Skip to content

Data Deduplication

Setting up automated data deduplication in CYBERQUEST is trivial:

Step 1: Configure Deduplication settings at agent level:

The parameters used for deduplication are:

"deduplicateData": false, //does not deduplicate by default
"deduplicateWindow": 300, // expressed in seconds, defines deduplication individual event window 
"deduplicateMaxWindow": 600, // expressed in seconds, defines maximum event window
"deduplicateMaxCount": 1000, // expressed in number of maximum event per deduplcated event
"deduplicateDropShortTermStorage": false, // sets duplicate events to be dropped from shortTermDataStorage, by default false 
"deduplicateDropLongTermStorage": false //  sets duplicate events to be dropped from longTermDataStorage

Step 2: Configure Deduplication activation at Data Source level

Step 3: Watch for deduplicated events in Browser Module:

FirstSeen - when the first duplicate dataset message appeared;

LastSeen - when the last message in the duplicate dataset appeared;

DuplicateCount - how many times the message was duplicated;

DuplicationHash - this parameter is used to ensure authenticity between messages (it is the element between duplicate messages)

isDuplicate - appears for all messages detected as duplicate, except for the first message (true if the message is duplicate, otherwise false);

isLastDuplicate - true, only on the last duplicate message, false on the rest.

Also, manual data deduplication is possible in order to mark duplicate: Events Manual Deduplication