Duplicate identification allows users to identify if their records are duplicated across all content sources managed by the platform. This feature helps to highlight any risks associated with the distribution of content and allows users to take necessary actions to mitigate these risks.
The duplicate identification uses an MD5 hash to create a fingerprint of the binary content (extracted text), and this is then paired with the file size to create a fingerprint with a very low probability of having a false match.
The RecordPoint Intelligence Signaling module is available for early access on request. Get free access to the Intelligence Signaling module until May 31st, 2023.
Viewing duplicates for a record
To view the duplicates for an individual record:
- Sign in to Records365.
- Go to the Browse page.
- Select any record to open the Records Details page.
- Select the Duplicates tab.
- Any duplicates of the record binary are displayed.
Please note documents created or modified before Intelligence Signaling was enabled will not have been processed and will therefore have no duplicate information.
Here you can see a number of duplicates of a CV, with content sources across SharePoint and Exchange:
Managing duplicates using file plans
To manage duplicates, you can create a classification and manually reassign these.
- Create a classification called "Duplicate"; you can assign a disposal date, such as 1 week after creation. This will facilitate destroying duplicate records.
- Open a record and go to the Duplicates tab
- Select a duplicate and select Reclassify
- Select the Duplicate classification
- The record will then be reclassified and is searchable and reportable using the duplicate classification.