Duplicate identification allows users to identify if their records are duplicated across all content sources managed by the platform. This feature helps to highlight any risks associated with the distribution of content and allows users to take necessary actions to mitigate these risks.
The duplicate identification uses an MD5 hash to create a fingerprint of the binary content (extracted text), and this is then paired with the file size to create a fingerprint with a very low probability of having a false match.
Viewing duplicates for a record
To view the duplicates for an individual record:
- Sign in to Records365.
- Go to the Browse page.
- Select any record to open the Records Details page.
- Select the Duplicates tab.
- Any duplicates of the record binary are displayed.
Please note documents created or modified before Intelligence Signaling was enabled will not have been processed and will therefore have no duplicate information.
Here you can see a number of duplicates of a CV, with content sources across SharePoint and Exchange:
Managing duplicates using file plans
To manage duplicates, you can create a classification and manually reassign these.
- Create a classification called "Duplicate"; you can assign a disposal date, such as 1 week after creation. This will facilitate destroying duplicate records.
- Open a record and go to the Duplicates tab
- Select a duplicate and select Reclassify
- Select the Duplicate classification
- The record will then be reclassified and is searchable and reportable using the duplicate classification.
Duplicates Reports
You can use the Intelligence Signalling report to discover duplicates across your data landscape.
Please see Enterprise Reporting & Analytics
Download the Signal Intelligence Report
- Sign in to Records365.
- Go to the Administration page by clicking the cog wheel on the top bar.
- Select Reporting from the left-hand navigation menu.
- Select the Enterprise tab.
- Download the Power BI file named Signal Intelligence Report.
Please read Enterprise Reporting & Analytics to learn how to connect your report to your tenant, understand PowerBI, and customize your report.
The report contains two tabs for Duplicates and Duplicates details. These reports can be used to identify duplicates across your connected sources.
- The total items under management in Record Point
- The total number of duplicates
- A gauge that shows the amount of duplicates in relation to the total number of items. This can be used to set Power BI automation or alerts.
- Filter by the size of the file
- Filter by the content source
- The grid groups duplicates so they can be explored
- Select a row to investigate
- Right Click and Drill into Detail Report