RecordPoint Classification Intelligence extracts text and other signals from your content so you can classify documents at scale. Knowing which file types are processed helps you set expectations for what Classification Intelligence can analyse, and why Content Search can sometimes match terms in unexpected places (for example, inside packaged or compressed files).
Before you begin
Access
Access to the Classification Intelligence machine learning module may require an additional subscription, depending on your current licensing model. If you need access, contact your RecordPoint Account Manager.
Role required
To train a model, you need the Application Administrator or Records Manager role in RecordPoint.
How RecordPoint extracts text from files
When RecordPoint processes a file, it tries to extract text so it can be used by services that rely on content (including Classification Intelligence and Content Search).
RecordPoint uses a layered approach to extract text:
RecordPoint first tries its own extraction methods for common file types.
If no text is extracted, RecordPoint then uses the Apache Tika library to try to extract text.
Supported file types are based on this article and the formats supported by the content extraction libraries used by RecordPoint (including Apache Tika). See Apache Tika – Supported Document Formats for more information.
Packaged and compressed files (archives)
Classification Intelligence supports packaged files such as ZIP archives (MIME type application/zip). When a packaged file is processed, the content extraction library can open the package and extract text from files inside it.
This can affect Content Search results:
A packaged file can appear in Content Search results if the search term exists in any file inside the package (even if the term is not visible when you open a “main” file in that package).
Packaged files can contain supporting files that are not part of your business content (for example, editor syntax or configuration files).
If you download and unzip a packaged file to investigate a match, search the extracted folder for file patterns that may contain the term (for example, *.vim) and then check the contents of those files.
Supported text and document formats
The file types supported for content-based features, such as the Classification Intelligence module, are listed below. For other file types, RecordPoint can manage these files using other classification techniques, such as rules. If you have questions about supported file types for your environment, contact your Account Manager or email support@recordpoint.com.
Supported file types
HTML documents:
text/html, application/vnd.wap.xhtml+xml, application/x-asp, application/xhtml+xmlHTTP:
application/x-httpresponseXML files:
text/xml, application/xmlJSON files:
text/json, application/jsonCSS stylesheets:
text/cssPlain text files:
text/plainBatch files:
application/bat, application/x-bat, application/x-msdos-programTextEdit documents:
application/texteditCSV files:
text/csv, application/csvWindows shortcuts:
application/x-ms-shortcutProject files:
application/x-projectPHP scripts:
text/x-phpURL files:
text/x-url-
Excel documents:
application/vnd.ms-excel application/vnd.ms-excel.sheet.macroenabled.12,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet PowerPoint presentations:
application/vnd.openxmlformats-officedocument.presentationml.presentation-
Word documents:
application/msword application/vnd.ms-word.document.macroenabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.document Outlook files: application/vnd.ms-outlook
PDFs:
application/pdfRTF documents:
application/rtfGeneric data files:
application/octet-streamEmail formats:
message/rfc822Publisher files:
application/vnd.ms-publisherOneNote files:
application/onenote; format=onePackaged files:
application/zipBook files:
application/epub+zip, application/x-ibooks+zip, application/x-fictionbook+xml-
Open document formats:
application/vnd.oasis.opendocument.tika.flat.document,application/vnd.oasis.opendocument.flat.presentation,application/vnd.oasis.opendocument.flat.spreadsheet,application/vnd.oasis.opendocument.flat.text,application/x-vnd.oasis.opendocument.presentation,application/vnd.oasis.opendocument.chart,application/x-vnd.oasis.opendocument.text-web,application/x-vnd.oasis.opendocument.image,application/vnd.oasis.opendocument.graphics-template,application/x-vnd.oasis.opendocument.spreadsheet-template,application/vnd.oasis.opendocument.spreadsheet-template,application/vnd.sun.xml.writer,application/vnd.oasis.opendocument.graphics,application/vnd.oasis.opendocument.spreadsheet,application/x-vnd.oasis.opendocument.chart,application/x-vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.image,application/x-vnd.oasis.opendocument.text,application/x-vnd.oasis.opendocument.text-template,application/vnd.oasis.opendocument.formula-template,application/x-vnd.oasis.opendocument.formula,application/vnd.oasis.opendocument.image-template,application/x-vnd.oasis.opendocument.image-template,application/x-vnd.oasis.opendocument.presentation-template,application/vnd.oasis.opendocument.presentation-template,application/vnd.oasis.opendocument.text,application/vnd.oasis.opendocument.text-template,application/vnd.oasis.opendocument.chart-template,application/x-vnd.oasis.opendocument.chart-template,application/x-vnd.oasis.opendocument.formula-template,application/x-vnd.oasis.opendocument.text-master,application/vnd.oasis.opendocument.presentation,application/x-vnd.oasis.opendocument.graphics,application/vnd.oasis.opendocument.formula,application/vnd.oasis.opendocument.text-master
Image file processing
RecordPoint supports processing various image formats. By default, Image File Processing for Classification is not enabled to optimise system performance.
To activate Image File Processing for Classification Intelligence, contact support so they can walk you through enabling this feature.
Supported image formats:
JPG/JPEG images:
image/jpg, image/jpegPNG images:
image/pngGIFs:
image/gifTIFF images:
image/tifBMP images:
image/bmp, image/x-ms-bmpBPG images:
image/bpg, image/x-bpgHeif images:
image/heic-sequence, image/heif, image/heic, image/heif-sequenceICNS images:
image/icnsJXL images:
image/jxlPSD images:
image/vnd.adobe.photoshopWebP images:
image/webpEMF images:
image/emfWMF images:
image/wmfOther images:
image/vnd.wap.wbmp, image/x-jbig2, image/x-xcf, image/x-icon, image/svg+xml, image/vnd.dwg
While RecordPoint supports PDFs, by default image-only PDFs require Image File Processing to be fully accessible.
Audio and video file processing
Audio formats:
audio/mpeg, audio/mp4, audio/x-oggflac, audio/x-flac, audio/ogg, audio/x-oggpcm, audio/opus, audio/vorbis, audio/speex, audio/vnd.wave, audio/x-wavVideo formats:
image/wmf, video/avi, video/mpeg, video/x-msvideo, video/mp4, video/x-m4v, video/3gpp, video/3gpp2, video/quicktime
Email file types
RecordPoint identifies emails when the metadata indicates a file name ending with .MSG or .OFT. This improves email classification, regardless of the broader MIME type classification.
Large files
Classification Intelligence has a default file size limit of 10MB to maintain performance and system integrity.