RecordPoint Supported File Types for Classification Intelligence Module

  • Updated

RecordPoint Classification Intelligence extracts text and other signals from your content so you can classify documents at scale. Knowing which file types are processed helps you set expectations for what Classification Intelligence can analyse, and why Content Search can sometimes match terms in unexpected places (for example, inside packaged or compressed files).

Before you begin

Access

Access to the Classification Intelligence machine learning module may require an additional subscription, depending on your current licensing model. If you need access, contact your RecordPoint Account Manager.

Role required

To train a model, you need the Application Administrator or Records Manager role in RecordPoint.

How RecordPoint extracts text from files

When RecordPoint processes a file, it tries to extract text so it can be used by services that rely on content (including Classification Intelligence and Content Search).

RecordPoint uses a layered approach to extract text:

  1. RecordPoint first tries its own extraction methods for common file types.

  2. If no text is extracted, RecordPoint then uses the Apache Tika library to try to extract text.

Supported file types are based on this article and the formats supported by the content extraction libraries used by RecordPoint (including Apache Tika). See Apache Tika – Supported Document Formats for more information.

Packaged and compressed files (archives)

Classification Intelligence supports packaged files such as ZIP archives (MIME type application/zip). When a packaged file is processed, the content extraction library can open the package and extract text from files inside it.

This can affect Content Search results:

  • A packaged file can appear in Content Search results if the search term exists in any file inside the package (even if the term is not visible when you open a “main” file in that package).

  • Packaged files can contain supporting files that are not part of your business content (for example, editor syntax or configuration files).

If you download and unzip a packaged file to investigate a match, search the extracted folder for file patterns that may contain the term (for example, *.vim) and then check the contents of those files.

Supported text and document formats

The file types supported for content-based features, such as the Classification Intelligence module, are listed below. For other file types, RecordPoint can manage these files using other classification techniques, such as rules. If you have questions about supported file types for your environment, contact your Account Manager or email support@recordpoint.com.

Supported file types

  • HTML documents: text/html, application/vnd.wap.xhtml+xml, application/x-asp, application/xhtml+xml

  • HTTP: application/x-httpresponse

  • XML files: text/xml, application/xml

  • JSON files: text/json, application/json

  • CSS stylesheets: text/css

  • Plain text files: text/plain

  • Batch files: application/bat, application/x-bat, application/x-msdos-program

  • TextEdit documents: application/textedit

  • CSV files: text/csv, application/csv

  • Windows shortcuts: application/x-ms-shortcut

  • Project files: application/x-project

  • PHP scripts: text/x-php

  • URL files: text/x-url

  • Excel documents:

    application/vnd.ms-excel application/vnd.ms-excel.sheet.macroenabled.12,

     application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

  • PowerPoint presentations: application/vnd.openxmlformats-officedocument.presentationml.presentation

  • Word documents:

    application/msword application/vnd.ms-word.document.macroenabled.12,

     application/vnd.openxmlformats-officedocument.wordprocessingml.document

  • Outlook files: application/vnd.ms-outlook

  • PDFs: application/pdf

  • RTF documents: application/rtf

  • Generic data files: application/octet-stream

  • Email formats: message/rfc822

  • Publisher files: application/vnd.ms-publisher

  • OneNote files: application/onenote; format=one

  • Packaged files: application/zip

  • Book files: application/epub+zip, application/x-ibooks+zip, application/x-fictionbook+xml

  • Open document formats:
     

    application/vnd.oasis.opendocument.tika.flat.document,

    application/vnd.oasis.opendocument.flat.presentation,

    application/vnd.oasis.opendocument.flat.spreadsheet,

    application/vnd.oasis.opendocument.flat.text,

    application/x-vnd.oasis.opendocument.presentation,

    application/vnd.oasis.opendocument.chart,

    application/x-vnd.oasis.opendocument.text-web,

    application/x-vnd.oasis.opendocument.image,

    application/vnd.oasis.opendocument.graphics-template,

    application/x-vnd.oasis.opendocument.spreadsheet-template,

    application/vnd.oasis.opendocument.spreadsheet-template,

    application/vnd.sun.xml.writer,

    application/vnd.oasis.opendocument.graphics,

    application/vnd.oasis.opendocument.spreadsheet,

    application/x-vnd.oasis.opendocument.chart,

    application/x-vnd.oasis.opendocument.spreadsheet,

    application/vnd.oasis.opendocument.image,

    application/x-vnd.oasis.opendocument.text,

    application/x-vnd.oasis.opendocument.text-template,

    application/vnd.oasis.opendocument.formula-template,

    application/x-vnd.oasis.opendocument.formula,

    application/vnd.oasis.opendocument.image-template,

    application/x-vnd.oasis.opendocument.image-template,

    application/x-vnd.oasis.opendocument.presentation-template,

    application/vnd.oasis.opendocument.presentation-template,

    application/vnd.oasis.opendocument.text,

    application/vnd.oasis.opendocument.text-template,

    application/vnd.oasis.opendocument.chart-template,

    application/x-vnd.oasis.opendocument.chart-template,

    application/x-vnd.oasis.opendocument.formula-template,

    application/x-vnd.oasis.opendocument.text-master,

    application/vnd.oasis.opendocument.presentation,

    application/x-vnd.oasis.opendocument.graphics,

    application/vnd.oasis.opendocument.formula,

    application/vnd.oasis.opendocument.text-master

Image file processing

RecordPoint supports processing various image formats. By default, Image File Processing for Classification is not enabled to optimise system performance.

To activate Image File Processing for Classification Intelligence, contact support so they can walk you through enabling this feature.

Supported image formats:

  • JPG/JPEG images: image/jpg, image/jpeg

  • PNG images: image/png

  • GIFs: image/gif

  • TIFF images: image/tif

  • BMP images: image/bmp, image/x-ms-bmp

  • BPG images: image/bpg, image/x-bpg

  • Heif images: image/heic-sequence, image/heif, image/heic, image/heif-sequence

  • ICNS images: image/icns

  • JXL images: image/jxl

  • PSD images: image/vnd.adobe.photoshop

  • WebP images: image/webp

  • EMF images: image/emf

  • WMF images: image/wmf

  • Other images: image/vnd.wap.wbmp, image/x-jbig2, image/x-xcf, image/x-icon, image/svg+xml, image/vnd.dwg

While RecordPoint supports PDFs, by default image-only PDFs require Image File Processing to be fully accessible.

Audio and video file processing

  • Audio formats: audio/mpeg, audio/mp4, audio/x-oggflac, audio/x-flac, audio/ogg, audio/x-oggpcm, audio/opus, audio/vorbis, audio/speex, audio/vnd.wave, audio/x-wav

  • Video formats: image/wmf, video/avi, video/mpeg, video/x-msvideo, video/mp4, video/x-m4v, video/3gpp, video/3gpp2, video/quicktime

Email file types

RecordPoint identifies emails when the metadata indicates a file name ending with .MSG or .OFT. This improves email classification, regardless of the broader MIME type classification.

Large files

Classification Intelligence has a default file size limit of 10MB to maintain performance and system integrity.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request