SCC OCR Text Recognition Module for SCC MediaServer Digital Asset Management DAM Systems leverages artificial intelligence and machine learning services to identify and generate text from scanned documents and with the added benefit of automatic column/article detection.
Taking only seconds for each page scan to process, bitmap text within the scanned page is automatically detected, analyzed and returned to the database record as fully searchable textual data.
The module attempts to recognize article shapes and columns in each document and when successful returns all words extracted in logical, human readable order, allowing phrases (sentences, paragraphs) to be matched correctly.
The following scanned documents have been kindly provided by SCC's customer The Globe and Mail (Toronto, CA). Corresponding text extracts can be viewed for each document.
Scanned document: GM_Aug17_1979_A1.JPG,
Extracted Text: GM_Aug17_1979_A1.TXT
Scanned document: GM_May01_1987_A1.JPG,
Extracted Text: GM_May01_1987_A1.TXT
Scanned document: GM_Sep05_1972_A1.JPG,
Extracted Text: GM_Sep05_1972_A1.TXT
SCC OCR Text Recognition Module Features
- Indentify and generate (OCR) text from scanned documents
- Automatically recognize article shapes and columns in each document
- Returns all words extracted in logical, human readable order
- Allows phrases (sentences, paragraphs) to be searched and matched correctly