Accuracy - Back to top
The measurement of accuracy for all stages of the document conversion process is typically calculated as follows:
Accuracy (%) = Total Opportunities - Missed/Incorrect Opportunities
Total Opportunities
The industry standard for accuracy of data capture is 95%. Off-shore data entry vendors often guarantee accuracy as high as 99.95%. The industry standard for scanning is 100% for capture of originals and 99% for images that match the quality of the original document.
Backfile Conversion - Back to top
The process of scanning, indexing and storing a large backlog of documents on an imaging system. This type of conversion is typically used to eliminate the need for costly storage space, reduce the time spent filing and re-filing documents and to reduce the occurrence of incorrectly filing documents.
Boolean Search
– Back to top
Search strategy for selecting information that uses AND, OR, NOT functions.
CCITT Groups III & IV – Back to top
A raster image compression format designed to be used for facsimile transmission but also used in other image processing systems.
CD-ROM
– Back to top
Compact Disc-Read-Only Memory. A type of optical disk capable of storing large amounts of data -- up to 1GB, although the most common size is 650MB. A single CD-ROM has the storage capacity of 700 floppy disks.
Character Recognition
– Back to top
The ability of a machine to read human-readable text.
Coding Opportunity
– Back to top
A value to be coded into a given field as agreed upon by the vendor and client. A field that is blank because the document contained no data pertaining to that field is a correctly coded field or 1 correct coding opportunity. i.e., it is 100% accurate. However, a field left blank because it is not meant to be coded due to the level of the document is not a coding opportunity. A letter with 5 authors has 5 coding opportunities in the Author field. If 4 are correctly coded and the 5th is omitted or coded incorrectly, then the author field has 4 correct entries and 1 error. i.e., it is 80% accurate.
Compact Disc (CD)
– Back to top
A standard medium for storage of digital data in a machine-readable form, accessible with a laser-based reader. CDs are faster and more accurate than magnetic tape for data storage.
Compression
– Back to top
A process that "shrinks" an image so that it occupies less storage space, and can be transmitted faster and easier.
Contextual Search
– Back to top
To locate documents stored in a system by searching for text that appears in them, rather than by searching for them by file name or other indexing technique.
Database
- Back to top
(1) A collection of information organized in such a way that a computer program can quickly select desired pieces of data. Traditional databases are organized by fields, records and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records.
(2) A program that manages data, and can be used to store, retrieve, and sort information.
Day Forward - Back to top
The process of scanning, indexing and storing documents on an imaging system as they are produced or received in the normal course of business.
De-skewing
– Back to top
The adjustment made to an image to make up for physical distortions inherent in the system or the adjustment made to an image to compensate for justification errors in scanning.
Decompression
– Back to top
The process of reversing the procedure conducted by compression software or hardware, thereby returning compressed data to its original size and condition.
Deskew
– Back to top
To straighten out a crooked image. Improves OCR accuracy and reduces image file size.
Despeckle
- Back to top
An image processing (clean-up) operation that removes random specks, called flyspecks or pepper, from an image to improve the legibility, OCR accuracy and to improve compression.
Digital Audio Tape (DAT)
- Back to top
A technology that records noise-free digital data on magnetic tape. Generally used for audio, a DAT cassette can hold zero to two gigabytes when adapted for data storage.
Digital Linear Tape (DLT) - Back to top
A technology designed by DEC and sold to Quantum used for backing up huge amounts of data (up to 35 GB per tape without compression, 70 GB with compression).
Directory Structure
– Back to top
Hierarchical file management used by operating systems. Consists of directories (files) and sub-directories (sub-files) "branching" away from the main, root directory. (i.e., c:\word\letter.doc)
File Transfer Protocol (FTP)
- Back to top
The Internet protocol that permits you to transfer files between your system and another system.
Flat File Database
- Back to top
A database used to manage a simple collection of information. A flat file database is similar to a relational database, but it only has one table.
Flat-bed Scanner – Back to top
Device for scanning that has a flat surface for input material. Generally used for scanning bound, delicate or small material.
Hypertext Markup Language (HTML)
- Back to top
The authoring language used to create documents on the World Wide Web. HTML is similar to SGML, although it is not a strict subset.
Image
– Back to top
The digitized representation of a picture, graphic or document.
Image Resolution
– Back to top
The fineness or coarseness of an image as it was digitized, measured as dots-per-inch (dpi), typically from 200 to 400 dpi.
Keyword
– Back to top
A word associated with a document or document image to aid in its retrieval from storage.
Multi-Occurring Field
– Back to top
A field that can have more than one entry for a given document.
Orientation
– Back to top
The relative direction of a display or printed page, either horizontal (called "landscape" orientation) or vertical (called "portrait" orientation).
Portable Document Format (PDF)
– Back to top
A file format developed by Adobe Systems. PDF captures formatting information from a variety of desktop publishing applications, making it possible to send formatted documents and have them appear on the recipient’s monitor or printer as they were intended. To view a file in PDF format, you need Adobe Acrobat Reader, a free application distributed by Adobe Systems.
Relational Database
- Back to top
A relational database is a set of tables containing data organized into predefined categories. Each table contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.
Sampling – Back to top
A random review of a percentage of database records and images judging the accuracy of the data and image quality using the coding manual and any other instructions as the measure.
Scanner
– Back to top
A device that optically senses a human-readable image, and contains software to convert the image to machine-readable code.
Skew
– Back to top
When a document is crooked (or differs from the original hard copy page’s orientation) when it is scanned. Skew reduces the legibility of an image when viewed and the quality of the hard copy when printed.
Tagged Image File Format (TIFF)
– Back to top
A bit map file format for describing and storing color and gray scale images.
Validated Fields
– Back to top
All entries into the field are chosen from a list of possible values. The restriction of values entered to those on a pick-list allows ease of information retrieval.
Education Center | Definitions | Document Statistics | Image vs. Film | FAQs