Home Company Info Mission Statement Services Solutions Education Center Web Promotion

DEFINITION OF TERMS

Accuracy   Backfile Conversion   Boolean Search
CCITT Groups III & IV CD-ROM  Character Recognition
Coding Opportunity   Compression Compact Disk (CD)
Contextual Search  Database  Day Forward
De-skewing  Decompression  Despeckle
Deskew Digital Audio Tape (DAT) Digital Linear Tape (DLT)
Directory Structure File Transfer Protocol (FTP) Flat File Database
Flat-bed Scanner  Hypertext Markup Language (HTML)  Keyword
Image Image Resolution Multi-Occurring Field
Portable Document Format (PDF) Relational Database Sampling 
Scanner Skew Tagged Image File Format (TIFF)
Validated Fields     

Document Statistics   Image vs. Film   FAQ

 

Accuracy -                                                                    Back to top

The measurement of accuracy for all stages of the document conversion process is typically calculated as follows:

Accuracy (%) = Total Opportunities - Missed/Incorrect Opportunities

                                                     Total Opportunities

The industry standard for accuracy of data capture is 95%. Off-shore data entry vendors often guarantee accuracy as high as 99.95%. The industry standard for scanning is 100% for capture of originals and 99% for images that match the quality of the original document.

Backfile Conversion -                                                  Back to top

The process of scanning, indexing and storing a large backlog of documents on an imaging system. This type of conversion is typically used to eliminate the need for costly storage space, reduce the time spent filing and re-filing documents and to reduce the occurrence of incorrectly filing documents.

Boolean Search –                                                         Back to top

Search strategy for selecting information that uses AND, OR, NOT functions.

CCITT Groups III & IV –                                          Back to top

A raster image compression format designed to be used for facsimile transmission but also used in other image processing systems.

CD-ROM –                                                                   Back to top

Compact Disc-Read-Only Memory. A type of optical disk capable of storing large amounts of data -- up to 1GB, although the most common size is 650MB. A single CD-ROM has the storage capacity of 700 floppy disks.

Character Recognition –                                             Back to top  

The ability of a machine to read human-readable text.

Coding Opportunity –                                                 Back to top

A value to be coded into a given field as agreed upon by the vendor and client. A field that is blank because the document contained no data pertaining to that field is a correctly coded field or 1 correct coding opportunity. i.e., it is 100% accurate. However, a field left blank because it is not meant to be coded due to the level of the document is not a coding opportunity. A letter with 5 authors has 5 coding opportunities in the Author field. If 4 are correctly coded and the 5th is omitted or coded incorrectly, then the author field has 4 correct entries and 1 error. i.e., it is 80% accurate.

Compact Disc (CD) –                                                 Back to top  

A standard medium for storage of digital data in a machine-readable form, accessible with a laser-based reader. CDs are faster and more accurate than magnetic tape for data storage.

Compression –                                                             Back to top  

A process that "shrinks" an image so that it occupies less storage space, and can be transmitted faster and easier.

Contextual Search –                                                    Back to top

To locate documents stored in a system by searching for text that appears in them, rather than by searching for them by file name or other indexing technique.

Database                                                                   Back to top

(1) A collection of information organized in such a way that a computer program can quickly select desired pieces of data. Traditional databases are organized by fields, records and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. 

(2) A program that manages data, and can be used to store, retrieve, and sort information.

Day Forward                                                              Back to top

The process of scanning, indexing and storing documents on an imaging system as they are produced or received in the normal course of business.

De-skewing –                                                               Back to top

The adjustment made to an image to make up for physical distortions inherent in the system or the adjustment made to an image to compensate for justification errors in scanning.

Decompression –                                                        Back to top

The process of reversing the procedure conducted by compression software or hardware, thereby returning compressed data to its original size and condition.

Deskew –                                                                      Back to top

To straighten out a crooked image. Improves OCR accuracy and reduces image file size.

Despeckle -                                                                  Back to top

An image processing (clean-up) operation that removes random specks, called flyspecks or pepper, from an image to improve the legibility, OCR accuracy and to improve compression.

Digital Audio Tape (DAT) -                                       Back to top

A technology that records noise-free digital data on magnetic tape. Generally used for audio, a DAT cassette can hold zero to two gigabytes when adapted for data storage.

Digital Linear Tape (DLT) -                                      Back to top

A technology designed by DEC and sold to Quantum used for backing up huge amounts of data (up to 35 GB per tape without compression, 70 GB with compression).

Directory Structure –                                                  Back to top

Hierarchical file management used by operating systems. Consists of directories (files) and sub-directories (sub-files) "branching" away from the main, root directory. (i.e., c:\word\letter.doc)

File Transfer Protocol (FTP) -                                   Back to top

The Internet protocol that permits you to transfer files between your system and another system.

Flat File Database                                                    Back to top

A database used to manage a simple collection of information. A flat file database is similar to a relational database, but it only has one table.

Flat-bed Scanner –                                                      Back to top

Device for scanning that has a flat surface for input material. Generally used for scanning bound, delicate or small material.

Hypertext Markup Language (HTML) -                 Back to top

The authoring language used to create documents on the World Wide Web. HTML is similar to SGML, although it is not a strict subset.

Image –                                                                        Back to top

The digitized representation of a picture, graphic or document.

Image Resolution –                                                     Back to top

The fineness or coarseness of an image as it was digitized, measured as dots-per-inch (dpi), typically from 200 to 400 dpi.

Keyword –                                                                    Back to top

A word associated with a document or document image to aid in its retrieval from storage.

Multi-Occurring Field –                                             Back to top

A field that can have more than one entry for a given document.

Orientation –                                                                Back to top

The relative direction of a display or printed page, either horizontal (called "landscape" orientation) or vertical (called "portrait" orientation).

Portable Document Format (PDF) –                        Back to top

A file format developed by Adobe Systems. PDF captures formatting information from a variety of desktop publishing applications, making it possible to send formatted documents and have them appear on the recipient’s monitor or printer as they were intended. To view a file in PDF format, you need Adobe Acrobat Reader, a free application distributed by Adobe Systems.

Relational Database                                                 Back to top

A relational database is a set of tables containing data organized into predefined categories. Each table contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.

Sampling –                                                                   Back to top

A random review of a percentage of database records and images judging the accuracy of the data and image quality using the coding manual and any other instructions as the measure.

Scanner –                                                                      Back to top

A device that optically senses a human-readable image, and contains software to convert the image to machine-readable code.

Skew –                                                                           Back to top

When a document is crooked (or differs from the original hard copy page’s orientation) when it is scanned. Skew reduces the legibility of an image when viewed and the quality of the hard copy when printed.

Tagged Image File Format (TIFF) –                       Back to top

A bit map file format for describing and storing color and gray scale images.

Validated Fields –                                                        Back to top

All entries into the field are chosen from a list of possible values. The restriction of values entered to those on a pick-list allows ease of information retrieval.

Document Statistics        Image vs. Film        FAQ

 

Home  •  Company Info  •  Mission Statement  •  Services  •  

Solutions  •  Education Center  •  Web Promo