Accuracy
-
Back to top
The measurement
of accuracy for all stages of the document conversion process is typically
calculated as follows:
Accuracy (%) = Total
Opportunities - Missed/Incorrect Opportunities
Total Opportunities
The industry standard for accuracy of data capture is 95%. Off-shore data
entry vendors often guarantee accuracy as high as 99.95%. The industry standard
for scanning is 100% for capture of originals and 99% for images that match the
quality of the original document.

Backfile Conversion
-
Back to top
The process of scanning, indexing and storing a
large backlog of documents on an imaging system. This type of conversion is
typically used to eliminate the need for costly storage space, reduce the time
spent filing and re-filing documents and to reduce the occurrence of incorrectly
filing documents.

Boolean Search
Back to top
Search strategy for selecting information that uses
AND, OR, NOT functions.

CCITT Groups III & IV
Back to top
A raster image compression format designed
to be used for facsimile transmission but also used in other image processing
systems.

CD-ROM
Back to top
Compact Disc-Read-Only Memory. A type of
optical disk capable of storing large amounts of data -- up to 1GB, although the
most common size is 650MB. A single CD-ROM has the storage capacity of 700
floppy disks.

Character Recognition
Back to top
The ability of a machine to
read human-readable text.

Coding Opportunity
Back to top
A value to be coded into a given
field as agreed upon by the vendor and client. A field that is blank because the
document contained no data pertaining to that field is a correctly coded field
or 1 correct coding opportunity. i.e., it is 100% accurate. However, a field
left blank because it is not meant to be coded due to the level of the document
is not a coding opportunity. A letter with 5 authors has 5 coding opportunities
in the Author field. If 4 are correctly coded and the 5th is omitted or coded
incorrectly, then the author field has 4 correct entries and 1 error. i.e., it
is 80% accurate.

Compact Disc (CD)
Back to top
A standard medium for storage of
digital data in a machine-readable form, accessible with a laser-based reader.
CDs are faster and more accurate than magnetic tape for data storage.

Compression
Back to top
A process that "shrinks" an
image so that it occupies less storage space, and can be transmitted faster and
easier.

Contextual Search
Back to top
To locate documents stored in a
system by searching for text that appears in them, rather than by searching for
them by file name or other indexing technique.

Database -
Back to top
(1) A collection of information organized in
such a way that a computer program can quickly select desired pieces of data.
Traditional databases are organized by fields, records and files. A field is a
single piece of information; a record is one complete set of fields; and a file
is a collection of records.
(2) A program that manages data, and can be used to
store, retrieve, and sort information.

Day Forward -
Back to top
The process of scanning, indexing and
storing documents on an imaging system as they are produced or received in the
normal course of business.

De-skewing
Back to top
The adjustment made to an image to make up
for physical distortions inherent in the system or the adjustment made to an
image to compensate for justification errors in scanning.

Decompression
Back to top
The process of reversing the procedure
conducted by compression software or hardware, thereby returning compressed data
to its original size and condition.

Deskew
Back to top
To straighten out a crooked image. Improves OCR
accuracy and reduces image file size.

Despeckle -
Back to top
An image processing (clean-up) operation
that removes random specks, called flyspecks or pepper, from an image to improve
the legibility, OCR accuracy and to improve compression.

Digital Audio Tape (DAT)
-
Back to top
A technology that records
noise-free digital data on magnetic tape. Generally used for audio, a DAT
cassette can hold zero to two gigabytes when adapted for data storage.

Digital Linear Tape (DLT) -
Back to top
A technology designed by
DEC and sold to Quantum used for backing up huge amounts of data (up to 35 GB
per tape without compression, 70 GB with compression).

Directory Structure
Back to top
Hierarchical file management used
by operating systems. Consists of directories (files) and sub-directories
(sub-files) "branching" away from the main, root directory. (i.e., c:\word\letter.doc)

File Transfer Protocol (FTP) -
Back to top
The Internet protocol that
permits you to transfer files between your system and another system.

Flat File Database -
Back to top
A database used to manage a simple
collection of information. A flat file database is similar to a relational
database, but it only has one table.

Flat-bed Scanner
Back to top
Device for scanning that has a flat
surface for input material. Generally used for scanning bound, delicate or small
material.

Hypertext Markup Language (HTML) -
Back to top
The authoring language
used to create documents on the World Wide Web. HTML is similar to SGML,
although it is not a strict subset.

Image
Back to top
The digitized representation of a picture, graphic
or document.

Image Resolution
Back to top
The fineness or coarseness of an
image as it was digitized, measured as dots-per-inch (dpi), typically from 200
to 400 dpi.

Keyword
Back to top
A word associated with a document or document
image to aid in its retrieval from storage.

Multi-Occurring Field
Back to top
A field that can have more than
one entry for a given document.

Orientation
Back to top
The relative direction of a display or
printed page, either horizontal (called "landscape" orientation) or
vertical (called "portrait" orientation).

Portable Document Format (PDF)
Back to top
A file format
developed by Adobe Systems. PDF captures formatting information from a variety
of desktop publishing applications, making it possible to send formatted
documents and have them appear on the recipients monitor or printer as they
were intended. To view a file in PDF format, you need Adobe Acrobat Reader, a
free application distributed by Adobe Systems.

Relational Database -
Back to top
A relational database is a set of
tables containing data organized into predefined categories. Each table contains
one or more data categories in columns. Each row contains a unique instance of
data for the categories defined by the columns. In addition to being relatively
easy to create and access, a relational database has the important advantage of
being easy to extend. After the original database creation, a new data category
can be added without requiring that all existing applications be modified.

Sampling
Back to top
A random review of a percentage of database
records and images judging the accuracy of the data and image quality using the
coding manual and any other instructions as the measure.

Scanner
Back to top
A device that optically senses a
human-readable image, and contains software to convert the image to
machine-readable code.

Skew
Back to top
When a document is crooked (or differs from the
original hard copy pages orientation) when it is scanned. Skew reduces the
legibility of an image when viewed and the quality of the hard copy when
printed.

Tagged Image File Format (TIFF)
Back to top
A bit map file format
for describing and storing color and gray scale images.

Validated Fields
Back to top
All entries into the field are chosen from a list of
possible values. The restriction of values entered to those on a pick-list
allows ease of information retrieval.
Document
Statistics