Key words: data extraction, data scrubbing, data migration, data warehousing, database modeling
Data Extraction, Cleansing and Migration Tools
to support Data Warehouse, Database Consolidation, and Systems Reengineering Projects
© 1996 richard j. orli
Glossary of Terms
Abstract (data) Combine several items into one, summarize, perform computation, take out a
subfact from one item to generate a new attribute, and other actions that change data.
Batch (transport) Data can be processed for quality or distributed either in a set of many
transactions- "batch" mode, or as individual transactions.
Bulk (transport) When data is moved bulk, the entire database is refreshed periodically. The
opposite strategy is to selectively refresh the database with data changes.
Clean (data) Process to check data for adherence to standards, internal consistency, referential
integrity, valid domain, and to replace/repair incorrect data with correct data. For example,
replacing an invalid zip code with a zip code derived from the state/city information. Checks data
quality and scrubs data by some combination of: look-up against valid data (e.g. a list of 140
million US mailing addresses), look-up against domain values (e.g. a list of valid US State and
Territory codes ), domain range checks (e.g. Employees less than 15 or greater than 90 years old),
consistency checks among table data, pattern analysis of exceptions, correlations, and frequency
Cleanse (data) see Clean.
Data Content Quality The accuracy and validity of the actual values of the data, in contrast to
issues of data structure and database design.
Extract Selects data from various source system platforms. Facility to 1) specify which data is to
be extracted, 2) access to physical database
Extraction Select and copy data from a source database.
Filter (data) Process to check data for adherence to standards, consistency, valid domain; then
either clean or reject invalid data.
Load Add or replace data in a designation database(s).
Metadata Data describing other data, for example the column headers in a table.
Merge (data) Combine two or more data sets; values or structures. See Abstract.
Middleware Software designed to establish a permanent relationship (including filtering and
transformation) between source systems and a logical model. The logical model is then available
as a virtual database to end-user query tools or a data migration product such as InfoPump.
Migration See transport.
Parameters A list or database or information that controls a process, for example check boxes or
values. Contrast to "script" or "program".
Replication Data replication extract data from several platforms, perform some filtering and
transformation, and distribute and load to another database or databases. Usually the term
replication implies limited or no transformation, and moves within a homogeneous environment.
Pump A data pump extracts data from several mainframe and client server platforms, performs
some filtering and transformation, and distributes and loads to another database(s). Usually the
term pump is used rather than "replicator" to connote its applicability in a cross-platform
Reverse Reverse engineering derives a consistent set of metadata from several potential source
Scrub (data) see Clean.
Transform (data) See Abstract.
Transport Extract data from source, interface with destination environment, load data to
Warehouse A permanent database formed of a collection of data extracted and abstracted from
Return to KISMET Home Page