Data Extraction, Transformation, and Migration Tools - Glossary

Key words: data extraction, data scrubbing, data migration, data warehousing, database modeling

Data Extraction, Cleansing and Migration Tools

to support Data Warehouse, Database Consolidation, and Systems Reengineering Projects

© 1996 richard j. orli

Main Document

Glossary of Terms

Abstract (data) Combine several items into one, summarize, perform computation, take out a subfact from one item to generate a new attribute, and other actions that change data.

Batch (transport) Data can be processed for quality or distributed either in a set of many transactions- "batch" mode, or as individual transactions.

Bulk (transport) When data is moved bulk, the entire database is refreshed periodically. The opposite strategy is to selectively refresh the database with data changes.

Clean (data) Process to check data for adherence to standards, internal consistency, referential integrity, valid domain, and to replace/repair incorrect data with correct data. For example, replacing an invalid zip code with a zip code derived from the state/city information. Checks data quality and scrubs data by some combination of: look-up against valid data (e.g. a list of 140 million US mailing addresses), look-up against domain values (e.g. a list of valid US State and Territory codes ), domain range checks (e.g. Employees less than 15 or greater than 90 years old), consistency checks among table data, pattern analysis of exceptions, correlations, and frequency distributions.

Cleanse (data) see Clean.

Data Content Quality The accuracy and validity of the actual values of the data, in contrast to issues of data structure and database design.

Extract Selects data from various source system platforms. Facility to 1) specify which data is to be extracted, 2) access to physical database

Extraction Select and copy data from a source database.

Filter (data) Process to check data for adherence to standards, consistency, valid domain; then either clean or reject invalid data.

Load Add or replace data in a designation database(s).

Metadata Data describing other data, for example the column headers in a table.

Merge (data) Combine two or more data sets; values or structures. See Abstract.

Middleware Software designed to establish a permanent relationship (including filtering and transformation) between source systems and a logical model. The logical model is then available as a virtual database to end-user query tools or a data migration product such as InfoPump.

Migration See transport.

Parameters A list or database or information that controls a process, for example check boxes or values. Contrast to "script" or "program".

Replication Data replication extract data from several platforms, perform some filtering and transformation, and distribute and load to another database or databases. Usually the term replication implies limited or no transformation, and moves within a homogeneous environment. (See Pump)

Pump A data pump extracts data from several mainframe and client server platforms, performs some filtering and transformation, and distributes and loads to another database(s). Usually the term pump is used rather than "replicator" to connote its applicability in a cross-platform environment.

Reverse Reverse engineering derives a consistent set of metadata from several potential source system's metadata.

Scrub (data) see Clean.

Transform (data) See Abstract.

Transport Extract data from source, interface with destination environment, load data to destination.

Warehouse A permanent database formed of a collection of data extracted and abstracted from source systems.

Return to KISMET Home Page

Last Updated September 4, 1996 by