Rapid Rationalization

Rapid Rationalization is our term for the technical approach to enterprise-wide meta-data analysis.

"Rationalization" is the process of identifying the commonalities among data is several different physical locations, and either creating an index to map the relationships, or to change the data or structures, for example by changing an element name to a "standard" name.

For example,

A PERSONNEL database may have Employee-Name

Another PERSONNEL database may have EMPNME

The PAYROLL database may have LST-NME

The PARKING PASS database may have NAME.

A CREDIT database may have APPROVER-ID

Etc. Etc.

Rationalization determines that these elements are really the same thing.

Rapid Data Rationalization

Abstract: Legacy systems meta-data are often highly irregular, which complicates every aspect of data management and data access. An essential data administration job is crystallizing order from a chaotic environment through a process of data rationalization (also called reconciliation). A product should be a library card catalog-like subject index to database records and elements. Since a large organization uses tens or hundreds of thousands of physical data elements (and related meta-data) the reconciliation and indexing task is immense. This paper describes the Kismeta Analyst tool and the techniques required to economically create the whole enterprise data index.

To White paper on Rapid Rationalization.

To list of: Representative Commercial Tools.

Data Content Inspection

Data Inspection uses some combination of three strategies:

1) Identify the patterns of data content (e.g., 333-22-4444) and attempt to find other databases with the same format; deduce or impute relationships

2) Use prior knowledge of domain, e.g., find VA, MD, CA, or 22308, and then look for the rest of the address pattern nearby. This works well for very specific types of knowledge, such as our example of address, as well as person or product name.

3) Look for implicit relationships. If a fact is associated with a result often enough, a causal relationship will be construed.

The data content approach addresses the fact that meta-data describes what the designer thought would be put into the field, not necessarily actual information content. This is an issue as fields are often used for different or multiple purposes. Also, documentation is often wrong.

Vendors claim that they can identify "real" meta-data from data content, and extract structured data from unformatted data. True, but substantial manual analytic work must be done by expert staff to achieve this. While this approach can be clearly superior given the right problem, such as extracting address data, it can be applied effectively to only certain problem types and is only moderately useful in grasping the big picture.

Return to KISMET Home Page.

Last Updated on December 3, 1996. By: info@kismeta.com
Click to request more information!