What types of data pollution/cleansing
problems might occur with the Fitchwood OLTP system data?
The Solution
Most of the data
pollution problems mentioned in the chapter could occur with this data set. The
most likely concerns are missing and duplicate data and inconsistencies (For
example, different primary keys for the same policies or other hire dates for
different agents might be legitimate because they were hired on different dates
to work in different product lines). It is also possible that other systems
have different rules for creating computed values (for example, various
insurance products might have different rules for using face value and
commissions to calculate the amount paid agents). Territories might have other
geographical boundaries across the source systems. Even more, issues are
possible.
The
data pollution/cleansing problems that might occur in the Fitch wood Insurance
Company system includes:
- Misspelt names
and addresses, odd formats for customer names and addresses
- The impossible
or erroneous effective date in the Policy table or date of hire in the Agent
table
- Fields used for
purposes for which they were not intended
- Mismatched
addresses and area codes
- Missing data
- Duplicate data
- Inconsistencies
(e.g., different addresses) in values or formats across sources
- Different
primary keys across sources
Post a Comment