®®®® SIIA Público

Título del libro: Dolap: Proceedings Of The Acm International Workshop On Data Warehousing And Olap
Título del capítulo: Estimating and bounding aggregations in databases with referential integrity errors

Autores UNAM:
JAVIER GARCIA GARCIA;
Autores externos:

Idioma:
Inglés
Año de publicación:
2008
Palabras clave:

Aggregate function; Aggregate queries; Answer set; Data quality; Database integration; Dimension tables; ETL process; Integrated database; Referential integrity; Referential integrity constraints; SQL; Errors; Knowledge management; Quality assurance; Set theory; Data warehouses


Resumen:

Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints. Copyright 2008 ACM.


Entidades citadas de la UNAM: