®®®® SIIA Público

Título del libro: International Conference On Information And Knowledge Management, Proceedings
Título del capítulo: Vector and matrix operations programmed with UDFs in a relational DBMS

Autores UNAM:
JAVIER GARCIA GARCIA;
Autores externos:

Idioma:
Inglés
Año de publicación:
2006
Palabras clave:

Data mining; Learning systems; Matrix algebra; Statistical methods; User interfaces; Vectors; Matrix operations; Memory management; User Defined Functions (UDFs).; Relational database systems


Resumen:

In general, a relational DBMS provides limited capabilities to perform multidimensional statistical analysis, which requires manipulating vectors and matrices. In this work, we study how to extend a DBMS with basic vector and matrix operators by programming User-Defined Functions (UDFs). We carefully analyze UDF features and limitations to implement vector and matrix operations commonly used in statistics, machine learning and data mining, paying attention to DBMS, operating system and computer architecture constraints. UDFs represent a C programming interface that allows the definition of scalar and aggregate functions that can be used in SQL. UDFs have several advantages and limitations. A UDF allows fast evaluation of arithmetic expressions, memory manipulation, using multidimensional arrays and exploiting all C language control statements. Nevertheless, a UDF cannot perform disk I/O, the amount of heap and stack memory that can be allocated is small and the UDF code must consider specific architecture characteristics of the DBMS. We experimentally compare UDFs and SQL with respect to performance, ease of use, flexibility and scalability. We profile UDFs based on call overhead, memory management and interleaved disk access. We show UDFs are faster than standard SQL aggregations and as fast as SQL arithmetic expressions. Copyright 2006 ACM.


Entidades citadas de la UNAM: