®®®® SIIA Público

Título del libro: 2016 Fifteenth Mexican International Conference On Artificial Intelligence (micai): Advances In Artificial Intelligence
Título del capítulo: An unsupervised approach for automatic discovery of metadata in document images

Autores UNAM:
GERARDO SIERRA DIAZ; BORIS ESCALANTE RAMIREZ;
Autores externos:

Idioma:

Año de publicación:
2016
Palabras clave:

Metadata; Maximally Stable Extremal Regions (MSER); Conditional Random Fields (CRF)


Resumen:

The visual information contained in documents provides a rich set of features that can be exploited to increase its understanding. The typography, design or lexical properties of text constitute the clues that help us identify at a glance those data from other. In this paper, we present a methodology to identify, extract and automatically classify the metadata of the document covers. A problem associated with metadata discovery is the processing of the original document format. We propose the combination of two methods, maximally stable extremal regions (MSER) for detecting text in cover images with complex background, and conditional random fields (CRF) for logical labeling elements in the document. We show a selected set of visual and linguistic features used to train our model. As a necessary proof of concept we incorporated the methods in a desktop application and we executed some interesting examples. Preliminary results show a performance improvement in text recognition regarding traditional methods of metadata extraction for document images. In particular, a problem that we seek to solve is the ambiguity between the book title and the author.


Entidades citadas de la UNAM: