®®®® SIIA Público

Título del libro:
Título del capítulo: Automatic Synthetic Data Selection for Highly Imbalanced Multi-class and Multi-label Obstetric Violence Classification

Autores UNAM:
NATALIA LERIN HERNANDEZ; HELENA MONTSERRAT GOMEZ ADORNO; MONICA VAZQUEZ HERNANDEZ;
Autores externos:

Idioma:

Año de publicación:
2026
Palabras clave:

BERTScore; Class labels; Data imbalance; Data Selection; Me-xico; Multi-label classifications; Multi-labels; Obstetric violence identification; Synthetic data; Transformer


Resumen:

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.This paper presents a labeled corpus of tweets related to obstetric violence in Mexico, annotated by narrative type and type of violence. We address the challenges of multi-class and multi-label classification under severe data imbalance using synthetic tweet generation via large language models and specialized loss functions. Our results show that RoBERTuito achieves the best performance, and that both data augmentation and loss adaptation improve classification metrics. This work contributes resources and methods for the automatic detection of obstetric violence in social media discourse.


Entidades citadas de la UNAM: