Abstract
This study unveils the Named Entity Recognition (NER) system specifically designed for Urdu news headlines, aimed at bridging crucial linguistic resource gaps. We meticulously developed a comprehensive corpus from diverse news sources, specifically tailored to reflect Urdu’s unique orthographic and morphological characteristics. Our approach incorporates state-of-the-art (SOTA) neural technologies including transformers for deep contextual embeddings, Graph Convolutional Networks (GCN) for detailed syntactic analysis, and Biaffine Attention mechanisms to enhance inter-token relationships. A Conditional Random Field (CRF) layer further ensures accurate and consistent entity labeling, improving the system’s precision. Initially, our model was rigorously benchmarked using established transformer models such as XLM-R, mBERT, and XLNet to set initial performance benchmarks. Subsequent enhancements involved integrating encoder functionalities from generative models like mBART and mT5, allowing a thorough comparative evaluation of these advanced encoders against our benchmarks. This phase aimed to assess their potential in effectively detecting implicit entities, thus enhancing our model’s functionality for complex searches and automated content categorization on Urdu digital platforms. Our improvements notably contribute to computational linguistics by extending SOTA language technologies to under-resourced languages and promoting greater inclusivity in Natural Language Processing (NLP).
| Original language | English |
|---|---|
| Article number | 489 |
| Number of pages | 22 |
| Journal | Complex & Intelligent Systems |
| Volume | 11 |
| Issue number | 489 |
| DOIs | |
| Publication status | Published (VoR) - 29 Oct 2025 |
Fingerprint
Dive into the research topics of 'Urdu Aspect-Category-Opinion-Sentiment (UACOS) Quadruple Extraction: A Transfer Learning Approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver