Tag Archives: Document classification

Document Classification Pattern Recognition via Information Fusion: A systematic review of multimodal and multiview representation approaches

Abstract

Information fusion is used widely to improve document classification by integrating multiple data sources (multimodal) or multiple representations of the same data (multiview). Yet the literature has been fragmented: there has been no unified framework, no quantitative synthesis of “how much fusion helps,” and limited practitioner-oriented guidance. In our systematic review we analyse 139 primary studies, propose a formal framework to structure the field, summarise key qualitative trends, and perform a random-effects meta-analysis (to our knowledge, the first focused specifically on document classification). The results show that multimodal fusion significantly improves accuracy (mean gain +5.28 percentage points, p=0.0016), while multiview fusion yields consistent but modest improvements for accuracy (+4.67%), F1-score (+3.08%) and recall (all p<0.05). We also highlight a reproducibility gap: only 11.8% (multimodal) and 23.3% (multiview) of studies report statistical tests. Overall, the key lesson is practical: success depends less on algorithmic complexity and more on aligning the fusion strategy with the task context and committing to rigorous validation.

The Outcomes and Publication Standards of Research Descriptions in Document Classification: A Systematic Review

Abstract

Document classification, a critical area of research, employs machine and deep learning methods to solve real-world problems. This study attempts to highlight the qualitative and quantitative outcomes of the literature review from a broad range of scopes, including machine and deep learning methods, as well as solutions based on nature, biological, or quantum physics-inspired methods. A rigorous synthesis was conducted using a systematic literature review of 102 papers published between 2003 and 2023. The 20 Newsgroups (bydate version) were used as a reference point of benchmarks to ensure fair comparisons of methods. Qualitative analysis revealed that recent studies utilize Graph Neural Networks (GNNs) combined with models based on the transformer architecture and propose end-to-end solutions. Quantitative analysis demonstrated state-of-the-art results, with accuracy, micro and macro F1-scores of 90.38%, 88.28%, and 89.38%, respectively. However, the reproducibility of many studies may need to be revised for the scientific community. The resulting overview covers a wide range of document classification methods and can contribute to a better understanding of this field. Additionally, the systematic review approach reduces systematic error, making it useful for researchers in the document classification community.

A recent overview of the state-of-the-art elements of text classification

Abstract

The aim of this study is to provide an overview the state-of-the-art elements of text classification. For this purpose, we first select and investigate the primary and recent studies and objectives in this field. Next, we examine the state-of-the-art elements of text classification. In the following steps, we qualitatively and quantitatively analyse the related works. Herein, we describe six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation. This study will help readers acquire the necessary information about these elements and their associated techniques. Thus, we believe that this study will assist other researchers and professionals to propose new studies in the field of text classification.