Document Classification Pattern Recognition via Information Fusion: A systematic review of multimodal and multiview representation approaches

Abstract

Information fusion is used widely to improve document classification by integrating multiple data sources (multimodal) or multiple representations of the same data (multiview). Yet the literature has been fragmented: there has been no unified framework, no quantitative synthesis of “how much fusion helps,” and limited practitioner-oriented guidance. In our systematic review we analyse 139 primary studies, propose a formal framework to structure the field, summarise key qualitative trends, and perform a random-effects meta-analysis (to our knowledge, the first focused specifically on document classification). The results show that multimodal fusion significantly improves accuracy (mean gain +5.28 percentage points, p=0.0016), while multiview fusion yields consistent but modest improvements for accuracy (+4.67%), F1-score (+3.08%) and recall (all p<0.05). We also highlight a reproducibility gap: only 11.8% (multimodal) and 23.3% (multiview) of studies report statistical tests. Overall, the key lesson is practical: success depends less on algorithmic complexity and more on aligning the fusion strategy with the task context and committing to rigorous validation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.