Abstract
Information fusion is used widely to improve document classification by integrating multiple data sources (multimodal) or multiple representations of the same data (multiview). Yet the literature has been fragmented: there has been no unified framework, no quantitative synthesis of “how much fusion helps,” and limited practitioner-oriented guidance. In our systematic review we analyse 139 primary studies, propose a formal framework to structure the field, summarise key qualitative trends, and perform a random-effects meta-analysis (to our knowledge, the first focused specifically on document classification). The results show that multimodal fusion significantly improves accuracy (mean gain +5.28 percentage points, p=0.0016), while multiview fusion yields consistent but modest improvements for accuracy (+4.67%), F1-score (+3.08%) and recall (all p<0.05). We also highlight a reproducibility gap: only 11.8% (multimodal) and 23.3% (multiview) of studies report statistical tests. Overall, the key lesson is practical: success depends less on algorithmic complexity and more on aligning the fusion strategy with the task context and committing to rigorous validation.