Categorization of Multilingual Scientific Documents by a Compound Classification System

Niedługo rusza konferencja The 16th International Conference on Artificial Intelligence and Soft Computing ICAISC 2017, Zakopane, Poland, June 11-15, 2017. Na ww. konferencję został zgłoszony i zaakceptowany artykuł dotyczący klasyfikacji dokumentów wielojęzycznych. Podążając za abstraktem – The aim of this study was to propose a classification method for documents that include simultaneously text parts in various languages. For this purpose, we constructed a three-leveled classification system. On its first level, a data processing module prepares a suitable vector space model. Next, in the middle tier, a set of monolingual or multilingual classifiers assigns the probabilities of belonging each document or its parts to all possible categories. The models are trained by using Multinomial Naive Bayes and Long Short-Term Memory algorithms. Finally, in the last component, a multilingual decision module assigns a target class to each document. The module is built on a logistic regression classifier, which as the inputs receives probabilities produced by the classifiers. The system has been verified experimentally. According to the reported results, it can be assumed that the proposed system can deal with textual documents which content is composed of many languages at the same time. Therefore, the system can be useful in the automatic organizing of multilingual publications or other documents.

Czytelnik może znaleźć więcej informacji w wersji angielskiej wpisu lub bezpośrednio w artykule.

Od Informacji do Wiedzy

Blog o informacjach na temat informacji i wiedzy

Categorization of Multilingual Scientific Documents by a Compound Classification System

Dodaj komentarz