Categorization of Multilingual Scientific Documents by a Compound Classification System

Abstract

The aim of this study was to propose a classification method for documents that include simultaneously text parts in various languages. For this purpose, we constructed a three-leveled classification system. On its first level, a data processing module prepares a suitable vector space model. Next, in the middle tier, a set of monolingual or multilingual classifiers assigns the probabilities of belonging each document or its parts to all possible categories. The models are trained by using Multinomial Naive Bayes and Long Short-Term Memory algorithms. Finally, in the last component, a multilingual decision module assigns a target class to each document. The module is built on a logistic regression classifier, which as the inputs receives probabilities produced by the classifiers. The system has been verified experimentally. According to the reported results, it can be assumed that the proposed system can deal with textual documents which content is composed of many languages at the same time. Therefore, the system can be useful in the automatic organizing of multilingual publications or other documents.

Detection of the Innovative Logotypes on the Web Pages

Abstract

The aim of this study was to describe a found method for detection of logotypes that indicate innovativeness of companies, where the images originate from their Internet domains. For this purpose, we elaborated a system that covers a supervised and heuristic approach to construct a reference dataset for each logotype category that is utilized by the logistic regression classifiers to recognize a logotype category. We proposed the approach that uses one-versus-the-rest learning strategy to learn the logistic regression classification models to recognize the classes of the innovative logotypes. Thanks to this we can detect whether a given company’s Internet domain contains an innovative logotype or not. More- over, we find a way to construct a simple and small dimension of feature space that is utilized by the image recognition process. The proposed feature space of logotype classification models is based on the weights of images similarity and the textual data of the images that are received from HTMLs ALT tags.

A Diversified Classification Committee for Recognition of Innovative Internet Domains

Abstract

The objective of this paper was to propose a classification method of innovative domains on the Internet. The proposed approach helped to estimate whether companies are innovative or not through analyzing their web pages. A Naïve Bayes classification committee was used as the classification system of the domains. The classifiers in the committee were based concurrently on Bernoulli and Multinomial feature distribution models, which were selected depending on the diversity of input data. Moreover, the information retrieval procedures were applied to find such documents in domains that most likely indicate innovativeness. The proposed methods have been verified experimentally. The results have shown that the diversified classification committee combined with the information retrieval approach in the preprocessing phase boosts the classification quality of domains that may represent innovative companies. This approach may be applied to other classification tasks.

The hybrid decision support system for Fire Service – chosen project’s problems

Abstract

This article presents the design process of a hybrid decision support system (HSWD) for the State Fire Service (PSP). The Design for Trustworthy Software (DFTS) methodology was chosen to ensure system reliability. The paper focuses particularly on the requirements planning stage and the overall platform design. The study identifies key challenges in the early project stages, primarily stemming from methodology, environment, and user-related factors. These elements play a crucial role at the start of the design process, whereas aspects such as software, hardware, and measurement have a lesser initial impact. The authors analyze the causes of these challenges and propose solutions to address them. By outlining the lack of specific information solutions in the current State Fire Service infrastructure, this research highlights the importance of a structured approach in decision support system development. The findings contribute to the design of a robust and reliable platform that enhances decision-making in emergency response scenarios.

The Cascading Knowledge Discovery: A Smarter Way to Design Information Systems

Abstract

This article describes a proposal of information system project method. This method based on author’s cascading knowledge discovery in databases process. In this article, the author also to presented use case of this process. All analysis presented in this article based on text reports from the rescue fire service.

A Method for Designing a Knowledge Base and Rules for Text Segmentation Using Formal Concept Analysis

Abstract

Objective: Presentation of a specialist text segmentation technique. The text was derived from reports (a form “Information about theevent”, field “Information about the event – descriptive data”) prepared by rescue units of the State Fire Service after firefighting andrescue operations.

Methods: In order to perform the task the author has proposed a method of designing the knowledge base and rules for a textsegmentation tool. The proposed method is based on formal concept analysis (FCA). The knowledge base and rules designed by theproposed method allow performing the segmentation process of the available documentation. The correctness and effectiveness of theproposed method was verified by comparing its results with the other two solutions used for text segmentation.

Results: During the research and analysis rules and abbreviations that were present in the studied specialist texts were grouped anddescribed. Thanks to the formal concepts analysis a hierarchy of detected rules and abbreviations was created. The extracted hierarchyconstituted both a knowledge and rules base of tools for segmentation of the text. Numerical and comparative experiments on theauthor’s solution with two other methods showed significantly better performance of the former. For example, the F-measure resultsobtained from the proposed method are 95.5% and are 7-8% better than the other two solutions.

Conclusions: The proposed method of design knowledge and rules base text segmentation tool enables the design and implementationof software with a small error divide the text into segments. The basic rule to detect the end of a sentence by the interpretation of thedots and additional characters as the end of the segment, in fact, especially in case of specialist texts, must be packaged with additionalrules. These actions will significantly improve the quality of segmentation and reduce the error. For the construction and representationof such rules is suitable presented in the article, the formal concepts analysis. Knowledge engineering and additional experiments canenrich the created hierarchy by the new rules. The newly inserted knowledge can be easily applied to the currently established hierarchythereby contributing to improving the segmentation of the text. Moreover, within the numerical experiment is made unique: a set ofrules and abbreviations used in reports and set properly separated and labeled segments

Designing Information Systems Through Text Mining: A Case Study of Fire Service Documentation

On September 25, 2013, at 12:15 PM in room WA-130 of the Rectorate building at Białystok University of Technology, I successfully defended my doctoral dissertation titled “Text Data Analysis in Designing a Selected Information System: A Case Study of National Fire Service Incident Documentation.” A detailed description of the proposed method can be found in the Publications – Seminars section or downloaded directly here. Below is a simplified overview of my research.