23 paź
Abstracts
Abstracts
Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study
In this paper, the author presents a novel information extraction system that analyses fire service reports. Although the reports contain valuable information concerning fire and rescue incidents, the narrative information in these reports has received little attention as a source of data. This is because of the challenges associated with processing these data and making sense of the contents through the use of machines. Therefore, a new issue has emerged: How can we bring to light valuable information from the narrative portions of reports that currently escape the attention of analysts? The idea of information extraction and the relevant system for analysing data that lies outside existing hierarchical coding schemes can be challenging for researchers and practitioners. Furthermore, comprehensive discussion and propositions of such systems in rescue service areas are insufficient. Therefore, the author comprehensively and systematically describes the ways in which information extraction systems transform unstructured text data from fire reports into structured forms. Each step of the process has been verified and evaluated on real cases, including data collected from the Polish Fire Service. The realisation of the system has illustrated that we must analyse not only text data from the reports but also consider the data acquisition process. Consequently, we can create suitable analytical requirements. Moreover, the quantitative analysis and experimental results verify that we can (1) obtain good results of the text segmentation (F-measure 95.5%) and classification processes (F-measure 90%) and (2) implement the information extraction process and perform useful analysis.
Recognising innovative companies by using a diversified stacked generalisation method for website classification
In this paper, we propose a classification system which is able to decide whether a company is innovative or not, based only on its public website available on the internet. As innovativeness plays a crucial role in the development of myriad branches of the modern economy, an increasing number of entities are expending effort to be innovative. Thus, a new issue has appeared: how can we recognise them? Not only is grasping the idea of innovativeness challenging for humans, but also impossible for any known machine learning algorithm. Therefore, we propose a new indirect technique: a diversified stacked generalisation method, which is based on a combination of a multi-view approach and a genetic algorithm. The proposed approach achieves better performance than all other classification methods which include: (i) models trained on single datasets; or (ii) a simple voting method on these models. Furthermore, in this study, we check if unaligned feature space improves classification results. The proposed solution has been extensively evaluated on real data collected from companies’ websites. The experimental results verify that the proposed method improves the classification quality of websites which might represent innovative companies.
Empirical evaluation of feature projection algorithms for multi-view text classification
This study aims to propose (i) a multi-view text classification method and (ii) a ranking method that allows for selecting the best information fusion layer among many variations. Multi-view document classification is worth a detailed study as it makes it possible to combine different feature sets into yet another view that further improves text classification. For this purpose, we propose a multi-view framework for text classification that is composed of two levels of information fusion. At the first level, classifiers are constructed using different data views, i.e. different vector space models by various machine learning algorithms. At the second level, the information fusion layer uses input information using a features projection method and a meta-classifier modelled by a selected machine learning algorithm. A final decision based on classification results produced by the models positioned at the first layer is reached. Moreover, we propose a ranking method to assess various configurations of the fusion layer. We use heuristics that utilise statistical properties of F-score values calculated for classification results produced at the fusion layer. The information fusion layer of the classification framework and ranking method has been empirically evaluated. For this purpose, we introduce a use case checking whether companies’ domains identify their innovativeness. The results empirically demonstrate that the information fusion layer enhances classification quality. The Friedman’s aligned rank and Wilcoxon signed-rank statistical tests and the effect size support this hypothesis. In addition, the Spearman statistical test carried out for the obtained results demonstrated that the assessment made by the proposed ranking method converges to a well-established method named Hellinger – The Technique for Order Preference by Similarity to Ideal Solution (H-TOPSIS). Thus, the proposed approach may be used for the assessment of classifier performance.
A recent overview of the state-of-the-art elements of text classification
The aim of this study is to provide an overview the state-of-the-art elements of text classification. For this purpose, we first select and investigate the primary and recent studies and objectives in this field. Next, we examine the state-of-the-art elements of text classification. In the following steps, we qualitatively and quantitatively analyse the related works. Herein, we describe six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation. This study will help readers acquire the necessary information about these elements and their associated techniques. Thus, we believe that this study will assist other researchers and professionals to propose new studies in the field of text classification.
The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction
The aim of this study is to propose an information extraction system, called BigGrams, which is able to retrieve relevant and structural information (relevant phrases, keywords) from semi-structural web pages, i.e. HTML documents. For this purpose, a novel semi-supervised wrappers induction algorithm has been developed and embedded in the BigGrams system. The wrappers induction algorithm utilizes a formal concept analysis to induce information extraction patterns. Also, in this article, the author (1) presents the impact of the configuration of the information extraction system components on information extraction results and (2) tests the boosting mode of this system. Based on empirical research, the author established that the proposed taxonomy of seeds and the HTML tags level analysis, with appropriate pre-processing, improve information extraction results. Also, the boosting mode works well when certain requirements are met, i.e. when well-diversified input data are ensured.
Categorization of Multilingual Scientific Documents by a Compound Classification System
The aim of this study was to propose a classification method for documents that include simultaneously text parts in various languages. For this purpose, we constructed a three-leveled classification system. On its first level, a data processing module prepares a suitable vector space model. Next, in the middle tier, a set of monolingual or multilingual classifiers assigns the probabilities of belonging each document or its parts to all possible categories. The models are trained by using Multinomial Naive Bayes and Long Short-Term Memory algorithms. Finally, in the last component, a multilingual decision module assigns a target class to each document. The module is built on a logistic regression classifier, which as the inputs receives probabilities produced by the classifiers. The system has been verified experimentally. According to the reported results, it can be assumed that the proposed system can deal with textual documents which content is composed of many languages at the same time. Therefore, the system can be useful in the automatic organizing of multilingual publications or other documents.
A Diversified Classification Committee for Recognition of Innovative Internet Domains
The objective of this paper was to propose a classification method of innovative domains on the Internet. The proposed approach helped to estimate whether companies are innovative or not through analyzing their web pages. A Naïve Bayes classification committee was used as the classification system of the domains. The classifiers in the committee were based concurrently on Bernoulli and Multinomial feature distribution models, which were selected depending on the diversity of input data. Moreover, the information retrieval procedures were applied to find such documents in domains that most likely indicate innovativeness. The proposed methods have been verified experimentally. The results have shown that the diversified classification committee combined with the information retrieval approach in the preprocessing phase boosts the classification quality of domains that may represent innovative companies. This approach may be applied to other classification tasks.
The Cascading Knowledge Discovery in Databases process in the Information System development
This article describes a proposal of information system project method. This method based on author’s cascading knowledge discovery in databases process. In this article, the author also to presented use case of this process. All analysis presented in this article based on text reports from the rescue fire service.
The hybrid decision support system for Fire Service – chosen project’s problems
This article describes the process of designing a hybrid decision support system HSWD for the Fire Service. This designing process realize a methodology of design for trustworthy software – DFTS. In this article describes chosen project problems and their solution on the first stage of proposed design process.
The Method of Designing the Knowledge Database and Rules for a Text Segmentation Tool Based on Formal Concept Analysis
Objective: Presentation of a specialist text segmentation technique. The text was derived from reports (a form “Information about the event”, field “Information about the event – descriptive data”) prepared by rescue units of the State Fire Service after firefighting and rescue operations.
Methods: In order to perform the task the author has proposed a method of designing the knowledge base and rules for a text segmentation tool. The proposed method is based on formal concept analysis (FCA). The knowledge base and rules designed by the proposed method allow performing the segmentation process of the available documentation. The correctness and effectiveness of the proposed method was verified by comparing its results with the other two solutions used for text segmentation.
Results: During the research and analysis rules and abbreviations that were present in the studied specialist texts were grouped and described. Thanks to the formal concepts analysis a hierarchy of detected rules and abbreviations was created. The extracted hierarchy constituted both a knowledge and rules base of tools for segmentation of the text. Numerical and comparative experiments on the author’s solution with two other methods showed significantly better performance of the former. For example, the F-measure results obtained from the proposed method are 95.5% and are 7-8% better than the other two solutions.
Conclusions: The proposed method of design knowledge and rules base text segmentation tool enables the design and implementation of software with a small error divide the text into segments. The basic rule to detect the end of a sentence by the interpretation of the dots and additional characters as the end of the segment, in fact, especially in case of specialist texts, must be packaged with additional rules. These actions will significantly improve the quality of segmentation and reduce the error. For the construction and representation of such rules is suitable presented in the article, the formal concepts analysis. Knowledge engineering and additional experiments can enrich the created hierarchy by the new rules. The newly inserted knowledge can be easily applied to the currently established hierarchy thereby contributing to improving the segmentation of the text. Moreover, within the numerical experiment is made unique: a set of rules and abbreviations used in reports and set properly separated and labeled segments.
Detecting and Extracting Semantic Topics from Polish Fire Service Raports
This article presents results of structuring text documents using the classification process. The proposed system based on classification process which used to extract information about the semantics (meaning) segments (sentences) that build text documents. The analysis was made on the reports coming from the National Fire Service (Polish Fire Service) event evidence system. The article describes the results of classification using the proposed classifiers and presents some future directions of research.
Language-Independent Information Extraction Based on Formal Concept Analysis
This paper proposes application of Formal Concept Analysis (FCA) in creating character-level information extraction patterns and presents BigGrams: a prototype of a languageindependent information extraction system. The main goal of the system is to recognise and to extract of named entities belonging to some semantic classes (e.g. cars, actors, pop-stars, etc.) from semi structured text (web page documents).
Proposition of hybrid process model semi structured description of event from fire services rescues operation
This paper describes a review of actual developed knowledge representation and case representation for fire services cases based reasoning system. The article also describes a method of processing the cases of events. This processing method based on classification and information retrieval.
Application of formal concept analysis for information extraction system analysis
This article describes a design process of information extraction system IES. The proposed projecting method is based on rules and formal concept analysis.
Proposed searches component of the CBR for the rescue service based on the ontology of domain
This paper describes problems of designing search module of case based reasoning system. In first part of this article author describes a review of solutions available in fire service like decision support system with implements reasoning solution – cased base reasoning CBR. Second part of this article describes a search component of this CBR system. Author propose ontology layer to support search process of case in this module. This ontology layer is a result of such conducted by the author’s analysis of the documentation describing the rescues action. In the last section author summarize proposed project of this component and presents new develop way to construct, refactorization and extend proposed ontology.
Review of methods and text data mining techniques
This article describes the author’s classification of the methods and techniques of textual data mining. In this article also describes the currently available methods and sauces representation of textual data and their processing techniques. Also conducted a discussion on the processing of text documents using the presented methods. This paper also discussed the possibilities and limitations of individual methods to process the presented text documents.
Crowdsourcing in rescue fire service – proposed application
This article describes the author’s proposal to apply crowdsourcing in Polish rescue fire service. This article also describes basic principles for implementing an crowdsourcing information platform in rescue fire service as well as the scheme of its implementation. The Author of this paper also describes the genesis of this proposal related to the evaluation of research conducted by the author on text mining analysis and extraction of information in the design of information systems.
Information system about hydrants network for the rescue service: method of text segmentation and evaluation
This article describes the design process rule segmentator. The article also describes a reference design procedure set of segments. There have been a description of the numerical experiment and the reference created a collection of segments. Designed segmentator was used to extract segments from the fire service reports. Segmentation results were compared with other solutions to the segmentation of the text.
The process and methods of text mining to processing reports from rescue service
This paper describes the process for processing reports from rescue and firefighting. To reports processing methods and techniques used in the field of textual data mining (text mining). This paper also presents the classification and analysis methods section of text which is considered a potential use in the proposed process.
Use of ERP components to build a second-generation hybrid decisionssupport system for the PSP
This article describes extend ideas of projecting hybrid decision support system in contextof enterprise resource planning for the Fire Service. In this article make a new assumption, terms and concept for projecting platform thanks to this basic functional specification was extended. Authors in this paper also propose uniform solution consist in connection two complementary systems i.e. the decision support system and multimedia training system.
The modified analysis FMEA with the elements SFTA in projects of hydrotehnical object information searsh system in noSQL catalogue register
This article describes a modified failure modes and effects analysis (FMEA) with the elements software failure tree analysis SFTA in project of hydrotehnical object information search system in no relational catalogue register. For this analysis was created sheet of FMEA (Table 1). In this article also describes actual searching system of information based on extended full text search. The use case for search information from actual system describes in fig. 1. On the basis of analysis of basic system propose a new searching system. The use case for search information from new system describes in fig. 2. In this paper also describes a faults which produce a most important error – maker a decision can’t choose a hydrotehincal object. Analysis of this fault describes using a SFTA. The results of this analysis present SFTA graph fig. 3. In the end of this article present a solution for new searching information system of hydrotechnical object based on no relational catalog database. Whole presented solution summarize in the end of article.
Database management system and agent architecture into fire service
This article describes a new approaches and classification of Database Management System DBMS. In this article also describe potential using of agent architecture into fire service. The first part of this article is a review of DBMS. In this section describes DBMS historically oldest solutions like a hierarchical database model, relational database model and describes the newest solutions like a object oriented database model, conceptual model or model base on extensible markup language – XML. Actually applying of DBMS in polish fire service is strongly limited. Usually their use moves to keep a record of fire service events. Eventually DBMS are used to the minimum support of decision-makers. The proposed review gives the wider glance on uses DBMS in rescue services. In the second part author describes a outline decision support system for fire service which based on agent data base management system. This section describes a system’s functions and his way of working. The end the article consist of summary where author describes techniques and problems appears in project cycle of agent based platform.
Data mining review and use’s classification, methods and techniques
The large quantity of the data and information accumulated into actual information systems and their successive extension extorted the development of new processes, techniques and methods to their storing, processing and analysing. Currently the achievement from the statistical analyses and artificial intelligence area are use to the analysis process of the large data sets. These fields make up the core of data exploration – data mining. Currently the data mining aspires to independent scientific method which one uses to solving problems from range of information analysis comes from the data bases menagments systems. In this article was described review and use’s classification, methods and techniques which they are using in the process of the data exploration. In this article also was described actual development direction and described elements which require this young applied discipline of the science.
The project of hybrid decision support system for the Fire Service
Issues of model design of the hybrid decision support system (HDSS) for the Fire Service were described. In the project analysis the aspects of use and incorporation an expert subsystem in the decision system were considered.
In the second section the methodology assumptions of the system projecting, based on applying the object-oriented description and Unified Modeling Language (UML) was presented. In the third section the analysis and presentation of the HDSS project was performed. That analysis was made with the use of the Activity Diagram. Basic system’s component, influences between them and the ways of the flow of information were clearly-defined. On the Activity Diagram basic components of the Decision Support System were separated and situated. The separated components and the physical realization of the system’s activity by the use of the Components Diagram were presented. Also possible use of the text mining component in the projected system was described.
Data mining into Knowledge Discovery In Databases (KDD) and methodology of Cross-Industry Standard Process For Data Mining (CRISP-DM) context
Article aims at introducing for the readers few problems connected with KDD process, Data Mining project modeling with the use of CRISP-DM. The systemized knowledge, approaches to and generic terms was presented in the article. In the first part article describes approach to Data Exploration as one of the KDD cycle, which is specialized Knowledge Discovery process. Then article takes the subject of CRISP-DM method. The context of method usage depending on scale and integration of project, which they concern – investigate of useing text mining in Inteligent Decission Support System (IDSS) develop by informatic faculty of Fire Service. At the end of the article the summary was made, which contains common features between the two looks on the exploration and extracting knowledge from data bases.
The concept of expert system for decision support in the State Fire Service
The article was presented the concept of building and modeling Decision Support System for the State Fire Service. This concept relies on a description of the object in the form of UML for modeling the system. In addition, as described aspects of using and building in decision-making system expert subsystem. Layer of knowledge representation system is used for description of ontological, as the most promising method for modeling current knowledge. For the purpose of the system also covers the role, place and scope of the analysis of the text which is a basic component in this decision-making platform.
The distributed system for collecting and analysing selected medical data
In this paper the structure of a three-tiered distributed system for collecting and analysing medical examination data is presented. The idea of this work is to make an assistant tool for urologists to diagnose the lower urinary track diseases and their symptoms easier. The data (which are processed from the files made in the uroflowmeters – devices for measuring urine flow rate) are presented in web browser. It has been done with the use of PHP scripts which are accessed through Apache web server.