Monthly Archives: October 2019

Recognising innovative companies by using a diversified stacked generalisation method for website classification

Abstract

In this paper, we propose a classification system which is able to decide whether a company is innovative or not, based only on its public website available on the internet. As innovativeness plays a crucial role in the development of myriad branches of the modern economy, an increasing number of entities are expending effort to be innovative. Thus, a new issue has appeared: how can we recognise them? Not only is grasping the idea of innovativeness challenging for humans, but also impossible for any known machine learning algorithm. Therefore, we propose a new indirect technique: a diversified stacked generalisation method, which is based on a combination of a multi-view approach and a genetic algorithm. The proposed approach achieves better performance than all other classification methods which include: (i) models trained on single datasets; or (ii) a simple voting method on these models. Furthermore, in this study, we check if unaligned feature space improves classification results. The proposed solution has been extensively evaluated on real data collected from companies’ websites. The experimental results verify that the proposed method improves the classification quality of websites which might represent innovative companies.

Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study

Abstract

In this paper, the author presents a novel information extraction system that analyses fire service reports. Although the reports contain valuable information concerning fire and rescue incidents, the narrative information in these reports has received little attention as a source of data. This is because of the challenges associated with processing these data and making sense of the contents through the use of machines. Therefore, a new issue has emerged: How can we bring to light valuable information from the narrative portions of reports that currently escape the attention of analysts? The idea of information extraction and the relevant system for analysing data that lies outside existing hierarchical coding schemes can be challenging for researchers and practitioners. Furthermore, comprehensive discussion and propositions of such systems in rescue service areas are insufficient. Therefore, the author comprehensively and systematically describes the ways in which information extraction systems transform unstructured text data from fire reports into structured forms. Each step of the process has been verified and evaluated on real cases, including data collected from the Polish Fire Service. The realisation of the system has illustrated that we must analyse not only text data from the reports but also consider the data acquisition process. Consequently, we can create suitable analytical requirements. Moreover, the quantitative analysis and experimental results verify that we can (1) obtain good results of the text segmentation (F-measure 95.5%) and classification processes (F-measure 90%) and (2) implement the information extraction process and perform useful analysis.