Tag Archives: information extraction

Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study

Abstract

In this paper, the author presents a novel information extraction system that analyses fire service reports. Although the reports contain valuable information concerning fire and rescue incidents, the narrative information in these reports has received little attention as a source of data. This is because of the challenges associated with processing these data and making sense of the contents through the use of machines. Therefore, a new issue has emerged: How can we bring to light valuable information from the narrative portions of reports that currently escape the attention of analysts? The idea of information extraction and the relevant system for analysing data that lies outside existing hierarchical coding schemes can be challenging for researchers and practitioners. Furthermore, comprehensive discussion and propositions of such systems in rescue service areas are insufficient. Therefore, the author comprehensively and systematically describes the ways in which information extraction systems transform unstructured text data from fire reports into structured forms. Each step of the process has been verified and evaluated on real cases, including data collected from the Polish Fire Service. The realisation of the system has illustrated that we must analyse not only text data from the reports but also consider the data acquisition process. Consequently, we can create suitable analytical requirements. Moreover, the quantitative analysis and experimental results verify that we can (1) obtain good results of the text segmentation (F-measure 95.5%) and classification processes (F-measure 90%) and (2) implement the information extraction process and perform useful analysis.

Pages: 1 2

The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction

Leave a Reply

Abstract

The aim of this study is to propose an information extraction system, called BigGrams, which is able to retrieve relevant and structural information (relevant phrases, keywords) from semi-structural web pages, i.e. HTML documents. For this purpose, a novel semi-supervised wrappers induction algorithm has been developed and embedded in the BigGrams system. The wrappers induction algorithm utilizes a formal concept analysis to induce information extraction patterns. Also, in this article, the author (1) presents the impact of the configuration of the information extraction system components on information extraction results and (2) tests the boosting mode of this system. Based on empirical research, the author established that the proposed taxonomy of seeds and the HTML tags level analysis, with appropriate pre-processing, improve information extraction results. Also, the boosting mode works well when certain requirements are met, i.e. when well-diversified input data are ensured.

Pages: 1 2

The Cascading Knowledge Discovery: A Smarter Way to Design Information Systems

Leave a Reply

Abstract

This article describes a proposal of information system project method. This method based on author’s cascading knowledge discovery in databases process. In this article, the author also to presented use case of this process. All analysis presented in this article based on text reports from the rescue fire service.

Pages: 1 2

Article – Language-Independent Information Extraction Based on Formal Concept Analysis

Leave a Reply

This paper proposes application of Formal Concept Analysis (FCA) in creating character-level information extraction patterns and presents BigGrams: a prototype of a languageindependent information extraction system. The main goal of the system is to recognise and to extract of named entities belonging to some semantic classes (e.g. cars, actors, pop-stars, etc.) from semi structured text (web page documents).

Designing Information Systems Through Text Mining: A Case Study of Fire Service Documentation

Leave a Reply

On September 25, 2013, at 12:15 PM in room WA-130 of the Rectorate building at Białystok University of Technology, I successfully defended my doctoral dissertation titled “Text Data Analysis in Designing a Selected Information System: A Case Study of National Fire Service Incident Documentation.” A detailed description of the proposed method can be found in the Publications – Seminars section or downloaded directly here. Below is a simplified overview of my research.

Pages: 1 2

Od Informacji do Wiedzy

Blog o informacjach na temat informacji i wiedzy

Tag Archives: information extraction

Information Extraction System for Transforming Unstructured Text Data in Fire Reports into Structured Forms: A Polish Case Study

Abstract

The BigGrams: the semi-supervised information extraction system from HTML: an improvement in the wrapper induction

Abstract

The Cascading Knowledge Discovery: A Smarter Way to Design Information Systems

Abstract

Article – Language-Independent Information Extraction Based on Formal Concept Analysis

Designing Information Systems Through Text Mining: A Case Study of Fire Service Documentation