Abstract
Objective: Presentation of a specialist text segmentation technique. The text was derived from reports (a form “Information about theevent”, field “Information about the event – descriptive data”) prepared by rescue units of the State Fire Service after firefighting andrescue operations.
Methods: In order to perform the task the author has proposed a method of designing the knowledge base and rules for a textsegmentation tool. The proposed method is based on formal concept analysis (FCA). The knowledge base and rules designed by theproposed method allow performing the segmentation process of the available documentation. The correctness and effectiveness of theproposed method was verified by comparing its results with the other two solutions used for text segmentation.
Results: During the research and analysis rules and abbreviations that were present in the studied specialist texts were grouped anddescribed. Thanks to the formal concepts analysis a hierarchy of detected rules and abbreviations was created. The extracted hierarchyconstituted both a knowledge and rules base of tools for segmentation of the text. Numerical and comparative experiments on theauthor’s solution with two other methods showed significantly better performance of the former. For example, the F-measure resultsobtained from the proposed method are 95.5% and are 7-8% better than the other two solutions.
Conclusions: The proposed method of design knowledge and rules base text segmentation tool enables the design and implementationof software with a small error divide the text into segments. The basic rule to detect the end of a sentence by the interpretation of thedots and additional characters as the end of the segment, in fact, especially in case of specialist texts, must be packaged with additionalrules. These actions will significantly improve the quality of segmentation and reduce the error. For the construction and representationof such rules is suitable presented in the article, the formal concepts analysis. Knowledge engineering and additional experiments canenrich the created hierarchy by the new rules. The newly inserted knowledge can be easily applied to the currently established hierarchythereby contributing to improving the segmentation of the text. Moreover, within the numerical experiment is made unique: a set ofrules and abbreviations used in reports and set properly separated and labeled segments