Segmentation Evaluation using SegEval

A package providing text segmentation evaluation metrics and utilities. (Installation)

Text segmentation is the task of splitting up any amount of text into segments by placing boundaries between some atomic unit (e.g., morphemes, words, lines, sentences, paragraphs, sections, etc.). It’s a common pre-processing step in many Natural Language Processing (NLP) tasks.

E.g., if we were to perform both manual and automatic syllabification of words, one may need a way to compare how close the automatic solution is to the manual one. For this, we can use Boundary Edit Distance and Boundary Similarity. Evaluating a hypothetical automatic syllabifier, we can obtain the results shown below.

Word Manual Solution Automatic Solution Boundary Edit Distance Boundary Similarity
automatic au·to·ma·tic au·tom·a·tic 2 matches, 1 near 0.83
segmentation seg·men·ta·tion seg·ment·ation 1 match, 1 near, 1 miss 0.50
is is is No edits 1.00
fun fun f·un 1 miss 0.00

This package is a collection of metrics and for comparing text segmentations and evaluating automatic text segmenters. Both new (Boundary Similarity, Segmentation Similarity) and traditional (WindowDiff, Pk) are included, as well as inter-coder agreement coefficients and confusion matrices based upon a boundary edit distance.

For more examples of how to use SegEval, see “An initial study of topical poetry segmentation”.

Release:2.0.11 (changelog)
Date:May 13, 2017

Feature Support

A variety of segmentation comparison metrics are implemented, including:

Additionally, B-based inter-coder agreement coefficients for segmentation that are suitable for 2 or more coders are provided, including:

User Guide

This part of the documentation, which is mostly prose, begins with some background information about Requests, then focuses on step-by-step instructions for getting the most out of Requests.

API Documentation

If you are looking for information on a specific function, class or method, this part of the documentation is for you.


If you have any suggestions, problems, or difficulties, please log an issue, or contact me.

Citing SegEval

If you’re using this software for research, please cite the ACL paper [Fournier2013] and, if you need to go into details, the thesis [Fournier2013b] describing this work.


    author      = {Fournier, Chris},
    year        = {2013},
    title       = {{Evaluating Text Segmentation using Boundary Edit Distance}},
    booktitle   = {Proceedings of 51st Annual Meeting of the Association for Computational Linguistics},
    publisher   = {Association for Computational Linguistics},
    location    = {Sophia, Bulgaria},
    pages       = {to appear},
    address     = {Stroudsburg, PA, USA}

    author      = {Fournier, Chris},
    title       = {Evaluating Text Segmentation},
    school      = {University of Ottawa},
    year        = {2013}


[ArtsteinPoesio2008]Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics, 4(4):555-596. MIT Press.
[Baker1990]David Baker. 1990. Stargazers look for life. South Magazine 117, 76–77. South Publications.
[BeefermanBerger1999]Doug Beeferman and Adam Berger. 1999. Statistical models for text segmentation. Machine learning, 34(1–210. Springer Netherlands.
[Cohen1960]Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37-46.
[Collins1868]Wilkie Collins. 1868. The Moonstone. Tinsley Brothers.
[DaviesFleiss1982]Mark Davies and Joseph L. Fleiss. 1982. Measuring agreement for multinomial data. Biometrics, 38(4):1047-1051.
[Fleiss1971]Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378-382.
[Fournier2013](1, 2, 3, 4) Chris Fournier. 2013. Evaluating Text Segmentation using Boundary Edit Distance. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. To appear.
[Fournier2013b]Chris Fournier. 2013. Evaluating Text Segmentation. Master’s Thesis. University of Ottawa.
[FournierInkpen2012]Chris Fournier and Diana Inkpen. 2012. Segmentation Similarity and Agreement. Proceedings of Human Language Technologies: The 2012 Annual Conference of the North American Chapter of the Association for Computational Linguistics. (HLT ‘12). Association for Computational Linguistics.
[Hearst1997]Marti A. Hearst. 1997. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics, 23(1):33-64.
[KazantsevaSzpakowicz2012]Kazantseva, A. & Szpakowicz, S. (2012), Topicalsegmentation: a study of human performance. Proceedings of Human Language Technologies: The 2012 Annual Conference of the North American Chapter of the Association for Computational Linguistics. (HLT ‘12). Association for Computational Linguistics.
[LamprierEtAl2007]Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, and Frederic Saubion 2007. On evaluation methodologies for text segmentation algorithms. Proceedings of the 19th IEEE International Conference on Tools with Arificial Intelligence, 2:19–26. IEEE Computer Society.
[PevznerHearst2002]Lev Pevzner and Marti A. Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36. MIT Press, Cambridge, MA, USA.
[Scott1955]William A. Scott. 1955. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19(3):321-325.
[SiegelCastellan1988]Sidney Siegel and N. John Castellan, Jr. 1988. Non-parametric Statistics for the Behavioral Sciences. 2nd Edition, Castellanhapter 9.8. McGraw-Hill.