NER Error Analyzer

Quick Start

from nlu.error import *
from nlu.parser import *


cols_format = [{'type': 'predict', 'col_num': 1, 'tagger': 'ner'},
                {'type': 'gold', 'col_num': 2, 'tagger': 'ner'}]

parser = ConllParser('testb.pred.gold', cols_format)

parser.obtain_statistics(entity_stat=True, source='predict')

parser.obtain_statistics(entity_stat=True, source='gold')

parser.set_entity_mentions()

NERErrorAnnotator.annotate(parser)

parser.print_corrects()

parser.print_all_errors()

parser.error_overall_stats()

see the section Input Format below to know what the input format is

Usage

import

from nlu.error import *
from nlu.parser import *

Create a `ConllParser` instance first with the input of the file path with specifying the column number in `cols_format` field

ConllParser(filepath)

cols_format = [{'type': 'predict', 'col_num': 1, 'tagger': 'ner'},
                {'type': 'gold', 'col_num': 2, 'tagger': 'ner'}]

parser = ConllParser('testb.pred.gold', cols_format)

obtain the basic statistics by `obtain_statistics()` method

parser.obtain_statistics(entity_stat=True, source='predict')

parser.obtain_statistics(entity_stat=True, source='gold')

To "Annotate" NER Errors in the documents inside ConllParser

NERErrorAnnotator.annotate(parser)

To print out all corrects/errors, use

parser.print_corrects() or parser.print_all_errors()

or use the function error_overall_stats() method to get the stats

Input File Format

The input file format of ConllParser is following the column format used by Conll03.

For example,

Natural I-ORG O
Language I-ORG O
Laboratory I-ORG I-ORG
...

where the first column is the text, the second and the third are the predicted and the ground truth tag respectively, where the order can be specified in the keyword cols_format in ConllParser in instantialization:

cols_format = [{'type': 'predict', 'col_num': 1, 'tagger': 'ner'},
               {'type': 'gold', 'col_num': 2, 'tagger': 'ner'}]  # col_num starts from 0

I recommend to use shell command awk '{print $x}' filepath to obtain the x-th column, like awk '{print $4} filepath' to obtain the 4-th column.

And use paste file1.txt file2.txt to concatenate two files.

For example,

awk '{print $4}' eng.train > ner_tags_file  # $num starts from 1
paste ner_pred_tags_file ner_tags_file

Types of Span Errors

Types	Number of Mentions (Predicted and Gold)	Subtypes	Examples	Notes
Missing Mention (False Negative)	1	TYPES→O	[] → None
Extra Mention (False Positive)	1	O→TYPES	None → [...]
Mention with Wrong Type (Type Errors)	≥ 2	TYPES-> TYPES - self ( {(p, g) \| p ∈ T, g ∈ T - p } )	[_PER...] → [_ORG...] # todo	But the spans are the same
Missing Tokens	2	L/ R/ LR Diminished	[_MISC1991 World Cup] → [_MISC1991] [_MISC World Cup]	also possible with type errors
Extra Tokens	2	L/R/LR Expanded	[...] → [......]	also possible with type errors
Missing + Extra Tokens	2	L/R Crossed	..[...].. → .[..]...	also possible with type errors
Conflated Mention	≥ 3		[][][] → []	also possible with type errors
Divided Mention	≥ 3		[_MISC1991 World Cup] → [_MISC1991] [_MISC World Cup] [_PERBarack Hussein Obama] → [_PERBarack][_PERHussein][_PERObama]	also possible with type errors
Complicated Case	≥ 3		[][][] → [][]	also possible with type errors
Ex - Mention with Wrong Segmentation (Same overall range but wrong segmentation)	≥ 4		[...][......][.] → [......][.....]	also possible with type errors

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
docs		docs
nlu		nlu
notebooks		notebooks
scripts		scripts
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NER Error Analyzer

Quick Start

Usage

import

Create a `ConllParser` instance first with the input of the file path with specifying the column number in `cols_format` field

obtain the basic statistics by `obtain_statistics()` method

To "Annotate" NER Errors in the documents inside ConllParser

To print out all corrects/errors, use

Input File Format

Types of Span Errors

About

Releases

Packages

Languages

ciaochiaociao/nlds

Folders and files

Latest commit

History

Repository files navigation

NER Error Analyzer

Quick Start

Usage

import

Create a ConllParser instance first with the input of the file path with specifying the column number in cols_format field

obtain the basic statistics by obtain_statistics() method

To "Annotate" NER Errors in the documents inside ConllParser

To print out all corrects/errors, use

Input File Format

Types of Span Errors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Create a `ConllParser` instance first with the input of the file path with specifying the column number in `cols_format` field

obtain the basic statistics by `obtain_statistics()` method

Packages