NERSO

Named entity disambiguation project using DBpedia Database

We developed a system called NERSO, short for Named Entity Recognition Using Semantic Open dataIt is in the category of other named entity recognition and disambiguation systems using linked data. We use a “centrality scoring mechanism” on the entity graph to disambiguate the similarly named entities.
Team

Test cases

Dataset1

This is the dataset used in DBpedia Spotlight project. It is manually annotated entities of DBpedia resources for entities mentioned in 10 news articles. Here is the goldset (entities).

Dataset2

This is the dataset we compiled for 10 New York Times articles.

We manually annotated all articles and created a seperate goldset for each article.

Evaluation of both Dataset1 and Dataset2 for Zemanta, Spotlight and NERSO

Each zipped file contains 10 text files. Each text file contains surface form and annotated entities for the particular project.

For example: for the line “Search engine Web_search_engine”, “Search engine” is a surface form in the given text and “Web_search_engine” is the DBpedia/Wikipedia URI or entity.

 

DATASET PROJECT  Zip file
Dataset1 NERSO nersoDataset1
Dataset2 NERSO nersoDataset2
Dataset1 Zemanta zemantaDataset1
Dataset2 Zemanta zemantaDataset2
Dataset1 DBpedia Spotlight spotlightDataset1
Dataset2 DBpedia Spotlight spotlightDataset2