Building a Gold Standard for Reference Information Extraction in the German Social Sciences

Currently there is a shortage of citation data for the social sciences and especially for the German social sciences. The EXCITE project aims to close this gap by automatically extracting such information from a large available corpus of research papers. A step towards this goal is to construct a gold standard which allows an evaluation of the different steps in the extraction pipeline. In this talk I will describe our approach to build such a gold standard. This also includes design decisions such as the selection of data sets and languages, the exclusion of certain types of papers, and the used annotation formats. The following discussion will then be used to gather feedback from the audience which might help to further improve the gold standard.

23.02.2017 - 10:15
B 017