EXCITE Workshop 2017: “Challenges in Extracting and Managing References”


When: 30.03.2017 - 31.03.2017
Where: GESIS-Leibniz-Institut für Sozialwissenschaften, Unter Sachsenhausen 6-8, 50667 Cologne, Germany

EXCITE is a collaborative activity of the GESIS – Leibniz Institute for the Social Sciences and the Institute for Web Science and Technologies (WeST) which has started in September 2016. The project develops a tool chain implementing the following steps: Extraction of text from the source documents, identification of individual references in the text, segmentation of those references, matching of reference strings against bibliographic databases, and export of the matched references in usable formats and services. Special attention will be paid to the overall optimization of individual components of the citation extraction.

Our first community meeting is planned as a “noon to noon” event and has the goal to bring together experts in reference extraction, text mining, and machine learning to explore the possibilities in the project. We plan to have scientific presentations with invited speakers on the first day and hands-on sessions on the second day. For the second day we will release a test corpus (PDF files of scientific papers and manually annotated data) for developers.


Day One (Thursday, 30.03.2017)

Time Title Speaker Slides Video
11:00 Arrival (Room: West II)      
Welcome and Introduction
Steffen Staab, WeST
Philipp Mayr, GESIS
12:20 Information Extraction out of Born-Digital Scientific Articles Roman Kern, TU Graz Link Link
12:40 Advanced citation matching and large-scale full-text analysis Nees Jan van Eck, Leiden U Link Link
13:00 Lunch Break (Cafeteria)      
14:20 APIs for third parties to extract and deposit output executions of automated extraction pipelines (via videoconferencing) Min-Yen Kan, NU Singapore Link Link
14:40 Extracting references from scientific articles in CERMINE system Dominika Tkaczyk, U Warsaw Link Link
15:00 Coffee Break (Cafeteria)      
CitEc to CitEcCyr. A stab at distributed citation systems. (via videoconferencing)
Link 1,
Link 2
EXCITE project: Status report
Behnam Ghavimi, GESIS
Martin Körner, WeST
    Heinrich Hartmann
16:10 Processing of in-text References: Towards a Semantic Analysis Marc Bertin, U Toulouse    
16:30 Citations in Utopia Documents David Thorne, U Manchester Link Link
16:50 Coffee Break (Cafeteria)      
17:20 Research around the Tagging System BibSonomy Andreas Hotho, U Würzburg    
LOC-DB: A Linked Open Citation Database provided by Libraries. Motivation and Challenges.
Kai Eckert, HDM Stuttgart
Anne Lauscher, HDM Stuttgart
Akansha Bhardwaj, DFKI
18:20 Record Linkage between CiteSeerX and Web of Science (via videoconferencing) Lee Giles, Penn State U    
18:50 Break      
20:00 Dinner at Gaffel am Dom (paid by participants)      
22:00 Socializing      
23:00 End      

Day Two (Friday, 31.03.2017)

Time Title    
9:00 Second Day Kickoff (Room: West II)
9:15 Extraction Result Discussion Group Gold Standard Discussion Group Collaboration Discussion Group
11:15 Coffee Break (Cafeteria)
11:30 Extraction Result Discussion Group Gold Standard Discussion Group Collaboration Discussion Group
12:30 Closing Talks (Room: West II)    
13:00 End    


Gold Standard

One part of the discussions during the second workshop day will around a gold standard that we are currently building. The current version can be found on Github. Note that it is work in progress. The according PDFs can be found (for now) here.

Arrival and Accommodation

GESIS Cologne is located near the Cologne central train station. Further information on traveling to GESIS by air, rail, intercity bus, or car can be found on the GESIS website.
There are also special GESIS rates available for accommodations. More information can be found on this list.