Harvesting, Integrating, and Cleaning Metadata in a bibliographic Information System

Ralf Schenkel

Bibliographic information systems need to rely on metadata provided by various sources in various forms and with various quality. The talk gives some insights how such systems are maintained and improved. It shows how metadata can be automatically harvested from publisher websites and how the harvesting process can be steered. It also discusses some open sources of bibliographic metadata and how they can be used to enrich existing bibliographic data. The talk also presents some initial results on citation extraction from full-text documents based on ScienceParse.

CV: Ralf Schenkel has been full professor for Databases and Information Systems at Trier University since August 2016. Before that, he was interim professor at the University of Passau since 2013. He got his PhD in 2001 at Saarland University for a thesis on distributed transaction management. From 2003, he worked at the Max-Planck-Institute for Informatics in Saarbrücken and was a research group leader in the cluster of excellence MMCI from 2007 to 2012. In 2010, Ralf Schenkel was granted the venia legendi in Computer Science at Saarland University.
Ralf Schenkel works at the intersection of databases, information retrieval and semantic information systems. His research interests include search on semantic and semi-structured data and bibliographic information systems. He has been co-chair of the special interest group on Information Retrieval of the German Computer Society (GI), co-chaired the INEX initiative and several international workshops, and edited several books. Ralf Schenkel has been editor-in-chief of the "Datenbank-Spektrum" since 2010, member of the editorial board of "Information Systems", and served on the PC of numerous international conferences such as SIGIR and VLDB.

25.01.2018 - 10:15
B 016