Multilingual Retrieval Using Thesauri Alignments

The project targets the Semantic Web representation of relations between keywords in the German Schlagwortnormdatei (SWD) and classes in the Dewey-Decimalclassification (DDC). SWD and DDC are used by German and international libraries for annotating their collections and making them accessible. The relations between SWD and DDC will allow for searching DDC annotated collections with keywords from SWD and vice versa.

The relations between SWD and DDC have a high quality because they have been manually created in the context of the CrissCross project. But often the meanings of SWD keywords and DDC classes do no match exactly. This is the reason why each relation has a certain deterministic level which provides information about the strength of the relation. The relations and the deterministic level can be used by information retrieval systems for adjusting precision and recall.

These relations are going to be published in the course of the linked data activities of the German National Library. It is going to be based on the SKOS specification. But the existing mapping relations in the SKOS specification are not suitable for representing the different deterministic levels.

This project will start with analyzing the existing deterministic levels and their implications for information retrieval. Furthermore, a recommendation will be worked out how the existing SKOS specification may be extended in order to be capable of representing the deterministic levels of the CrissCross relations.


  • March 2010 - June 2010

Source of funding

  • German National Library


Prof. Dr. Steffen Staab

Short CV

I have studied computer science and computational linguistics at the Universität Erlangen-Nürnberg and at the University of Pennsylvania. I worked in the previous computational linguistics research group at the Universität Freiburg and did my Ph.D. in computer science in the faculty for technology in 1998. Afterwards I joined Universität Stuttgart, Institute IAT & Fraunhofer IAO, before I moved on to the Universität Karlsruhe (now: KIT), where I progressed from project lead, over lecturer and senior lecturer and did my habilitation in 2002. In 2004 I became professor for databases and information systems at Universität Koblenz-Landau, where I founded the Institute for Web Science and Technologies (WeST) in 2009. In parallel, I hold a Chair for Web and Computer Science at University of Southampton since March 2015.

Research Interests

Data represent the world on our computers. While the world is very intriguing, data may be quite boring, if one does not know what they mean. I am interested in making data more meaningful to find interesting insights in the world outside.

How does meaning arise?

  • One can model data and information. Conceptual models and ontologies are the foundations for knowledge networks that enable the computer to treat data in a meaningful way.
  • Text and data mining as well as information extraction find meaningful patterns in data (e.g. using ontology learning of text clustering) as well as connections between data and its use in context (e.g. using smartphones). Hence, knowledge networks may be found in data.
  • Humans communicate information. In order to understand what data and information means, one has to understand social interactions. In the context of social network knowledge networks become meaningful for human consumption.
  • Eventually meaning is nothing that exists in the void. Data and information must be communicated to people who may use insights into data and information. Interaction between humans and computers must happen in a way that matches the meaning of data and information.

The World Wide Web is the largest information construct made by mankind to convey meaningful data. Web Science is the discipline that considers how networks of people and knowledge in the Web arise, how humans deal with it and which consequences this has for all of us. The Web is a meaning machine that I want do understand by my research.

Where else you might find me?

