As a free and collaborative knowledge base, created and maintained by volunteers, Wikidata may represent knowledge from different points of view. These points of view can be conflicting, especially when the data comes from different information sources.
Finding suitable datasets is a major task within many science topics. Especially social sciences rely heavily on existing studies since the creation of new data with hundreds of participants from different countries over a longer periods of time is resource intensive, so the reuse and recombination of existing studies is crucial.
Wikimedia Deutschland ist als Verein aus der Wikipedia-Bewegung hervorgegangen und hat unter anderem das Ziel, sich für freies Wissen stark zu machen. Freie Bildung als integraler Bestandteil von freiem Wissen kommt als Ergebnis in den schwächelnden Wikimedia-Projekten Wikibooks und Wikiversity jedoch zu kurz.
Semantic data fuels many different applications, but is still lacking proper integration into programming languages. Untyped access is error-prone while mapping approaches cannot fully capture the conceptualization of semantic data.
Recent developments in the SemGIS project In part one an update on recent developments in the SemGIS project concerning automated mapping approaches of geospatial data into the semantic web is given.
The efficiency of SPARQL query evaluation against Linked Open Data may benefit from schema-based indexing. However, many data items come with incomplete schema information or lack schema descriptions entirely. We outline an approach to an indexing of linked data graphs based on schemata induced through Formal Concept Analysis.
The Resource Description Framework (RDF) is a triple based representation of directed graphs with labelled edges. With the emergence of RDF graphs special databases, called RDF stores, were developed. In order to query graphs, which are stored in these RDF stores, the query language SPARQL Protocol And Query Language (SPARQL) is used.
To help in overcoming a shortage of citation information for the German social sciences, we contribute an approach for extracting author names from reference sections. Instead of relying on small amounts of manually labeled data, we use a distantly supervised approach in combination with the widely used probabilistic framework of conditional random fields.
Our objective is to identify and assess gender bias related to professions in the results of collaborative community work. We present the results of studies in which we characterize the gender inequality present over the three dimensions: redirections, images, and people mentioned in the articles.
During the development of a distributed database for big data, small datasets are used to test the implementation and to evaluate alternative solutions. These small datasets have the advantage that evaluation results are produced more quickly and the implementation progresses faster. When the database is tested with large datasets, several design decisions based on the experiments with the small datasets can lead to a poor performance or unstable database. In this talk some of these wrong design decisions during the implementation of the distributed RDF store Koral are presented and how they could be improved.