Manual creation of ontologies is a time-consuming, costly and complicated process. Consequently, over the past two decades a significant number of methods have been proposed for (semi)automatic generation of ontologies from existing data, especially textual ones. However, ontologies generated by these methods usually does not meet the needs of many reasoning-based applications in different domains.
When processing natural language on the computer, vector representations of words have many fields of application. They enable a computer to score the similarity between two words and allow to determine missing words for an analogy. In 2013 [Mikolov et al., 2013a] published an algorithm that is called word2vec to create such vector representations. It was able to create vector representations that far exceeded the performance of earlier methods. This thesis explains the algorithm’s popular skip-gram model from the perspective of neural networks.
In ongoing efforts to investigate the effectiveness of the ScaSpa type system and SPARQL query validation approaches, we take a look at query logs. In particular we aim to apply the ScaSpa approach to queries that were executed against the public DBpedia and Wikidata SPARQL endpoints. This talk presents some of the challenges related to, and shortcomings of, available data sets for this purpose. In addition to intermediate results, we investigate routes to improve the applicability of our approaches, ranging from the introduction of disjointness to leveraging mappings between DBpedia and Wikidata.
The most common interfaces of human-computer interaction are graphical interfaces. Thus, usability of those interfaces is of importance for research and industry. However, interfaces become more and more dynamic in appearance and functionality, why analysis of usability is a complex task. We propose a visual approach to identify changes in an interface as stimulus, in order to cluster user experiences of multiple users. We have trained a classifier that takes visual features from the video recording of an interaction with an interface and decides about visual changes. We use the visual changes to split and merge the user experiences of multiple users into representations that comprehend coherent visual states of an interface.
In this talk, I will present our submission, namely CLEARumor, for RumourEval 2019  and I aim to get feedback from you for improving CLEARumor. RumourEval consists of two tasks: stance detection towards a rumour and identifying veracity of a rumour. The goal of stance detection is to label the type of interaction between a rumourous tweet and a reply tweet, as support, query, deny or comment. The other task is to predict the veracity of a given rumour as true, false or unverified. For stance detection, CLEARumor uses CNN based neural network using pre-trained ELMo embeddings  with auxiliary features which is extracted from metadata of post. For veracity detection, it leverages probabilistic estimations from the first task with further auxiliary features.
RDF graphs are sets of triples consisting of the subject, property and object resource. Nowadays, RDF graph can consist of trillion of triples. In order to cope with these huge graphs, distributed RDF stores combine the computational power of several compute nodes. In order to query the RDF graph stored in these distributed RDF stores efficiently, statistical information about the occurrences of resources on the different compute nodes is required. A naive way to store these statistical information is a table stored in a single random access file. This naive implementation leads to files that have a size of several tens of gigabytes and thus, to slow read and write operations.
While RDF in particular and graph based data models have gained traction in the last few years, programming with them is still error-prone. In case of RDF, part of the problem was the lack of integrity constraints which can make guarantees about the data. The recently introduced W3C standard SHACL can now provide such integrity constraints in the form of so called SHACL shapes. The talk presents an approach for integrating them into a programming language in order to avoid run-time errors. In particular, we use SHACL shapes as types for programming language constructs and queries that access the RDF data.
In this seminar, I shall introduce a (relatively) novel way to characterize the macroscopic states of a dynamic model - the XY spin model - on networks. The method is based on the spectral decomposition of time series by using topological information about the underlying networks.
Governmental geospatial data are usually considered a de-jure goldstandard by official authorities and many companies working with geospatial data. Yet, official geospatial data are far from perfect. Such kinds of datasets are updated in long intervals (e.g. yearly), only selectively according to the judgements and regulations of the local governmental organizations and updated in a migratory process at the state and federal level. However, volunteered geographic information projects such as OpenStreetMap can provide an alternative in both freshness of data and possibly a bigger coverage of semantic attributes in the respective areas.
First order theorem proving with large knowledge bases makes it necessary to select those parts of the knowledge base, which are necessary to prove the theorem at hand. We propose to extend syntactic axiom selection procedures like SInE to the use of the semantics of symbol names. For this not only occurrences of symbol names but also similar names are taken into account. We propose to use a similarity measure based on word embeddings like ConceptNet Numberbatch. An evaluation of this similarity based SInE is given by using problem sets from TPTP's CSR problem class and Adimen-SUMO. This evaluation is done with two very different systems, namely the HYPER tableau prover and the saturation based system E.