In the last years, scalable RDF stores in the cloud have been developed, where graph data is distributed over compute and storage nodes for scaling efforts of query processing and memory needs.
With the popularity of RDF as an independent data model came the need for specifying constraints on RDF graphs, and for mechanisms to detect violations of such constraints. One of the most promising schema languages for RDF is SHACL, a recent W3C recommendation. Unfortunately, the specification of SHACL leaves open the problem of validation against recursive constraints. This omission is important because SHACL by design favors constraints that reference other ones, which in practice may easily yield reference cycles. In this paper, we propose a concise formal semantics for the so-called “core constraint components” of SHACL. This semantics handles arbitrary recursion, while being compliant with the current standard.
Since deep learning is a very flexible framework, it works well for various tasks without expert knowledge, but it also has difficulty leveraging explicit knowledge. Deep learning always requires massive dataset and is applicable to limited tasks. I introduce deep generative model, which is a Bayesian network implemented on deep neural networks. By expressing our knowledge as the network structure, deep generative model works for a small-sized dataset and provides interpretable results.
I present a family of stochastic local search algorithms for finding a single stable extension in an abstract argumentation framework. These incomplete algorithms work on random labellings for arguments and iteratively select a random mislabeled argument and flip its label. We present a general version of this approach and an optimisation that allows for greedy selections of arguments. We conduct an empirical evaluation with benchmark graphs from the previous two ICCMA competitions and further random instances. Our results show that our approach is competitive in general and significantly outperforms previous direct approaches and reduction-based approaches for the Barabasi-Albert graph model.
In both conversation and writing, grammar gives us the opportunity to avoid articulating parts of a sentence, which are overtly expressed in the preceding linguistic context. For instance, in the sentence, /I wanted to play football but I couldn’t/, after /couldn’t/, /play football/ can be dropped because it can be understood from the context. In linguistics, this phenomenon is known as verb phrase (VP) ellipsis. Detection and resolution of ellipsis lead towards understanding text properly which could be helpful to improve language understanding systems. Since this phenomenon is optional, the challenge was to find a way to systematically distinguish auxiliaries and modals that indicate VP ellipsis from auxiliaries that do not.
The internet has often been hailed as an opportunity for democracy. Political parties especially have high hopes for the new possibilities for communication and participation. But the digital divide still mediates the access to, use of, and benefits derived from use of the internet. These online inequalities are a problem when combined with democratic processes: If equal opportunity to participate is the goal, how can an unequal tool be beneficial?
Most information extraction (IE) systems are designed to construct Knowledge Bases (KBs) consisting of high precision facts. The focus is primarily on the confidence in the correctness of the data. As KBs are being increasingly considered to be representations of the real world, it is imperative that the data in KBs is not only correct, but also complete. Yet, most widely used KBs today do not store the completeness information for many common predicates such as names of children or winners of an award. This incompleteness of KB facts is the result of oversight on the part of most information extraction processes, that emphasize on the optimization of precision, but largely ignore the recall. A recall oriented IE system is highly desirable for many use cases.
This talk will present the paper "Deep Contextualized Word Embeddings" by Peters et al, which won the best paper award at NAACL-HLT 2018. Its approach ELMo (Embeddings from Language Models) constitutes a fundamentally new way of representing words by considering linguistic context and achieved state-of-the-art results in six important NLP problems.
This talk will introduce the required background material (recurrent neural networks, word embeddings, and language models) and summarize the key contributions of the paper.
What is the challenge of interdisciplinary research on democracy? Based on the implications of text data, I present the study of democracy in the digital era as the dilemma between large data and deep validity.
This focus reflects that philosophies behind the choice of methods differ across disciplines, which is the main difficulty but also the desired advantage. Computer science prefers data on a very large level, while social and political science move flexibly between medium and small levels, with the smallest level being the time-intensive immersion in language, culture, and society. Against popular assumption, I argue
Extracting and parsing cited references from publications in PDF format is important to ensure the acknowledgement of the sources of information. However, the mention of these sources differs from a community to another and from a publication to another. This citation diversity lies mainly in the indexation style (e.g., one or several reference sections), the existence of components (e.g. editor, source, URL, etc.) and the type of references (e.g. grey literature, academic literature, etc.). In order to accurately extract and segment difference kinds of references, EXCITE proposes a generic approach that combines Random Forest and Conditional Random Fields (CRF) in a coherent mechanism.