In the past decade, quantitative text analysis has established itself as a frequently used method in political science to study political actors and processes. Typical research questions include the identification of concepts and topics, text classification, and the measurement of latent policy positions. While quantitative text analysis reduces the costs of analysing texts, and while various user friendly open-source libraries have been developed recently, several challenges remain. In this presentation, I will describe popular text-as-data methods from the perspective of implementation in R, one of the most commonly used statistical programming software in the social sciences.
Research at WeST now includes a wider range of politics-related topics than before due to several reasons: Firstly, politics drive behaviors on the Web that trouble society as a whole. Secondly, politics influence content on news and social media. Thirdly, democracy online is a challenge of asking what should and what can be regulated. "Politics" in this sense is broader than the level of politicians and includes the audience level. From a data perspective, politics provide a mapping of expected behaviors by groups and positions within text. My talk briefly overviews, with some ongoing snippets, the following application areas: misinformation, partisanship, discourses, platforms, and retro-theories.
The impact of intercultural exposure can be observed in multi-lingual societies where bilingual speakers often exhibit a switch/mix of grammar and lexicon of more than one language. This phenomenon is an inevitable outcome of language contact and this switch can be observed between sentences (inter-sentential), within a sentence (intra-sentential) or even at the word level.
Recent research in code-switching (CS) has turned towards utilization of neural networks and word embeddings for these low-resourced languages. The standard approach for a given CS corpora combines information from various existing distributional representations pre-trained on source languages with neural network models to achieve varying degrees of success.
In 2013 property paths were introduced with the release of SPARQL 1.1. These property paths allow for describing complex queries in a more concise and comprehensive way. The W3C introduced a formal specification of the semantics of property paths, to which implementations should adhere. Most commonly used RDF stores claim to support property paths. In order to give insight into how well current implementations of property paths work we have developed BeSEPPI, a benchmark for the semantic based evaluation of property path implementations. BeSEPPI measures execution times of queries containing property paths and checks whether RDF stores follow the W3Cs semantics by testing the correctness and completeness of query result sets.
Objective of this thesis is the system identiﬁcation of ships for multistep prediction, i.e. simulation, with deep learning methods. First-principles modeling of ships is a challenging and expensive task, as it requires complex numerical computations, model tests, sea trials, and expert knowledge in marine engineering. The collection of sensor data during the routine operation of ships enables system identiﬁcation methods for deriving mathematical models of the vessel dynamics.
Nowadays, because of increasing threat of fake news to trustworthiness of online information, recognizing the truthfulness of news can help to minimize its potential problems in society. However, finding truthful information from social media contexts, which covers a large number of subjects, is a very complex task. Fake news detection is more than simple keyword spotting task, the truth of statements cannot be assessed only by context of news, and it is needed to automatically understand human behavior and sentiment in social media that usually are vague and dependent on subject which should be interpreted and represented in different ways.
When processing natural language on the computer, vector representations of words have many fields of application. They enable a computer to score the similarity between two words and allow to determine missing words for an analogy. In 2013 [Mikolov et al., 2013a] published an algorithm that is called word2vec to create such vector representations. It was able to create vector representations that far exceeded the performance of earlier methods. This thesis explains the algorithm’s popular skip-gram model from the perspective of neural networks.
Manual creation of ontologies is a time-consuming, costly and complicated process. Consequently, over the past two decades a significant number of methods have been proposed for (semi)automatic generation of ontologies from existing data, especially textual ones. However, ontologies generated by these methods usually does not meet the needs of many reasoning-based applications in different domains.
In ongoing efforts to investigate the effectiveness of the ScaSpa type system and SPARQL query validation approaches, we take a look at query logs. In particular we aim to apply the ScaSpa approach to queries that were executed against the public DBpedia and Wikidata SPARQL endpoints. This talk presents some of the challenges related to, and shortcomings of, available data sets for this purpose. In addition to intermediate results, we investigate routes to improve the applicability of our approaches, ranging from the introduction of disjointness to leveraging mappings between DBpedia and Wikidata.
The most common interfaces of human-computer interaction are graphical interfaces. Thus, usability of those interfaces is of importance for research and industry. However, interfaces become more and more dynamic in appearance and functionality, why analysis of usability is a complex task. We propose a visual approach to identify changes in an interface as stimulus, in order to cluster user experiences of multiple users. We have trained a classifier that takes visual features from the video recording of an interaction with an interface and decides about visual changes. We use the visual changes to split and merge the user experiences of multiple users into representations that comprehend coherent visual states of an interface.