The internet has often been hailed as an opportunity for democracy. Political parties especially have high hopes for the new possibilities for communication and participation. But the digital divide still mediates the access to, use of, and benefits derived from use of the internet. These online inequalities are a problem when combined with democratic processes: If equal opportunity to participate is the goal, how can an unequal tool be beneficial?
Most information extraction (IE) systems are designed to construct Knowledge Bases (KBs) consisting of high precision facts. The focus is primarily on the confidence in the correctness of the data. As KBs are being increasingly considered to be representations of the real world, it is imperative that the data in KBs is not only correct, but also complete. Yet, most widely used KBs today do not store the completeness information for many common predicates such as names of children or winners of an award. This incompleteness of KB facts is the result of oversight on the part of most information extraction processes, that emphasize on the optimization of precision, but largely ignore the recall. A recall oriented IE system is highly desirable for many use cases.
This talk will present the paper "Deep Contextualized Word Embeddings" by Peters et al, which won the best paper award at NAACL-HLT 2018. Its approach ELMo (Embeddings from Language Models) constitutes a fundamentally new way of representing words by considering linguistic context and achieved state-of-the-art results in six important NLP problems.
This talk will introduce the required background material (recurrent neural networks, word embeddings, and language models) and summarize the key contributions of the paper.
What is the challenge of interdisciplinary research on democracy? Based on the implications of text data, I present the study of democracy in the digital era as the dilemma between large data and deep validity.
This focus reflects that philosophies behind the choice of methods differ across disciplines, which is the main difficulty but also the desired advantage. Computer science prefers data on a very large level, while social and political science move flexibly between medium and small levels, with the smallest level being the time-intensive immersion in language, culture, and society. Against popular assumption, I argue
Extracting and parsing cited references from publications in PDF format is important to ensure the acknowledgement of the sources of information. However, the mention of these sources differs from a community to another and from a publication to another. This citation diversity lies mainly in the indexation style (e.g., one or several reference sections), the existence of components (e.g. editor, source, URL, etc.) and the type of references (e.g. grey literature, academic literature, etc.). In order to accurately extract and segment difference kinds of references, EXCITE proposes a generic approach that combines Random Forest and Conditional Random Fields (CRF) in a coherent mechanism.
A compendium of applications of tailor-made network-theoretic tools have been devised and implemented in a data-driven fashion. In the first part, a (formerly) novel centrality metric, aptly named “bridgeness”, based on a decomposition of the standard betweenness centrality, will be introduced. A prominent feature is its agnosticism with regard to any possible community structure prior. A second application is aimed at describing dynamic features of temporal graphs which are apparent at the mesoscopic level. A dataset comprising 40 years' worth of selected scientific publications is used to highlight the appearance and evolution in time of a specific field of study: “wavelets”.
Anomalous diffusion processes, both in the superdiffusive and subdiffusive regimes, have spurred a lot of theoretical research effort, along with experimental validation, for decades now. Their description, however, strongly relies on the existence of a metric in continuous space. Complex networks lack an intrinsic metric definition and, in this talk, I will present some theoretical "recipes" to work around this issue and recover such regimes on networks as well. On the applied side, some machine learning algorithms, like the celebrated Page Rank, exploit diffusion for classification and ranking tasks. Thus I will show how, through enhanced diffusion regimes, it is possible to address and correct some shortcomings of those algorithms and improve classification performance.
Knowledge-based authentication methods are vulnerable to Shoulder surfing phenomenon. The widespread usage of these methods and not addressing the limitations it has could result in the user's information to be compromised.
At WeST we had prior discussions on how eye tracking studies can be useful in other projects of our group. Currently have 8 eye trackers in our lab, and we should plan to extend the scope of our eye tracking expertise and resources. In this direction, I will take a topic common to most of our group members, i.e., how eye tracking has been used to evaluate ontologies. Will take a simple example from related work  to evaluate two commonly used ontology visualization techniques, namely, indented list and graph. Will discuss the eye-tracking experiment and analysis procedure and how it complements the set of existing evaluation protocols for ontology visualization.
Graph-based data models allow for flexible data representation. In particular, semantic data based on RDF and OWL fuels use cases ranging from general knowledge graphs to domain specific knowledge. The flexibility of these approaches however make programming with semantic data tedious and error-prone. In particular the logics-based data descriptions used in OWL are problematic for existing error-detecting techniques such as type systems. In the LISeQ project, we investigate integration of such data descriptions and associated query languages into programming languages. In this presentation, we discuss the first publication (currently under submission) of this project: ScaSpa.