Institute for Web Science and Technologies · Universität Koblenz - Landau
Institute WeST
Diese Lehrveranstaltung hat in einem vergangenen Semester stattgefunden oder findet in einem zukünftigen Semester statt. Falls Sie nach aktuellen Veranstaltungen suchen, gehen Sie zur Übersicht der Lehrveranstaltungen.

Forschungspraktikum/Projektpraktikum "Machine Learning Application"

[zur Übersicht]

Sommersemester 2020

In this research lab, you are going to build a complete machine learning system following the generic pipeline in order to solve a specific problem. For each phase in this pipeline, you will adopt the methods and techniques being learnt in Machine Learning and Data Mining course. Therefore, completing this lecture is mandatory. Moreover, other fundamental approaches will also be used when necessary [1], including other sophisticated and modified approaches from the state-of-the-art.

Important Information

To whom?

Master and Bachelor students in:

  • Web Science
  • Computer Science
  • Computational Visualistics
  • Business Informatics

Kick-off / Introductory meeting

  • When: February 20 at 14:00 (your presence is mandatory)
  • Where: B 016

How to register?

  • Form a group of four people to work on a topic
  • Give a name to your group
  • Send (one) email to boukhers@uni-koblenz.de with the subject: MLA registration request (group) before the kick-off
  • In the email, you need to state the topics by order of preference from most preferred one.
  • Attend the introductory meeting
  • After the topic is assigned, write a proposal (up to two pages), describing your potential solution.
  • Register to the exam

Important note: If you could not form a group, you may still take part in the research lab. However, you will have to work with other people who couldn’t form groups. Please send an email to boukhers@uni-koblenz.de with the subject: MLA registration request (indivdual)

Exam

  • When:----
  • Where:----
  • Type: Presentation + Report + Software
  • Registration (Klips): Open from ---- to ---- (Do Not miss the deadline!)
  • Cancellation (Klips): Until ----

Topics

!!!The list is not yet complete!!!

Topic 1

  • Title: Paragraph segmentation
  • Main advisor: Zeyd Boukhers
  • Description: In this topic, you will build a machine learning system to recognize paragraphs in text lines that are extracted from PDF documents. The pdf-to-text extractor provides the content of each line independently. Each line is associated with some features (e.g. length, position, width, etc). The available dataset is not labelled. Therefore, you will need to labbel some documents in order to build a supervised or semi-supervised model, or you can also apply an unsupervised model. Moreover, it is necessary to remove noise artifacts from the documents, such as page numbers. More details will be provided in the introductory meeting.

Topic 2

  • Title: Metadata extraction from German scinetific papers
  • Main advisor: Zeyd Boukhers
  • Description: In this topic, you will build a model that extracts the metadata from scientific papers (PDF format) such as the title, author names, institute and abstract. You will make a labbeled data which is not an expensive task for such a task. The model has to handle different templates and different font types. The input of this model is a PDF document, where its output is the metadata.

Topic 3

  • Title: Optimising online documents for fact-checking
  • Main advisor: Ipek Baris
  • Description: In this task, you will implement a web application for optimising fact-checking. The application will first check given url whether has been already fact-checked against fact-checking organisations, if it is not, the text mining module of the system will evaluate the full article of url. And finally the system will rank the url with other urls which have not been fact-checked. The system baseline which you compare will be ClaimPoster [2], and the baseline of text mining module will be online nutrition label extractor [3]. You are expected to implement your novelty method at least 3 category which you choose in online nutrition label.

Topic 4

  • Title: False Article Detection with Weakly Supervised Learning
  • Main advisor: Ipek Baris
  • Description: This task aims to predict the full text article whether fake or not. You will investigate weakly supervised learning methods which is popular in computer vision, and adopt them on natural language processing task. You will use the datasets and methodology which is described in [4] as baseline.

References

[1] Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition, Fourth Edition (4th ed.). Academic Press, Inc., Orlando, FL, USA. (More than 10 copies are available in the library)

[2] Majithia et al., 2019 ACL, ClaimPortal

[3] Fuhr et al. 2018 ACM SIGIR, An Information Nutritional Label for Online Documents

[4] https://github.com/isspek/weakly_misinformation_learning/tree/master/documents

Lehrende

  • boukhers@uni-koblenz.de
  • Wissenschaftlicher Mitarbeiter
  • B 114
  • +49 261 287-2765
  • ibaris@uni-koblenz.de
  • Wissenschaftlicher Mitarbeiter
  • B 007
  • +49 261 287-2863