Machine Learning is an important branch of Artificial Intelligence, which is defined by Arthur Samuel in 1959 as “the ability to learn without being explicitly programmed.” In this research project, students will build a complete machine learning system following the generic pipeline in order to solve a specific problem. For each phase in this pipeline, the students will adopt the methods and techniques being learnt in Machine Learning and Data Mining lecture. Therefore, successfully completing this lecture is not trivial. Moreover, other fundamental approaches will also be used when necessary , including other sophisticated and modified approaches from the state-of-the-art.
Two teams will be formed (4-5 students/ team) in order to work in the two following tasks:
Task I: Citation Style Identification:
With respect to the research community, authors cite their fellows in different ways . In this task, the different citations styles available in the corpus will be categorised, considering several factors. Afterwards, each reference string will be assigned to one citation style. Note that in some cases the references in the same article don’t necessarily belong to the same citation style (i.e. in the reference section and in the footnote). For evaluation, all the reference in the reference section of a given article will be considered having the same style. Similarly, all references in footnotes of a given article will be considered having the same style.
Task II: Reference Matching:
In this task, references in different articles will be compared, where those referring to the same article will be gathered and matched. The difficulties that we might meet include: 1) The order of entities. 2) The absence of some entities (e.g. publisher is not always present). 3) The content of these entities (e.g. with/without umlaute).
What each student needs to know:
2- Feature extraction
3- Feature Selection (Or feature transformation, e.g., PCA)
4- Learning (Supervised, Unsupervised and/or Semi-supervised)
5- Matching (Including similarity computation)
- Successfully completing Machine Learning and Data Mining lecture is mandatory.
- Attending the introductory meeting, which will take place on February 19th at 02:00pm in B016 (slides).
- Please send an email, that includes; your full name, your matriculation number and your field of study, to firstname.lastname@example.org
 Sergios Theodoridis and Konstantinos Koutroumbas. 2008. Pattern Recognition, Fourth Edition (4th ed.). Academic Press, Inc., Orlando, FL, USA. (More than 10 copies are available in the library)