Sie sind hier

Classifying Visual States of a Web Page for Eye Gaze Data Mapping


For Web page usability, eye tracking has become a prominent measure to assess which sections of a Web page are read, glanced or skipped. Such assessments primarily depend on the mapping of gaze data to a Web page representation. However, modern Web pages make use of interactive and dynamic content, e.g., carousels, menus or fixed navigation bars. Thus, it is not trivial to aggregate interaction and eye tracking data of users who interact with a Web page.

The GazeMining project aims to detect visual states of a Web page and to aggregate interaction and eye tracking data onto representations of these visual states. However, it is an open research question how to define a visual state and especially, how to split a user experience of the Web page into mutliple visual states, regarding to the perception of a user.

This master thesis should explore what "perceptual difference" means and propose measurements on the structural (DOM tree) and visual (screen pixels) representation of a Web page. The pixel level analysis (image similarity or temporal features) might be required. The work must be integrated to our GazeMining framework, hence basic knowledge about C++ and OpenCV is required.


[1] Michael Cormier, Karyn Moffatt, Robin Cohen, and Richard Mann. 2016. Purely vision-based segmentation of web pages for assistive technology. Comput. Vis. Image Underst. 148, C (July 2016), 46-66. DOI:
[2] Morgan Dixon and James Fogarty. 2010. Prefab: implementing advanced behaviors using pixel-based reverse engineering of interface structure. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 1525-1534. DOI: