Generating Counterfactual Images for Visual Question Answering by Editing Question-Critical Objects[go to overview]
While Visual Question Answering (VQA) systems improved significantly in recent years, they still tend to produce errors that are hard to reconstruct for human users. The lack of interpretability in black-box VQA models raises the necessity for discriminative explanations alongside the models’ outputs. This thesis aims at introducing a method to generate counterfactual images for an arbitrary VQA model. Given a question-image pair, the counterfactual generator should mask the question-critical objects in the image and then predict a minimal number of edits to the image such that the VQA model outputs a different answer. Thereby, the new image should contain semantically meaningful changes, be visually realistic, and remain unchanged in question-answer-irrelevant regions (e.g., the background). To the best of my knowledge, this is the first counterfactual image generator applied to VQA systems that does not apply edits to individual pixels but rather to a spatial mask without requiring additional manual annotations.
17.06.21 - 10:15
via Big Blue Button