Toward Measuring Visualization Insight

This paper starts with telling us that one of the purposes of visualization is gaining insight. It is hard to define insight when it comes to visualizations so the article identifies some essential characteristics of insight. Insight is: complex, deep qualitative, unexpected and relevant. An insight is more interesting if it has more of these characteristics. Often visualizations are evaluated using controlled experiments. When benchmark tasks are used in these controlled experiments they are not proper tools for measuring insight. This method depends on the fact that these benchmark tasks and metrics represent insight. According to the author there are four fundamental problems compared to the previously mentioned characteristics:

  • they must be predefined by test administrators, leaving little room for unexpected insight and even forcing users into a line of thought that they might not otherwise take.
  • they need definitive completion times
  • they must hav definitive answers that measure accuracy
  • they require simple answers

This forces the experimenter to search-like tasks that don’t represent insight well. These benchmark tasks are far too simplistic and constrained to indicate insight of a visualization. A claim often made to generalize results of simple benchmark tasks is that complex tasks are build from simple tasks. The author counters this, first of all efficiency of simple benchmark tasks is often due to specific visualization interface features that don’t generalize to more complex tasks. Second a clear decomposition doesn’t exist yet. Another problem often arising in the interpretation of the benchmark results is the tradeoff between performance and accuracy. Often users are forced to continue until correctly completing a tasks, leading to trail-and-error approach by users and a misrepresentation of accuracy. It is concluded that controlled experiments on benchmarks are not the right method to evaluate insight.

First of all the author suggest to include more complex benchmark tasks, this still involves some uncertainty because these tasks generally support visualization overviews rather that detail views. Another method could be to let users interpret visualization into a textual answer but this is difficult to score. Allowing multiple-choice could again lead towards biasing the user. Also these methods lead to longer tasks times and a larger group of participants to be tested to get statistically significant results.

A second suggestion is made to eliminating benchmark tasks and letting researchers observe what insights users gain on their own. Using a open-ended protocol is a possible method, users are instructed to explore the data and report their insights. A qualitative insight analysis is also a possible solution like the think-aloud protocol. For each insight, a coding method quantifies various metrics (insight category, complexity, …), these categories can be assigned to common clusters like usability, … The coding converts qualitative data to quantitative but still is subjective but supports the qualitativeness of insight. The advantage of eliminating benchmark tasks is that they reveal what insights visualization users gained. The measures are closely related to the fundamental characteristics of insight (previously mentioned). These insights can also be compared to insights a researcher expected users to gain.

The author concludes with pointing to the fact that both types of controlled experiments are needed. Benchmark tasks for low-level effects and the eliminating of benchmarks tasks for a broader insight. Noted is that if combining both approaches into a single experiment, benchmark taks should not precede the open-ended portion this could lead to constraining the user.


This article helped me to understand that I need to pay more attention to the open-ended portion of the evaluation of my visualizations. I will combine both methods to gain more information, in my previous evaluations I allowed the user to explore the visualisation for a very short time this should be extended. I’ll also need to note what I kind of insights I’d like to achieve from my visualization and compare these to the insights gained from the evaluation. In my previous evaluation I also noticed how hard is to find a good benchmark to test the visualization, this article confirms my thoughts that these are often to simple and force the user in a certain direction. I’ll also need to pay more attention to how I will formulate my question to not bias a user.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: