How to do visualization…

by Mike Gleicher on December 25, 2014

Here it is: the whole class distilled into a single Blog posting! Maybe you don’t need to take the class.

Over the course of the semester, we’ll get lots of perspectives of how different people tell you to do visualization. Here’s an attempt to write down my perspective. You’ll see that it is well aligned with Munzner (which is part of the reason I like her book so much). In particular, when you learn the nested evaluation model in class, you’ll see a strong parallel here.

This is what we (in my group) try to do when we work on trying to create visualizations (or, to be more precise, help solve data understanding problems.

There are four elements to visualization:

  1. Task
  2. Data
  3. Design
  4. Implementation

The challenge is that you can’t do a good job with one of these, unless you’ve done the levels above. Hopefully this explains the structure of the class.

I was going to title this posting “All I ever really needed to know (to do Visualization) I (could have) learned in Munzner’s Nested Model Paper” –  as a play on the title of the book “All I Ever Really Needed to Know I Learned in Kindergarten,” but I haven’t actually read that book, and the humor is lost if you haven’t heard of it.

The core is the TASK

Visualizations help someone do something for some reason. (who, what why).

The better that you understand what the visualization is trying to achieve (what will it help the person do), the more likely you will come up with a good solution. In the end, everything serves the tasks.

Note the plural: you may have a set of tasks. Often, there isn’t just one at a time. There are a set of things that a set of someones may want to do for a set of reasons.  And maybe your solution will address many of these.

I was going to say “it starts with the tasks,” but sometimes you start someplace else (like you have some data and say “I’d like to do something with it” – but even then, I would probably say you have a task: figure out what the right questions to ask are!). However, in those cases, it’s really important to remember that task is key: the sooner you get to “what is this thing going to do for someone,” the better off you are.

This is also not to say that you need to fully understand the task at the beginning. Sometimes, your understanding of the task is hazy, or changes as you learn more (from later stages).

Task is an informal, fuzzy notion. It doesn’t always get explicitly written down or defined. But the clearer you are about it, the better off everything else will be. You can’t succeed unless you have something to succeed at.

One other detail on task: there is a range of kinds of tasks. There are abstract tasks and concrete application tasks. This is actually a spectrum/continuum.

While task is the most central thing, it’s also hard to talk about. We lack good, rigorous ways to talk about it.  For the longest time, it meant that it didn’t get discussed enough (in the literature, in my class, in my work, …). The fact that it is hard shouldn’t get in the way of us trying to get better at thinking about it. We particularly lack good ways to talk about different levels of task abstraction.

Where I start…

When I talk to a new (potential) domain collaborator, I always start with the the question “tell me about your science.” I want to know the big picture (the why) – because without it, it’s hard to have context.

My first goal is to identify the problem that needs to be solved – it won’t help anyone if we solve the wrong problem.

Usually people come thinking they want specific help – they want to start with the data, or worse, with the way they are looking at their data (can you make a better chart for me? not without understanding what you are trying to do, so I know what “better” means!) We will get to that, but I think its important to identify the task.

I’ll stress this: if you want to be a visualization scientist (or more generally, a data scientist or computer scientist), one of the best skills you can have is to be able to help people identify their problems. I think it’s hard for people to identify their problems. Part of this is that people get so caught up in the details, that they lose sight of the big picture. Or that they are so set in how they do things that they lose the ability to imagine alternatives.

And, as computer scientists (and/or mathematicians), we have a secret weapon: abstraction. This is something that we value/stress much more than other disciplines. For this task phase of visualization, abstraction is a key tool. If we can recognize the abstract task for which the real problem is an instance of, the path to solving it becomes much clearer.

We should also remember, that our goal is to solve people’s problems. Sometimes, that requires inventing a novel and complicated visualization. Other times, it might mean applying some simple, off-the-shelf solution.

Here’s my favorite analogy. You go to the doctor’s office because you feel sick. The last thing you want to hear is “that’s a novel and interesting problem! we need to devise a novel treatment. let’s write a grant proposal and hire some research assistants…” No, you want to hear “I’ve seen that before. No problem. Take two aspirin and see me in the morning.”

As visualization practitioners, our goal is to be able to look at a problem and make those kinds of prescriptions. The task identification and abstraction is key here. It’s how we can say “I’ve seen that before” and get to “take two scatterplots and see me in the morning.”

Where we’ll finish…

Ultimately, you’re quest is to make a good visualization, not just any visualization. Generally, a big part of this will be “did you solve the problem” which of course means that you need to know what the problem was (e.g. the task).

It turns out the idea of “is it good” will be a challenging question – there are many different ways to think about whether or not a visualization is good, and many ways to assess it. But a lot turns out to come back to “did it solve the problem.”

It’s worth mentioning that the whole line of thinking here is based on the “nested model” which is a discussion of evaluation.

Sometimes, the best visualization isn’t a visualization at all: you might be able to come up with a solution to the problem that doesn’t require a visualization. Understanding when and why to use visualization as a strategy for solving data (or understanding) problems is a key part of being good at doing vis.

You need to have / know your data

We’ll see lots of examples of visualizations that don’t really show lots of data. However, even there, the data is important and may be hidden.

There are lots of challenges with data. It’s a lot of work to obtain it, wrangle it, validate it, store it, keep track of it, …

For our purposes, the important part is to understand what is and isn’t in it, so that we can see how it can be used to address the task. Or how the form of the data, or factors of it, may change the task (or create new, intermediate tasks).

One way to think of visualization is that its the process of applying data to address a task.

Generally, the data is there to do something – it’s part of the bigger problem. Rarely, is it an end unto itself. If you find yourself saying “I want to understand my data” – ask yourself why. Generally, data is in service of a goal. Keeping that ultimate goal in mind (the real task) is a good idea – even though you might solve a “data task” (which usually corresponds to an abstract task).

For example, the common data question “I want to see if my data is noisy” is a good example of an abstract task. Probably, you want to know if the data is noisy because you want to do something with it (or maybe because the source of noise might be interesting for your ultimate question).

A Design Addresses the Problem

One you know your task and your data, you can try to design a solution. I say “design” to explicitly separate the act of coming up with the idea and actually building it (implementation). Design is the act of making conscious choices to solve a problem. (Defining design is a whole philosophical debate – but that definition is one I like, and will work with for the moment)

In terms of the class, a big part of what we’ll do is focus on design. What are the choices you can make, and how can you make good choices.

There are a few key elements to a visualization design:

  • Transformation – changing the data to a form that is better suited to what you want to do.
  • Layout – figuring out where to put things “in space” (where space is usually the 2D of the page/screen)
  • Encoding – figuring out what things to put in that layout
  • Interaction – how you might change one of the other elements in response to what the viewer is doing

I find this list to be a useful way to organize the larger list of more specific things you might do. Most things fit into one category or another. I won’t waste time arguing this is the best categorization – but its good enough to give you a sense of the kinds of things that you can think about.

In the class, we’ll spend a lot of time thinking about design. We’ll pick apart existing designs to understand how they work. We’ll learn foundations, like perception, that will help us come up with and evaluate designs. We’ll try to give ourselves a library of standard designs to try out on problems we encounter and to modify as necessary. We’ll practice making designs and assessing how well they address problems.

Implementation

Actually creating the design is the last part.

If you were thinking “this is a CS class, we should focus on implementation,” you will be disappointed. As I’ve said, this class is more about how to figure out what the right picture to make is (e.g. the design) than how to make it.

In the ideal world, you can think about implementation last – it’s an afterthought. In practice, the constraints of having to implement things will probably influence the kinds of designs you will want to consider. A design becomes less attractive if its too hard to build. In practice, there’s often a tradeoff between the practical issues of implementation and having the best design.

Even within implementation, there is a spectrum of levels. I like to think of this as “fidelity of prototypes.” In a sense, you can think of a back-of-the-napkin sketch as an implementation of a design. Most likely an incomplete, non-final one, but an concrete instantiation. It might be a good enough implementation that you can evaluate your design and decide if you want to pursue the design further (and make a higher-fidelity prototype). If you’re lucky, a crude prototype might just solve the actual problem.

One thing I like to stress is the importance of prototyping to explore designs. It’s best to try out lots of ideas, and see if you can figure out their problems before investing a lot in implementing them. Good “Designers” (graphic designers, industrial designers, …) usually like to explore an entire space of designs – by using very crude “implementations” (e.g. sketches).

Data analysis tools – things like Excel (yes, excel will turn out to be my favorite visualization tools) or Tableau or … – often let you prototype lots of different things with your data. This “playing” with data – re-ordering it, making various kinds of pictures with it, looking at it all kinds of different ways – is actually a form of rapid prototyping. You can explore a lot of designs easily – often to decide that they don’t solve your problem – but sometimes to see that some of the simple elements actually can help. This “playing with data” (if you can do it) is a lot like sketching a lot of visual designs.

Having a good toolbox so that you can implement your designs is useful. If you don’t have one, you will be limited in what designs you can explore, and won’t be able to choose designs that you can’t realize (that’s not quite true: if you can come up with a great design, you may be able to get someone else to implement it). Part of my premise for this class (or at least this instantiation of it) is that we can all have different toolboxes – some students might be wizard programmers, some might be fabulous artists – but we all can have some common basic tools (e.g. sketching), and we can all explore designs using out respective toolboxes.

Now, if you’re saying “but I want visualization to be about writing fancy programs using complex data analysis methods and algorithms and spiffy programming things …” let me give you a bit of caution.

Building a custom visualization solution by programming should be a last resort. You should really believe that your problem cannot be solved by some easier method. Going back to the medical analogy, writing a program for a new design is like inventing a completely new (and therefore untested) treatment. Yes, if your patient has a mysterious disease and is going to die you want to take these drastic measures. Or, you might do an experiment if you believe that you can afford the risk on this patient in order to learn something to save the next ones (this is the excuse we use as researchers).

That said, all too often there are other factors that make us want to take the extreme measure. Sometimes, we just want to practice our inventive skills. Sometimes our “customers” think they want to have something novel (don’t make it look too easy!). Sometimes we really want to try out some implementation idea, or show off some challenging design idea. And sometimes, it might just be easier to re-implement a standard design than to figure out how to make an “easy” tool do what we want. (you’d be amazed how often I’ve found myself writing Python code for scatterplots because I wasn’t in the mood to wrestle with Excel). Sometimes, it’s hard to find a decent “easy” tool for something that should be easy (like graph layout).

But I really want to learn about some of the implementation things (D3, SVG, Python, Tableau, …) – I’ll have some postings soon that will help. Note that in class we will learn about these things, but we won’t necessarily learn to use them (again, that will be explained in a future post).

But is looking at a table in Excel Visualization?

Two answers:

Yes! You can often solve hard data understanding problems using visual methods in excel. And that’s even before we get to fancy charts (which it can do some of). And understanding what you can and can’t do this way is important.

No! Learning to use Excel to analyze data is not what this class is about.

However, a lot of the elements of visualization designs can be tried out and appreciated. And you can see how simple problems can often have simple solutions.

So, we can look at how having data in a table can be OK sometimes since it can solve some problems. And we can see how we can apply those 4 design elements: transformation (computing derived quantities), layout (ordering data in different ways), and encodings (highlighting different values). Even a little interaction (switching between the different things). Yes, you can experience the core ideas in visualization with a table in excel.

And then we can move on to more complicated examples.

Print Friendly, PDF & Email

Previous post:

Next post: