Assignment 1: A First Vis Paper and MizBee

January 4, 2010

in Assignments

due before Thursday, January 21st.

This first assignment has three goals, which make it a little different than the typical reading:

  1. To read and learn from a particularly good paper – not just because the system they describe is good, but because its a nice account of the entire vis processs.
  2. To get a handle on how the diversity of the class can cope with reading Computer Science papers. For many of you in other disciplines this might be the first technical CS paper you’ve read.
  3. To test out some of the class mechanics.

The reading for this assignment is the paper:

  • Meyer, M.   Munzner, T.  and Pfister, H. MizBee: A Multiscale Synteny Browser. IEEE Transactions on Visualization and Computer Graphics, 15 (6) 2009. Proceedings Infovis 2009. doi:10.1109/TVCG.2009.167

You can get the paper from the authors website here. There is other information (including a video and actually downloading the program to play with it) at the website. But for this assignment, it is particularly important to read the paper. Part of the reason for this assignment is that I think this is a particularly illustrative paper.

Here’s the assignment:

  1. Read the paper and be prepared to discuss it in class on Thursday, January 21.
  2. Register for this website (you’ll just have “subscriber status” – but that will enable you to comment on postings. we’ll upgrade your membership manually once you are subscribed).
  3. Post a comment to this assignment post with some thoughts about the paper. You can also begin a dialog with others (by replying to their comments, etc.), but each person should post their thoughts.

Send and email to both the instructor and TA listing the things that you didn’t understand in the paper. To a computer scientist (who knows a little bit about genetics), this paper is easy in terms of the technical content. But I am trying to gauge how it is for others to read papers like this. I am looking for background that you don’t think you have, terms that you can’t understand (if you had to look something up), concepts that you felt like you needed before reading this paper, … If the paper was totally understandable to you, please say that. (everyone needs to send this email).

Please do this (all 3 parts) before Thursday, January 21st. I am going to look at what people wrote on Thursday morning and use it to plan class, so make sure that your comment and email has been done.

{ 20 comments }

mikola January 19, 2010 at 4:19 pm

The visualization was very pretty, I liked the choice of spline curves to draw the connections between elements. However I didn’t quite get how they were generating the control points; my best guess is that they were doing something like the following (as an experiment I’m going to try formatting some code):

For each group:
Compute normal direction from center of group, and normal from center of target; n_s, n_t
For each curve:
compute normal direction at start of curve and target; n’_s, n’_t
create control points c_{-n}, … , c_{n}
for k = n to 1
t = k / n * s ; // where s is the length of the straight segment of each curve
c_k = t * n’_s * (1 – f(t,beta)) + t * n_s * (f(t,beta))
c_{-k} = t * n’_t * (1 – f(t,beta)) + t * n_t * (f(t,beta))
draw b-spline c_{-n}, … c_{n}

Where in this method the function f(t,beta) is some monotone function on t, taking values in [0,1] with the parameter beta controlling the curvature. One possible example is something like:

f(t,beta) = (t / s)^beta

mikola January 19, 2010 at 4:19 pm

Well, looks like that didn’t work! Fortunately I don’t think it is too unreadable so I guess I will leave it as it is…

Michael Correll January 20, 2010 at 9:42 am

What concerns me with this reading is that while it claims that the principles of design in use are scientific, the actual evidence of improvements is all anecdotal. If the tool were a work of art then I would be okay with what are ultimately unscientific subjective judgements, but it doesn’t seem okay to me when the tool is “designed” in a more formal way. Granted, it seems remarkably difficult to come up with some sort of non-subjective metric to measure ease of use, but perhaps not impossible.

dalbers January 20, 2010 at 10:03 pm

I agree on the shaky grounds for the claim. One of the most common ways I’ve heard of people “validating” visualization tools is by recruiting users and having them execute a series of tasks, such interacting with the tool or answering questions about a data set based on the visualization. Typically, the users execute these tasks using both the old way of visualizing things and the new techniques and then the researchers run an analysis based on the users’ interactions with both methods.

The hard thing with genomic visualizations is that the user has to understand genetics in order to know what they are looking at and thus provide reasonable feedback. Unfortunately, that leaves the user candidate pool pretty shallow, so anecdotal case studies may provide the best analysis available. Not sure if we will have to quantify our reasoning in our visualizations for this class, but if so, hopefully these techniques can be somewhat helpful!

punkish January 20, 2010 at 10:52 am

A very nicely presented paper unmarred by just one badly formed sentence in Section 1, para 3 — “Synteny… is the property that features occur on the same chromosome.”

I found figure 2, particularly 2c, to be a tad confusing. The graphic was not conveying what the caption claimed.

MizBee uses the ColorBrewer originally developed by a geographer to assist in computer cartography. This is personally interesting to me because of my background in geographic information systems (GIS), and also because I have had at least perfunctory contact with ColorBrewer’s creator.

MizBee is tackling a difficult problem — answering questions at multiple scales in a single view window. A map application tried to present at most two scales, the detailed canvas and an overview map. More than that tends to lead to cognitive overload.

It is hard to fully understand the various points of the paper without actually seeing the software in action. Thankfully, I was able to download and play with the software, but a deep understanding of the mechanics of the software and the problems presented in the paper would be possible only after repeated use of the software and possibly several re-reads of the paper.

Nevertheless, a very interesting problem presented in a well organized albeit a relatively dense paper.

njvack January 20, 2010 at 10:18 pm

Agree with the Fig 2 being confusing aspect. It would have been clearer if, in (c), they’d ordered their counterexamples in the same order in the description and on the source genome. It’s especially confusing with a printed black & white copy.

Shuang January 20, 2010 at 4:53 pm

The visualization figures illustrate the relationship between chromosomes well. One of the first questions comes to my mind when I read the Figure1 is that whether it deals with the similarity between chr1’s of two individuals, besides chr1 and other chr’s of the same individual. I think error happens when DNA duplicates and two homologous chromatids may change some parts since they are nearby(Also it is possible for deletion and inversion). It is also beneficial to study evolution by testing the difference of same parts from different individuals. Another issue is that in Figure 1, also in other part, the unit for chromosome length is Mb (Mega Base). Maybe other units, like CentiMorgan, which estimates the probability of crossover between two spots in chr, can be useful.

Besides genetics part, I am also curious about the structure of data and algorithm. The base pairs on each chromosome are grouped into defined blocks, which I guess are expressed or interested by scientists. For comparing similarity, will the whole genome be scanned?

Figure 9 is definitely a good example. It gives the raw results and procedure to analyze data.

This technical paper is well organized and detailed, with good design.

Jim Hill January 20, 2010 at 7:15 pm

I found the paper very interesting. It seems like they have a three step process for developing visualization applications, requirements (the 14 questions), taxonomy of visualization possibilities, and implementation. I would be very interested to understand the process they went through to develop these areas. From the paper it sounds like they occured one after the other, but in my experience with software development, there’s usually a certain amount of iteration involved.

I wonder if perhaps the genome view takes up more than it’s fair share of the user interface. The outer ring seems to simply be a selector, but it takes up a lot or real estate. The block/feature view looks very small in figure 1 and it seems like that would be where a lot of useful data would be. In fact, 5.3 starts out by saying this is the most detailed view of the data.

I also wonder if there are more interesting layouts that could have been used. The authors stated that they used previous synteny browsers as their inspirations; I wonder if there are more layouts to add to the taxonomy that simply haven’t been discovered yet. Having said that, the bundled b-splines looked really nice and helped to determine what was connected to what.

Kris Kosmatka January 20, 2010 at 10:03 pm

MizBee seems to be a very effective solution for this problem. I found the hierarchical viewing approach to be a nice way of addressing the immense scale range present in genomic data. It struck me that the problem specification itself is only examining one dimension of a much richer picture.

When generating the syntenies, each feature in the source genome is matched to one, and only one, sequence in the destination genome according to some similarity measure. This single choice may be artificial, in fact there may be many similar matches in the destination genome. Indeed in the case of repeating sequences, which are extremely common, there may be very large numbers of matches that are equally similar.

Instead, for a given feature, we might identify a large set of potential matches with a probability distribution over them according to their similarity measures. This would expand the visualization problem from one-to-one sequence correspondences, to one-to-many relations. One might imagine an alternate mode to the interface that would display links from each feature, or block, to all its potential matches with the color transparence weighted by the estimated probability. This would, however, lead to considerable visual clutter.

In another line of thought: I wondered about the method used for edge bundling. The visual impact of this strategy is clearly useful and important, but are the bundles justified? For example in Fig. 5b one of the bundles appears to span across a small gap, whereas the bundle adjacent to it does not. It would be more confidence inspiring to see some rigorous measure of statistical confidence in the cluster when, say, the bundle is hovered over.

njvack January 20, 2010 at 10:39 pm

This was seems like interesting and rather elegant solution to the “let’s compare genomes” problem. I don’t know enough about the genetic analysis workflow to really understand what the diagrams convey — beyond knowing “these lines connect to those lines, so these areas are similar to those areas,” I was left wondering how I’d interpret that information. And some things, such as the histogram in the block view, left me quite confused.

The biggest surprise to me was how much this paper read like an advertisement. I’d expected more of the paper to be devoted to objectively comparing different solutions to some of the data analysis problems — perhaps early design iterations — rather than the more holistic finished product presented.

I do have one gripe with the paper, and probably MizBee itself: when printed out in monochrome, most of the colors are the same shade of grey. And there’s a lot of information conveyed mainly by color here.

But: it looks like a useful app, and it sure is pretty.

dalbers January 20, 2010 at 10:55 pm

This paper is rather impressive compared to similar works in that it very thoroughly considers all of the science, art, and practice of visualization. Most of the literature that I have come across in this domain tend to use the actual biological motivation of the visualization as more of a data set and doesn’t pay it much regard in the actual construction of the visualization. However, in this paper, the biology and aesthetic of the visualization constantly appear to influence the design, as evident in the use of corresponding blocks when determining control points and using annotations as navigational cues. Also, with regard to the case studies, Figure 9 provided a great concrete example of how the tool could potentially be useful.

Overall, this paper exhibits a strong focus on the task at hand and thorough explanations of the contributions the author was attempting to make. One thing that did bother me, however, was the sudden inclusion of the related works section. This may just be my opinion, but it did feel as if the authors knew how they intended to structure the paper and simply threw in the analysis of related works in a transitional part of the paper simply because they felt it a necessary element of any research paper.

lyalex January 20, 2010 at 11:21 pm

This IEEE Transaction paper is really good, both in the method which it presents and the style in which it’s organized.
The paper first addresses the motivation of Mizbee and some important features it looks into, and then did a thoughout study of a taxonomy of layouts for chromosome sets, which makes the following description and comparison of different design decisions and applications much clearer. Then it presents 14 questions from the bilogists which need to be addressed. After that, to deal with the 14 problems, the author give a detailed description of the Mizbee interface, from the Genome View, the chromosome view to the block view. Two case studies are listed for the evaluation for Mizbee. Then conclusion is achieved. In my opinion, the highlight of the style of the paper are: first, it does a good review for the problem with which Mizbee deals, extracts14 questions from the seemingly ambiguous demand of users, and then uses a detailed taxonomy study to describe the designing, which makes the paper concise and clear to understand. The provided previous research part and case studies provides convincing evidence for the value of Mizbee. The only thing I feel a little awkward is that for most IEEE papers, previous work part always goes with the introduction section, which is not the case here.

Mizbee itself indeed is a very useful tool for genome conversation study. The outstanding performance has already been proved by the authors in the paper and in the case studies. However, the implementation part is too concise for me to discuss further. In my opinion, there might still be some drawback in Mizbee. Firstly, it seems that Mizbee lacks a way to precisely locate the interesting block of base-pairs, for example, a box to enter the number of the base pairs would be desirable. Secondly, another thing which may be worth to integrate is the function to swap the source chromosomes and the destination chromosomes. Thirdly, the capacity and performance,( i.e. what is the maximum size of the data sets that Mizbee can handle? How fast can the view, filtering and zoom operations be performed giving certain hardware?) is not presented in the paper. In the paper, we can see it can obviously handle a size of data set up to 10,000,000.

The next thing I find a little bit hard to understand is how to create the data files for Mizbee. The instructions in Mizbee.org explains the data format, and it seems quite complex. Considering the overwhelming data size in molecular genetics, how can an enduser create such a datafile by his own? A parser might be helpful, but certain parser may also request certain input format, how to maintain a valid interface still seems a issue.

Kevin January 21, 2010 at 4:11 am

My understanding of genetics is limited, and so I don’t feel comfortable arguing if MizBee is a good solution in this problem space. The discussion of developing the tool, though, was problematic because they didn’t sufficiently “justify [their] design choices for spatial layout, color, and interaction in terms of known perceptual principles” (p. 2), though perhaps they do so better than other systems they examined. I also found their characterization of the users insufficient to evaluate their decisions.

I assume that their critique of “other synteny browsers” (p. 3) refers to the browsers they discuss in section 6. The known perceptual principles they base their design on includes the “obvious and effective way” of lines and curves; alternatives are not discussed and no research cited. And if “less than one dozen colors are distinguishable when showing categorical data” (p. 4, though it appears we’ll discuss this more later), then why did they use eight instead of 10? Or 11? The color choice is also questionable since the selection they used from ColorBrewer (8-class qualitative Set1) is not explained. Looking at the tool, this selection appears “print friendly” but neither “colorblind safe” nor “photocopy-able”. Whether RGB and CMYK compatibility alone justifies the color selection is unclear. Similar issues abound in the paper.

The two case studies described do not address the users’ use of or familiarity with other synteny browsers; the second collaborator’s description of using “scatter plots and raw text analysis” (p. 7) suggests that he may have limited experience. Thus the praise assigned to MizBee, though deserved, may also apply to other tools. Making a case for MizBee’s superiority is therefore restricted to a theoretical discussion, and I am unable to evaluate the questions they used as being appropriate for general synteny browsing versus being a union of two distinct use cases with little over lap.

The tool itself looks very attractive; I’d only like to see more justification for its design, as this is one of their stated goals, and to learn more about how it compares to other synteny browsers in actual practice.

Adrian Mayorga January 21, 2010 at 7:10 am

I like the way that the authors present MizBee. There is a concise and understandable explanation of both the biological background and the task that the tool is supposed to perform. While I do not know if their table of questions is thorough, they clearly use this to guide the design and functionality of the tool.There is also some discussion about the possible layout choices.

In general I feel the weakest point of this paper is the lack of comparison with other systems. For one, they fail to compare other systems in the specific cases of their contributors. It is also not very clear how much more functionality (in terms of the questions answered) MizBee has over other tools. Lastly, there is very few discussion about the use of the tool in the general sense. While I was reasonably convinced that MizBee was useful to the two users that the authors mention, I have no notion to how it would operate in the general case, special against other tools already out there.

Nakho Kim January 21, 2010 at 7:56 am

Reading this paper, I found the pattern matching and linking process interesting because being a journalism major, my personal interests are in utilizing data visualization to explore relationship patterns between social agents. But I think what helped me most was the 14-questions approach(table 1) how the various complex research interests are boiled down to just two distinct spatial dimensions(scale and relationship).

On the other hand, I couldn’t really understand from the paper alone what specific algorithms are being used to remove the “noise” and cluster the edges, especially in the second example case. Come to think of it, how much of such details are usually included in visualisation tool introduction papers like this?

jeeyoung January 21, 2010 at 8:19 am

I like the authors put questions together and show in which scale and relationship the question can be addressed. This information, Table 1, seems to be useful for users to check which scale they should look in or what they overlook and also for others to design a similar browser.

In the genome view, edge bundling makes us to see minor block connections as figure 5 shows. I like they make the user to choose one source chromosome so that they can avoid the problem from when the user choose all the chromosomes. I am curious how the genome view changes if the ordering of destination chromosome can change.

I like the layering of annotation tracks in the chromosome view. The browser can use any annotation tracks and it seems to be easy to catch meaningful annotation in the chromosome view.

It will be nice if the browser can be expanded to compare more than two species but it may not be straightforward expansion.

ChamanSingh January 21, 2010 at 8:48 am

This wonderfully lucid paper with nice illustrations and end users perspective summarizes some of the challenges and new opportunities in domain specific visualization software. The most effective visualization are simple and must provides insights into a pattern or answer some specific questions that are almost perfunctory in nature.

Although this paper is very specific to genomics which is outside my domain of knowledge, there are however some general rules that are clearly mentioned in the paper which could be useful for any other domain specific software.

1. Characterization of the domain: List down higher level question that the software must answer.

2. Taxonomy of the design space :

3. Non scalability of Colors:

This paper also emphasizes “Edge Bundling” to remove visual clutter that often occurs with large dataset. Although the paper have mentioned use of Spline, but I am sure that real implementation must be difficult.
T
Finally, this paper nicely reinforces the general thinking that no matter how hard the implementors think and design the software, it is the end user that decide how good the visualization software is and their decisions are based on the answers they are looking for from their incomprehensible dataset.

The only concern that I have with this paper and the MizBee software is that it probably does a great job in answering 14 questions, but what if the user slightly change or pose new question. In that sense, how this software is “General” as claimed by the authors.

Overall this paper is very interesting to read and thought provoking.

watkins January 21, 2010 at 9:03 am

After Table 2, I wondered how the designers created the layout taxonomies, if they were all the possibilities, or just the ones the researchers were considering for the purposes of this project. Then in Previous Work, when describing previous software with similar functionality to MisBee, it became clear that most of these layouts had been tried before. The previous programs had shortcomings, but that seemed to be due mostly to a lack of different levels of view and the ability to move between them.

I’m curious as to how many people would find this tool useful in their work, as it seems to be very specialized. I would expect it to be small, but biology is a large field, and there were a lot of existing tools, so maybe they were trying to meed a larger demand than is apparent to me.

I’m also interested in the evaluation methods (I saw a conversation had been started about this previously in the comments.) If not many people use this tool, maybe anecdotal evaluation is sufficient for testing MisBee. I am curious, though, as to whether they tested it using a different group of people than the ones they interviewed to determine the 14 important questions that motivated the MizBee’s design.

sarahc January 21, 2010 at 9:46 am

For a paper in a specialized field that is a great departure from my own area of study, I found it easy enough to follow due to its succinct introductory sections clearly explaining the functions and characteristics of genomes, which provided sufficient background knowledge to understanding the remainder of the paper.

The rest of the paper however was also largely descriptive, such as in its explanation of MizBee, and it would have benefitted from further justification for their design choices. On page 3, the authors indicate that their generalized taxonomy is a result of their critique of “the design choices taken in other synteny browsers presented in the literature.” What, in particular, were such critiques? And how does their system differ or compare to these other browsers in terms of its design choices? Why were these decisions made?

Many others have mentioned color use in their comments, and I, too, had a similar initial response to the visualizations. Especially in Fig. 9c, the red and green in the block would fail to provide useful information to a biologist who is red-green colorblind.

turetsky January 24, 2010 at 8:51 pm

Note: Joining the class late, so doing this late.

I found the paper readable and those genetics concepts that I didn’t understand from my one high school biology course were explained well enough for me to follow. I would have liked to see more proof that MizBee is an improvement over other genome visualization software. To be fair, though, I imagine that would have to be conducted by biologists and be measured purely by their subjective preferences.

While I understand one of their methods of using a circular structure for simplifying some of the clutter and providing more information between source and destination, it just doesn’t seem as intuitive as the block view.

Previous post:

Next post: