Design Challenge 3: A Real Problem

by Mike Gleicher on April 7, 2017

Update: 4/18/2017 – Handin Policy and Milestone requirements added.
Update: 4/18/2017 – Working with a partner policy posted.
Update: 5/2/2017 – Demo policy changed

Update: 5/2/2017 – See the other “Endgame” posting for more into on handins /demos. (the post will be made today)

Schedule

  • April 10, Challenge Announced!
  • April 17 (or before) – initial data sets provided
  • April 19 – in class discussion
  • Tuesday, April 25 – Milestone 1 (status update)
  • Tuesday, May 2 – Milestone 2 (rough draft)
  • Thursday, May 4 – “Official” deadline
  • Tuesday, May 9 – Unofficial deadline
  • Wednesday, May 10 – Last day to turn things in

Officially, the class ends on May 4th. However, I can give everyone an extension to this deadline. The actual ending is hard to figure out since this class wasn’t assigned a final exam time. So, I am arbitrarily picking Tuesday May 9th as the goal (you get a “no-cost extension” if you turn things in by Tuesday, May 9th). Thursday May 11th is a hard deadline – I need to get grading done so I can get my grades in before graduation, which is technically not a requirement but is helpful for those “last-semester” students.

Overview

As you are aware from the mid-semester feedback, I now have the ability to collect quantitative data about the discussions in this class. While the simple statistics from the discussions are not a great proxy for the actual content (e.g. a post can be short and insightful, or long-winded and irrelevant), they may be better than nothing.

Your challenge: help me look at this data!

I might be able to get more data. I might be able to derive some more features of the data (e.g., do some language understanding to count interesting words or something). But for now, consider only the simple data. (if you want to suggest how your design scales to better data, let me know)

In some ways, the data is simple: it’s a 2D array (students are rows, assignments are columns).

In some ways, it’s not so simple: for every entry of the table (student, assignment), there is a list of postings. For each posting, we know the length in words and the number of images/links. For every assignment, we know if the initial posting was on time, and we have the manually assigned score. (here the “ontime” is given as “number of hours after the deadline” so a negative number is ontime).

Name ID Assignment 1 Assignment 2
Alice 1 50,-24,[(1000,1),(200,0),(150,1)] 50,-23,[(1200,2),(125,1)]
Bob 2 25,25,[(400,1)] 40,-1,[(1000,1),(100,0),(120,0),(80,0),(90,0)]
Carrol 3 45,2,[(2000,0)] 50,-25,[(1500,1)]
Dave 4 45,26,[(2000,0),(100,0),(100,0)] 50,4,[(1500,1),(200,1),(150,1)]

Read this as “for Assignment 1, Alice got a score of 50, turned in her assignment 24 hours before the deadline, and made 3 posts (the first has length 1000 characters and had one image/link)”.

The tricky thing in the data is that each table entry has a list, and that list is actually of pairs of numbers.

I can tell you from experience, looking at a big table of this (49 students in class * (9 discussions + 9 seek and find discussions)) is hard to make much sense of. Last years data (more students, more assignments) was even harder. (and my table didn’t even have the ontime information)

How can you apply your visualization experience to make this better?

Of course, your first question should be “What’s the task?” – and of course, you’ll need to figure that out. But generally, the task is to help the instructor use this information to assess the class (both in terms of grading, but also to see if there are good/bad assignments). You may consider secondary tasks (like helping  student figure out where they stand), however, the primary task is your main responsibility.

The next question is “what’s the data” – the table is an example. I’ll give you more (fake data, I can’t give you class data – but if you write a program, we can run it on the real data). You need to present data that comes in this form.

And of course, since this is a class assignment, there’s the question “what do I need to do?”

In the ideal world, I would ask students to build interactive systems that could load in a data file with this kind of data in it and allow a user to explore it. But, since not everyone is a programmer, and even if you are a programmer 3 weeks is not a lot of time to build an interactive system, we have lower expectations – and give you a choice in how to meet them.

Kinds of Solutions

I distinguish between three different kinds of solutions:

  1. a sketch – this is a “description” of a design (it may include a visualization using fake data). You can create some “mockup” or “prototype” (which doesn’t actually show the real data), and use text or diagrams or storyboards to convey what the real thing would look like (especially with interaction/animation/ …)
  2. a visualization – this is an actual visualization using real data (one of the provided sets)
  3. a tool – this is a program that can take in data sets and produce visualizations. For this assignment, your tool should produce visualizations (#2) for all of the example data sets. But the real test in this category: will it work on a data set that you don’t have for testing?

Given the range of students in the class, your final turn-in can be from any one of those categories. Of course, the expectations are different: to do a “great” sketch, it really has to be great; if you submit a tool, doing something decent/reasonable on the test data will be great (since building something on such short order is impressive).

You can pick which category your submission is in. Category 3 (tools) require programming, but you might do some programming for Category 2 or Category 1 (e.g. write something that programmatically generates a visualization).

A tool must be able to read in different data files. It may either allow for interaction, or make a static picture. For example, you might make a python program that reads in a data file and writes out an image file, or a (static) web page with a picture on it. (that’s what I would do, if you’re wondering).

The expectations vary from category to category. More details forthcoming (we’ll give a rubric), but…

If you submit a sketch, there is pressure to come up with a really good design and rationale. You need to provide a convincing explanation that your design would scale to the size of this class (and beyond). You’ll need to explain how your tool would help show some of the “problems” described below. You’ll need to provide some self-critique as to how well things work, and how well they will scale (to bigger classes and more assignments). You’ll need to show that you’ve thought through a range of tasks. You should probably describe how interaction would work.

If you submit a tool, we understand that just getting the basics in place is a big task, and that you don’t have a lot of time to learn about vis and graphics programming, and you might not have a toolkit at your fingertips. So, your designs may not be as fancy and well-thought out as the things we’ll see in sketches. I want you think through the design before you implement (I recommend submitting a sketch of what you’re trying to implement). If you get something that can read in the test file and show it in a reasonable way, that will get you a good grade.

As an added bonus, for at least one of the discussion assignments, we will have lower expectations if you are planning to turn in a tool submission.

Sample Tasks and Problems

This problem should be familiar enough to you (since you have experience with at least one side of it) that you can think of some of the tasks.

Imagine you are the class instructor. You want to get a sense of how students are doing on these quantitative measures. Maybe it’s the end of the semester and you’re giving grades, maybe you’re double checking your manual scoring system to see if there are anomalies. Maybe you’re planning the next class and trying to decide how to improve the assignments.

The main task you must consider is giving the instructor an overview of the semester’s worth of data.

There are many more specific tasks that can be part of this. Assess individual students, figure out the “baseline” for the class, identify problematic assignments, find anomalies that might be indicative of problems, …

You can also imagine tasks that a student might want to do with this data (e.g. see where they stand in relation to their peers). In such a scenario, the visualization would probably show only a single student, and provide some kind of summary of the whole class (since a student shouldn’t be shown everyone else’s data except in summary)

NOTE: if you want to provide a design for this “student view” that’s great – but you also must provide a solution for the instructor view (get a picture of the whole class)

Some things that might be detectable in the data (and I will hopefully generate synthetic that  provides examples of some, if not all, of these):

  • A student who does consistently well / consistently poorly
  • An assignment that seemed to be a dud (didn’t create interesting responses or discussion)
  • A student whose responses are “fishy” and might be a sign of trying to beat the system since they know we’re applying quantitative metrics. (I’m not accusing anyone in this class of trying to cheat the quantitative metrics in this class. But you could imagine if we were teaching a big undergrad class or something)
  • A good student who got sick and had a while where their assignments were bad
  • A student who does very good work occasionally, but is inconsistent
  • A student who consistently does good work
  • Signs that the data gathering process was broken and had some noise
  • A student who got mid semester feedback and started doing better work
  • An overall assessment as to how well the quantitative metrics match up with the subjective ones.
  • Identification of anomalies where the subjective scores and the objective scores don’t line up. (e.g., a student writes long answers but gets low scores or is very terse and gets high scores).
  • and you can think of others…

.As part of the assignment you will need to provide an analysis of task (what tasks is your design meant to address?), as well as to show some specific examples of how the design would allow for some of these specific cases to be identified.

Data

For privacy reasons, I cannot provide you with the actual data from the class. However, I have it and will test your program on it (if you make a Category 3 submission). I also hope to have data from the 2015 class (which was bigger and had even more diversity in discussion responses).

I will generate synthetic data for people to use. Expect to get:

  1. A small data set for testing
  2. [at least one, maybe a few] “full size” realistic data sets (same size as the class)
  3. [at least one, maybe a few] “full size” realistic data set with some interesting things “planted” – this way you can test to see if your design works to help an instructor. Some of these, I’ll tell you what to look for. Some of these, you can puzzle over.
  4. Data sets at various scales – so you can see how your design will scale as the class gets bigger, or we have more assignments.

The data is provided in JSON format. A simple random data generator is available in the GitHub Repo, as are some sample data sets generate with it. The sample data generator documents the json format.

The data is also provided in CSV format – it is meant to be a “human readable” format. If you are doing a tool submission, use the JSON format (that is a recommendation, not a requirement).

Simple data examples on GitHub.

Turning in Programs

Your documentation should explain what your design was trying to do (even if it doesn’t do it perfectly).

In case I cannot run your program, or it doesn’t work on the test data, your hand in must provide example outputs on the sample data sets.

I have no idea how I will be able to run your program. Especially, if it uses tools that I don’t normally use. We’ll probably need to have a “demo session” where you can come and show off your program. It is more fun to do this with everyone (so people can see what each other has done). But that might be tiresome with such a large class.

For this reason, we will have a questionnaire about hand-in requirements so we can plan for the finale.

If you turn in a program, you should turn in:

  1. A document describing your task analysis that clearly expresses your goals (what tasks are you trying to support) and describes your design.
  2. Example outputs (if your tool is interactive, please make screen shots or — even better — screencast videos) for some of the more challenging data sets (specific requirements to follow – but probably 2-3 of the data sets with more than 40 students).
  3. A document explaining how you can see some of the “interesting things” in outputs from the data (the examples you turned in). For example, you might say “you can see a student who is consistently bad since their entire will be blue” and so on. (this connects with #2)
  4. Description of how to run / use the tool. Be sure to include all requirements (like what version of the language you need, what libraries, …). Note: we may not try to run it ourselves.
  5. All the source code and assets required to run it.

We will invite you to turn in a draft so we can check to see if things are on the right page with expectations and give you a little “pre-final submission feedback.” Also, since the demo will be interactive, if things don’t work perfectly you can tweak/adjust/explain as necessary.

In a sense, your assignment will be to turn in everything for a “Visualization” submission, (including a visualization and the discussion of it) – and a program to make more.

At the end, we will expect a ZIP file on Canvas.

Note: we might require a demo and/or ask people to make video demonstrations.

(see “Final Turn In for Assignments” below)

Turning in a Visualization Submission

If you are just turning in a visualization…

  1. A document describing your task analysis that clearly expresses your goals (what tasks are you trying to support) and describes your design.
  2. The visualization – including specifying what data set it was done one (you might want to turn in multiple visualizations for multiple data sets)
  3. An explanation of how you can see interesting things in the data in your visualization. You may also want to say the things that you are not seeing (but you could see with the design).
  4. A description of how you made the visualization.
  5. A discussion of how your design would work on other data sets (if you had a tool that automated the creation of it). Consider how the design would scale (to more students, more data, other information about the assignments, …)

Again, turn in a ZIP file.

If you want to do well at this, you may consider adding some of the elements of a sketch submission (like a rationale for your design, discussion of iteration, and a self-critique).

(see “Final Turn In for Assignments” below)

Turning in a Sketch Submission

Note that the expectations are higher here for the parts that you have to do. Since you aren’t doing the data operation work, you need to make up for it by doing more design/critique work.

  1. A document describing your task analysis that clearly expresses your goals (what tasks are you trying to support) and describes your design. Be sure to give a rationale for your design. You may want to describe some alternatives you considered so you can help us understand why this design is better than others.
  2. Your design – this probably includes some sketches, as well as some description (what would it look like for different data, if there are interactions, describe them …)
  3. An explanation of how different kinds of phenomena that the viewer might be looking for would show up in your design (again, maybe more sketches)
  4. A self-critique

(see “Final Turn In for Assignments” below)

Some thoughts on the design

The nature of the data doesn’t obviously fit into a standard design. Maybe you can be more clever and come up with a way to show the data as an ensemble of standard designs. Most of my thoughts turn out to be unlike standard designs (which is why sketching / programming is necessary).

It could be that a single chart isn’t the right strategy. My initial design was 2 separate tables (rows=student, columns=assignment, cells=[length of longest, # of postings] which didn’t convey all of the information, but gave me a starting point. It was inconvenient to go back and forth between tables.

Really think about the tasks – what do you want to do at the scale of the whole class.

How will this be graded?

The design challenge – no matter which option you choose – will be graded on an A-F scale.

Given the short amount of time we have to grade submissions, I am not sure how we’re going to do it.

Some things we will look for:

  1. Have you identified good tasks?
  2. Have you thought through a design that addresses the tasks?
  3. Have you communicated your design well?
  4. Have you demonstrated that your design can be successful on the tasks?
  5. How creative is your design? Does it seem like something that would be helpful?
  6. Does your implementation actually produce the designs given valid input data?
  7. Are the details of the design well-chosen and making use of the principles discussed in class?

Thoughts on Tools

I want you to use whatever tools you want. If you’re brave, you can use this as an excuse to learn Javascript (so you can learn D3), but you might want to use tools you are familiar with.

As I mentioned in class, Processing is a Java variant with easy graphics, a JSON reader, and is a simple way to prototype interactive visualizations.

I did my experiments in Python (and I will share my code). I used the svgwrite library to make it easier to write SVG – but it’s unclear if that was actually easier than just doing it with string I/O.

The Example Code and Tools

Note:  I will make more stuff available.

Things are available in a GitHub repository: https://github.com/uwgraphics/cs765-dc3-data-and-code

There is python code to make a simple visualization (reads in JSON and writes out an SVG). An example of its output is in the “SimpleData” subdirectory. It’s really simple. It took me longer to set up the GitHub repo or write about it. It can give you a starting point.

I have made a random data generator. It used a list of silly names generated by a web based random name generator (so yes, they are silly). Right now, the random data is boring – I will improve the random generator to make it generate more interesting data.

To get a handle on the file formats, here are some really simple files. Test your programs on these. You cannot turn in visualizations of these if you are doing a “visualization submission” – but you can use them to practice.

Working with a partner

You may work with a partner. If you wish to work with a partner, you must tell us in the initial check in (Tuesday April 25). If you work with the partner, you must:

  1. Both partners get the same grade for the assignment.
  2. Both partners must turn in something using the turn in system – one partner should turn in a note saying who their partner is where we should find the assignment.
  3. The expectations for pairs are higher than for individuals, but not significantly so.
  4. Pairs may not do “sketch” solutions.
  5. If pairs choose to do a “visualization” (as opposed to a tool) solution, they must turn in visualizations for 2 different data sets (using the same design). (see the 5 parts of a Visualization submission – you need to have 2 of part 2)

The Milestones

Note: we will “grade” the milestones as check/no check. We will penalize your project grade if you do not turn them in on time.

  • Milestone 1 – Tuesday, April 25th – getting started. For this milestone, you need to leave a comment in Canvas (just use the type in box) that says the following:
    • whether you are working with a partner, and who the partner is (note: if you are working in a pair, both partners need to do the milestones)
    • which type of solution you plan to submit (sketch, visualization, or tool). Note that this is not “binding” – you can change your mind – but give us your best guess.
    • if you plan to do a tool, what environment you plan to do it in (e.g., Javascript with D3, or Processing). please tell us your level of familiarity with the tool (for example “I’m good with Javascript, but am just learning D3” – this will help us try to guide you if you are being too ambitous). Note that this is not “binding” – you can change your mind – but give us your best guess.
  • Milestone 2 – Tuesday, May 2nd – confirming that you’ve made progess. You must leave a comment in Canvas (just use the comment in the type-in box) and tell us the following:
    • confirm the type of submission you are making – at this point, your choice is binding (or, if you want to change you must inform the course staff). We need to know this to plan grading
    • if you are doing a tool assignment, let us know the platform. (this is important to help us plan for looking at assignments). tell us what we’ll need to run it ourselves (e.g. “Python 3.6 with libraries X,Y and Z” or “it will be a web page that runs in chrome and can be hosted by any web server”)
    • give a brief description of the status of your project. (a few sentences). we want to make sure everyone has at least started.
    • confirm that you’ve at least looked at some of the data (unless you’re doing a sketch submission)

Final Turn In for Assignments

For Sketch Solutions and Visualization Solutions: Please turn in either: a single PDF that has all the listed parts (preferred), or a ZIP file with each part separate on the Canvas assignment.

You may turn your assignment in any time before noon on Tuesday, May 9th without penalty. (Note: unusual deadline of noon! not the usual midnight). Assignments turned in after noon on Tuesday may be penalized depending on how late they are. Assignments turned in after Wednesday, May 10th (e.g. midnight) will not be graded.

If you turn things in before noon on Monday May 8th, we will check your assignment. This will give us a chance to ask questions or possibly make suggestions (so you can resolve issues to improve things before real grading). We may give preferential grading to these early assignments.

For Tool Solutions: This is tricky since we may or may not be able to run your program without you. We may need to have you give us a demo. Or we might be able to tell enough by looking at what you turn in (e.g. sample outputs and code).

We will hold a “demo session” on Tuesday afternoon, May 9th starting at 2pm, and going until we’ve seen everyone’s program. Location to be determined. Unfortunately, we cannot schedule slots – so we might just have people go in random order and show things in front of everyone. We may try to figure out some scheme so that everyone doesn’t have to be there the whole time.

We will schedule demos on Tuesday/Wednesday afternoon so that everyone doesn’t have to sit around for 6+ hours.

If you are going to demo, you must turn in your assignment before the demo session. If things come up at the demo session, we may allow you to change your assignment.

If you turn in your assignment before noon on Monday, May 7th, we will check it. If we are able to run it ourselves, we will let you know (so the demo is optional). If we have questions to feedback, we will give you a chance to fix things.

If you do not participate in the demo session, you may turn in your assignment any time on Tuesday May 8th, without a lateness penalty. Assignments turned in after Wednesday, May 10th (e.g. midnight) will not be graded. If you do not choose to demo (or submit early so we can confirm that you do not need to), you are at your own risk.

If you turn in a video showing off your tool operating on multiple data sets, you probably won’t need a demo.

 

Previous post:

Next post: