DC Example Data

Another Simple Example

February 27, 2010

in Uncategorized

Here is a simple example of a trajectory (an evolution of the group).

In each case, we start out with things in 2 groups, and the nodes reconnect into 3 groups. There are 3 intermediate steps.

I have generated this data for two different network sizes (12 and 18).

To make things easier, I have not permuted the groups when things are divided into 3 (so, the groups are [0,1,2,3] [4,5,6,7] [8,9,10,11]). But I have also included cases where everything is permuted.

The file naming convention is: bg18pn_000_100.csv which means:

  • 18 = 18 node network
  • pn = start (2 groups) is permuted, end (3 groups) is not permuted (pp=both beginning and end are permuted)
  • 0% of the start
  • 100% of the end

Some examples:

Big ZIP of n=12,18,24, with and without permuted ends: bg.zip

Here is another set of real data. This data comes from a single venue, and represents 4 “stories” that the students wrote over the semester (its 1,2,4,10). Is there a progression over time?

Some Real Data

February 23, 2010

in Uncategorized

Here is some real data for the design challenge.

These are 8×8 matrices (for an 8 concept epistemic frame). There were 3 sets of students (practicum, game, and course). Each of these matrices represents the average over all students in the set (known as venue) and all of the “stories” in the venue.

If you want, the labels for the 8 nodes (in order) are:

S_investigating
S_detailed_description
K_story
K_reporting
K_reader
V_informing_the_public
V_engaging_reader_story
E_rich_details

In order to keep the “assignments” category clean, I am putting posts about example data for the design challenge under a tag: DC Example Data. If you go to that link, you’ll see all the posts about example data. There’s not much there yet, but keep watching…

This is a simple example of synthetic data, generated using the cocktail party simulator.

All of these data files come from the same network: a 12 person party with 1 host. All guests know the host and 2 other people (so D knows A (the host) and C and E (its two neighbors).

In the simulation, we add two factors:

sampling (how many observations do we take to build the matrix). in many cases, we are undersampling (not getting enough samples to really capture the phenomenon, which will lead to noisy measurements)

measurement noise (random chance added to the numbers). basically, this says that when we make an observation, there’s a chance it might be a random event (two people that do not know each other still may talk to each other, or two people are talking to each other, but we missed it)

This example should allow you to see how well your techniques deal with these two factors. The underlying phenomenon is the same (so we would hope to have very similar representations), but the errors might make that harder to discover.

The datafiles have the names formed as:

P 12 x 100 – 0 – 1

which means:

  • 12 person party (all these are the same)
  • x means that its the single host party (we’ll see other networks in future data)
  • 100 means 100 samples
  • 0 means no noise (6 means the +/- 3 noise added to each conversation selection)
  • 1 is the trial (there are two trials of each condition given)

Here is a ZIP of a bunch of these: p12x.zip (16 to be exact)

(right now, I can’t upload individual CSV files – but we’re working on fixing that)