Design Challenge 1 – Datasets

by Aditya Barve on September 16, 2019

These are the three choices for data sets for Design Challenge 1. The first choice has three options.

Airline On-Time Performance

These data sets comes from the Bureau of Transportation Statistics. Each data set contains data about 40,000 flights from 2019. Refer to this page for explanations of all the fields and look up tables which describe what the codes mean. You may use one of these three variants:

  1. 40k flights from January (download dataset)
  2. 20k flights each from January and February (download dataset)
  3. 10k flights each from January to April (download dataset)

Census Data by County (download dataset)

Socioeconomic indicators like poverty rates, population change, unemployment rates, and education levels vary geographically across U.S. States and counties. This data set combines data from these topics into a single file. See documentation on the constituent data sets.

Beijing Air Quality and Weather (download dataset)

Data about Beijing air quality and weather has been joined into a single cohesive table.

The air quality data describes the concentration of air pollutants measured in microgram per cubic meter. Multiple samples were measured for each day, and the fields recorded are statistical summaries of the pollutant distribution. For example, “Upper95Pollution” is the 95th upper percentile, or the top 5%.

The weather data comes from Weather Underground.

This data set was contributed by Brad Stieber, a past CS765 student.

Previous post:

Next post: