Difference between revisions of "Assignment 2 - Data Exploration using SQL"
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Assignment Description == | == Assignment Description -- Due on March 3, 2014 == | ||
In your first assignment, you explored MTA data about subway fares using a data exploration tool. Now, you will further explore this data set using SQL. | In your first assignment, you explored MTA data about subway fares using a data exploration tool. Now, you will further explore this data set using SQL. | ||
You will use MySQL. Please follow the instructions in the [http://www.vistrails.org/index.php/Big_Data_Lab_notes_02/19/14 lab notes] to install the system. | You will use MySQL. Please follow the instructions in the [http://www.vistrails.org/index.php/Big_Data_Lab_notes_02/19/14 lab notes] to install the system. | ||
You will work with the following tables: | |||
<code> | |||
create table fares_jan18( | |||
remote varchar(10), | |||
station varchar(100), | |||
ff int, | |||
sendis int(11), | |||
7d int(11), | |||
30d int(11), | |||
students int(11)); | |||
create table fares_feb1( | |||
remote varchar(10), | |||
station varchar(100), | |||
ff int(11), | |||
sendis int(11), | |||
7d int(11), | |||
30d int(11), | |||
students int(11)); | |||
create table stations( | |||
name varchar(100), | |||
lat float, | |||
long float, | |||
line varchar(50), | |||
lines varchar(50)); | |||
</code> | |||
To create these tables and populate them, first download the dump file from: http://vgc.poly.edu/~juliana/courses/BigData2014/Assignments/2-SQL/mta.sql | |||
Then, from a terminal window invoke MySQL and issue the following commands: | |||
<code> | |||
> use test; | |||
> source mta.sql | |||
</code> | |||
If you are not running MySQL from the same directory where the file mta.sql is located, you need to specify the full path for the file. | |||
You are encouraged to use the Web as a resource to find more information about MySQL. | You are encouraged to use the Web as a resource to find more information about MySQL. | ||
Line 10: | Line 57: | ||
== Submission Instructions == | == Submission Instructions == | ||
You | You are required to submit to NYU Classes two text files: | ||
* as2.sql -- containing the queries you will write to answer the questions below. Use the following template for this file: | * as2.sql -- containing the queries you will write to answer the questions below. Use the following template for this file: | ||
<code> | <code> | ||
-- remember to end each query with a semi-colon | -- remember to end each query with a semi-colon | ||
-- ********************Q1******************* | -- ********************Q1******************* | ||
SELECT 'Q1' AS ' '; | SELECT 'Q1' AS ' '; | ||
<insert your query Q1>; | |||
-- ********************Q2******************* | -- ********************Q2******************* | ||
SELECT 'Q2' AS ' '; | SELECT 'Q2' AS ' '; | ||
<insert your query Q1>; | |||
</code> | </code> | ||
.... | .... | ||
* as2.log -- containing the answers to you queries | * as2.log -- containing the answers to you queries | ||
* For extra credit, you can submit a third file named as2-ex.txt where we can explain how you would improve and optimize the design of this database. | |||
== Queries == | |||
* 1. How many subway stations are there in New York City? | * 1. How many subway stations are there in New York City? | ||
Line 39: | Line 90: | ||
* 6. Which station had the largest increase in the number of full-fare tickets sold between Jan 18 and Feb 1? | * 6. Which station had the largest increase in the number of full-fare tickets sold between Jan 18 and Feb 1? | ||
* 7. Which station had the largest decrease in the number of full-fare tickets sold between Jan 18 and Feb 1? | * 7. Which station had the largest decrease in the number of full-fare tickets sold between Jan 18 and Feb 1? | ||
* 8. What was the average difference in the number of full-fare tickets sold between Jan 18 and Feb 1 across the stations of the Broadway line? | * 8. List the names of the stations whose decrease in the number of full-fare tickets sold between Jan 18 and Feb 1 is greater than 1000. | ||
* | * 9. What was the average difference in the number of full-fare tickets sold between Jan 18 and Feb 1 across the stations of the Broadway line? | ||
* 10. How many full-fare tickets were sold in NYC in the week of Jan 18? | |||
* 11. List all the stops of the F train. |
Latest revision as of 15:13, 27 February 2014
Assignment Description -- Due on March 3, 2014
In your first assignment, you explored MTA data about subway fares using a data exploration tool. Now, you will further explore this data set using SQL.
You will use MySQL. Please follow the instructions in the lab notes to install the system.
You will work with the following tables:
create table fares_jan18(
remote varchar(10),
station varchar(100),
ff int,
sendis int(11),
7d int(11),
30d int(11),
students int(11));
create table fares_feb1(
remote varchar(10),
station varchar(100),
ff int(11),
sendis int(11),
7d int(11),
30d int(11),
students int(11));
create table stations(
name varchar(100),
lat float,
long float,
line varchar(50),
lines varchar(50));
To create these tables and populate them, first download the dump file from: http://vgc.poly.edu/~juliana/courses/BigData2014/Assignments/2-SQL/mta.sql Then, from a terminal window invoke MySQL and issue the following commands:
> use test;
> source mta.sql
If you are not running MySQL from the same directory where the file mta.sql is located, you need to specify the full path for the file.
You are encouraged to use the Web as a resource to find more information about MySQL.
You can exchange ideas with your classmates, but the work you submit should be your own. Copying is not allowed.
Submission Instructions
You are required to submit to NYU Classes two text files:
- as2.sql -- containing the queries you will write to answer the questions below. Use the following template for this file:
-- remember to end each query with a semi-colon
-- ********************Q1*******************
SELECT 'Q1' AS ' ';
<insert your query Q1>;
-- ********************Q2*******************
SELECT 'Q2' AS ' ';
<insert your query Q1>;
....
- as2.log -- containing the answers to you queries
- For extra credit, you can submit a third file named as2-ex.txt where we can explain how you would improve and optimize the design of this database.
Queries
- 1. How many subway stations are there in New York City?
- 2. List all stations in the Broadway Line.
- 3. List the latitude, longitude and number of full-fare tickets sold during the week of Jan 18 at all stations in the Broadway line in decreasing order of number of tickets.
- 4. List the difference between the number of full-fare tickets sold at the different stations of the Broadway line between Jan 18 and Feb 1.
- 5. List the difference between the number of 7-day and 30-day tickets sold at the different stations in the Broadway line between on Jan 18 and Feb 1.
- 6. Which station had the largest increase in the number of full-fare tickets sold between Jan 18 and Feb 1?
- 7. Which station had the largest decrease in the number of full-fare tickets sold between Jan 18 and Feb 1?
- 8. List the names of the stations whose decrease in the number of full-fare tickets sold between Jan 18 and Feb 1 is greater than 1000.
- 9. What was the average difference in the number of full-fare tickets sold between Jan 18 and Feb 1 across the stations of the Broadway line?
- 10. How many full-fare tickets were sold in NYC in the week of Jan 18?
- 11. List all the stops of the F train.