Difference between revisions of "Course: Big Data 2015"
Jump to navigation
Jump to search
Line 69: | Line 69: | ||
* Programming assignment: Map Reduce | * Programming assignment: Map Reduce | ||
== Week 5: Parallel Databases vs MapReduce | == Week 5: MapReduce Algorithm Design Patterns; Parallel Databases vs MapReduce | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-algo-design-patterns.pdf | |||
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/paralleldb-vs-hadoop-2014.pdf | ** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/paralleldb-vs-hadoop-2014.pdf | ||
= Transparency and Reproducibility (1 week) = | = Transparency and Reproducibility (1 week) = |
Revision as of 23:29, 1 March 2015
DS-GA 1004- Big Data: Tentative Schedule -- subject to change
- Course Web page: http://vgc.poly.edu/~juliana/courses/BigData2015
- Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana)
- Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
- Some classes will include a lab session, please "always bring your laptop.
News
- 2/26/2015: An Amazon AWS token was emailed to each student. Please create your Amazon AWS account. You can find instructions at: http://www.vistrails.org/index.php/AWS_Setup
- 2/26/2015: You should install the Cloudera VM on your laptop. We will need that for the lab on March 9th. Here are the instructions: Cloudera VM Setup
- There is a new version of the textbook Mining of Massive Datasets, we will use the latest version 2.1
Background (2 weeks)
Week 1 - Feb 2: Course Overview; The evolution of Data Management and introduction to Big Data
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
- Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
- Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
Week 2 - Feb 9: Introduction to Databases, Relational Model and SQL
- Lecture notes:
- Lab:
- SQL hands on: Big Data 2015 - SQL Lab
- Other useful reading:
- Programming assignment: Using SQL for data analysis and cleaning (see NYU Classes)
Feb 16: Holiday
Big Data Foundations and Infrastructure (3 weeks)
Week 3 - Feb 23: Introduction to Map Reduce
- Lab: (continuation)
- SQL hands on: Big Data 2015 - SQL Lab
- Lecture notes:
- Required Reading:
- Data-Intensive Text Processing with MapReduce. Chapters 1 and 2
- Mining of Massive Datasets (v 2.1). Chapter 2 - 2.1, 2.2, and 2.3
- Other useful reading:
- Hadoop: The Definitive Guide. http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
- Quiz 1 (Map Reduce) assigned -- check http://www.newgradiance.com/services
Week 4: Algorithm Design for MapReduce: Relational Operations
- Lecture notes:
- Lab: Hands-on Hadoop
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2.
- Programming assignment: Map Reduce
== Week 5: MapReduce Algorithm Design Patterns; Parallel Databases vs MapReduce
- Lecture notes:
Transparency and Reproducibility (1 week)
Week 6: Data Exploration and Reproducibility
- Lab: VisTrails
- Programming assignment: Exploring urban data
Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)
Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)
- Lecture notes:
Week 9: Association Rules
- Lecture notes:
- Assignment on frequent items and association rule mining. Due on Dec 7th. Check http://www.newgradiance.com/services
- Reading: Chapter 6 Mining of Massive Datasets
- Suggested additional reading:
- Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
- Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
- Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
Week 10: Finding similar items
- Reading: Chapter 3 Mining of Massive Datasets
- Homework Assignment
- There are two new quizes on Gradiance -- Distance measures and document similarity. They due on May 5th.
- Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit