Difference between revisions of "Course: Massive Data Analysis 2014"
Jump to navigation
Jump to search
Line 13: | Line 13: | ||
= Background (4 weeks) = | = Background (4 weeks) = | ||
== Week 1 -- | == Week 1 -- Sept 8: Course Overview; the evolution of Data Management == | ||
* Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf | * Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf | ||
Line 19: | Line 19: | ||
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA | * Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA | ||
== Week 2 -- | == Week 2 -- Sept 15: Introduction to Databases == | ||
* Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf | * Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf | ||
* Other useful reading: | * Other useful reading: | ||
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro] | ** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro] | ||
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ||
* Homework assignment: [[Assignment 1 - Data Exploration]] | * Homework assignment: [[Assignment 1 - Data Exploration]] | ||
== Week 3 -- | == Week 3 -- Sept 22: Overview: Relational Model and SQL == | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf | ||
Line 39: | Line 36: | ||
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ||
== Week 4 -- | == Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization == | ||
* Lecture notes: | * Lecture notes: | ||
Line 57: | Line 47: | ||
= Big Data Foundations and Infrastructure (2 weeks) = | = Big Data Foundations and Infrastructure (2 weeks) = | ||
== Week 5 -- | == Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop == | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf | ||
Line 70: | Line 60: | ||
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.'' | * Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.'' | ||
== Week 6 -- | == Week 6 -- Oct 13: Algorithm Design for MapReduce == | ||
* Lecture notes: | * Lecture notes: | ||
Line 82: | Line 72: | ||
= Machine Learning and Big Data (3 weeks) = | = Machine Learning and Big Data (3 weeks) = | ||
== Week 7 -- | == Week 7 -- Oct 20: Hashing and AllReduce == | ||
* Invited lecture by John Langford | * Invited lecture by John Langford | ||
Line 93: | Line 83: | ||
* Homework assignment: [[Assignment 3 - MapReduce algorithm design]] | * Homework assignment: [[Assignment 3 - MapReduce algorithm design]] | ||
== Week 8 -- | == Week 8 -- Oct 27: Bandits == | ||
* Invited lecture by John Langford | * Invited lecture by John Langford | ||
Line 101: | Line 91: | ||
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_doing_exploration.pdf | ** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_doing_exploration.pdf | ||
== Week 9 -- | == Week 9 -- Nov 3: Large Scale Machine Learning in the Real World == | ||
* Invited lecture by Leon Bottou | * Invited lecture by Leon Bottou | ||
Line 111: | Line 101: | ||
= Big Data Foundations and Infrastructure -- cont. (2 weeks) = | = Big Data Foundations and Infrastructure -- cont. (2 weeks) = | ||
== Week 10 -- | == Week 10 -- Nov 10: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages == | ||
* Lecture notes: | * Lecture notes: | ||
Line 130: | Line 120: | ||
= Big Data Algorithms and Techniques (3 weeks) = | = Big Data Algorithms and Techniques (3 weeks) = | ||
== Week 11 -- | == Week 11 -- Nov 17: Data Management for Big Data (cont) and Association Rules == | ||
* Lecture notes: | * Lecture notes: | ||
Line 139: | Line 129: | ||
* Homework Assignment -- Your quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April 28th.'' | * Homework Assignment -- Your quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April 28th.'' | ||
== Week 12 -- | == Week 12 -- Nov 25: Finding similar items: Invited lecture by Dr. Harish Doraiswami == | ||
* Lecture notes: | * Lecture notes: | ||
Line 150: | Line 140: | ||
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit | ** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit | ||
== Week 13 -- | == Week 13 -- Dec 1: Graph Analysis and Exam Review == | ||
* Lecture notes: | * Lecture notes: | ||
Line 156: | Line 146: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/exam-review.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/exam-review.pdf | ||
== Week 14 -- | == Week 14 -- Dec 8: Final Exam == | ||
== Week 15 -- | == Week 15 -- Dec 15: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) == | ||
* Lecture notes: | * Lecture notes: |
Revision as of 01:57, 8 September 2014
CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change
- Course Web page: http://cs.nyu.edu/courses/spring14/CSCI-GA.2568-001/index.html
- Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana/)
- Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.
News
- Welcome!
Background (4 weeks)
Week 1 -- Sept 8: Course Overview; the evolution of Data Management
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
- Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
- Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA
Week 2 -- Sept 15: Introduction to Databases
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
- Other useful reading:
- Homework assignment: Assignment 1 - Data Exploration
Week 3 -- Sept 22: Overview: Relational Model and SQL
- Lecture notes:
- Other useful reading:
Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization
- Lecture notes:
- Homework assignment: Assignment 2 - Data Exploration using SQL
Big Data Foundations and Infrastructure (2 weeks)
Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
- Other useful reading:
- Hadoop: The Definitive Guide. http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
- Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.
Week 6 -- Oct 13: Algorithm Design for MapReduce
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2.
Machine Learning and Big Data (3 weeks)
Week 7 -- Oct 20: Hashing and AllReduce
- Invited lecture by John Langford
- Lecture notes:
- Homework assignment: Assignment 3 - MapReduce algorithm design
Week 8 -- Oct 27: Bandits
- Invited lecture by John Langford
- Lecture notes:
Week 9 -- Nov 3: Large Scale Machine Learning in the Real World
- Invited lecture by Leon Bottou
- Lecture notes:
Big Data Foundations and Infrastructure -- cont. (2 weeks)
Week 10 -- Nov 10: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages
- Lecture notes:
- Required reading:
- Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
- Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
- MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
- Additional reading:
- Pig Latin: A Not-So-Foreign Language for Data Processing: http://pages.cs.brandeis.edu/~olga/cs228/Reading%20List_files/piglatin.pdf
- Hive - A Warehousing Solution Over a Map-Reduce Framework: http://www.vldb.org/pvldb/2/vldb09-938.pdf
Big Data Algorithms and Techniques (3 weeks)
Week 11 -- Nov 17: Data Management for Big Data (cont) and Association Rules
- Reading: Chapter 6 Mining of Massive Datasets
- Homework Assignment -- Your quiz is available on Gradiance. It is due on April 28th.
Week 12 -- Nov 25: Finding similar items: Invited lecture by Dr. Harish Doraiswami
- Reading: Chapter 3 Mining of Massive Datasets
- Homework Assignment
- There are two new quizes on Gradiance -- Distance measures and document similarity. They due on May 5th.
- Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit
Week 13 -- Dec 1: Graph Analysis and Exam Review
- Lecture notes:
Week 14 -- Dec 8: Final Exam
Week 15 -- Dec 15: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)
- Lecture notes:
- Reading:
The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf
Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/
Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf