Difference between revisions of "Course: Big Data 2015"
Jump to navigation
Jump to search
(Created page with '= DS-GA 1004- Big Data: Tentative Schedule -- ''subject to change'' = * Course Web page: http://vgc.poly.edu/~juliana/courses/BigData2015 * Instructor: Professor Juliana Freire…') |
|||
Line 10: | Line 10: | ||
= Background (4 weeks) = | = Background (4 weeks) = | ||
== Week 1 Course Overview; The evolution of Data Management and introduction to Big Data == | == Week 1: Course Overview; The evolution of Data Management and introduction to Big Data == | ||
* Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf | * Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf | ||
Line 16: | Line 16: | ||
* Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form | * Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form | ||
== Week 2 Introduction to Databases, Relational Model and SQL == | == Week 2: Introduction to Databases, Relational Model and SQL == | ||
* Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/intro-to-db.pdf | * Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/intro-to-db.pdf | ||
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/relational-algebra.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/relational-algebra.pdf | ||
Line 26: | Line 26: | ||
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)] | ||
== Week 3 Other Data Models and Query Optimization == | == Week 3: Other Data Models and Query Optimization == | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/xml_schema_query.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/xml_schema_query.pdf | ||
Line 35: | Line 35: | ||
* Programming assignment: Using SQL for data analysis and cleaning | * Programming assignment: Using SQL for data analysis and cleaning | ||
== Week 4 Data Exploration and Reproducibility == | == Week 4: Data Exploration and Reproducibility == | ||
* Lecture notes: http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf | * Lecture notes: http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf | ||
Line 46: | Line 46: | ||
= Big Data Foundations and Infrastructure (3 weeks) = | = Big Data Foundations and Infrastructure (3 weeks) = | ||
== Week 5 | == Week 5: Cloud computing, Map Reduce and Hadoop == | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-intro.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-intro.pdf | ||
Line 61: | Line 61: | ||
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.'' | * Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.'' | ||
== Week 6 | == Week 6: Algorithm Design for MapReduce == | ||
* Lecture notes: | * Lecture notes: | ||
Line 71: | Line 71: | ||
== Week 7 | == Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages == | ||
* Lecture notes: | * Lecture notes: | ||
Line 80: | Line 80: | ||
= Big Data Algorithms, Mining Techniques, and Visualization (3 weeks) = | = Big Data Algorithms, Mining Techniques, and Visualization (3 weeks) = | ||
== Week 8 | == Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP) == | ||
* Lecture notes: | * Lecture notes: | ||
Line 86: | Line 86: | ||
== Week 9 | == Week 9: Association Rules == | ||
* Lecture notes: | * Lecture notes: | ||
Line 101: | Line 101: | ||
== Week 10 | == Week 10: Finding similar items == | ||
* Lecture notes: | * Lecture notes: | ||
Line 112: | Line 112: | ||
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit | ** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit | ||
== Week | == Week 11: Graph Analysis == | ||
* Lecture notes: | * Lecture notes: | ||
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/graph-algos.pdf | ** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/graph-algos.pdf | ||
== Week | == Week 12: TBD == | ||
== Week 13: TBD == | |||
== Week 15 | == Week 14: Final Exam == | ||
== Week 15: Project Presentations == |
Revision as of 05:49, 26 January 2015
DS-GA 1004- Big Data: Tentative Schedule -- subject to change
- Course Web page: http://vgc.poly.edu/~juliana/courses/BigData2015
- Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana)
- Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
- Some classes will include a lab session, please "always bring your laptop.
Background (4 weeks)
Week 1: Course Overview; The evolution of Data Management and introduction to Big Data
- Lecture notes: http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
- Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
- Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
Week 2: Introduction to Databases, Relational Model and SQL
- Other useful reading:
Week 3: Other Data Models and Query Optimization
- Lecture notes:
- Lab: SQL
- Programming assignment: Using SQL for data analysis and cleaning
Week 4: Data Exploration and Reproducibility
- Lab: VisTrails
- Programming assignment: Exploring urban data
Big Data Foundations and Infrastructure (3 weeks)
Week 5: Cloud computing, Map Reduce and Hadoop
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
- Other useful reading:
- Hadoop: The Definitive Guide. http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
- Lab: Hands-on Hadoop
- Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.
Week 6: Algorithm Design for MapReduce
- Required reading:
- Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
- Mining of Massive Datasets (2nd Edition), Chapter 2.
Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages
- Lecture notes:
Big Data Algorithms, Mining Techniques, and Visualization (3 weeks)
Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)
- Lecture notes:
Week 9: Association Rules
- Lecture notes:
- Assignment on frequent items and association rule mining. Due on Dec 7th. Check http://www.newgradiance.com/services
- Reading: Chapter 6 Mining of Massive Datasets
- Suggested additional reading:
- Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
- Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
- Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
Week 10: Finding similar items
- Reading: Chapter 3 Mining of Massive Datasets
- Homework Assignment
- There are two new quizes on Gradiance -- Distance measures and document similarity. They due on May 5th.
- Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit