Difference between revisions of "Course: Big Data 2015"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with '= DS-GA 1004- Big Data: Tentative Schedule -- ''subject to change'' = * Course Web page: http://vgc.poly.edu/~juliana/courses/BigData2015 * Instructor: Professor Juliana Freire…')
 
Line 10: Line 10:
= Background (4 weeks) =
= Background (4 weeks) =


== Week 1  Course Overview; The evolution of Data Management and introduction to Big Data ==
== Week 1: Course Overview; The evolution of Data Management and introduction to Big Data ==


* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/course-overview.pdf
Line 16: Line 16:
* Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form
* Course survey: https://docs.google.com/forms/d/1LTiJwkDVvp0cF62Fw_d9Y86US5LCkorRUIQtV2T8KWE/viewform?usp=send_form


== Week 2 Introduction to Databases, Relational Model and SQL ==
== Week 2: Introduction to Databases, Relational Model and SQL ==
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/intro-to-db.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/intro-to-db.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/relational-algebra.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/relational-algebra.pdf
Line 26: Line 26:
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]


== Week 3  Other Data Models and  Query Optimization ==
== Week 3: Other Data Models and  Query Optimization ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/xml_schema_query.pdf
Line 35: Line 35:
* Programming assignment: Using SQL for data analysis and cleaning
* Programming assignment: Using SQL for data analysis and cleaning


== Week 4 Data Exploration and Reproducibility  ==
== Week 4: Data Exploration and Reproducibility  ==


* Lecture notes:  http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf
* Lecture notes:  http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf
Line 46: Line 46:
= Big Data Foundations and Infrastructure (3 weeks) =
= Big Data Foundations and Infrastructure (3 weeks) =


== Week 5 --  Cloud computing, Map Reduce and  Hadoop ==
== Week 5: Cloud computing, Map Reduce and  Hadoop ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-intro.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/mapreduce-intro.pdf
Line 61: Line 61:
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.''
* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.''


== Week 6 -- Algorithm Design for MapReduce  ==
== Week 6: Algorithm Design for MapReduce  ==


* Lecture notes:   
* Lecture notes:   
Line 71: Line 71:




== Week 7 -- : Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
== Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==


* Lecture notes:
* Lecture notes:
Line 80: Line 80:
= Big Data Algorithms, Mining Techniques, and Visualization (3 weeks) =
= Big Data Algorithms, Mining Techniques, and Visualization (3 weeks) =


== Week 8 -- Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP) ==
== Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP) ==


* Lecture notes:
* Lecture notes:
Line 86: Line 86:




== Week 9 -- Association Rules  ==
== Week 9: Association Rules  ==


* Lecture notes:
* Lecture notes:
Line 101: Line 101:




== Week 10 --- Finding similar items  ==
== Week 10: Finding similar items  ==


* Lecture notes:
* Lecture notes:
Line 112: Line 112:
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit


== Week 13 -- May 5: Graph Analysis ==
== Week 11: Graph Analysis ==


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/graph-algos.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/graph-algos.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2015/Lectures/exam-review.pdf


== Week 14 -- May 12: Final Exam  ==
== Week 12: TBD ==


== Week 13: TBD ==


== Week 15 -- Project Presentations ==
== Week 14: Final Exam  ==
 
== Week 15: Project Presentations ==

Revision as of 05:49, 26 January 2015

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
  • Some classes will include a lab session, please "always bring your laptop.

Background (4 weeks)

Week 1: Course Overview; The evolution of Data Management and introduction to Big Data

Week 2: Introduction to Databases, Relational Model and SQL

Week 3: Other Data Models and Query Optimization

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4: Data Exploration and Reproducibility

  • Lab: VisTrails
  • Programming assignment: Exploring urban data


Big Data Foundations and Infrastructure (3 weeks)

Week 5: Cloud computing, Map Reduce and Hadoop

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).
  • Lab: Hands-on Hadoop
  • Homework Assignment -- Your first quiz is available on Gradiance. It is due on March 17th at 5pm.

Week 6: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 7: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Big Data Algorithms, Mining Techniques, and Visualization (3 weeks)

Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 9: Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html


Week 10: Finding similar items

Week 11: Graph Analysis

Week 12: TBD

Week 13: TBD

Week 14: Final Exam

Week 15: Project Presentations