Difference between revisions of "Course: Big Data 2016"

From VistrailsWiki
Jump to navigation Jump to search
Line 18: Line 18:
= News =
= News =


* 1/25/2016: Amazon has kindly donated time on AWS for all the student in this class. To obtain your credit, please follow the instructions at http://www.vistrails.org/index.php/AWS_Setup
* 1/25/2016: Amazon has kindly donated time on AWS for all the student in this class. You must signup for the AWS Educate program, see http://www.vistrails.org/index.php/AWS_Setup
* 1/25/2016: Access you NYU HPC account, which you will use for in-class exercises and homework assignments. See  [[NYU HPC Access Instructions]]
* 1/25/2016: Access you NYU HPC account, which you will use for in-class exercises and homework assignments. See  [[NYU HPC Access Instructions]]



Revision as of 21:46, 23 January 2016

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • TAs:
    • Yuan Feng
    • Kevin Ye
  • Lecture: Mondays, 4:55pm-7:35pm at 19 University Pl., room 102.
  • Some classes will include a lab session, please always bring your laptop.

News

Week 1 - Jan 25: Course Overview; Lab: Computing infrastructure for the course

Week 2 - Feb 1: The evolution of Data Management and introduction to Big Data; Introduction to Databases, Relational Model and SQL

  • In-class assignment: relational algebra

Week 3 - Feb 8: Introduction to Databases, Relational Model and SQL (cont.)

  • Lab: SQL
  • Programming assignment: Using SQL for data analysis and cleaning

Week 4 - Feb 15: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 5 - Feb 22: Introduction to Map Reduce

  • Lab: Hands-on Hadoop (local and AWS)

Week 6 - Feb 29: MapReduce Algorithm Design Patterns

  • Lab: Hands-on Hadoop (HPC)
  • Programming assignment: Map Reduce (check NYU Classes)

Week 7 - March 7: Parallel Databases vs MapReduce; Introduction to SPARK

  • Lab: Hands-on SPARK (HPC)
  • Programming assignment: check NYU Classes on March 10th

Week 8 -- March 14th: Spring Break

Transparency and Reproducibility (1 week)

Week 9 - March 21: Data Exploration and Reproducibility

  • Programming assignment 4: Exploring urban data (see NYU Classes)

Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 10 - March 28th: Finding similar items

  • Homework Assignment
    • See quizzes on Gradiance -- Distance measures and document similarity.

Week 11 - April 4th: Association Rules


  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
  • Homework Assignment
    • See quizes on Gradiance -- Distance measures and document similarity.

Week 12 - April 11th: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Harish Doraiswamy (NYU CUSP)

Week 13 - April 18th: Parallel Databases

Week 14 - April 25th: Graph Analysis

  • Required Reading: Data-Intensive Text Processing with MapReduce. Chapters 5 -- Graph Algorithms

Week 15 - May 2: Final Exam

Week 16 - May 9: Project Presentations

Week 17 - May 16: Project Presentations