Difference between revisions of "Course: Big Data 2015"

From VistrailsWiki
Jump to navigation Jump to search
Line 10: Line 10:
= News =
= News =


== 02/10/2015: Programming assignment 1 posted. Check NYUClasses! ==
* 2/26/2015: Please create your Amazon AWS account. You can find instructions at: http://www.vistrails.org/index.php/AWS_Setup
== There is a new version of the textbook Mining of Massive Datasets, we will use the latest version 2.1 ==
* 2/26/2015: You should install the Cloudera VM on your laptop. We will need that for the lab on March 9th. Here are the instructions: [[Cloudera VM Setup]]
* There is a new version of the textbook Mining of Massive Datasets, we will use the latest version 2.1


= Background (2 weeks) =
= Background (2 weeks) =

Revision as of 01:51, 27 February 2015

DS-GA 1004- Big Data: Tentative Schedule -- subject to change

  • Lecture: Mondays, 4:55pm-7:35pm at Silver, room 208.
  • Some classes will include a lab session, please "always bring your laptop.

News

  • 2/26/2015: Please create your Amazon AWS account. You can find instructions at: http://www.vistrails.org/index.php/AWS_Setup
  • 2/26/2015: You should install the Cloudera VM on your laptop. We will need that for the lab on March 9th. Here are the instructions: Cloudera VM Setup
  • There is a new version of the textbook Mining of Massive Datasets, we will use the latest version 2.1

Background (2 weeks)

Week 1 - Feb 2: Course Overview; The evolution of Data Management and introduction to Big Data

Week 2 - Feb 9: Introduction to Databases, Relational Model and SQL

  • Programming assignment: Using SQL for data analysis and cleaning (see NYU Classes)

Feb 16: Holiday

Big Data Foundations and Infrastructure (3 weeks)

Week 3 - Feb 23: Introduction to Map Reduce


Week 4: Algorithm Design for MapReduce

  • Lab: Hands-on Hadoop
  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.
  • Programming assignment: Map Reduce

Week 5: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages


Transparency and Reproducibility (1 week)

Week 6: Data Exploration and Reproducibility

  • Lab: VisTrails
  • Programming assignment: Exploring urban data


Big Data Algorithms, Mining Techniques, and Visualization (6 weeks)

Week 8: Visualization and Spatio-Temporal Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 9: Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html


Week 10: Finding similar items

Week 11: Graph Analysis

Week 12: TBD

Week 13: TBD

Week 14: Final Exam

Week 15: Project Presentations