Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
Line 63: Line 63:


* Getting started with Hadoop: You will use two different Hadoop systems
* Getting started with Hadoop: You will use two different Hadoop systems
** NYU HPC will provide accounts so that you can use a local Hadoop cluster. Please submit  a request for the to create an account for you *ASAP*. Follow the instructions in: https://wikis.nyu.edu/display/NYUHPC/HPC+at+NYU+-+Access. You can find instructions on how to login and use the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
** NYU HPC will provide accounts so that you can use a local Hadoop cluster. Please submit  a request for the to create an account for you *ASAP*. Follow the instructions to obtain an HPC account in: https://wikis.nyu.edu/display/NYUHPC/HPC+at+NYU+-+Access. You can find instructions on how to login and use the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
** Amazon AWS: Each student will receive a token with $100 credit towards computing time at AWS. See http://www.vistrails.org/index.php/AWS_Setup for instructions on how to set up AWS.
** Amazon AWS: Each student will receive a token with $100 credit towards computing time at AWS. See http://www.vistrails.org/index.php/AWS_Setup for instructions on how to set up AWS. '''Always remember to terminate your instances! If you don't you will be charged and you are responsible for the charges beyond your credit.'''
'''Always remember to terminate your instances! If you don't you will be charged and you are responsible for the charges beyond your credit.'''





Revision as of 16:04, 22 September 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

  • Welcome!

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Provenance and Reproducibility

  • Github setup:

Week 3 -- Sept 22: Introduction to Databases; Relational Model and SQL

Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop


  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).

Week 6 -- Oct 13: Fall Break

Week 7 -- Oct 20: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.


Week 8 -- Oct 27: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages



Big Data Algorithms and Techniques (3 weeks)

Week 9 -- Nov 3: Association Rules


Week 10 -- Nov 10: Finding similar items


Week 11 -- Nov 17: Graph Analysis


Week 12 -- Nov 25: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Reading:

The Value of Visualization, Jarke Van Wijk http://www.win.tue.nl/~vanwijk/vov.pdf

Tamara Munzner's Book draft 2 available online http://www.cs.ubc.ca/~tmm/courses/533/book/

Nanocubes Paper http://nanocubes.net http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf


Week 13 -- Dec 1: Data Cleaning and Integration

Week 14 -- Dec 8: Project Presentations

Week 15 -- Dec 15: Project Presentations