Course: Big Data Analysis
Make sure to check for course announcements
Week 1: Monday Sept. 10th - Course Overview
- Course overview (First day of classes!)
- Student survey
- Introduction to Big Data
- Dilbert's BigData
- New York Time's "How BigData Became so Big"
- World Economic Forum: Big Data, Big Impact
- The Analytics Journey
- BigData Analytics Usecases
- Data-Intensive Text Processing with MapReduce, Chapter1
- PDMBS vs. MapReduce
- Benchmark DBMS vs MapReduce (2009)
Week 2: Monday Sept. 17th - Map-Reduce
- Introduction to map-reduce
- Introduction to Hadoop
- Map-Reduce ecosystem: Pig, Hive, Jaql, Mahout, BigInsights
- original google map-reduce paper
- Mining of Massive Datasets, Chapter 2
- Data-Intensive Text Processing with MapReduce, Chapter 2, Chapter 3
- Pig Latin: A Not-So-Foreign Language for Data Processing
- Jaql: A Scripting Language for Large Scale Semistructured Data Analysis
- Hive - A Warehousing Solution Over a Map-Reduce Framework
Week 3: Monday Sept. 24th - Statistics is easy
- Guest lecture by Dennis Shasha
- Statistics and Big Data
- -- book is available for free for NYU students
- JF: add references for issues related to stats and big data
Week 4: Monday Oct. 1st - Databases and Big Data
- Databases and Big Data
- JF: ADD: NoSQL databases (reading papers from literature)
Column store vs. tuple store. HBase, MongoDB, VaultDB, Cassandra, HadoopDB (Facebook) Overview of different architectures, distributed databases vs. hadoop, transaction support...
Week 5: Monday Oct. 8st - Finding Similar Items
- Overview of information integration
- Mining of Massive Datasets, chapter 3; information integration; entity resolution
Week 6: Monday Oct. 15st - Graph Analysis
- Graph algorithms, link analysis, social networks
- Mining of Massive Datasets, Chapter 5
- Data-Intensive Text Processing with MapReduce, Chapter 5
Week 7: Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance
- Guest lecture by Claudio Silva and Lauro Lins
- Hellerstein (ask Claudio for additional references)
- ADD: provenance and reproducibility
Week 8: Monday Oct. 29th - TBD swap oct 15
- Reading: inverted index and crawling (Lin chapter 4)
- Ask Torsten (tentative, ask him for reading material)
- Data-Intensive Text Processing with MapReduce, Chapter 4
Week 9: Monday Nov. 12th - Frequent Itemsets
- Mining of Massive Datasets, Chapter 6
Week 10: Monday Nov. 5th - Mining Data Streams =
- Mining of Massive Datasets, Chapter 4
Week 11: Monday Nov. 19th - Clustering
- Mining of Massive Datasets, Chapter 7
Week 12: Monday Nov. 26th - Recommendation Systems
- Mining of Massive Datasets, Chapter 9
Week 13 Monday Dec. 3rd - EM algorithms for text processing
- Data-Intensive Text Processing with MapReduce, Chapter 6