Difference between revisions of "Course: Big Data Analysis"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
'''''Make sure to check my.poly.edu for course announcements''''' | '''''Make sure to check my.poly.edu for course announcements''''' | ||
== Week 1: Monday Sept. 10th | == Week 1: Monday Sept. 10th - Course Overview == | ||
* Course overview (First day of classes!) | * Course overview (First day of classes!) | ||
Line 7: | Line 7: | ||
* Introduction to Big Data | * Introduction to Big Data | ||
== Week 2: Monday Sept. 17th | == Week 2: Monday Sept. 17th - Map-Reduce == | ||
* Introduction to map-reduce | * Introduction to map-reduce | ||
Line 16: | Line 16: | ||
* Data-Intensive Text Processing with MapReduce, Chapter 2 | * Data-Intensive Text Processing with MapReduce, Chapter 2 | ||
== Week 3: Monday Sept. 24th | == Week 3: Monday Sept. 24th - Statistics is easy == | ||
* Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha] | * Guest lecture by [http://cs.nyu.edu/shasha/ Dennis Shasha] | ||
Line 25: | Line 25: | ||
* JF: add references for issues related to stats and big data | * JF: add references for issues related to stats and big data | ||
== Week 4: Monday Oct. 1st | == Week 4: Monday Oct. 1st - Databases and Big Data == | ||
* Databases and Big Data | * Databases and Big Data | ||
Line 34: | Line 34: | ||
Overview of different architectures, distributed databases vs. hadoop, transaction support... | Overview of different architectures, distributed databases vs. hadoop, transaction support... | ||
== Week 5: Monday Oct. 8st | == Week 5: Monday Oct. 8st - Finding Similar Items == | ||
* Overview of information integration | * Overview of information integration | ||
Line 41: | Line 41: | ||
== Week 6: Monday Oct. 15st | == Week 6: Monday Oct. 15st - Graph Analysis == | ||
* Graph algorithms, link analysis, social networks | * Graph algorithms, link analysis, social networks | ||
Line 50: | Line 50: | ||
== Week 7: Monday Oct. 22st | == Week 7: Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance == | ||
* Guest lecture by Claudio Silva and Lauro Lins | * Guest lecture by Claudio Silva and Lauro Lins | ||
Line 58: | Line 58: | ||
== Week 8: Monday Oct. 29th | == Week 8: Monday Oct. 29th - TBD swap oct 15== | ||
* Reading: inverted index and crawling (Lin chapter 4) | * Reading: inverted index and crawling (Lin chapter 4) | ||
* Ask Torsten (tentative, ask him for reading material) | * Ask Torsten (tentative, ask him for reading material) | ||
Line 66: | Line 66: | ||
== Week 9: Monday Nov. 12th | == Week 9: Monday Nov. 12th - Frequent Itemsets == | ||
=== Reading === | === Reading === | ||
Line 72: | Line 72: | ||
== Week 10: Monday Nov. 5th | == Week 10: Monday Nov. 5th - Mining Data Streams === | ||
=== Readings === | === Readings === | ||
Line 78: | Line 78: | ||
== Week 11: Monday Nov. 19th | == Week 11: Monday Nov. 19th - Clustering == | ||
=== Readings === | === Readings === | ||
* Mining of Massive Datasets, Chapter 7 | * Mining of Massive Datasets, Chapter 7 | ||
== Week 12: Monday Nov. 26th | == Week 12: Monday Nov. 26th - Recommendation Systems == | ||
=== Readings === | === Readings === | ||
* Mining of Massive Datasets, Chapter 9 | * Mining of Massive Datasets, Chapter 9 | ||
== Week 13 Monday Dec. 3rd | == Week 13 Monday Dec. 3rd - EM algorithms for text processing=== | ||
* Data-Intensive Text Processing with MapReduce, Chapter 6 | * Data-Intensive Text Processing with MapReduce, Chapter 6 |
Revision as of 00:11, 27 August 2012
Make sure to check my.poly.edu for course announcements
Week 1: Monday Sept. 10th - Course Overview
- Course overview (First day of classes!)
- Student survey
- Introduction to Big Data
Week 2: Monday Sept. 17th - Map-Reduce
- Introduction to map-reduce
Readings
- google original paper
- Mining of Massive Datasets, Chapter 2
- Data-Intensive Text Processing with MapReduce, Chapter 2
Week 3: Monday Sept. 24th - Statistics is easy
- Guest lecture by Dennis Shasha
- Statistics and Big Data
Readings
- http://www.morganclaypool.com/doi/abs/10.2200/S00142ED1V01Y200807MAS001 -- book is available for free for NYU students
- JF: add references for issues related to stats and big data
Week 4: Monday Oct. 1st - Databases and Big Data
- Databases and Big Data
Readings
- JF: ADD: NoSQL databases (reading papers from literature)
Column store vs. tuple store. HBase, MongoDB, VaultDB, Cassandra, HadoopDB (Facebook) Overview of different architectures, distributed databases vs. hadoop, transaction support...
Week 5: Monday Oct. 8st - Finding Similar Items
- Overview of information integration
Readings
- Mining of Massive Datasets, chapter 3; information integration; entity resolution
Week 6: Monday Oct. 15st - Graph Analysis
- Graph algorithms, link analysis, social networks
Readings
- Mining of Massive Datasets, Chapter 5
- Data-Intensive Text Processing with MapReduce, Chapter 5
Week 7: Monday Oct. 22st - Introduction to Visualization; Data stewardship and provenance
- Guest lecture by Claudio Silva and Lauro Lins
Readings
- Hellerstein (ask Claudio for additional references)
- ADD: provenance and reproducibility
Week 8: Monday Oct. 29th - TBD swap oct 15
- Reading: inverted index and crawling (Lin chapter 4)
- Ask Torsten (tentative, ask him for reading material)
Readings
- Data-Intensive Text Processing with MapReduce, Chapter 4
Week 9: Monday Nov. 12th - Frequent Itemsets
Reading
- Mining of Massive Datasets, Chapter 6
Week 10: Monday Nov. 5th - Mining Data Streams =
Readings
- Mining of Massive Datasets, Chapter 4
Week 11: Monday Nov. 19th - Clustering
Readings
- Mining of Massive Datasets, Chapter 7
Week 12: Monday Nov. 26th - Recommendation Systems
Readings
- Mining of Massive Datasets, Chapter 9
Week 13 Monday Dec. 3rd - EM algorithms for text processing=
- Data-Intensive Text Processing with MapReduce, Chapter 6
== Week 14 == Monday Dec. 10th
- Project presentation