Difference between revisions of "Course: Big Data Analysis"

Revision as of 16:19, 17 September 2012

This schedule is tentative and subject to change

Make sure to check my.poly.edu for course announcements

Week 1: Monday Sept. 10th - Course Overview

Course overview and introduction to Big Data Analysis
Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro.pdf
Student survey -- to be filled out today!

Required Reading

Additional References

Week 2: Monday Sept. 17th - Map-Reduce

Introduction to Map-Reduce
Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/Hadoop.pdf
Introduction to [1]
The Map-Reduce ecosystem: Pig, Hive, Jaql, Mahout, BigInsights

Required Reading

Additional References

Week 3: Monday Sept. 24th - Databases and Big Data

Databases and Big Data: Persistence, Querying, Indexing, Transactions
BigTables and NoSQL stores. Tuple store vs. column stores: HBase, MongoDB, Cassandra
Transactions in NoSQL stores. Google's percolator.
"NewSQL" stores: more on Hive, VoltDB, HadoopDB,
Beyond MapReduce: Berkeley's Spark, UC Irvine's Asterix, Google's Dremel

Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Guest lecture by Dennis Shasha: Statistics and Big Data
Provenance and data exploration

Required Reading

http://www.morganclaypool.com/doi/abs/10.2200/S00142ED1V01Y200807MAS001 -- book is available for free for NYU students

Making Computations and Publications Reproducible with VisTrails

Juliana Freire and Claudio Silva. In Computing in Science and Engineering 14(4): 18-25, 2012.

Provenance for Computational Tasks: A Survey

Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.

Week 5: Monday Oct. 8st - Finding Similar Items

Overview of information integration

Readings

Mining of Massive Datasets, chapter 3; information integration; entity resolution

Week 6: Monday Oct. 15st - Invited Speaker: Torsten Suel

Reading: inverted index and crawling (Lin chapter 4)
Ask Torsten (tentative, ask him for reading material)

Readings

1998 PageRank Paper
Mining of Massive Datasets, Chapter 5
Data-Intensive Text Processing with MapReduce, Chapter 5

Week 7: Monday Oct. 22st - Invited Speakers: Claudio Silva and Lauro Lins

Introduction to Visualization; Data stewardship and provenance
Guest lecture by Claudio Silva and Lauro Lins

Readings

Hellerstein (ask Claudio for additional references)
ADD: provenance and reproducibility

Week 8: Monday Oct. 29th - Graph Analysis

Graph algorithms, link analysis, social networks

Readings

Data-Intensive Text Processing with MapReduce, Chapter 4

Week 9: Monday Nov. 5th - Frequent Itemsets

Reading

Mining of Massive Datasets, Chapter 6

Week 10: Monday Nov. 12th - Mining Data Streams =

Readings

Mining of Massive Datasets, Chapter 4

Week 11: Monday Nov. 19th - Clustering

Readings

Mining of Massive Datasets, Chapter 7

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

Mining of Massive Datasets, Chapter 9

Week 13 Monday Dec. 3rd - EM algorithms for text processing

Data-Intensive Text Processing with MapReduce, Chapter 6

@@ Line 26: / Line 26: @@
 * Introduction to Map-Reduce
+* Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/Hadoop.pdf
 * Introduction to [http://hadoop.apache.org/Hadoop]
 * The Map-Reduce ecosystem: [http://pig.apache.org/ Pig], [http://hive.apache.org/ Hive], [http://code.google.com/p/jaql/ Jaql], [http://mahout.apache.org/ Mahout], BigInsights

Difference between revisions of "Course: Big Data Analysis"

Revision as of 16:19, 17 September 2012

Contents

Week 1: Monday Sept. 10th - Course Overview

Required Reading

Additional References

Week 2: Monday Sept. 17th - Map-Reduce

Required Reading

Additional References

Week 3: Monday Sept. 24th - Databases and Big Data

Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Required Reading

Week 5: Monday Oct. 8st - Finding Similar Items

Readings

Week 6: Monday Oct. 15st - Invited Speaker: Torsten Suel

Readings

Week 7: Monday Oct. 22st - Invited Speakers: Claudio Silva and Lauro Lins

Readings

Week 8: Monday Oct. 29th - Graph Analysis

Readings

Week 9: Monday Nov. 5th - Frequent Itemsets

Reading

Week 10: Monday Nov. 12th - Mining Data Streams =

Readings

Week 11: Monday Nov. 19th - Clustering

Readings

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

Week 13 Monday Dec. 3rd - EM algorithms for text processing

Week 14: Monday Dec. 10th - Project presentation

Further Readings

Navigation menu

Search