Difference between revisions of "Course: Big Data Analysis"

Revision as of 23:07, 26 November 2012

This schedule is tentative and subject to change

Make sure to check my.poly.edu for course announcements

News

Project description

Week 1: Monday Sept. 10th - Course Overview

Course overview and introduction to Big Data Analysis
Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro.pdf
Student survey -- to be filled out today!

Required Reading

Additional References

Week 2: Monday Sept. 17th - Map-Reduce

Introduction to Map-Reduce
Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/Hadoop.pdf
Introduction to [1]
The Map-Reduce ecosystem: Pig, Hive, Jaql, Mahout, BigInsights

Required Reading

Additional References

Week 3: Monday Sept. 24th - Databases and Big Data

Databases and Big Data: Persistence, Querying, Indexing, Transactions
Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/paralleldb-vs-hadoop.pdf
In-class exercise (to be distributed in class)

Required Reading

Additional Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Guest lecture by Dennis Shasha: Statistics is Easy
Pig Latin and Query Processing:
- Relational query processing: Review
- Query Processing in Pig

Required Reading

http://www.morganclaypool.com/doi/abs/10.2200/S00142ED1V01Y200807MAS001 -- book is available for free for NYU students
Second edition of the book: http://www.morganclaypool.com/doi/pdf/10.2200/S00295ED1V01Y201009MAS008

Homework Assignment

Due October 9th BigDataHW1

Week 5: Monday Oct. 8st - Finding Similar Items

Similarity: Applications, Measures and Efficiency considerations
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/similarity.pdf
Similarity application: Information integration on the Web:
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/web-info-integration.pdf
Homework presentation and demo

Required Reading

Mining of Massive Datasets, chapter 3; information integration; entity resolution

Homework Assignment

Due October 15th at noon Your assignment is in http://www.newgradiance.com/services. Please see http://vgc.poly.edu/~juliana/courses/cs9223 for instructions on how to access this service.

Week 6: Wednesday Oct. 17th - Invited Speaker: Torsten Suel

Note this class will be held on Wednesday!

Big Data and Information Retrieval. Invited lecture by Torsten Suel.
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/search-data.pdf

Readings

Week 7: Monday Oct. 22st - Invited lecture by and Lauro Lins

Introduction to Visualization
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/intro-to-visualization.pdf

Readings

The Value of Visualization. IEEE Visualization 2005. Jarke J. van Wijk. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.1138

Visualization Analysis and Design: Principles, Methods, and Practice. Tamara Munzner (Book Draft 2 from Sep. 2012). http://www.cs.ubc.ca/~tmm/courses/533-11/book/vispmp-draft.pdf

Week 8: Monday Oct 29th- Class canceled due to storm

Week 9: Monday Nov 5th- Data infrastructure and information integration

Big Table, HadoopDB.
Similarity application: Information integration on the Web:
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/web-info-integration.pdf

Readings

HBase book HBase: The Definitive Guide. Random Access to Your Planet-Size Data: http://shop.oreilly.com/product/0636920014348.do
HBase book. Chapter 8 Architecture for information about transactional processing, WriteAhead Log notably, and how consistency is being maintained.

Week 10: Monday Nov. 12th - Frequent Itemsets

- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf, http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/assoc-rules1.pdf

Readings

Mining of Massive Datasets, Chapter 4

Additional Reading

Mining association rules between sets of items in large databases. Agrawal et al., SIGMOD 1993. http://delivery.acm.org/10.1145/180000/170072/p207-agrawal.pdf?ip=128.238.251.32&acc=ACTIVE%20SERVICE&CFID=198467341&CFTOKEN=23537886&__acm__=1352747519_b80a516e0f5e294b36dc021f13f55bbb
Fast algorithms for mining association rules. Agrawal and Srikant, VLDB 1994. https://www.seas.upenn.edu/~jstoy/cis650/papers/Apriori.pdf
An effective hash-based algorithm for mining association rules. Park et al., SIGMOD 1995. http://dl.acm.org/citation.cfm?id=223813

Week 11: Monday Nov 19th- Algorithms on MapReduce: text processing

Algorithms, link analysis, social networks
- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/mapreduce-indexing-graph.pdf
Discussion on the project

Readings

Data-Intensive Text Processing with MapReduce, Chapter 4

Week 12: Monday Nov. 26th - Graph Algorithms and Phase-I project presentations

- Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/mapreduce-indexing-graph.pdf

@@ Line 49: / Line 49: @@
 === Related Topics ===
 * BigTables and NoSQL stores. Tuple store vs. column stores: [http://hbase.apache.org/ HBase], [http://www.mongodb.org/ MongoDB], [http://cassandra.apache.org/ Cassandra]
-* Transactions in NoSQL stores. Google's percolator.
+* Transactions in NoSQL stores. Google's percolator, [http://research.google.com/pubs/pub36726.html].
 * "NewSQL" stores: more on [http://hive.apache.org/ Hive], [http://voltdb.com/ VoltDB], [http://db.cs.yale.edu/hadoopdb/hadoopdb.html HadoopDB],
 * Beyond MapReduce: [http://spark-project.org/ Berkeley's Spark], [http://asterix.ics.uci.edu/ UC Irvine's Asterix], Google's [http://code.google.com/p/dremel/ Dremel]

Difference between revisions of "Course: Big Data Analysis"

Revision as of 23:07, 26 November 2012

News

Week 1: Monday Sept. 10th - Course Overview

Required Reading

Additional References

Week 2: Monday Sept. 17th - Map-Reduce

Required Reading

Additional References

Week 3: Monday Sept. 24th - Databases and Big Data

Related Topics

Required Reading

Additional Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Required Reading

Homework Assignment

Week 5: Monday Oct. 8st - Finding Similar Items

Required Reading

Homework Assignment

Week 6: Wednesday Oct. 17th - Invited Speaker: Torsten Suel

Readings

Week 7: Monday Oct. 22st - Invited lecture by and Lauro Lins

Readings

Week 8: Monday Oct 29th- Class canceled due to storm

Week 9: Monday Nov 5th- Data infrastructure and information integration

Readings

Week 10: Monday Nov. 12th - Frequent Itemsets

Readings

Additional Reading

Week 11: Monday Nov 19th- Algorithms on MapReduce: text processing

Readings

Week 12: Monday Nov. 26th - Graph Algorithms and Phase-I project presentations

Readings

Week 13: Monday Dec. 3rd - Clustering

Readings

Week 14: Monday Dec. 10th - EM algorithms for text processing

Readings

Week 15 Monday Dec. 17 - Phase-II Project presentation

Further Readings

Other topics

Provenance

Navigation menu

Search