Difference between revisions of "Course: Big Data Analysis"

From VistrailsWiki
Jump to navigation Jump to search
Line 81: Line 81:


== Week 5: Monday Oct. 8st - Finding Similar Items ==
== Week 5: Monday Oct. 8st - Finding Similar Items ==
* Overview of information integration
* Similarity: Applications, Measures and Efficiency considerations
* Provenance and data exploration
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/similarity.pdf
* Similarity application: Information integration on the Web:
** Lecture notes: http://vgc.poly.edu/~juliana/courses/cs9223/Lectures/web-info-integration.pdf


=== Required Reading ===
=== Required Reading ===
* [http://infolab.stanford.edu/~ullman/mmds/ch3.pdf Mining of Massive Datasets, chapter 3; information integration; entity resolution]
* [http://infolab.stanford.edu/~ullman/mmds/ch3.pdf Mining of Massive Datasets, chapter 3; information integration; entity resolution]
* [http://vgc.poly.edu/~juliana/pub/vistrails-reproducibility2012.pdf Making Computations and Publications Reproducible with VisTrails]
Juliana Freire and Claudio Silva. In Computing in Science and Engineering 14(4): 18-25, 2012.
* [http://vgc.poly.edu/~juliana/pub/freire-cise2008.pdf Provenance for Computational Tasks: A Survey]
Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.


== Week 6:  Wednesday Oct. 17th - Invited Speaker: Torsten Suel ==
== Week 6:  Wednesday Oct. 17th - Invited Speaker: Torsten Suel ==

Revision as of 21:57, 7 October 2012

This schedule is tentative and subject to change

Make sure to check my.poly.edu for course announcements

Week 1: Monday Sept. 10th - Course Overview

Required Reading

Additional References

Week 2: Monday Sept. 17th - Map-Reduce

Required Reading

Additional References

Week 3: Monday Sept. 24th - Databases and Big Data

Related Topics

Required Reading

Additional Readings

Week 4: Monday Oct. 1st - Statistics is easy - Invited Speaker: Dennis Shasha

Required Reading

Homework Assignment

Due October 9th BigDataHW1

Week 5: Monday Oct. 8st - Finding Similar Items

Required Reading

Week 6: Wednesday Oct. 17th - Invited Speaker: Torsten Suel

Note this class will be held on Wednesday!

  • Big Data and Information Retrieval. Invited lecture by Torsten Suel.

Readings

Week 7: Monday Oct. 22st - Invited Speakers: Claudio Silva and Lauro Lins

  • Introduction to Visualization; Data stewardship and provenance
  • Guest lecture by Claudio Silva and Lauro Lins

Readings

  • Hellerstein (ask Claudio for additional references)
  • ADD: provenance and reproducibility

Week 8: Monday Oct. 29th - Graph Analysis

  • Graph algorithms, link analysis, social networks

Readings

  • Data-Intensive Text Processing with MapReduce, Chapter 4


Week 9: Monday Nov. 5th - Frequent Itemsets

Reading

  • Mining of Massive Datasets, Chapter 6


Week 10: Monday Nov. 12th - Mining Data Streams =

Readings

  • Mining of Massive Datasets, Chapter 4


Week 11: Monday Nov. 19th - Clustering

Readings

  • Mining of Massive Datasets, Chapter 7

Week 12: Monday Nov. 26th - Recommendation Systems

Readings

  • Mining of Massive Datasets, Chapter 9

Week 13 Monday Dec. 3rd - EM algorithms for text processing

  • Data-Intensive Text Processing with MapReduce, Chapter 6

Week 14: Monday Dec. 10th - Project presentation

Further Readings