Difference between revisions of "Course: Massive Data Analysis 2014"

From VistrailsWiki
Jump to navigation Jump to search
 
(51 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= CS-GY 6333 Massive Data Analysis: Tentative Schedule -- ''subject to change'' =
= CS-GY 6333 Massive Data Analysis: Tentative Schedule -- ''subject to change'' =


* Course Web page: http://cs.nyu.edu/courses/spring14/CSCI-GA.2568-001/index.html
* Course Web page: http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/


* Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana/)
* Instructor: Professor Juliana Freire (http://vgc.poly.edu/~juliana/)
Line 9: Line 9:
= News =
= News =


* The final exam will take place on May 12th.
* [[Massive Data Analysis 2014: Class project]]
* Aditi Nakta, our TA, will hold office hours on Tuesdays from 1 - 3 pm @ 2 MTC room 10.98D
* Your Gradiance assignment on MapReduce has been posted:  http://www.newgradiance.com/services. If you haven't registered yet, do so and use the class token 1AEF5F24. Make sure to use your official NYU email and id when you register.
* On Sept 22nd, I distributed AWS tokens that will be needed for your assignments. If you have not received your token, let me know.
* Your first assignment has been posted -- see details below and in NYU Classes.
* Instructions on how to set up your AWS account: http://www.vistrails.org/index.php/AWS_Setup
* You should get an NYU HPC account so that you can use the NYU Hadoop cluster. To submit a request for an account, follow the instructions in: https://wikis.nyu.edu/display/NYUHPC/HPC+at+NYU+-+Access. You can find instructions on how to login and use the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt


* We will have our last class on May 19th.
= Background (4 weeks) =


* 4/21/2014: There are two new quizes on gradiance. They are due on 2014-04-28 23:59 PST.
== Week 1 -- Sept 8: Course Overview; the evolution of Data Management ==


* Homework assignment 4 has been posted: [[Assignment 4 - Querying with Pig and Mapreduce]]
* Lecture notes: http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/course-overview.pdf (http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/course-overview-6p.pdf)
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dFpwTjROVzhLUWY2NVNXb0xvNTVLMnc6MA


* Homework assignment 3 has been posted: [[Assignment 3 - MapReduce algorithm design]]
== Week 2 -- Sept 15: Provenance and Reproducibility ==
** You can find instructions on how to log into the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
** I have created a list of frequently-asked questions which I hope will help you with your assignment: [[Assignment 3 - FAQ]]


* Your first assignment has been posted and it is due on Feb 17, 2014 5:00 pm. Here are the instructions: http://vistrails.org/index.php/Assignment_1_-_Data_Exploration
* Lecture notes: http://vgc.poly.edu/~fchirigati/mda-class/provenance-reproducibility.pdf
* The class will have a lab component. Please bring your laptops.
* Before class, follow the instructions below to install and set up VisTrails as well as github


* I have sent a test email to the class list. If you have not received the message, make sure to sign up: http://www.cs.nyu.edu/mailman/listinfo/csci_ga_2568_001_sp14
* VisTrails setup:
** Download VisTrails 2.1.4 from http://www.vistrails.org/index.php/Downloads and follow the installation instructions. Start the system and then quit.
** Download the following packages:
***http://vgc.poly.edu/~fchirigati/mda-class/gmaps.zip.
***http://vgc.poly.edu/~fchirigati/mda-class/tabledata-backport.zip
** After you extract the content of the zip files, place them under $HOME/.vistrails/userpackages


* Starting on Feb 10th, our class will meet at a new location: Cantor 101
* Github setup:
** Create a github account (https://github.com/join)
** Learn how to set up git and create a public repository.


* We will have lab on Thu at CIWW, room 109. ''Bring your laptop!''
* During class, you will add the trail of your analysis to github, and submit the link to your public github repo using this form: https://docs.google.com/forms/d/17OScN8Ea-El20AC4mHIb32S3e62mAbGEiU-BET0PyX8/viewform?usp=send_form


= Background (4 weeks) =
== Week 3 -- Sept 22: Introduction to Databases; Relational Model and SQL ==
 
* Lecture notes:   
== Week 1 -- Jan 27: Course Overview; the evolution of Data Management ==
**http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/intro-to-db.pdf
 
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/relational-algebra.pdf
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/course-overview.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/sql-intro.pdf
* Reading: Chapter 1 of Mining of Massive Data Sets (version 1.1)
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/sql-more.pdf
* Course survey: https://docs.google.com/spreadsheet/embeddedform?formkey=dDRoTVcyMnRQUXhFUjl0cFFuTEVER1E6MA
 
 
== Week 2 -- Feb 3: Introduction to Databases ==
* Lecture notes:  http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-db.pdf
* Other useful reading:  
* Other useful reading:  
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]


* Feb 6: Lab: Data Exploration and Reproducibility ==
* [[Assignment 1: Provenance and Data Exploration]]
** [[Lab notes 02/06/14]]


* Homework assignment: [[Assignment 1 - Data Exploration]]
== Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization  ==


== Week 3 -- Feb 10: Overview: Relational Model and SQL  ==
* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/relational-algebra.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/sql-intro.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/query-opt.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/sql-more.pdf
* Other useful reading:
** [http://philip.greenspun.com/sql/introduction.html Greenspun's SQL for Web Nerds Intro]
** [http://philip.greenspun.com/sql/data-modeling.html SQL/Nerds Modeling (parts)]


* Feb 13: Lab: Canceled -- University closed due to snow ==
* In-class exercise: http://vistrails.org/index.php/Big_Data_Lab_SQL


= Big Data Foundations and Infrastructure (3 weeks) =


== Week 3.1 -- Feb 17Holiday ==
== Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop ==
* No class, holiday
* Lecture notes:
* Feb 20 Lab: hands-on SQL
** http://vgc.poly.edu/~fchirigati/mda-class/mapreduce-intro.pdf
** [[Big Data Lab notes 02/19/14]]


== Week 4 -- Feb 24: Overview: Advanced SQL and Query Optimization  ==
* Lab: after the lecture, you will work on an in-class exercise. For this you need to install Hadoop on your laptop and have your account setup on AWS. See instructions below.
 
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/xml_schema_query.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/query-opt.pdf


* Homework assignment: [[Assignment 2 - Data Exploration using SQL]]
* You will use two different Hadoop configurations:
** Local (on your laptop)
<!--** NYU HPC will provide accounts so that you can use a local Hadoop cluster. Please submit  a request for the to create an account for you *ASAP*. Follow the instructions to obtain an HPC account in: https://wikis.nyu.edu/display/NYUHPC/HPC+at+NYU+-+Access. You can find instructions on how to login and use the NYU Hadoop cluster at: http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/MapReduceExample/readme-nyu-hadoop.txt
** Amazon AWS: each student will receive a token with $100 credit towards computing time at AWS. See http://www.vistrails.org/index.php/AWS_Setup for instructions on how to set up AWS. '''Always remember to terminate your instances! If you don't you will be charged and you are responsible for the charges beyond your credit.'''-->
** Amazon AWS: Each student should have received a token with $100 credit towards computing time at AWS. If you have not received the token yet, contact us immediately! '''When using AWS, always remember to terminate your instances! If you don't, you will be charged and you are responsible for the charges beyond your credit.'''
** See installation instructions for Hadoop on your local machine and how to setup your AWS account in http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/HadoopExerciseInstructions.pdf
** '''Warning: Install Hadoop in your machine and setup your AWS account before class starts. There will be no time for installing software during our in-class exercise.'''


= Big Data Foundations and Infrastructure (2 weeks) =
* In-Class Exercise: [[Course:_Massive_Data_Analysis_2014/Hadoop_Exercise | Hadoop Exercise]]


== Week 5 -- Mar 3: Cloud computing, Map Reduce and  Hadoop ==
* Lecture notes: 
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-intro.pdf


* Required reading:  
* Required reading:  
Line 87: Line 90:
** Hadoop: The Definitive Guide.  http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
** Hadoop: The Definitive Guide.  http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520


* Homework Assignment -- Your first quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on March 17th at 5pm.''
== Week 6 -- Oct  13: Fall Break ==
 
== Week 7 -- Oct  20: Big Data Analysis with Myria  ==
 
* Lecture notes: 
** http://bigdata.poly.edu/~fchirigati/mda-class/dan-myria.pdf
 
* Useful reading:
** Myria Demo Paper: http://myria.cs.washington.edu/publications/Halperin_Myria_demo_SIGMOD_2014.pdf


== Week 6 -- Mar 10: Algorithm Design for MapReduce  ==
== Week 7 -- Oct  27: Algorithm Design for MapReduce  ==


* Lecture notes:   
* Lecture notes:   
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/mapreduce-algo-design.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/mapreduce-algo-design.pdf


* Required reading:  
* Required reading:  
Line 98: Line 109:
** Mining of Massive Datasets (2nd Edition), Chapter 2.
** Mining of Massive Datasets (2nd Edition), Chapter 2.


 
== Week 8 -- Nov 3: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
= Machine Learning and Big Data  (3 weeks) =
 
== Week 7 -- Mar 23: Hashing and AllReduce ==
* Invited lecture by John Langford
 
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/langford_hashing_2014.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/langford_parallel_learning_2014.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture08-hashing.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture04-allreduce.pdf
 
* Homework assignment: [[Assignment 3 - MapReduce algorithm design]]
 
== Week 8 -- Mar 30: Bandits ==
* Invited lecture by John Langford


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/langford_interact.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/paralleldb-vs-hadoop-2014.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_using_exploration.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/data-analysis-mapreduce.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture10_doing_exploration.pdf
 
== Week 9 -- Apr 7: Large Scale Machine Learning in the Real World ==
* Invited lecture by Leon Bottou
 
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/bottou-ml-real-world.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture09-ads-bottou.pdf
** http://cilvr.cs.nyu.edu/diglib/lsml/lecture11-ads-bottou.pdf
 
= Big Data Foundations and Infrastructure -- cont. (2 weeks) =


== Week 10 -- April 14:  Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages ==
* Discussion about project


* Lecture notes:
* Assignment: check Gradiance!
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/paralleldb-vs-hadoop-2014.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/hive-pig.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/data-analysis-mapreduce.pdf


* Required reading:  
* Required reading:  
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
** Data-Intensive Text Processing with MapReduce (Jan 27, 2013), Chapter 6 -- Processing Relational Data (this chapter appears in the 2013 version of the textbook -- I have placed this version in http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf)
** Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
** Benchmark DBMS vs MapReduce (2009): http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
** MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
** MapReduce: A Flexible Data Processing Tool: http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
Line 147: Line 129:
** Hive - A Warehousing Solution Over a Map-Reduce Framework: http://www.vldb.org/pvldb/2/vldb09-938.pdf
** Hive - A Warehousing Solution Over a Map-Reduce Framework: http://www.vldb.org/pvldb/2/vldb09-938.pdf


= Big Data Algorithms and Techniques (3 weeks) =
= Big Data Algorithms, Techniques, and Visualization (3 weeks) =


== Week 11 -- April 21: Data Management for Big Data (cont) and Association Rules  ==
== Week 9 -- Nov 10: Visualization and Big Data -- Invited lecture by Dr. Huy Vo (NYU CUSP) ==


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/association-rules.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/vis_and_big_data_resized.pdf


* Reading: Chapter 6 [http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]


* Homework Assignment -- Your  quiz is available on [http://www.newgradiance.com Gradiance]. It is ''due on April  28th.''
== Week 10 -- Nov 17:  Visualization Techniques -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==
 
* Project status report due!
 
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/intro-to-visualization.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/nanocubes.pdf
 
* Reading:
** Nanocubes for real-time exploration of spatiotemporal datasets. Lins et al. http://nanocubes.net/assets/pdf/nanocubes_paper.pdf


== Week 12 -- Apr 28: Finding similar items: Invited lecture by Dr. Harish Doraiswami ==
== Week 11 -- Nov 25 Association Rules ==


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/similarity.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/association-rules.pdf


* Reading: Chapter 3 [http://vgc.poly.edu/~juliana/courses/BigData2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]
* Assignment on frequent items and association rule mining. ''Due on Dec 7th.''  Check http://www.newgradiance.com/services


* Homework Assignment
* Reading: Chapter 6 [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]
** There are two new quizes on [http://www.newgradiance.com Gradiance] -- Distance measures and document similarity. They ''due on May  5th.''
** Your final assignment is available at http://www.vistrails.org/index.php/Assignment_4_-_Querying_with_Pig_and_Mapreduce. This is an optional assignment and will count towards extra credit


== Week 13 -- May 5: Graph Analysis and Exam Review ==
* Suggested additional reading:
**Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
**Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
**Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html
 
== Week 12 -- Dec 1: Project Updates  ==


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/graph-algos.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/similarity.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/exam-review.pdf


== Week 14 -- May 12: Final Exam  ==
* Reading: Chapter 3 [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]


* Quizzes on Distance Measures and Document Similarity . ''These quizzes are optional and will count as extra credit. Due on Dec 14th.''  Check http://www.newgradiance.com/services


== Week 15 -- May 19: Large-Scale Visualization -- Invited lecture by Dr. Lauro Lins (AT&T Research) ==
== Week 13 -- Dec 8: Finding Similar Items and Link Analysis ==


* Lecture notes:
* Lecture notes:
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/intro-to-visualization.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/similarity.pdf
** http://vgc.poly.edu/~juliana/courses/BigData2014/Lectures/nanocubes.pdf
** http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/graph-algos.pdf


* Reading:  
* Readings:  
**Chapter 3 (pages 55-79) [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/ullman-book-v1.1-mining-massive-data.pdf Mining of Massive Datasets]
**Chapter 5 (pages 87-106) [http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Textbooks/MapReduce-algorithms-Jan2013-draft.pdf Data-Intensive Text Processing with MapReduce]
 
== Week 13 -- Dec 10: Project Discussion ==
 
* Meeting with individual groups at 2 MTC, 10.097


The Value of Visualization, Jarke Van Wijk
== Week 14 -- Dec 15: Project Presentations  ==
http://www.win.tue.nl/~vanwijk/vov.pdf


Tamara Munzner's Book draft 2 available online
http://www.cs.ubc.ca/~tmm/courses/533/book/


Nanocubes Paper
<!--== Week 15 -- Dec 15: Project Presentations ==-->
http://nanocubes.net
http://nanocubes.net/assets/pdf/nanocubes_paper_preprint.pdf

Latest revision as of 20:58, 8 December 2014

CS-GY 6333 Massive Data Analysis: Tentative Schedule -- subject to change

  • Lecture: Mondays, 1:00pm-3:25pm at 2MTC, room 9.011.

News

Background (4 weeks)

Week 1 -- Sept 8: Course Overview; the evolution of Data Management

Week 2 -- Sept 15: Provenance and Reproducibility

  • Github setup:

Week 3 -- Sept 22: Introduction to Databases; Relational Model and SQL

Week 4 -- Sept 29: Overview: Advanced SQL and Query Optimization

Big Data Foundations and Infrastructure (3 weeks)

Week 5 -- Oct 6: Cloud computing, Map Reduce and Hadoop

  • Lab: after the lecture, you will work on an in-class exercise. For this you need to install Hadoop on your laptop and have your account setup on AWS. See instructions below.
  • You will use two different Hadoop configurations:
    • Local (on your laptop)
    • Amazon AWS: Each student should have received a token with $100 credit towards computing time at AWS. If you have not received the token yet, contact us immediately! When using AWS, always remember to terminate your instances! If you don't, you will be charged and you are responsible for the charges beyond your credit.
    • See installation instructions for Hadoop on your local machine and how to setup your AWS account in http://vgc.poly.edu/~juliana/courses/MassiveDataAnalysis2014/Lectures/HadoopExerciseInstructions.pdf
    • Warning: Install Hadoop in your machine and setup your AWS account before class starts. There will be no time for installing software during our in-class exercise.


  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2 - 2.1 and 2.2 (Large-Scale File Systems and Map-Reduce).

Week 6 -- Oct 13: Fall Break

Week 7 -- Oct 20: Big Data Analysis with Myria

Week 7 -- Oct 27: Algorithm Design for MapReduce

  • Required reading:
    • Data-Intensive Text Processing with MapReduce, Chapters 1 and 2
    • Mining of Massive Datasets (2nd Edition), Chapter 2.

Week 8 -- Nov 3: Parallel Databases vs MapReduce, Query Processing on Mapreduce and High-level Languages

  • Discussion about project
  • Assignment: check Gradiance!


Big Data Algorithms, Techniques, and Visualization (3 weeks)

Week 9 -- Nov 10: Visualization and Big Data -- Invited lecture by Dr. Huy Vo (NYU CUSP)


Week 10 -- Nov 17: Visualization Techniques -- Invited lecture by Dr. Lauro Lins (AT&T Research)

  • Project status report due!

Week 11 -- Nov 25 Association Rules

  • Suggested additional reading:
    • Fast algorithms for mining association rules, Agrawal and Srikant, VLDB 1994.
    • Data Mining Concepts and Techniques, Jiawei Han and Micheline Kamber, Morgan Kaufmann
    • Dynamic Itemset Counting and Implication Rules for Market Basket Data. Brin et al., SIGMOD 1997. http://www-db.stanford.edu/~sergey/dic.html

Week 12 -- Dec 1: Project Updates

Week 13 -- Dec 8: Finding Similar Items and Link Analysis

Week 13 -- Dec 10: Project Discussion

  • Meeting with individual groups at 2 MTC, 10.097

Week 14 -- Dec 15: Project Presentations