Difference between revisions of "CS6093/Lectures"

From VistrailsWiki
Jump to navigation Jump to search
 
(64 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''''Make sure to check my.poly.edu for course announcements'''''
'''''Every week, you must write position papers for the papers in the Required Readings list'''''
== Week 1 - Jan 24 ==
== Week 1 - Jan 24 ==


Line 27: Line 31:
== Week 3 - Feb 7 ==
== Week 3 - Feb 7 ==


* Information integration overview
* Information extraction: survey
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf
 
=== Announcements ===
 
* The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
 
* I will email to you preliminary assignments tomorrow


* Information extraction
=== Assignment ===
* Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction


===Readings ===
===Readings ===
Line 36: Line 48:


* [http://www.sigmod.org/publications/sigmod-record/0206/laender-survey.pdf A Brief Survey of Web Data Extraction Tools.] Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
* [http://www.sigmod.org/publications/sigmod-record/0206/laender-survey.pdf A Brief Survey of Web Data Extraction Tools.] Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cortez-sigmod2010.pdf ONDUX: on-demand unsupervised learning for information extraction.] Eli Cortez, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura:  SIGMOD Conference 2010: 807-818


Some history and perspective:
Some history and perspective:
Line 44: Line 58:


== Week 4 - Feb 14 ==  
== Week 4 - Feb 14 ==  
* Provenance and Databases
* Graph Indexing
=== Assignment ===
* Write 2 position papers --- one for each of the articles in the required reading for this week (see below)
=== Required Reading ===
* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
** Presenter: Fernando Seabra [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/WhyWherePresentation.pdf Presentation]
** Rebuttal: Joe Miller (tentative)
* [http://www.vldb.org/conf/2007/papers/research/p938-zhao.pdf Graph Indexing: Tree + Delta >= Graph] P. Zhao, J. X. Yu, and P. S. Yu.  VLDB 2007.
** Presenter: Nivan Ferreira
** Rebuttal: Sergey Nepomnyachiy
===Additional Suggested Reading===
* A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
* Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
* [http://www.soe.ucsc.edu/~wctan/papers/2007/pdb-ieee.pdf Provenance in Databases: Past, Current, and Future] W. Tan. IEEE Data Engineering Bulletin.
* [http://www.cs.ucsb.edu/~dbl/papers/he_icde_2006.pdf Closure-Tree: An Index Structure for Graph Queries] H. He and A. K. Singh.  ICDE 2006.
* Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
* Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
* [http://www.cs.nyu.edu/shasha/papers/graphgrep/pods2002.pdf Algorithmics and Applications of Tree and Graph Searching] D. Shasha, J. T. L. Wang, and R. Giugno.  PODS 2002.


== Week 5 - Feb 21 ==  
== Week 5 - Feb 21 ==  
* NoSQL databases
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/dean-cacm2008.pdf MapReduce: simplified data processing on large clusters] Jeffrey Dean and  Sanjay Ghemawat, CACM 2008
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
**Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
**Rebuttal: Fernando Seabra,  Shoshana Gottesman
=== Additional suggested reading ===
* Debate between MR and DB people:
**http://cacm.acm.org/magazines/2010/1/55743-mapreduce-and-parallel-dbmss-friends-or-foes/fulltext
**http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
* http://www.computerworld.com/s/article/9224180/What_s_the_big_deal_about_Hadoop_
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.
* [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010.
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics


== Week 6 - Feb 28 ==
== Week 6 - Feb 28 ==


TBD
[http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/intro-to-visualization.pdf Introduction to Visualization.]  Lecture will be given by Professors Claudio Silva and Lauro Lins
 
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
 
=== Suggested Reading ===
 
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition
http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
 
Lecture notes. Claudio Silva
http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf


== Week 7 - March 6 ==
== Week 7 - March 6 ==


* NoSQL Databases
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
* [http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.] Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. VLDB 2009.
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.]  Bajda-Pawlikowsk et al., SIGMOD 2011
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
** Presenters: Julie Odongo, Majed Hakami [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/majed-hadoopdb.pdf Presentation], Yuan Ding
** Rebuttal:  Nivan Ferreira, Dmitriy Gromov, Juliana Freire
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics


== Week 8 - March 13 ==
== Week 8 - March 13 ==
Line 62: Line 169:


== Week 10 - March 27 ==
== Week 10 - March 27 ==
* Web information integration
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
* [http://pages.cs.wisc.edu/~anhai/papers/imap.pdf iMAP: Discovering Complex Semantic Matches between Database Schemas.]  R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004.
* [http://portal.acm.org/citation.cfm?id=1132863.1132872&coll=GUIDE&dl=GUIDE Automatic complex schema matching across Web query interfaces] Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
** Presenters: Joe Miller, Vineet Meghani
** Rebuttal:  Yuan Ding,  Chunqing Jiang
=== Additional Reading ===
* [http://portal.acm.org/citation.cfm?id=767154 A survey of approaches to automatic schema matching] Rahm Erhard and Bernstein Philip,  VLDB 2001


== Week 11 - April 3 ==
== Week 11 - April 3 ==
* Wikipedia
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
*  [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf  Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007)
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
** Rebuttal: Wei Jiang,  Juliana Freire, Majed Hakami
=== Additional Reading ===
* [http://www.mpi-inf.mpg.de/yago-naga/yago/publications/YAGO-NAGA-Appr.pdf The YAGO-NAGA Approach to Knowledge Discovery] Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath, Gerhard Weikum SIGMOD Record 37:4, December 2008
* [http://vgc.poly.edu/~juliana/pub/wikimatch-vldb2012.pdf Multilingual Schema Matching for Wikipedia Infoboxes] Nguyen et al., VLDB 2012


== Week 12 - April 10 ==
== Week 12 - April 10 ==
* Information extraction
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009.
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf  Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
** Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal
** Rebuttal:  Majed Hakami, Xiang Liu,  May Thazin
=== Additional Reading ===
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008.


== Week 13 - April 17 ==
== Week 13 - April 17 ==
=== Assignment ===
* Write a position papers for the required papers
* Twitter and News: finding entities and trends
===Required Reading ===
* [http://vgc.poly.edu/wiki/vgc/index.php/File:D11-1141.pdf Named Entity Recognition in Tweets: An Experimental Study.] EMNLP 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets]  ACL 2011
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011
** Presenters:  Fernando Seabra, Wei Jiang, Nivan Ferreira
** Rebuttal:  Juliana Freire, Bhaktavatsalam Nallanthighal,  Julie Ondongo
=== Additional reading ===
* [http://www.www2011india.com/proceeding/proceedings/p267.pdf Unified Analysis of Streaming News] WWW 2011
*  [http://www.cs.ust.hk/~qyang/Docs/2011/cikm-short-text.pdf Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering] CIKM 2011


== Week 14 - April 24 ==
== Week 14 - April 24 ==
* Keyword queries over relational data
=== Assignment ===
* Write a position papers for the required papers
===Required Reading ===
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008
** Presenters:  May Thazin,  Tehila Minkus, Bhaktavatsalam Nallanthighal
** Rebuttal:  Tehila Minkus, Vineet Meghani, May Thazin


== Week 15 - May 1 ==
== Week 15 - May 1 ==
Project presentation
Project presentation

Latest revision as of 19:55, 24 April 2012

Make sure to check my.poly.edu for course announcements

Every week, you must write position papers for the papers in the Required Readings list

Week 1 - Jan 24

  • Course overview (First day of classes!)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf

  • Provenance and Workflows

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

Readings

  • Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.

Week 2 - Jan 31

  • Provenance and Workflows (cont.)

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf

  • Discussion about literature search

Readings

same as last week

Week 3 - Feb 7

  • Information extraction: survey

http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf

Announcements

  • The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
  • I will email to you preliminary assignments tomorrow

Assignment

  • Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction

Readings

Some history and perspective:

Week 4 - Feb 14

  • Provenance and Databases
  • Graph Indexing

Assignment

  • Write 2 position papers --- one for each of the articles in the required reading for this week (see below)


Required Reading

Additional Suggested Reading

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Week 5 - Feb 21

  • NoSQL databases

Assignment

  • Write a position papers for the required papers

Required Reading

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

    • Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
    • Rebuttal: Fernando Seabra, Shoshana Gottesman

Additional suggested reading

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 6 - Feb 28

Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins

There will be no assignment this week, but I plan to give you a quiz on visualization next week.

Suggested Reading

Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf

Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf

Week 7 - March 6

  • NoSQL Databases

Assignment

  • Write a position papers for the required papers

Required Reading

    • Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
    • Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire

For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics

Week 8 - March 13

Spring break - no class

Week 9 - March 20

TBD

Week 10 - March 27

  • Web information integration

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 11 - April 3

  • Wikipedia

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 12 - April 10

  • Information extraction

Assignment

  • Write a position papers for the required papers

Required Reading

Additional Reading

Week 13 - April 17

Assignment

  • Write a position papers for the required papers
  • Twitter and News: finding entities and trends

Required Reading

Additional reading

Week 14 - April 24

  • Keyword queries over relational data

Assignment

  • Write a position papers for the required papers

Required Reading

Week 15 - May 1

Project presentation