Difference between revisions of "CS6093/Lectures"
(17 intermediate revisions by the same user not shown) | |||
Line 69: | Line 69: | ||
* Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf | * Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf | ||
** Presenter: Fernando Seabra | ** Presenter: Fernando Seabra [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/WhyWherePresentation.pdf Presentation] | ||
** Rebuttal: Joe Miller (tentative) | ** Rebuttal: Joe Miller (tentative) | ||
Line 110: | Line 110: | ||
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011 | * Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011 | ||
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf | http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf | ||
**Presenters: Dmitriy Gromov,Xiang Liu | **Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu | ||
**Rebuttal: Fernando Seabra, Shoshana Gottesman | **Rebuttal: Fernando Seabra, Shoshana Gottesman | ||
Line 129: | Line 129: | ||
== Week 6 - Feb 28 == | == Week 6 - Feb 28 == | ||
Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins | [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/intro-to-visualization.pdf Introduction to Visualization.] Lecture will be given by Professors Claudio Silva and Lauro Lins | ||
There will be no assignment this week, but I plan to give you a quiz on visualization next week. | There will be no assignment this week, but I plan to give you a quiz on visualization next week. | ||
=== Suggested Reading === | |||
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition | |||
http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf | |||
Lecture notes. Claudio Silva | |||
http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf | |||
== Week 7 - March 6 == | == Week 7 - March 6 == | ||
Line 146: | Line 154: | ||
* [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.] Bajda-Pawlikowsk et al., SIGMOD 2011 | * [http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf Efficient Processing of Data Warehousing Queries in a Split Execution Environment.] Bajda-Pawlikowsk et al., SIGMOD 2011 | ||
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R | * [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008. | ||
** Presenters: Julie Odongo, Majed Hakami, Yuan Ding | ** Presenters: Julie Odongo, Majed Hakami [http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/majed-hadoopdb.pdf Presentation], Yuan Ding | ||
** Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire | ** Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire | ||
Line 189: | Line 197: | ||
* [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09) | * [http://www.cs.washington.edu/homes/weld/papers/adar-wsdm09.pdf Information Arbitrage in Multi-Lingual Wikipedia.] Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09) | ||
* [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007) | * [http://suchanek.name/work/publications/www2007.pdf Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. ] 16th international World Wide Web conference (WWW 2007) | ||
** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng | ** Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng | ||
** Rebuttal: Wei Jiang, | ** Rebuttal: Wei Jiang, Juliana Freire, Majed Hakami | ||
=== Additional Reading === | === Additional Reading === | ||
Line 208: | Line 215: | ||
===Required Reading === | ===Required Reading === | ||
* [http://vgc.poly.edu/~juliana/courses/cs6093/Readings/bizer-web-sem2009..pdf DBpedia - A crystallization point for the Web of Data] Bizer et al., Web Semantics 2009. | |||
* [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009 | * [http://pages.cs.wisc.edu/~anhai/papers/delex-sigmod09.pdf Optimizing Complex Extraction Programs over Evolving Text Data.] F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009 | ||
* [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008 | * [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008 | ||
** Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal | |||
** Presenters: Chunqing Jiang, Bhaktavatsalam Nallanthighal | ** Rebuttal: Majed Hakami, Xiang Liu, May Thazin | ||
** Rebuttal: Xiang Liu, May Thazin | |||
=== Additional Reading === | === Additional Reading === | ||
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia: Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld | |||
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi. FnT Databases, 1(3), 2008. | * [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi. FnT Databases, 1(3), 2008. | ||
* [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008. | * [http://pages.cs.wisc.edu/~anhai/papers/spec-issue-intro-sigmodrec08.pdf Introduction to the Special Issue on Managing Information Extraction] Doan et al., SIGMOD Record 2008. | ||
Line 230: | Line 238: | ||
* [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets] ACL 2011 | * [http://vgc.poly.edu/wiki/vgc/index.php/File:NerTwitter.pdf Recognizing Named Entities in Tweets] ACL 2011 | ||
* [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011 | * [http://vgc.poly.edu/wiki/vgc/index.php/File:TrackingTrends.pdf Tracking Trends: Incorporating Term Volume into Temporal Topic Models.] KDD 2011 | ||
** Presenters: | ** Presenters: Fernando Seabra, Wei Jiang, Nivan Ferreira | ||
** Rebuttal: | ** Rebuttal: Juliana Freire, Bhaktavatsalam Nallanthighal, Julie Ondongo | ||
=== Additional reading === | === Additional reading === | ||
Line 250: | Line 258: | ||
* [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010 | * [http://pages.cs.wisc.edu/~anhai/papers/scalable-kws-vldb10.pdf Toward Scalable Keyword Search over Relational Data] Baid et al., VLDB 2010 | ||
* [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002 | * [http://www.vldb.org/conf/2002/S33P11.pdf BANKS: Browsing and Keyword Searching in Relational Databases] Aditya et al., VLDB 2002 | ||
** Presenters: May Thazin, Tehila Minkus | * [http://pages.cs.wisc.edu/~anhai/papers/ie-provenance-vldb08.pdf On the Provenance of Non-Answers to Queries over Extracted Data]. Huang et al, VLDB 2008 | ||
** Rebuttal: Vineet Meghani, | ** Presenters: May Thazin, Tehila Minkus, Bhaktavatsalam Nallanthighal | ||
** Rebuttal: Tehila Minkus, Vineet Meghani, May Thazin | |||
== Week 15 - May 1 == | == Week 15 - May 1 == | ||
Project presentation | Project presentation |
Latest revision as of 19:55, 24 April 2012
Make sure to check my.poly.edu for course announcements
Every week, you must write position papers for the papers in the Required Readings list
Week 1 - Jan 24
- Course overview (First day of classes!)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf
- Provenance and Workflows
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
Readings
- Provenance and Scientific Workflows: Challenges and Opportunities Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. Tutorial resources
- Provenance for Computational Tasks: A Survey Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
- Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
Week 2 - Jan 31
- Provenance and Workflows (cont.)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
- Discussion about literature search
Readings
same as last week
Week 3 - Feb 7
- Information extraction: survey
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf
Announcements
- The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
- I will email to you preliminary assignments tomorrow
Assignment
- Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction
Readings
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
- A Brief Survey of Web Data Extraction Tools. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
- ONDUX: on-demand unsupervised learning for information extraction. Eli Cortez, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura: SIGMOD Conference 2010: 807-818
Some history and perspective:
- Data integration: the teenage years. A. Halevy, A. Rajaraman, J. Ordille. VLDB 2006.
- Generic Schema Matching, Ten Years Later. Philip A. Bernstein, Jayant Madhavan, Erhard Rahm: PVLDB 4(11): 695-701 (2011)
Week 4 - Feb 14
- Provenance and Databases
- Graph Indexing
Assignment
- Write 2 position papers --- one for each of the articles in the required reading for this week (see below)
Required Reading
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
- Presenter: Fernando Seabra Presentation
- Rebuttal: Joe Miller (tentative)
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Presenter: Nivan Ferreira
- Rebuttal: Sergey Nepomnyachiy
Additional Suggested Reading
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
- Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
- Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
Week 5 - Feb 21
- NoSQL databases
Assignment
- Write a position papers for the required papers
Required Reading
- MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat, CACM 2008
- Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
- Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
- Rebuttal: Fernando Seabra, Shoshana Gottesman
Additional suggested reading
- Debate between MR and DB people:
- SQL databases v. NoSQL databases. Michael Stonebraker, CACM 2010.
- NoSQL Databases. Christof Strauch. 2010.
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 6 - Feb 28
Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
Suggested Reading
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf
Week 7 - March 6
- NoSQL Databases
Assignment
- Write a position papers for the required papers
Required Reading
- HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. VLDB 2009.
- Efficient Processing of Data Warehousing Queries in a Split Execution Environment. Bajda-Pawlikowsk et al., SIGMOD 2011
- Pig latin: a not-so-foreign language for data processing.C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
- Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
- Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 8 - March 13
Spring break - no class
Week 9 - March 20
TBD
Week 10 - March 27
- Web information integration
Assignment
- Write a position papers for the required papers
Required Reading
- iMAP: Discovering Complex Semantic Matches between Database Schemas. R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004.
- Automatic complex schema matching across Web query interfaces Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
- Presenters: Joe Miller, Vineet Meghani
- Rebuttal: Yuan Ding, Chunqing Jiang
Additional Reading
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
Week 11 - April 3
- Wikipedia
Assignment
- Write a position papers for the required papers
Required Reading
- Information Arbitrage in Multi-Lingual Wikipedia. Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
- Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. 16th international World Wide Web conference (WWW 2007)
- Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
- Rebuttal: Wei Jiang, Juliana Freire, Majed Hakami
Additional Reading
- The YAGO-NAGA Approach to Knowledge Discovery Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath, Gerhard Weikum SIGMOD Record 37:4, December 2008
- Multilingual Schema Matching for Wikipedia Infoboxes Nguyen et al., VLDB 2012
Week 12 - April 10
- Information extraction
Assignment
- Write a position papers for the required papers
Required Reading
- DBpedia - A crystallization point for the Web of Data Bizer et al., Web Semantics 2009.
- Optimizing Complex Extraction Programs over Evolving Text Data. F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
- On the Provenance of Non-Answers to Queries over Extracted Data. Huang et al, VLDB 2008
- Presenters: Haibo Zeng, Chunqing Jiang, Bhaktavatsalam Nallanthighal
- Rebuttal: Majed Hakami, Xiang Liu, May Thazin
Additional Reading
- Information Extraction From Wikipedia: Moving Down the Long Tail Fei Wu, Raphael Hoffmann, Daniel S. Weld
- Information extraction Sunita Sarawagi. FnT Databases, 1(3), 2008.
- Introduction to the Special Issue on Managing Information Extraction Doan et al., SIGMOD Record 2008.
Week 13 - April 17
Assignment
- Write a position papers for the required papers
- Twitter and News: finding entities and trends
Required Reading
- Named Entity Recognition in Tweets: An Experimental Study. EMNLP 2011
- Recognizing Named Entities in Tweets ACL 2011
- Tracking Trends: Incorporating Term Volume into Temporal Topic Models. KDD 2011
- Presenters: Fernando Seabra, Wei Jiang, Nivan Ferreira
- Rebuttal: Juliana Freire, Bhaktavatsalam Nallanthighal, Julie Ondongo
Additional reading
- Unified Analysis of Streaming News WWW 2011
- Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering CIKM 2011
Week 14 - April 24
- Keyword queries over relational data
Assignment
- Write a position papers for the required papers
Required Reading
- Toward Scalable Keyword Search over Relational Data Baid et al., VLDB 2010
- BANKS: Browsing and Keyword Searching in Relational Databases Aditya et al., VLDB 2002
- On the Provenance of Non-Answers to Queries over Extracted Data. Huang et al, VLDB 2008
- Presenters: May Thazin, Tehila Minkus, Bhaktavatsalam Nallanthighal
- Rebuttal: Tehila Minkus, Vineet Meghani, May Thazin
Week 15 - May 1
Project presentation