CS6093/Lectures
Week 1 - Jan 24
- Course overview (First day of classes!)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf
- Provenance and Workflows
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
Readings
- Provenance and Scientific Workflows: Challenges and Opportunities Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. Tutorial resources
- Provenance for Computational Tasks: A Survey Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
- Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
Week 2 - Jan 31
- Provenance and Workflows (cont.)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
- Discussion about literature search
Readings
same as last week
Week 3 - Feb 7
- Information extraction: survey
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf
Announcements
- The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
- I will email to you preliminary assignments tomorrow
Assignment
- Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction
Readings
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
- A Brief Survey of Web Data Extraction Tools. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
- ONDUX: on-demand unsupervised learning for information extraction. Eli Cortez, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura: SIGMOD Conference 2010: 807-818
Some history and perspective:
- Data integration: the teenage years. A. Halevy, A. Rajaraman, J. Ordille. VLDB 2006.
- Generic Schema Matching, Ten Years Later. Philip A. Bernstein, Jayant Madhavan, Erhard Rahm: PVLDB 4(11): 695-701 (2011)
Week 4 - Feb 14
Assignment
- Write 2 position papers for the article: one for each of the articles in the required reading for this week (see below)
Required Reading
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330
http://db.cis.upenn.edu/DL/whywhere.pdf
- Presenter: Fernando Seabra
- Rebuttal: Joe Miller (tentative)
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Presenter: Nivan Ferreira
- Rebuttal: Sergey Nepomnyachiy (tentative)
Additional Suggested Reading
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
- Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
- Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
Week 5 - Feb 21
Week 6 - Feb 28
TBD
Week 7 - March 6
Week 8 - March 13
Spring break - no class
Week 9 - March 20
TBD
Week 10 - March 27
Week 11 - April 3
Week 12 - April 10
Week 13 - April 17
Week 14 - April 24
Week 15 - May 1
Project presentation