CS6093/Lectures
Make sure to check my.poly.edu for course announcements
Every week, you must write position papers for the papers in the Required Readings list
Week 1 - Jan 24
- Course overview (First day of classes!)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/lecture1.pdf
- Provenance and Workflows
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
Readings
- Provenance and Scientific Workflows: Challenges and Opportunities Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. Tutorial resources
- Provenance for Computational Tasks: A Survey Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
- Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
Week 2 - Jan 31
- Provenance and Workflows (cont.)
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/provenance-workflows.pdf
- Discussion about literature search
Readings
same as last week
Week 3 - Feb 7
- Information extraction: survey
http://vgc.poly.edu/~juliana/courses/cs6093/Lectures/information-extraction.pdf
Announcements
- The topic winners were: Information Extraction, Deep Web, Relational Data on the Web, Web Schema Matching, NoSQL DB, Provenance in DB, Graph Indexing, Usable query interfaces
- I will email to you preliminary assignments tomorrow
Assignment
- Write a position paper for the article: ONDUX: on-demand unsupervised learning for information extraction
Readings
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
- A Brief Survey of Web Data Extraction Tools. Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran Soares da Silva, Juliana S. Teixeira: SIGMOD Record 31(2): 84-93 (2002)
- ONDUX: on-demand unsupervised learning for information extraction. Eli Cortez, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura: SIGMOD Conference 2010: 807-818
Some history and perspective:
- Data integration: the teenage years. A. Halevy, A. Rajaraman, J. Ordille. VLDB 2006.
- Generic Schema Matching, Ten Years Later. Philip A. Bernstein, Jayant Madhavan, Erhard Rahm: PVLDB 4(11): 695-701 (2011)
Week 4 - Feb 14
- Provenance and Databases
- Graph Indexing
Assignment
- Write 2 position papers --- one for each of the articles in the required reading for this week (see below)
Required Reading
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu/DL/whywhere.pdf
- Presenter: Fernando Seabra Presentation
- Rebuttal: Joe Miller (tentative)
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Presenter: Nivan Ferreira
- Rebuttal: Sergey Nepomnyachiy
Additional Suggested Reading
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
- Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
- Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
Week 5 - Feb 21
- NoSQL databases
Assignment
- Write a position papers for the required papers
Required Reading
- MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat, CACM 2008
- Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
- Presenters: Dmitriy Gromov [ttp://vgc.poly.edu/~juliana/courses/cs6093/Lectures/MapReducePresentation_DmitriyGromov.pdf Presentation], Xiang Liu
- Rebuttal: Fernando Seabra, Shoshana Gottesman
Additional suggested reading
- Debate between MR and DB people:
- SQL databases v. NoSQL databases. Michael Stonebraker, CACM 2010.
- NoSQL Databases. Christof Strauch. 2010.
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 6 - Feb 28
Introduction to Visualization. Lecture will be given by Professors Claudio Silva and Lauro Lins
There will be no assignment this week, but I plan to give you a quiz on visualization next week.
Suggested Reading
Visualization. Tamara Munzner. Chapter 27, p 675-707, of Fundamentals of Graphics, Third Edition http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
Lecture notes. Claudio Silva http://www.cs.utah.edu/~csilva/courses/cs5630/lec01-notes.pdf
Week 7 - March 6
- NoSQL Databases
Assignment
- Write a position papers for the required papers
Required Reading
- HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. VLDB 2009.
- Efficient Processing of Data Warehousing Queries in a Split Execution Environment. Bajda-Pawlikowsk et al., SIGMOD 2011
- Pig latin: a not-so-foreign language for data processing.C Olston, B Reed, U Srivastava, R Kumar, A. Tomkins. SIGMOD 2008.
- Presenters: Julie Odongo, Majed Hakami Presentation, Yuan Ding
- Rebuttal: Nivan Ferreira, Dmitriy Gromov, Juliana Freire
For additional suggested readings, see http://www.vistrails.org/index.php?title=CS6093/Selected_Papers_and_Topics
Week 8 - March 13
Spring break - no class
Week 9 - March 20
TBD
Week 10 - March 27
- Web information integration
Assignment
- Write a position papers for the required papers
Required Reading
- iMAP: Discovering Complex Semantic Matches between Database Schemas. R. Dhamanka, Y. Lee, A. Doan, A. Halevy, and P. Domingos. SIGMOD-2004.
- Automatic complex schema matching across Web query interfaces Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
- Presenters: Joe Miller, Vineet Meghani
- Rebuttal: Yuan Ding, Chunqing Jiang
Additional Reading
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
Week 11 - April 3
- Wikipedia
Assignment
- Write a position papers for the required papers
Required Reading
- Information Arbitrage in Multi-Lingual Wikipedia. Adar, E. and Skinner, M. and Weld, D., Second ACM International Conference on Web Search and Data Mining (WSDM'09)
- Yago - A Core of Semantic Knowledge. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. 16th international World Wide Web conference (WWW 2007)
- Presenters: Sergey Nepomnyachiy, Shoshana Gottesman, Haibo Zeng
- Rebuttal: Wei Jiang, Juliana Freire, Majed Hakami
Additional Reading
- The YAGO-NAGA Approach to Knowledge Discovery Gjergji Kasneci, Fabian M. Suchanek, Maya Ramanath, Gerhard Weikum SIGMOD Record 37:4, December 2008
- Multilingual Schema Matching for Wikipedia Infoboxes Nguyen et al., VLDB 2012
Week 12 - April 10
- Information extraction
Assignment
- Write a position papers for the required papers
Required Reading
- Optimizing Complex Extraction Programs over Evolving Text Data. F. Chen, B. Gao, A. Doan, J. Yang, R. Ramakrishnan. SIGMOD 2009
- On the Provenance of Non-Answers to Queries over Extracted Data. Huang et al, VLDB 2008
- Presenters: Chunqing Jiang, Bhaktavatsalam Nallanthighal
- Rebuttal: Xiang Liu, May Thazin, Haibo Zeng
Additional Reading
- Information Extraction From Wikipedia: Moving Down the Long Tail Fei Wu, Raphael Hoffmann, Daniel S. Weld
- Information extraction Sunita Sarawagi. FnT Databases, 1(3), 2008.
- Introduction to the Special Issue on Managing Information Extraction Doan et al., SIGMOD Record 2008.
Week 13 - April 17
Assignment
- Write a position papers for the required papers
- Twitter and News: finding entities and trends
Required Reading
- Named Entity Recognition in Tweets: An Experimental Study. EMNLP 2011
- Recognizing Named Entities in Tweets ACL 2011
- Tracking Trends: Incorporating Term Volume into Temporal Topic Models. KDD 2011
- Presenters: ?????, Wei Jiang
- Rebuttal: ????, Bhaktavatsalam Nallanthighal, Julie Ondongo
Additional reading
- Unified Analysis of Streaming News WWW 2011
- Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering CIKM 2011
Week 14 - April 24
- Keyword queries over relational data
Assignment
- Write a position papers for the required papers
Required Reading
- Toward Scalable Keyword Search over Relational Data Baid et al., VLDB 2010
- BANKS: Browsing and Keyword Searching in Relational Databases Aditya et al., VLDB 2002
- Presenters: May Thazin, Tehila Minkus
- Rebuttal: Vineet Meghani, Tehila Minkus
Week 15 - May 1
Project presentation