Difference between revisions of "CS6093/Selected Papers and Topics"
Line 57: | Line 57: | ||
* [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010. | * [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010. | ||
== Relational Data on the | == Relational Data on the Large == | ||
* Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom | |||
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf] | |||
* Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010 | |||
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf] | |||
* Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000. | |||
http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf | |||
== Deep Web == | == Deep Web == | ||
== Using and Analyzing Social Networking Data == | == Using and Analyzing Social Networking Data == |
Revision as of 21:26, 7 February 2012
Provenance and Databases
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330
http://db.cis.upenn.edu/DL/whywhere.pdf
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.
http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf
- Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
Additional Suggested Reading:
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
Graph Indexing
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
- Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
- Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
Provenance Applications: Reproducible Publications
- papers from challenge
Web Schema Matching and Integration
NoSQL Databases
- Intro to Hadoop (TBD)
- Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/jahani-vldb2011.pdf
- Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
- Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cattel-sigrec2011.pdf
- Pig latin: a not-so-foreign language for data processing.C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
- PNUTS : Yahoo !’ s Hosted Data Serving Platform. Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-arno Jacobsen, et al. in Proceedings of the VLDB Endowment (2008).
Additional suggested reading:
- SQL databases v. NoSQL databases. Michael Stonebraker, CACM 2010.
- NoSQL Databases. Christof Strauch. 2010.
Relational Data on the Large
- Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf]
- Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf]
- Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.
http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf