Difference between revisions of "CS6093/Selected Papers and Topics"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with '== Provenance and Databases == * Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330 http://db.cis.upenn.edu…')
 
 
(6 intermediate revisions by the same user not shown)
Line 9: Line 9:
* Total Recall | Oracle Database
* Total Recall | Oracle Database
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf
Additional Suggested Reading:
* [http://www.soe.ucsc.edu/~wctan/papers/2007/pdb-ieee.pdf Provenance in Databases: Past, Current, and Future] W. Tan. IEEE Data Engineering Bulletin.
== Graph Indexing ==
* [http://www.cs.nyu.edu/shasha/papers/graphgrep/pods2002.pdf Algorithmics and Applications of Tree and Graph Searching] D. Shasha, J. T. L. Wang, and R. Giugno.  PODS 2002.
* [http://www.vldb.org/conf/2007/papers/research/p938-zhao.pdf Graph Indexing: Tree + Delta >= Graph] P. Zhao, J. X. Yu, and P. S. Yu.  VLDB 2007.
* [http://www.cs.ucsb.edu/~dbl/papers/he_icde_2006.pdf Closure-Tree: An Index Structure for Graph Queries] H. He and A. K. Singh.  ICDE 2006.
* Answering pattern match queries in large graph databases via graph embedding
Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011
* Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf
==  Provenance Applications: Reproducible Publications ==
- papers from challenge
== Web Schema Matching and Integration ==
== NoSQL Databases ==
* Intro to Hadoop (TBD)
* Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/jahani-vldb2011.pdf
* Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf
* Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cattel-sigrec2011.pdf
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
* [http://infolab.stanford.edu/~usriv/papers/pnuts.pdf PNUTS : Yahoo !’ s Hosted Data Serving Platform.] Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-arno Jacobsen, et al. in Proceedings of the VLDB Endowment (2008).
Additional suggested reading:
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010.
* [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010.
== Relational Data on the Large ==
* [http://fleixeiras.cs.utah.edu/researchTopics/images/e/e7/Webtables-vldb08.pdf WebTables: exploring the power of tables on the web. ] Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang: PVLDB 1(1): 538-549 (2008)
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/dassarma-sigmod2008.pdf Bootstrapping pay-as-you-go data integration systems.] Anish Das Sarma, Xin Dong, Alon Y. Halevy, SIGMOD Conference 2008: 861-874.
* Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf]
* Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010
http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf]
* Discovering data quality rules. Chiang and Miller. PVLDB 2008
[http://vgc.poly.edu/~juliana/courses/cs6093/Readings/chiang-vldb2008.pdf]
* Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.
http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf
== Deep Web ==
* [http://www.cs.cornell.edu/~lucja/Publications/I03.pdf Google's Deep Web crawl.]  Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008)
== Information Extraction ==
* Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Chai et al., SIGMOD 2009
[http://vgc.poly.edu/~juliana/courses/cs6093/Readings/chai-sigmod2009pdf]
* [http://www.it.iitb.ac.in/~sunita/papers/ieSurvey.pdf Information extraction] Sunita Sarawagi.  FnT Databases, 1(3), 2008.
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/huang-vldb2008.pdf On the Provenance of Non-Answers to Queries over Extracted Data.] J. Huang, T. Chen, A. Doan, J. Naughton. VLDB-08.
* [http://turing.cs.washington.edu/papers/kdd08.pdf Information Extraction From Wikipedia:  Moving Down the Long Tail] Fei Wu, Raphael Hoffmann, Daniel S. Weld
== Using and Analyzing Social Networking Data ==

Latest revision as of 21:33, 7 February 2012

Provenance and Databases

  • Peter Buneman, Sanjeev Khanna, Wang Chiew Tan: Why and Where: A Characterization of Data Provenance. ICDT 2001: 316-330

http://db.cis.upenn.edu/DL/whywhere.pdf

  • A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported Versioned DBMS. Proceedings of the 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, June 2010.

http://ilpubs.stanford.edu:8090/926/1/versioning-TR.pdf

  • Total Recall | Oracle Database

http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf

Additional Suggested Reading:

Graph Indexing

  • Answering pattern match queries in large graph databases via graph embedding

Lei Zou, Lei Chen, M. Tamer Özsu and Dongyan Zhao http://vgc.poly.edu/~juliana/courses/cs6093/Readings/graph-matching-vldbj2011

  • Chenghui Ren, Eric Lo, Ben Kao, Xinjie Zhu, Reynold Cheng: On Querying Historical Evolving Graph Sequences. PVLDB 4(11): 726-737 (2011)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/evolving-graphs-vldb11.pdf

Provenance Applications: Reproducible Publications

- papers from challenge

Web Schema Matching and Integration

NoSQL Databases

  • Intro to Hadoop (TBD)
  • Automatic optimization for MapReduce programs. Eaman Jahani, Michael J. Cafarella, Christopher Ré. .PVLDB, 2011.

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/jahani-vldb2011.pdf

  • Parallel data processing with MapReduce: a survey. Lee et al, SIGMOD Record 2011

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/lee-sigrec2011.pdf

  • Scalable SQL and NoSQL Data Stores Rick Cattel, SIGMOD Record 2011. (overview of current data stores)

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/cattel-sigrec2011.pdf

Additional suggested reading:

Relational Data on the Large


  • Swoosh: a generic approach to entity resolution Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang and Jennifer Widom

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/swoosh-vldbj2009pdf]

  • Automatically incorporating new sources in keyword search-based data integration. Talukdar et al, SIGMOD 2010

http://vgc.poly.edu/~juliana/courses/cs6093/Readings/ives-sigmod2010pdf]

  • Discovering data quality rules. Chiang and Miller. PVLDB 2008

[1]

  • Data cleaning: Problems and current approaches. Rahm, IEEE DEB 2000.

http://dc-pubs.dbs.uni-leipzig.de/files/Rahm2000DataCleaningProblemsand.pdf

Deep Web

  • Google's Deep Web crawl. Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008)


Information Extraction

  • Efficiently Incorporating User Feedback into Information Extraction and Integration Programs. Chai et al., SIGMOD 2009

[2]


Using and Analyzing Social Networking Data