Difference between revisions of "Reading List"
Jump to navigation
Jump to search
Line 69: | Line 69: | ||
* [http://videolectures.net/mlg07_han_miasg/ Mining, Indexing, and Searching Graphs in Large Data Sets] by Jiawei Han Nature 2007 | * [http://videolectures.net/mlg07_han_miasg/ Mining, Indexing, and Searching Graphs in Large Data Sets] by Jiawei Han Nature 2007 | ||
== Provenance Mining == | === Provenance Mining === | ||
* [http://www.cs.utah.edu/~juliana/pub/tvcg-recommendation2008.pdf VisComplete: Automating Suggestions for Visualization Pipelines.] David Koop, Carlos E. Scheidegger, Steven P. Callahan, Huy T. Vo, Juliana Freire and Claudio T. Silva. In IEEE Transactions on Visualization and Computer Graphics, 14(6), pp. 1691-1698, 2008. | * [http://www.cs.utah.edu/~juliana/pub/tvcg-recommendation2008.pdf VisComplete: Automating Suggestions for Visualization Pipelines.] David Koop, Carlos E. Scheidegger, Steven P. Callahan, Huy T. Vo, Juliana Freire and Claudio T. Silva. In IEEE Transactions on Visualization and Computer Graphics, 14(6), pp. 1691-1698, 2008. | ||
Line 87: | Line 87: | ||
* [http://gking.harvard.edu/files/dvn.pdf An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.] Gary King. Sociological Methods and Research. Vol. 32, No. 2 (November, 2007): Pp. 173--199, | * [http://gking.harvard.edu/files/dvn.pdf An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.] Gary King. Sociological Methods and Research. Vol. 32, No. 2 (November, 2007): Pp. 173--199, | ||
== Provenance: Security and Privacy == | === Provenance: Security and Privacy === | ||
* [http://www.cs.utah.edu/~juliana/rtdb2008/References/braun-hotsec2008.pdf Securing provenance.] Braun, A. Shinnar, and M. Seltzer. In HotSec’08, 2008. | * [http://www.cs.utah.edu/~juliana/rtdb2008/References/braun-hotsec2008.pdf Securing provenance.] Braun, A. Shinnar, and M. Seltzer. In HotSec’08, 2008. | ||
Line 94: | Line 94: | ||
* [http://www.ragibhasan.com/publications/papers/storagess2007-rhasan.pdf Introducing Secure Provenance: Problems and Challenges], Ragib Hasan, Radu Sion, Marianne Winslett, in ACM StorageSS 2007. | * [http://www.ragibhasan.com/publications/papers/storagess2007-rhasan.pdf Introducing Secure Provenance: Problems and Challenges], Ragib Hasan, Radu Sion, Marianne Winslett, in ACM StorageSS 2007. | ||
* [http://www.ragibhasan.com/research/provenance.html Secure Provenance Project at UIUC] | * [http://www.ragibhasan.com/research/provenance.html Secure Provenance Project at UIUC] | ||
Line 143: | Line 142: | ||
* [http://portal.acm.org/citation.cfm?id=1107499.1107502 From databases to dataspaces: a new abstraction for information management] by Michael Franklin, Alon Halevy, David Maier, SIGMOD 2005 | * [http://portal.acm.org/citation.cfm?id=1107499.1107502 From databases to dataspaces: a new abstraction for information management] by Michael Franklin, Alon Halevy, David Maier, SIGMOD 2005 | ||
* [http://portal.acm.org/citation.cfm?id=1454159.1454217 A first tutorial on dataspaces] by Michael Franklin, Alon Halevy, David Maier, VLDB 2008 | * [http://portal.acm.org/citation.cfm?id=1454159.1454217 A first tutorial on dataspaces] by Michael Franklin, Alon Halevy, David Maier, VLDB 2008 | ||
=== NoSQL Databases === | |||
* [http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2010/SQLvsNoSQLDatabases.pdf SQL databases v. NoSQL databases.] Michael Stonebraker, CACM 2010. | |||
* [http://www.christof-strauch.de/nosqldbs.pdf NoSQL Databases.] Christof Strauch. 2010. | |||
* [http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf Pig latin: a not-so-foreign language for data processing].C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008. | |||
* [http://infolab.stanford.edu/~usriv/papers/pnuts.pdf PNUTS : Yahoo !’ s Hosted Data Serving Platform.] Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-arno Jacobsen, et al. in Proceedings of the VLDB Endowment (2008). | |||
=== Relational data on the Web === | === Relational data on the Web === |
Revision as of 17:50, 31 January 2012
Provenance
Overview
- Provenance and Scientific Workflows: Challenges and Opportunities Susan Davidson and Juliana Freire. In Proceedings of ACM SIGMOD International Conference on Management of Data, 2008. Tutorial resources
- Provenance for Computational Tasks: A Survey Juliana Freire, David Koop, Emanuele Santos, and Claudio T. Silva. In IEEE Computing in Science & Engineering, 2008.
- Lineage retrieval for scientific data processing:a survey R. Bose and J. Frew. ACM Computing Surveys,37(1):1-28,2005.
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin.
- A survey of data provenance in e-science, Yogesh L. Simmhan, Beth Plale, Dennis Gannon, SIGMOD Record, September, 2005.
Provenance in Databases
- Provenance in Databases: Past, Current, and Future W. Tan. IEEE Data Engineering Bulletin. (short overview)
- Curated Databases W. Tan, P. Buneman, J. Cheney, S. Vansumerren. ACM Symposium on Principles of Database Systems (PODS), 2008.
- Provenance as Dependency Analysis James Cheney, Amal Ahmed, Umut A. Acar. DBPL 2007: 138-152
- Database Provenance Tutorial W. Tan and P. Buneman
Provenance Management: Storage, Indexing and Querying
- Querying and Creating Visualizations by Analogy. Carlos E. Scheidegger, Huy T. Vo, David Koop, Juliana Freire and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1560-1567, 2007. Best paper in IEEE Visualization 2007.
- Efficient Provenance Storage Adriane Chapman. H .V. Jagadish and Prakash Ramanan. SIGMOD 2008.
- Efficient lineage tracking for scientific workflows. Thomas Heinis, Gustavo Alonso. SIGMOD Conference 2008: 1007-1018
- Querying and Managing Provenance through User Views in Scientific Workflows. Olivier Biton, Sarah Cohen Boulakia, Susan B. Davidson, Carmem S. Hara. ICDE 2008: 1072-1081
- Querying Business Processes. Catriel Beeri, Anat Eyal, Simon Kamenkovich, Tova Milo. VLDB 2006: 343-354
Provenance/Workflow/Graph Indexing
- Algorithmics and Applications of Tree and Graph Searching D. Shasha, J. T. L. Wang, and R. Giugno. PODS 2002.
- Graph Indexing: Tree + Delta >= Graph P. Zhao, J. X. Yu, and P. S. Yu. VLDB 2007.
- Closure-Tree: An Index Structure for Graph Queries H. He and A. K. Singh. ICDE 2006.
Additional papers:
- Efficient Matching and Indexing of Graph Models in Content-Based Retrieval by Stefano Berretti , Alberto Del Bimbo , Enrico Vicario, IEEE TPAMI 2001
- Computing Frequent Graph Patterns from Semistructured Data by N. Vanetik , E. Gudes , S. E. Shimony ICDM 2002
- Graph indexing based on discriminative frequent structure analysis by Xifeng Yan, Philip S. Yu, Jiawei Han TODS 2004
- Graph Indexing: A Frequent Structurebased Approach by Xifeng Yan, Philip S. Yu, Jiawei Han SIGMOD 2004
- Graph Database Indexing Using Structured Graph Decomposition by David W. Williams, Jun Huan, Wei Wang ICDE 2007
- Towards graph containment search and indexing by Chen Chen , Xifeng Yan , Philip S. Yu , Jiawei Han , Dong-Qing Zhang , Xiaohui Gu, VLDB 2007
- Treepi: A novel graph indexing method by S Zhang, M Hu, J Yang ICDE 2007
- Summarization Graph Indexing: Beyond Frequent Structure-based Approach by Lei Zou, Lei Chen, Huaming Zhang, Yansheng Lu, and Qiang Lou
Presentation:
Video lecture:
- Mining, Indexing, and Searching Graphs in Large Data Sets by Jiawei Han Nature 2007
Provenance Mining
- VisComplete: Automating Suggestions for Visualization Pipelines. David Koop, Carlos E. Scheidegger, Steven P. Callahan, Huy T. Vo, Juliana Freire and Claudio T. Silva. In IEEE Transactions on Visualization and Computer Graphics, 14(6), pp. 1691-1698, 2008.
- A First Study on Clustering Collections of Workflow Graphs E. Santos, L. Lins, J. P. Ahrens, J. Freire, C. Silva. In Proceedings of IPAW, pp. 160-173, 2008
- Process Mining Based on Clustering: A Quest for Precision. A.K. Alves de Medeiros, A. Guzzo, G. Greco, W.M.P. van der Aalst, A.J.M.M. Weijters, B. van Dongen, and D. Saccà. In A. ter Hofstede, B. Benatallah, and H.-Y. Paik, editors, BPM 2007 Workshops, LNCS 4928: 17–29, 2008.
- Mining and Reasoning on Workflows Greco et al. TKDE2005
Provenance Applications: Publications
- Reproducible Research Fomel, Sergey; Claerbout, Jon F. CiSE Volume: 11 Issue: 1 Date: Jan.-Feb. 2009 Page(s): 5-7 Digital Object Identifier 10.1109/MCSE.2009.14
- Reproducible Research: A Bioinformatics Case Study Robert Gentleman. Bioconductor Project Working Papers. Working Paper 3. (May 2004).
- An Introduction to the Dataverse Network as an Infrastructure for Data Sharing. Gary King. Sociological Methods and Research. Vol. 32, No. 2 (November, 2007): Pp. 173--199,
Provenance: Security and Privacy
- Securing provenance. Braun, A. Shinnar, and M. Seltzer. In HotSec’08, 2008.
- The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance, Ragib Hasan, Radu Sion, and Marianne Winslett, USENIX FAST 2009
- Introducing Secure Provenance: Problems and Challenges, Ragib Hasan, Radu Sion, Marianne Winslett, in ACM StorageSS 2007.
- TAPIDO: Trust and Authorization via Provenance and Integrity in Distributed Objects. A. Cirillo, R. Jagadeesan, C. Pitcher, and J. Riely. In European Symposium on Programming (ESOP), Lecture Notes in Computer Science, Springer, 2008.
- Evidence-Based Audit. Jeffrey A. Vaughan, Limin Jia, Karl Mazurak, Steve Zdancewic. CSF 2008: 177-191
- SELinks: End to end security for Web applications. Hicks, Swamy, and Corcoran. Project Web Site
Data on the Web
Web Schema Matching and Integration
- An interactive clustering-based approach to integrating source query interfaces on the deep Web Wensheng Wu, Clement Yu, AnHai Doan, Weiyi Meng, SIGMOD 2004
- Automatic complex schema matching across Web query interfaces Bin He, Kevin Chuan Chang, ACM Trans. Database Syst. 2006
- Web-scale Data Integration: You can only afford to Pay As You Go Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy. CIDRDB 2007
- Data Integration with Uncertainty Xin Dong, Alon Y. Halevy, Cong Yu. VLDB 2007
Additional: papers
- A survey of approaches to automatic schema matching Rahm Erhard and Bernstein Philip, VLDB 2001
- A Survey of Schema-based Matching Approaches Pavel Shvaiko1 and Jerome Euzenat2, JoDS 2005
- Why is schema matching tough and what can we do about it? Avigdor Gal. ACM SIGMOD Record
- Wise-integrator: An automatic integrator of web search interfaces for e-commerce. Hai He and Weiyi Meng. VLDB 2003
- Holistic query interface matching using parallel schema matching. W. Su, J. Wang, and F. Lochovsky. ICDE '06
- Corpus-based schema matching. Jayant Madhavan, Philip A. Bernstein, Anhai Doan, Alon Halevy. ICDE 05
- A Robust Approach to Schema Matching overWeb Query Interfaces Jin Pei, Jun Hong, David Bell. ICDE 06
- Statistical Schema Matching across Web Query Interfaces Bin He, Kevin Chen-Chuan Chang, SIGMOD 2003
- Merging Source Query Interfaces on Web Databases, Eduard Dragut, ICDE06
Additional papers on Dataspaces:
- From databases to dataspaces: a new abstraction for information management by Michael Franklin, Alon Halevy, David Maier, SIGMOD 2005
- A first tutorial on dataspaces by Michael Franklin, Alon Halevy, David Maier, VLDB 2008
NoSQL Databases
- SQL databases v. NoSQL databases. Michael Stonebraker, CACM 2010.
- NoSQL Databases. Christof Strauch. 2010.
- Pig latin: a not-so-foreign language for data processing.C Olston, B Reed, U Srivastava, R Kuma, A. Tomkins. SIGMOD 2008.
- PNUTS : Yahoo !’ s Hosted Data Serving Platform. Brian F Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-arno Jacobsen, et al. in Proceedings of the VLDB Endowment (2008).
Relational data on the Web
- WebTables: exploring the power of tables on the web. Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang: PVLDB 1(1): 538-549 (2008)
- Uncovering the Relational Web. Michael J. Cafarella, Alon Y. Halevy, Yang Zhang, Daisy Zhe Wang, Eugene Wu. WebDB 2008
- Mining database structure; or, how to build a data quality browser. Tamraparni Dasu, Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk. SIGMOD 2002
- [1] Information-theoretic tools for mining database structure from large data sets. Periklis Andritsos, Renee J. Miller and Panayiotis Tsaparas. SIGMOD 2004
Additional Papers:
- Duplicate Record Detection: A Survey. Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios. IEEE TKDE, 2007
- Efficient Discovery of Functional and Approximate Dependencies Using Partitions Yka Huhtala, Juha Karkkainen, Pasi Porkka, and Hannu Toivonen. In Proc. IEEE Intl. conf. on Data Engineering, 1998.
- Mining Association Rules between Sets of Items in Large Databases Rakesh Agrawal, Tomasz Imielinski, Arun Swami. SIGMOD 1993
- LIMBO: Scalable Clustering of Categorical DataPeriklis Andritsos, Panayiotis Tsaparas, Ren´ee J. Miller, and Kenneth C. Sevcik. In EDBT 2004.
Data integration on the fly (or almost...)
- From databases to dataspaces: a new abstraction for information management. Michael Franklin, Alon Halevy, David Maier. Sigmod Record, 2005
- Indexing dataspaces. Xin Dong and Alon Halevy. SIGMOD 2007.
- Pay-as-you-go user feedback for dataspace systems. Shawn R. Jeffery, Michael J. Franklin, Alon Y. Halevy. SIGMOD Conference 2008: 847-860
- Bootstrapping pay-as-you-go data integration systems. Anish Das Sarma, Xin Dong, Alon Y. Halevy, SIGMOD Conference 2008: 861-874.
- Building Community Wikipedias: A Human-Machine Approach. P. DeRose, X. Chai, B. Gao, W. Shen, A. Doan, P. Bohannon, J. Zhu. ICDE-08.
- The Case for a Structured Approach to Managing Unstructured Data. A. Doan, J. F. Naughton, A. Baid, X. Chai, F. Chen, T. Chen, E. Chu, P. DeRose, B. Gao, C. Gokhale, J. Huang, W. Shen, B. Vuong. CIDR-09.
Usable query interfaces for structured data
- Discover: keyword search in relational databases. Vagelis Hristidis, Yannis Papakonstantinou. VLDB 2002.
- Bidirectional Expansion For Keyword Search on Graph Databases. Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan, Rushi Desai and Hrishikesh Karambelkar, VLDB 2005
- Keyword Searching and Browsing in databases using BANKS. Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan. ICDE 2002
- Effective keyword search in relational databases. Liu,, Fang and Yu,, Clement and Meng,, Weiyi and Chowdhury,, Abdur. SIGMOD 2006, pp 563--574.
Snippet Generation and Ranking
- A system for query-specific document summarization. Ramakrishna Varadarajan, Vagelis Hristidis. CIKM, 2006
- Fast generation of result snippets in web search. Andrew Turpin, Yohannes Tsegay, David Hawking, Hugh E. Williams. ACM SIGIR, 2007 (Ramesh: Will present this)
- Object-level ranking: bringing order to Web objects. Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen, Wei-Ying Ma. WWW, 2005
- Page quality: in search of an unbiased web ranking. Junghoo Cho, Sourashis Roy, Robert E. Adams. SIGMOD, 2005
The Deep Web
- Google's Deep Web crawl. Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Y. Halevy. PVLDB 1(2): 1241-1252 (2008) (*Ramesh will present this)
- Siphoning Hidden-Web Data through Keyword-Based Interfaces. Luciano Barbosa and Juliana Freire. In Proceedings of Brazilian Symposium on Databases (SBBD), 2004. (*Huong will present this)
- Instance-based schema matching for web databases by domain-specific query probing. Jiying Wang , Ji-Rong Wen , Fred Lochovsky , Wei-Ying Ma. VLDB 2004
- Query Selection Techniques for Efficient Crawling of Structured Web Sources. Ping Wu , Ji-Rong Wen , Huan Liu , Wei-Ying Ma. ICDE 2006
Information Extraction
- Information extraction Sunita Sarawagi. FnT Databases, 1(3), 2008.
- On the Provenance of Non-Answers to Queries over Extracted Data. J. Huang, T. Chen, A. Doan, J. Naughton. VLDB-08.
- Information Extraction From Wikipedia: Moving Down the Long Tail Fei Wu, Raphael Hoffmann, Daniel S. Weld
- Intelligence in Wikipedia Daniel S. Weld, Fei Wu, Eytan Adar
- Semantic annotation of unstructured and ungrammatical text Matthew Michelson and Craig A. Knoblock. IJCAI 2005
- Domain adaptation of information extraction models. Rahul Gupta and Sunita Sarawagi. In Sigmod Record, 2008.
- Information Extraction Challenges in Managing Unstructured Data. AnHai Doan et al. SIGMOD Record, Winter 08, Special Issue on Managing Information Extraction.
- Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando Pereira, ICML 2001.
- 2D Conditional Random Fields for Web Information Extraction Jun Zhu,Wei-Ying Ma ICML 2005
- Simultaneous record detection and attribute labeling in web data extraction Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. KDD 2006
- Web Data Extraction Based on Partial Tree Alignment Yanhong Zhai, Bing Liu. WWW 2005