Difference between revisions of "RepeatabilityCentral"
Line 38: | Line 38: | ||
* Packaging an experiment on querying Wikipedia: [[WikiQuery]] | * Packaging an experiment on querying Wikipedia: [[WikiQuery]] | ||
== Publications | == Publications == | ||
* Exploring the Coming Repositories of Reproducible Experiments: Challenges and Opportunities, by Juliana Freire, Philippe Bonnet and Dennis Shasha. In PVLDB, 2011. ''To appear''. | * Exploring the Coming Repositories of Reproducible Experiments: Challenges and Opportunities, by Juliana Freire, Philippe Bonnet and Dennis Shasha. In PVLDB, 2011. ''To appear''. | ||
* A Provenance-Based Infrastructure for Creating Executable Papers, by David Koop, Emanuele Santos, Phillip Mates, Huy Vo, Philippe Bonnet, Matthias Troyer, Dean Williams, Joel Tohline, Juliana Freire and Claudio Silva. In Proceedings of ICCS, 2011. | * A Provenance-Based Infrastructure for Creating Executable Papers, by David Koop, Emanuele Santos, Phillip Mates, Huy Vo, Philippe Bonnet, Matthias Troyer, Dean Williams, Joel Tohline, Juliana Freire and Claudio Silva. In Proceedings of ICCS, 2011. | ||
== Presentations == | |||
* [http://www.cs.utah.edu/~juliana/talks/freire-beyondthepdf.pdf Towards an Infrastructure to Create Provenance-Rich Papers], by Juliana Freire. Presentation at the [https://sites.google.com/site/beyondthepdf Beyond The PDF Workshop], San Diego, January 19-21, 2011. | * [http://www.cs.utah.edu/~juliana/talks/freire-beyondthepdf.pdf Towards an Infrastructure to Create Provenance-Rich Papers], by Juliana Freire. Presentation at the [https://sites.google.com/site/beyondthepdf Beyond The PDF Workshop], San Diego, January 19-21, 2011. | ||
* Publishing Reproducible Results with VisTrail, by Juliana Freire and Claudio Silva. Presentation at the [http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11845 SIAM Workshop on Verifiable, Reproducible Research and Computational Science], Reno, March 4th, 2011. | * Publishing Reproducible Results with VisTrail, by Juliana Freire and Claudio Silva. Presentation at the [http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=11845 SIAM Workshop on Verifiable, Reproducible Research and Computational Science], Reno, March 4th, 2011. | ||
== People == | == People == | ||
Line 54: | Line 58: | ||
* [http://www.cs.utah.edu/~dakoop/ David Koop] | * [http://www.cs.utah.edu/~dakoop/ David Koop] | ||
* [http://emanuelesantos.net/Emanuele_Santos/Home.html Emanuele Santos] | * [http://emanuelesantos.net/Emanuele_Santos/Home.html Emanuele Santos] | ||
* http://www.sci.utah.edu/~hvo/homepage/index.html Huy Vo] | |||
* [http://cs.nyu.edu/shasha Dennis Shasha] | * [http://cs.nyu.edu/shasha Dennis Shasha] | ||
* [http://www.cs.utah.edu/~csilva/ Claudio Silva] | * [http://www.cs.utah.edu/~csilva/ Claudio Silva] | ||
* [http://www.cs.utah.edu/~juliana/ Juliana Freire] | |||
* [http://www.phys.lsu.edu/~tohline Joel Tohline] | * [http://www.phys.lsu.edu/~tohline Joel Tohline] | ||
* [http://www.itp.phys.ethz.ch/people/troyer Matthias Troyer] | * [http://www.itp.phys.ethz.ch/people/troyer Matthias Troyer] | ||
* The [http://www.vistrails.org VisTrails] team | * The [http://www.vistrails.org VisTrails] team | ||
Revision as of 22:06, 5 August 2011
News
- The VisTrails methodology and infrastructure for creating provenance-rich, executable publications has been selected as a finalist of the Executable Paper Grand Challenge. We will present this work at the ICCS Meeting, in Singapore.
Project Description
A hallmark of the scientific method has been that experiments should be described in enough detail that they can be repeated and perhaps generalized. This implies the ability to redo experiments in nominally equal settings and also to test the generalizability of a claimed conclusion by trying similar experiments in different settings. In principle, this should be easier for computational experiments than for natural science experiments, because not only can computational processes be automated but also computational systems do not suffer from the 'biological variation' that plagues the life sciences. Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. While concern about repeatability and generalizability cuts across virtually all natural, computational, and social science fields, no single field has identified this concern as a target of a research effort.
This collaborative project between the University of Utah and New York University consists of tools and infrastructure that supports the process of sharing, testing and re-using scientific experiments and results by leveraging and extending the infrastructure provided by provenance-enabled scientific workflow systems. The project explores three key research questions: 1) How to package and publish compendia of scientific results that are reproducible and generalizable. 2) What are appropriate algorithms and interfaces for exploring, comparing, re-using the results or potentially discovering better approaches for a given problem? 3) How to aid reviewers to generate experiments that are most informative given a time/resource limit.
An expected result of this work is a software infrastructure that allows authors to create workflows that encode the computational processes that derive the results (including data used, configuration parameters set, and underlying software), publish and connect these to publications where the results are reported. Testers (or reviewers) can repeat and validate results, ask questions anonymously, and modify experimental conditions. Researchers, who want to build upon previous works, are able to search, reproduce, compare and analyze experiments and results. The infrastructure helps scientists in any discipline to construct, publish and share reproducible results.
Infrastructure to Create Provenance-Rich Papers
The first prototype of our infrastructure is described in http://www.vistrails.org/index.php/ExecutablePapers. We have also written a paper that will appear in the Proceedings of the International Conference on Computational Science, 2011: http://www.cs.utah.edu/~juliana/pub/vistrails-executable-paper.pdf
To see our infrastructure in action, check out the following videos and tutorial:
Video: Editing an executable paper written using LaTeX and VisTrails
Video: Exploring a Web-hosted paper using server-based computation
Tutorial: Editing an executable paper using VisTrails and LaTeX extensions
SIGMOD Repeatability Effort
As part of this project, in collaboration with Philippe Bonnet, we are using (and extending) our infrastructure to support the SIGMOD Repeatability effort.
Below are some case studies that illustrate how authors can create provenance-rich and reproducible papers, and how reviewers can both reproduce the experiments and perform workability tests:
- Packaging an experiment on a distributed database system: http://effdas.itu.dk/repeatability/tuning.html
- Packaging an experiment on querying Wikipedia: WikiQuery
Publications
- Exploring the Coming Repositories of Reproducible Experiments: Challenges and Opportunities, by Juliana Freire, Philippe Bonnet and Dennis Shasha. In PVLDB, 2011. To appear.
- A Provenance-Based Infrastructure for Creating Executable Papers, by David Koop, Emanuele Santos, Phillip Mates, Huy Vo, Philippe Bonnet, Matthias Troyer, Dean Williams, Joel Tohline, Juliana Freire and Claudio Silva. In Proceedings of ICCS, 2011.
Presentations
- Towards an Infrastructure to Create Provenance-Rich Papers, by Juliana Freire. Presentation at the Beyond The PDF Workshop, San Diego, January 19-21, 2011.
- Publishing Reproducible Results with VisTrail, by Juliana Freire and Claudio Silva. Presentation at the SIAM Workshop on Verifiable, Reproducible Research and Computational Science, Reno, March 4th, 2011.
People
Several people have contributed to this project, including:
- Philippe Bonnet
- David Koop
- Emanuele Santos
- http://www.sci.utah.edu/~hvo/homepage/index.html Huy Vo]
- Dennis Shasha
- Claudio Silva
- Juliana Freire
- Joel Tohline
- Matthias Troyer
- The VisTrails team
Funding
This project is sponsored by the National Science Foundation awards IIS#1139832, IIS#1050422, IIS#1050388, IIS#0905385, and CNS#0751152.