Difference between revisions of "Lab notes"

From VistrailsWiki
Jump to navigation Jump to search
Line 11: Line 11:


=== Installation ===
=== Installation ===
You will need to install VisTrails 2.1.1 to run this example.
You can download the system from http://www.vistrails.org/index.php/Downloads.
Select the link that matches your operating system.
You will need 3 packages to run the example:
http://vgc.poly.edu/~dakoop/mta_example/HTTP_new.zip
http://vgc.poly.edu/~dakoop/mta_example/tabledata_new.zip
http://vgc.poly.edu/~dakoop/mta_example/gmaps.zip
Before you install these packages:
1. Start VisTrails
2. Go to Preferences and disable the HTTP and tabledata packages
3. Quit the system
To install the packages, copy them to ~/.vistrails/userpackage and unzip. Then you have to enable the packages in VisTrails:
1. Start VisTrails
2. Go to Preferences and enable HTTP_new, tabledata_new, and gmaps packages
Here's the example we will use: http://vgc.poly.edu/~dakoop/mta_example/mta.vt
After you download, load it into VisTrails.
''Now, you are ready to go!''




=== Acknowledgments ===
=== Acknowledgments ===
This example was provided by [http://vgc.poly.edu/~dakoop/ Dr. David Koop].
This example was provided by [http://vgc.poly.edu/~dakoop/ Dr. David Koop].
You will need to install VisTrails 2.1.1 to run this example. You can download the system from http://www.vistrails.org/index.php/Downloads.
Select the link that matches your operating system.

Revision as of 02:35, 6 February 2014

Provenance and Reproducibility

Data exploration is inherently a trial-and-error process -- as well formulate and test hypothesis, we often need to follow many different lines of reasoning, use different tools, explore multiple parameter value combinations. It is not uncommon to arrive at an interesting result and not remember the exact path that took you there. Therefore, it is important to maintain detailed provenance of the steps followed, data and parameter values used. This is particularly important for Big Data, where complex processes and data are used.

Today, we will use VisTrails, an open source data analysis and visualization system that systematically captures provenance as a user explores data using computational processes. We will discuss the benefits of provenance, in particular, the ability to reproduce results and re-use knowledge.

The Problem: Analyzing MTA Fare Data

The Wall Street Journal published a story in 2011 that examined MetroCard usage as the cost of fares changed. The original work was created by Albert Sun and Andrew Grossman and published at http://graphicsweb.wsj.com/documents/MTAFARES1108/ on October 17, 2011. To do this, they used the publicly available fare data from the Metropolitan Transportation Authority (MTA). Their results were an interesting snapshot of usage patterns in the six months before and after the fare change. Because this data is made available on a weekly basis, it is possible to analyze more recent data as it becomes available. In addition, we can restrict views to specific lines or compare different ranges of time. As we will see, by using VisTrails, it becomes much easier to do these types of explorations.

Installation

You will need to install VisTrails 2.1.1 to run this example.

You can download the system from http://www.vistrails.org/index.php/Downloads. Select the link that matches your operating system.

You will need 3 packages to run the example: http://vgc.poly.edu/~dakoop/mta_example/HTTP_new.zip http://vgc.poly.edu/~dakoop/mta_example/tabledata_new.zip http://vgc.poly.edu/~dakoop/mta_example/gmaps.zip


Before you install these packages: 1. Start VisTrails 2. Go to Preferences and disable the HTTP and tabledata packages 3. Quit the system


To install the packages, copy them to ~/.vistrails/userpackage and unzip. Then you have to enable the packages in VisTrails: 1. Start VisTrails 2. Go to Preferences and enable HTTP_new, tabledata_new, and gmaps packages

Here's the example we will use: http://vgc.poly.edu/~dakoop/mta_example/mta.vt After you download, load it into VisTrails.

Now, you are ready to go!




Acknowledgments

This example was provided by Dr. David Koop.