Difference between revisions of "SciVisFall2007/Assignment 1"
Line 28: | Line 28: | ||
== Problem 4 == | == Problem 4 == | ||
This problem deals with correlation (for an example, see the Correlation.vt example). The | This problem deals with correlation (for an example, see the Correlation.vt example). The temp_precip.dat file contains a line for each day of the year which includes the air temperature in Celcius and amount of precipitation in inches (in form "10:0.5" for 10 degrees C and 0.5 inches). Note, this is a similar format that the labeled data in the MammalScaling.vt example is provided, so you can use a similar parser. Perform the following tasks and label the nodes "Problem4a", "Problem4b", etc. | ||
a. (Grads and UGrads) Plot the data using a scatterplot with temperature on the X axis and precipitation on the Y axis. Be sure to use the basic principles of plotting. In the notes for this node, describe any correlation that you can perceive (rough judgement, not calculated) and any conclusions that could be drawn. | a. (Grads and UGrads) Plot the data using a scatterplot with temperature on the X axis and precipitation on the Y axis. Be sure to use the basic principles of plotting. In the notes for this node, describe any correlation that you can perceive (rough judgement, not calculated) and any conclusions that could be drawn. |
Revision as of 22:47, 21 September 2007
This is your first real assignment for CS 5630/6630.
The assignment is due at midnight on September ??, 2007. You will need to use the CADE handin functionality to turn in your assignment. The class account is "cs5630".
The purpose of this initial assignment is to make sure you understand the basic plotting concepts covered in class. Examples of plotting were provided after the lectures and can be found here: PlottingVistrails.zip. As you work on the assignment, we encourage you to read the available documentation on both matplotlib and python.
Here is the initial vistrail file hw1.vt that you should use for completing your work. You should use this vistrails to do your assignment. As before, show your work by submitting the complete vistrail you used to solve the problem
The data we will be using for this assignment comes from weather measurements near Snowbird Ski Resort in Little Cottonwood Canyon (original data found here). To make things simpler, the data we provide has been reformatted so that it is easy to parse.
Problem 1
This problem deals with simple connected symbol plots, as shown in the MaunaLoa.vt example. The "Snow Depth" node in the history tree plots a list of snow water equivalent (amount of water in snow) maximum monthly measurements for 2006. Start with this node and perform the following changes. Label them "Problem 1a", "Problem 1b", etc.
a. Apply the principles of plotting described in the notes to improve the vision and the understanding of the plot. In the notes, list the principles that were addressed and how they were addressed.
b. The "Snow Depth" pipeline reads data for 2006 from SWE2006.dat. Directly compare this with the 2005 measurements found in SWE2005.dat by Superposition (on the same plot).
c. Repeat part b, but compare using Juxtaposition (each plot in a different spreadsheet cell). In the notes, describe which technique (superpostion vs. juxtaposition) makes the most sense for this data and why.
Problem 2
Histogram
Problem 3
Labeled data
Problem 4
This problem deals with correlation (for an example, see the Correlation.vt example). The temp_precip.dat file contains a line for each day of the year which includes the air temperature in Celcius and amount of precipitation in inches (in form "10:0.5" for 10 degrees C and 0.5 inches). Note, this is a similar format that the labeled data in the MammalScaling.vt example is provided, so you can use a similar parser. Perform the following tasks and label the nodes "Problem4a", "Problem4b", etc.
a. (Grads and UGrads) Plot the data using a scatterplot with temperature on the X axis and precipitation on the Y axis. Be sure to use the basic principles of plotting. In the notes for this node, describe any correlation that you can perceive (rough judgement, not calculated) and any conclusions that could be drawn.
b. (Grads only) Because of the limited resolution of the measurements, the data takes a regular spacing and points are stacked. This makes it difficult to analyze concentrations of the data. Resolve this problem by using one of the following techniques:
- jittering: Perturb the points by a small amount of randomness such that the overlap is reduced.
- symbols: Find stacked points and represent them using one point that is drawn differently (heavier weight or different symbol)
- colormap: Find stacked points and color them differently depending on how many are in the stack.
In the notes for the node, describe what you did.
c. (Grads only) Perform a linear regression to fit a line through the data. Is a degree 1 polynomial (line) sufficient? What happens with a higher degree polynomial such as a cubic (degree 3) polynomial? Note, the 3rd parameter of the scipy.polyfit function defines the degree of the polynomial. The number of coefficients returned from scipy.polyfit is determined by the degree. Thus (ar,br) = scipy.polyfit(x,y,1) would need to be (ar,br,cr) = scipy.polyfit(x,y,2). The polyval function would need to be changed in a similar way. Also note that a sort on the x axis may need to be performed on the data for the polyval points to be monotonic (and thus not overlapping). In the notes, describe what fit you settled on and why.