Difference between revisions of "Hadoop Package"
(Created page with 'This page describes how to use the hadoop package in VisTrails. === Getting vistrails === Get the latest working version with: git clone http://vistrails.org/git/vistrails.git …') |
|||
Line 1: | Line 1: | ||
This page describes how to use the hadoop package in VisTrails. | This page describes how to use the hadoop package in VisTrails. This package works in Mac and Linux. | ||
=== | == Installation == | ||
Get the | === Install vistrails === | ||
Get vistrails using git and check out a version supporting the hadoop package: | |||
git clone http://vistrails.org/git/vistrails.git | git clone http://vistrails.org/git/vistrails.git | ||
cd vistrails | cd vistrails | ||
git checkout 976255974f2b206f030b2436a5f10286844645b0 | git checkout 976255974f2b206f030b2436a5f10286844645b0 | ||
If you are using a binary distribution of vistrails you should replace the vistrails folder in that with this one. | If you are using a binary distribution of vistrails you should replace the vistrails folder in that with this one. | ||
=== | === Install BatchQ-PBS and the RemotePBS package === | ||
This python package is used for communication over ssh. Get it with: | |||
git clone https://github.com/rexissimus/BatchQ-PBS | |||
Copy BatchQ-PBS/batchq to your vistrails python installations site-packages folder. | |||
Copy BatchQ-PBS/batchq/contrib/vistrails/RemotePBS to ~/.vistrails/userpackages/ | |||
=== Install the hadoop package === | |||
git clone git://vgc.poly.edu:src/vistrails-hadoop.git ~/.vistrails/userpackages/hadoop | git clone git://vgc.poly.edu:src/vistrails-hadoop.git ~/.vistrails/userpackages/hadoop | ||
== Modules used by the hadoop package == | |||
==== Dialogs/PasswordDialog ==== | |||
Used to specify a password to the remote machine | |||
==== Remote PBS/Machine ==== | |||
Represents a remote machine running SSH. | |||
* server - the server url | |||
* username - the remote server username, default is your local username | |||
* password - your password, connect the PasswordDialog to here | |||
* port - the remote ssh port, set to 0 to use the default port | |||
===== Example connecting to the Poly cluster through vgchead ===== | |||
The hadoop job submitter runs on gray02.poly.edu. If you are outside the poly network you need to use a ssh tunnel to get through the firewall. | |||
Add this to ~/.ssh/config: | |||
Host vgctunnel | |||
HostName vgchead.poly.edu | |||
LocalForward 8101 gray02.poly.edu:22 | |||
Host gray02 | |||
HostName localhost | |||
Port 8101 | |||
ForwardX11 yes | |||
Set up a tunnel to gray02 by running: | |||
ssh vgctunnel | |||
In vistrails, create a Machine module with host=gray02 and port=0. Now you have a connection that can be used by the hadoop package | |||
==== HDFSGet ==== | ==== HDFSGet ==== |
Revision as of 17:47, 10 January 2014
This page describes how to use the hadoop package in VisTrails. This package works in Mac and Linux.
Installation
Install vistrails
Get vistrails using git and check out a version supporting the hadoop package:
git clone http://vistrails.org/git/vistrails.git cd vistrails git checkout 976255974f2b206f030b2436a5f10286844645b0
If you are using a binary distribution of vistrails you should replace the vistrails folder in that with this one.
Install BatchQ-PBS and the RemotePBS package
This python package is used for communication over ssh. Get it with:
git clone https://github.com/rexissimus/BatchQ-PBS
Copy BatchQ-PBS/batchq to your vistrails python installations site-packages folder.
Copy BatchQ-PBS/batchq/contrib/vistrails/RemotePBS to ~/.vistrails/userpackages/
Install the hadoop package
git clone git://vgc.poly.edu:src/vistrails-hadoop.git ~/.vistrails/userpackages/hadoop
Modules used by the hadoop package
Dialogs/PasswordDialog
Used to specify a password to the remote machine
Remote PBS/Machine
Represents a remote machine running SSH.
- server - the server url
- username - the remote server username, default is your local username
- password - your password, connect the PasswordDialog to here
- port - the remote ssh port, set to 0 to use the default port
Example connecting to the Poly cluster through vgchead
The hadoop job submitter runs on gray02.poly.edu. If you are outside the poly network you need to use a ssh tunnel to get through the firewall.
Add this to ~/.ssh/config:
Host vgctunnel HostName vgchead.poly.edu LocalForward 8101 gray02.poly.edu:22
Host gray02 HostName localhost Port 8101 ForwardX11 yes
Set up a tunnel to gray02 by running:
ssh vgctunnel
In vistrails, create a Machine module with host=gray02 and port=0. Now you have a connection that can be used by the hadoop package