Grid FTP Tests
From BingGridWiki
OnurDemir and MichaelHead have been working on this project since July 2005.
Contents |
GridFTP setup
- log in as
globus -
download http://firefighter.cs.binghamton.edu/~burner/gt4/gt4.0.1-smaller.tgz -
extract in/home/globus - Open a terminal and run these commands:
-
sudo apt-get install build-essential cvs -
cvs -d:ext:globus@10.0.0.51:/home/globus/cvsroot co gt4.0.1-small -
cvs -d:ext:head@nsrg.cs.binghamton.edu:/usr/local/cvsroot co -r AUTHENTICATION_SERVER gt4.0.1-all-source-installer -
unset GLOBUS_LOCATION -
rm -rf globus -
cd gt4.0.1-all-source-installer -
./configure --prefix=$HOME/globus --disable-prewsgram --disable-rls --disable-wsjava --disable-wsmds --disable-wsdel --disable-wsrft --disable-wsgram --disable-rndvz --disable-wscas --disable-wsc --disable-tests --disable-wstests --disable-webmds -
make clean -
make globus_libtool# This makes libltdl for gcc64dbg -
make all -
make install
-
- add
export GLOBUS_LOCATION=$HOME/globusto.bashrc - Follow the instructions on http://www-unix.globus.org/toolkit/docs/4.0/security/simpleca/admin-index.html#s-simpleca-admin-installing
- make sure the first 'ou' in the simple CA's subject name is the proper hostname for the CA server (I think?)
- use '
grid-mapfile-add-entry -f ~/.gridmap -dn <unknown DN> -ln <username>'. This is the file that globus-gridftp-server will look at when run in nonroot mode - run
setup-gsi(as instructed by$GLOBUS_LOCATION/setup/globus/setup-simple-ca) with the-nonrootoption.
Set up a CA
For reference, here are the steps to set up the CA (cook is the CA):
-
$GLOBUS_LOCATION/setup/globus/setup-simple-ca -
$GLOBUS_LOCATION/setup/globus_simple_ca_c7881362_setup/setup-gsi -nonroot -defaultNote: just use the commandline as suggested by the output from the previous command -
$GLOBUS_LOCATION/bin/grid-cert-request -host 'IP address'Note: fix up the IP address (proper hostname is best) -
$GLOBUS_LOCATION/bin/grid-ca-sign -in $GLOBUS_LOCATION/etc/hostcert_request.pem -out $GLOBUS_LOCATION/etc/hostcert.pem
Deploying the CA
To share the certificate authority: this should be done on the AS cook
-
scp $HOME/.globus/simpleCA/globus_simple_ca_c7881362_setup-0.18.tar.gz othermachine:Fixup: the Hash and othermachine -
ssh othermachine \$GLOBUS_LOCATION/sbin/gpt-build globus_simple_ca_c7881362_setup-0.18.tar.gz -
ssh othermachine \$GLOBUS_LOCATION/sbin/gpt-postinstall -
ssh othermachine \$GLOBUS_LOCATION/setup/globus_simple_ca_c7881362_setup/setup-gsi -nonroot -default
Making a User Cert
To make a user cert:
-
$GLOBUS_LOCATION/bin/grid-cert-request -nopassphrase -
$GLOBUS_LOCATION/bin/grid-ca-sign -in $HOME/.globus/usercert_request.pem -out $HOME/.globus/usercert.pem
- and if it's on a remote machine:
-
scp .globus/usercert_request.pem cook: - <code>ssh cook \$GLOBUS_LOCATION/bin/grid-ca-sign -in usercert_request.pem -out usercert.pem
-
scp cook:usercert.pem .globus/usercert.pem
Requesting and signing a cert remotely
To request a cert on another machine: this should be done on the remote machine not cook
- <code>$GLOBUS_LOCATION/bin/grid-cert-request -host 'IP address'
-
scp /home/globus/globus/etc/hostcert_request.pem cook:Fixup: cook should be the name of the simpleca machine -
ssh cook \$GLOBUS_LOCATION/bin/grid-ca-sign -in hostcert_request.pem -out hostcert.pem -
scp cook:hostcert.pem $GLOBUS_LOCATION/etc/
Hacking GridFTP
- In
globus_gridftp_server_control_commands.c::globus_l_gsc_auth_cb(), if the response was a success, notify the active NIC with a UDP packet containing the IP, username, and remote port (if possible). - Struct for server->activenic communication
ip address : (result of htonslon) 32bit timestamp : (long) 64bit nBytes : 32bit nBytesRemaining :32bit username : null term. char* filename : null term. char* (or hashvalue of some kind) port : port (short) - use 0 if unknown
Journal Paper
Enhancing GridFTP Performance Using Intelligent Gateways
The paper is due January 31, 2006. We are running a number of experiments. They all involve calling globus-url-copy from a number of clients. The control connection routes through an ActiveNic router, which can drop and massage the different connections as decided by a program running on the router.
We had a discussion and made some notes about The Graphs
The doc version of the paper is in the cvs now. I still need to do more on experiments section.
Journal Paper as Word Document
The references should be completed. Bios should be added.
HPDC Workshop Paper
- HPDC Workshop on Next-Generation Distributed Data Management; Due February 28, 2006
For this workshop, we should improve the test scripts so that instead of running N times, the tests run for T seconds. This will make the output a bit more comparable.
We should also add concurrency to the client scripts, so multiple clients on a machine can be downloading at the same time.
The download test script should do a better job of timing the download process.
We should attempt to do authentication on the host or activenic. Then we can do load balancing and lots of other cool stuff.
Grid Workshop paper
Our plan here was to separate the data servers from the authentication node and provide a backchannel to the activenic to provide a real working solution for grid ftp providers.
Cluster Workshop paper
Repeat the Grid Workshop plan.
Compare server realized throughput, client wait time until data starts == response time?, reliability?
- No active NIC
- Active NIC with one server (previous experiment)
- Active NIC with remote AS
- Active NIC with local AS (AS is on ANIC host)
Test clients' effective bandwidth?, server realized throughput against several policies. Check number of requests completed per minute, response time,
- let small files through first
- smallest percentage remaining
- smallest bytes remaining
Outbound Traffic Shaping with an ActiveNIC-based Egress Switch
Future stuff
If we have an outgoing Active NIC
- Multiple services using same outbound NIC
- Each app sends client -> IP+Port mapping to active NIC
- Active NIC queue packets before forwarding. When packets must be dropped and there is a client using a high percentage of the queue slots, prefer to drop his packets.
- Works for multiple grid FTP servers when a certain client opens many channels and starves other clients
- Can we discover bottlenecks in the network cloud by looking at the number of retransmits? If so, and client1 has 10 connections and client2 has one connection and client1 is retransmitting at 10x the rate of client2 (but the outbound interface isn't saturated) prefer client2's packets a little.
- Also works when outbound interface is saturated with gridftp data and ssh packets want to get through, so look also at the packets per connection for the high usage client
- Even consider the size of the packet. Small packets get priority (ssh vs. scp).
- Software only Server optimizations
- Implement fast authentication using pre-shared public keys when possible
- If a connection has multiple timeouts, cache its file for a while. It should have a faster resumption
- Improve scheduling based on file usage/disk layout. Attempt to reduce the need to seek around the disk at the application level.
General GridFTP Implementation Notes
α) The Globus Striped GridFTP Framework and Server
β) File and Object Replication in Data Grids
γ) Data Management and Transfer in High-Performance Computational Grid Environments

