About the Condor Grid at UH

Condor is a software framework for distributing computing workload across a variety of platforms. It can schedule and run jobs on dedicated compute clusters as well as "scavenge" resources from otherwise idle desktop computers.
The grid at UH has access to dedicated and scavenged cycles from servers at the RCC as well as from selected desktop computers on the UH campus.
The grid is available to all UH researchers, graduate and undergraduate students.
At this time, 6 pools form a grid through a feature known as "flocking". A user with local access to any of the individual pools can submit jobs to the entire grid.
3 Universes in 6 Pools
In Condor the execution environment is called a universe. The UH Condor grid supports three universes: standard, vanilla and java. The standard universe is the default, but any one can be selected when a job is submitted.
The standard universe supports checkpointing. Condor jobs may be temporarily interrupted by jobs with higher priority. In the standard universe, when the interrupted job resumes, it will continue from the last checkpoint, rather than restart from the beginning. To run in the standard universe, you must relink your program (no code changes are required) with the Condor support libraries using the command, condor_compile.
The vanilla universe lacks support for checkpointing, but will run programs that cannot be relinked. It also runs scripts from various intepretors, like command shells, perl or python.
The java universe provides a JVM with appropriate classpath for execution of Java applications.
In the RCC, each compute cluster participating in the Condor grid appears as a separate pool. Use the frontend or "login" node in the cluster to submit and monitor jobs. The compute nodes are dedicated to job execution. Jobs submitted in a particular pool will generally start on the execution nodes of that cluster but if no resources are available, the job will automatically be assigned to another pool in the grid.
Suggested Reading
Basic Condor Commands
Condor Submit Description Files
Condor Version 7.0.5 Manual
Wikipedia: the Free Encyclopedia "Condor High-Throughput Computing System"
Recommended Viewing
Condor Overview is a 78 minute video presentation recorded at UH in August 2008.
The presenter is Jason Stowe of CycleComputing. Jason covers the basics of how and why you would use Condor.
Note: The media site storing the presentation uses ActiveX controls. This means MS Windows Internet Explorer is probably required for viewing.
|