General Questions
PARAM Ganga is the name of the supercomputer hosted at IIT Roorkee.
All IIT Roorkee Faculty and their Research Group who are working in the HPC domain can get an account provided the PARAM Ganga administration approves their account request online form.
Download the form from the above link. Send the duly filled account request form to:gangasupport@iitr.ac.in
A new User can start with a quick start guide (Click Here) and User Manual (Click Here).
Acknowledging the usage of the facility is mandatory.
Performa for Acknowledging the usage:
Applications installed on PARAM Ganga (Click Here)
Raise Support ticket https://paramganga.iitr.ac.in/support/
Raise Support ticket https://paramganga.iitr.ac.in/support/
MegaFLOPS/GigaFLOPS/TeraFLOPS/PetaFLOPS are millions/billions/trillions/quadrillions of FLoating-point Operations (calculations) Per Second.
Linux Questions
Linux is an open-source operating system that is similar to UNIX. It is widely used in High-Performance Computing.
Many tutorials are available on the web, which can be a good starting point.
SSH Questions
Secure Shell (SSH) is a program to log into another computer over a network, execute commands in a remote machine, and move files from one device to another. It provides strong authentication and secure communications over insecure channels. In addition, SSH provides secure X connections and secure forwarding of arbitrary TCP connections.
Do NOT use IP address(es) to access HPC! Please use domain name paramganga.iitr.ac.in.(i.e $ssh paramganga.iitr.ac.in).
Batch Processing Questions
On PARAM Ganga. The SLURM manages systems batch processing. The slurm batch requests (jobs) are shell scripts that contain the same set of commands that you enter interactively. These requests may also include options for the batch system that provide timing, memory, and processor information. For more information, general procedure on website and refer to the manpage “man sbatch.”
slurm uses sbatch to submit, squeue to check the status, and scancel to delete a batch request.
There are numerous reasons why a job might not run even though there appear to be processors and/or memory available. These include:
a. Your account may be at or near the job count or processor count limit for an individual user.
b. Your group/Faculty may be at or near the job count or processor count limit for a group.
c. The scheduler may be trying to free enough processors to run a large parallel job.
d. Your Job may need to run longer than the time left until the start of scheduled downtime.
e. You may have requested a scarce resource or node type, either inadvertently or by design.
Ideally, it should be at the job output location given in the batch script.
scancel -u.
By default, we don’t start an X server on GPU nodes because it impacts computational performance, and therefore, it is not possible to use them for visualization.
The job priority depends on the job’s current wait time, the queue priority, the size of the job, and job wall time.
The codes identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case only one of those reasons is displayed.
JOB REASON CODES | Explanation |
AssociationJobLimit | The job’s association has reached its maximum job count. |
AssocGrpNodeLimit | The jobs requested a number of nodes above the allowed for the entire project/association/group. |
AssocMaxNodesPerJobLimit | The job requested a number of nodes above the allowed. |
AssocMaxJobsLimit | Generally occurs when you have exceeded the number of jobs running in the queue. |
AssocMaxWallDurationPerJobLimit | The job requested a runtime greater than that allowed by the queue |
AssociationResourceLimit | The job’s association has reached some resource limit. |
AssociationTimeLimit | The job’s association has reached its time limit. |
BadConstraints | The job’s constraints can not be satisfied. |
BeginTime | The job’s earliest start time has not yet been reached. |
Cleaning | The job is being requeued and still cleaning up from its previous execution. |
Dependency | This job is waiting for a dependent job to complete. |
FrontEndDown | No front end node is available to execute this job. |
InactiveLimit | The job reached the system InactiveLimit. |
InvalidAccount | The job’s account is invalid. |
InvalidQOS | The Job’s QoS is invalid. |
JobHeldAdmin | The job is held by a system administrator. |
JobHeldUser | The user holds the job. |
JobLaunchFailure | The Job could not be launched. This may be due to a file system problem, invalid program name, etc. |
Licenses | The job is waiting for a license. |
NodeDown | A node required by the job is down. |
NonZeroExitCode | The job was terminated with a non-zero exit code. |
PartitionConfig | Requesting more or wrong number of resources than the partition is configured for. |
PartitionDown | The partition required by this job is in a DOWN state. |
PartitionInactive | The partition required by this job is in an Inactive state and not able to start jobs. |
PartitionNodeLimit | The number of nodes required by this job is outside of it’s partitions current limits. Can also indicate that required nodes are DOWN or DRAINED. |
PartitionTimeLimit | The job’s time limit exceeds it’s partition’s current time limit. |
Priority | One or more higher priority jobs exist for this partition or advanced reservation. |
Prolog | It’s PrologSlurmctld program is still running. |
QOSJobLimit | The Job’s QoS has reached its maximum job count. |
QOSResourceLimit | The Job’s QoS has reached some resource limit. |
QOSTimeLimit | The Job’s QoS has reached its time limit. |
ReqNodeNotAvail | Some node specifically required by the job is not currently available. For example, the Node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job’s “reason” field as “UnavailableNodes”. Such nodes will typically require the intervention of a system administrator to make them available. |
Reservation | The job is waiting for its advanced reservation to become available. |
Resources | The job is waiting for resources to become available. |
system failure | Failure of the Slurm system, a file system, the network, etc. |
TimeLimit | The job exhausted its time limit. |
QOSUsageThreshold | Required QOS threshold has been breached. |
WaitingForScheduling | No reason has been set for this job yet. I was waiting for the scheduler to determine the appropriate reason. |
Compiling System Questions
Fortran, C, and C++ are available on PARAM Ganga systems. For more details refer the link.
Although it may be possible to use the executables generated on other machines, all users are recommended to recompile the software in their home directory if they are not available on PARAM Ganga.
Other Common Questions
Programs run on the login nodes are subject to strict CPU time limits called wall time. To run an application that takes more time, you need to create a batch request. Your batch request should include an appropriate estimate for the amount of time your application will need.
Programs run on the login nodes are subject to strict CPU time limits. Because file transfers use encryption, you may hit this limit when transferring a large file. To run longer programs, use the batch system.
Windows and Mac have different end-of-line conventions for text files than UNIX and Linux systems do, and most UNIX shells (including the ones interpreting your batch script) don’t like seeing the extra character that Windows appends to each line or the alternate character used by Mac. You can use the following commands on the Linux system to convert a text file from Windows or Mac format to UNIX format:
$dos2unix myfile.txt
$mac2unix myfile.txt
A text file created on Linux/UNIX will usually display correctly in Wordpad but not in Notepad. For example, you can use the following command on the Linux system to convert a text file from UNIX format to Windows format:
$unix2dos myfile.txt
- Do NOT run any job which is longer than a few minutes on the login nodes. The login node is for the compilation of jobs. It is best to run the job on computes. (compute nodes)
- It is recommended to refer to the quick start guide on the Param Ganga. This should serve as a good starting point for the new users.
- Before installing any software in your home, ensure that it is from a reliable and safe source.
- Please do not use spaces while creating the directories and files.
- Please inform PARAM Ganga support when you notice something strange – e.g., unexpected slowdowns, files missing/corrupted, etc