General Questions

PARAM Ganga is the name of the supercomputer hosted at IIT Roorkee.

All IIT Roorkee Faculty and their Research Group who are working in the HPC domain can get an account provided the PARAM Ganga administration approves their account request online form.

Download the form from the above link. Send the duly filled account request form to:gangasupport@iitr.ac.in

A new User can start with a quick start guide (Click Here) and User Manual (Click Here).

Acknowledging the usage of the facility is mandatory.

If you use supercomputers and services provided under the National Supercomputing Mission, Government of India, please let us know of any published results including Student Thesis, Conference Papers, Journal Papers and patents obtained.

Performa for Acknowledging the usage:

“We acknowledge the National Supercomputing Mission (NSM) for providing computing resources of ‘PARAM Ganga’ at the Indian Institute of Technology Roorkee, which is implemented by C-DAC and supported by the Ministry of Electronics and Information Technology (MeitY) and Department of Science and Technology (DST), Government of India”.

Also, please submit the copies of dissertations, reports, reprints and URLs in which “National Supercomputing Mission, Government of India” is acknowledged to:
HoD, HPC Technologies,
Centre for Development of Advanced Computing,
CDAC Innovation Park, S.N. 34/B/1, Panchavati, Pashan,
Pune – 411008
Maharashtra

Communication of your achievements using resources provided by the National Supercomputing Mission will help the Mission in measuring outcomes and gauging future requirements. This will also help in further augmentation of resources at a given site of the National Supercomputing Mission.

Applications installed on PARAM Ganga (Click Here)

MegaFLOPS/GigaFLOPS/TeraFLOPS/PetaFLOPS are millions/billions/trillions/quadrillions of FLoating-point Operations (calculations) Per Second.

Linux Questions

Linux is an open-source operating system that is similar to UNIX. It is widely used in High-Performance Computing.

Many tutorials are available on the web, which can be a good starting point.

SSH Questions

Secure Shell (SSH) is a program to log into another computer over a network, execute commands in a remote machine, and move files from one device to another. It provides strong authentication and secure communications over insecure channels. In addition, SSH provides secure X connections and secure forwarding of arbitrary TCP connections.

Do NOT use IP address(es) to access HPC! Please use domain name paramganga.iitr.ac.in.(i.e $ssh paramganga.iitr.ac.in).

Batch Processing Questions

On PARAM Ganga. The SLURM manages systems batch processing. The slurm batch requests (jobs) are shell scripts that contain the same set of commands that you enter interactively. These requests may also include options for the batch system that provide timing, memory, and processor information. For more information, general procedure on website and refer to the manpage “man sbatch.”

slurm uses sbatch to submit, squeue to check the status, and scancel to delete a batch request. 

There are numerous reasons why a job might not run even though there appear to be processors and/or memory available. These include:
a. Your account may be at or near the job count or processor count limit for an individual user.
b. Your group/Faculty may be at or near the job count or processor count limit for a group.
c. The scheduler may be trying to free enough processors to run a large parallel job.
d. Your Job may need to run longer than the time left until the start of scheduled downtime.
e. You may have requested a scarce resource or node type, either inadvertently or by design.

Ideally, it should be at the job output location given in the batch script.

By default, we don’t start an X server on GPU nodes because it impacts computational performance, and therefore, it is not possible to use them for visualization.

The job priority depends on the job’s current wait time, the queue priority, the size of the job, and job wall time.

The codes identify the reason that a job is waiting for execution. A job may be waiting for more than one reason, in which case only one of those reasons is displayed.

JOB REASON CODES

Explanation

AssociationJobLimit

The job’s association has reached its maximum job count.

AssocGrpNodeLimit

The jobs requested a number of nodes above the allowed for the entire project/association/group.

AssocMaxNodesPerJobLimit

The job requested a number of nodes above the allowed.

AssocMaxJobsLimit

Generally occurs when you have exceeded the number of jobs running in the queue.

AssocMaxWallDurationPerJobLimit

The job requested a runtime greater than that allowed by the queue

AssociationResourceLimit

The job’s association has reached some resource limit.

AssociationTimeLimit

The job’s association has reached its time limit.

BadConstraints

The job’s constraints can not be satisfied.

BeginTime

The job’s earliest start time has not yet been reached.

Cleaning

The job is being requeued and still cleaning up from its previous execution.

Dependency

This job is waiting for a dependent job to complete.

FrontEndDown

No front end node is available to execute this job.

InactiveLimit

The job reached the system InactiveLimit.

InvalidAccount

The job’s account is invalid.

InvalidQOS

The Job’s QoS is invalid.

JobHeldAdmin

The job is held by a system administrator.

JobHeldUser

The user holds the job.

JobLaunchFailure

The  Job     could  not be launched.  This may be due to a file system problem, invalid program name, etc.

Licenses

The job is waiting for a license.

NodeDown

A node required by the job is down.

NonZeroExitCode

The job was terminated with a non-zero exit code.

PartitionConfig

Requesting more or wrong number of resources than the partition is configured for.

PartitionDown

The partition required by this job is in a DOWN state.

PartitionInactive

The partition required by this job is in an Inactive state and not able to start jobs.

PartitionNodeLimit

The number of nodes required by this job is outside of it’s partitions current limits. Can also indicate that required nodes are DOWN or DRAINED.

PartitionTimeLimit

The job’s time limit exceeds it’s partition’s current time limit.

Priority

One or more higher priority jobs exist for this partition or advanced reservation.

Prolog

It’s PrologSlurmctld program is still running.

QOSJobLimit

The Job’s QoS has reached its maximum job count.

QOSResourceLimit

The Job’s QoS has reached some resource limit.

QOSTimeLimit

The Job’s QoS has reached its time limit.

ReqNodeNotAvail

Some  node specifically required by the job is not currently available.  For example, the Node may currently be in use, reserved for another job, in an advanced reservation,  DOWN,  DRAINED,     or not responding.  Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job’s “reason” field as “UnavailableNodes”. Such nodes  will typically require the intervention of a system administrator to make them available.

Reservation

The job is waiting for its advanced reservation to become available.

Resources

The job is waiting for resources to become available.

system failure

Failure of the Slurm system, a file system, the network, etc.

TimeLimit

The job exhausted its time limit.

QOSUsageThreshold

Required QOS threshold has been breached.

WaitingForScheduling

No reason has been set for this job yet.  I was waiting for the scheduler to determine the appropriate reason.

Compiling System Questions

Fortran, C, and C++ are available on PARAM Ganga systems. For more details refer the link.

Although it may be possible to use the executables generated on other machines, all users are recommended to recompile the software in their home directory if they are not available on PARAM Ganga.

Other Common Questions

Programs run on the login nodes are subject to strict CPU time limits called wall time. To run an application that takes more time, you need to create a batch request. Your batch request should include an appropriate estimate for the amount of time your application will need.

Programs run on the login nodes are subject to strict CPU time limits. Because file transfers use encryption, you may hit this limit when transferring a large file. To run longer programs, use the batch system.

Windows and Mac have different end-of-line conventions for text files than UNIX and Linux systems do, and most UNIX shells (including the ones interpreting your batch script) don’t like seeing the extra character that Windows appends to each line or the alternate character used by Mac. You can use the following commands on the Linux system to convert a text file from Windows or Mac format to UNIX format:
$dos2unix myfile.txt
$mac2unix myfile.txt

A text file created on Linux/UNIX will usually display correctly in Wordpad but not in Notepad. For example, you can use the following command on the Linux system to convert a text file from UNIX format to Windows format:
$unix2dos myfile.txt

  1. Do NOT run any job which is longer than a few minutes on the login nodes. The login node is for the compilation of jobs. It is best to run the job on computes. (compute nodes)
  2. It is recommended to refer to the quick start guide on the Param Ganga. This should serve as a good starting point for the new users.
  3. Before installing any software in your home, ensure that it is from a reliable and safe source. 
  4. Please do not use spaces while creating the directories and files.
  5. Please inform PARAM Ganga support when you notice something strange – e.g., unexpected slowdowns, files missing/corrupted, etc