Difference between revisions of "FAQ (Hexagon)"

From HPC documentation portal
Jump to: navigation, search
(My problem is not listed here. What do I do?)
(How do I change my password?)
Line 66: Line 66:
  
 
Local users have to contact support-uib@notur.no.
 
Local users have to contact support-uib@notur.no.
 +
 +
== I have not enough quota to run my job. ==
 +
If you are running out of space in your home folder. Or job is producing big amount of stdout & stderr it is recommended to redirect stdout & stderr into file on the /work FS. Like one of these examples:
 +
# Both stderr and stdout into one file
 +
aprun .... >& /work/$USER/combined.out
 +
# Stderr and stdout into different files
 +
aprun .... >/work/$USER/app.out 2>/work/$USER/app.err
 +
# Stdout into file and dropping stderr
 +
aprun .... >/work/$USER/app.out 2>/dev/null
  
 
== My problem is not listed here. What do I do? ==
 
== My problem is not listed here. What do I do? ==

Revision as of 14:06, 26 May 2010

How do I log in on hexagon?

To log in on hexagon you need a ssh program installed on your desktop. The syntax for logging in depends on which ssh client you use. From a Linux desktop "ssh username@hexagon.bccs.uib.no" is sufficient.

I typed my password wrong several times, now it seems I can not log in. Has my account been closed?

Your account has not been closed, only you computer has been temporarily blocked in our firewall to prevent bruteforce attacks. Try again in 15 minutes.

How do I compile my software with MPI?

When compiling your software on hexagon you have to use the wrappers provided by Cray, ftn, cc, CC and f77. These wrappers include MPI.

How do I change the compiler?

By default you will have the PGI compiler loaded when you log on to the system. If this compiler for some reason does not work correctly or optimal for your program, you can change to GNU or PathScale. This is done by the module command.

For example will the "module swap PrgEnv-pgi PrgEnv-gnu" change the compiler from PGI to GNU. You will still use the same wrapper to compile you program.

If you are uncertain which compiler you are using, "module list" will show you a list of the modules you currently have loaded. Either PrgEnv-pgi, PrgEnv-gnu or PrgEnv-pathscale will be listed.

Mpiexec or mpirun does not seem to be available. How do I run my MPI program?

Cray do not use mpiexec or mpirun. Instead they have aprun, which HAS to be used in order to run programs on the compute nodes.

You have to provide aprun with some flags depending on how you want your software to run.

aprun -n 4 (-N 4) ./a.out
    

The above example will run a.out on 4 cpus, on ONE node. Hexagon has 4 cpus per node. (-N 4) is default and could in this case be omitted (and is therefore put in parenthesis).

aprun -n 4 -N 2 ./a.out
    

The above example will run a.out on 4 cpus, where each node will use 2 cpus. Hence, TWO nodes will be used.

Please note that even though examples above run on the same number of cores, the last example will be charged as it was running on all cpus on the two nodes. This is because the node will be completely reserved for your job, that is, no other job can run on the free cpus.

When I try to run my software through aprun I get the error message: "No such file or directory" to my home directory. What is wrong?

The /home directory is not mounted on the compute nodes. In order to run your software, the executable has to be located in /work/$USER/somewhere

I get this strange error messages. What am I doing wrong?

The error message:

[unset]: _pmi_init: _pmi_preinit encountered an internal error
Assertion failed in file /tmp/ulib/mpt/nightly/3.1/040709/mpich2/src/mpid/cray/src/adi/mpid_init.c 
at line 178: 0  aborting job:
(null)

This error message is returned if you are trying to run a program which is compiled for the compute nodes on a login node. The program has to be executed with aprun. See here for more information about the batch system and aprun.

What is the OOM killer?

The error message:

_pmii_daemon(SIGCHLD): PE 4 exit signal Killed
[NID 21]Apid 611039: initiated application termination
[NID 00021] Apid 611039: OOM killer terminated this process.

This error message is returned in the output if your program uses more memory than available on a (or several) compute nodes. OOM stands for Out-Of-Memory. The OOM-killer is a standard Linux kernel feature that kills a process that uses up all the memory on a machine. The issue can be fixed or worked around in several ways. Running on more (or sometimes also fewer) cores per job can help minimize memory used per core and giving the job as a whole more memory. You can also ask for more memory per core. You control this with the batch system parameters mppmem, "memory-per-core" and mppnppn, "cores per node". See here for more information about the batch system and aprun.

How do I change my password?

The passwords on hexagon is stored in a readonly filesystem, so it is not possible to for a user to change the password directly. Notur users can change their password from this webpage. Select the Notur domain.

Local users have to contact support-uib@notur.no.

I have not enough quota to run my job.

If you are running out of space in your home folder. Or job is producing big amount of stdout & stderr it is recommended to redirect stdout & stderr into file on the /work FS. Like one of these examples:

# Both stderr and stdout into one file
aprun .... >& /work/$USER/combined.out
# Stderr and stdout into different files
aprun .... >/work/$USER/app.out 2>/work/$USER/app.err
# Stdout into file and dropping stderr
aprun .... >/work/$USER/app.out 2>/dev/null

My problem is not listed here. What do I do?

Send an email to our administrators at Support describing your problem. It will be beneficial to provide the job number which failed, and paths to output file, error file, submit script and Makefile. Then one of the engineers will help you as soon as possible.