Difference between revisions of "FAQ (Hexagon)"

From HPC documentation portal
Jump to: navigation, search
(Mpiexec or mpirun does not seem to be available. How do I run my MPI program?)
(How do I change the compiler?)
Line 16: Line 16:
  
 
If you are uncertain which compiler you are using, "module list" will show you a list of the modules you currently have loaded. Either PrgEnv-pgi, PrgEnv-gnu or PrgEnv-pathscale will be listed.
 
If you are uncertain which compiler you are using, "module list" will show you a list of the modules you currently have loaded. Either PrgEnv-pgi, PrgEnv-gnu or PrgEnv-pathscale will be listed.
 +
 +
See [[Application development (Hexagon)]] for more information.
  
 
== Mpiexec or mpirun does not seem to be available. How do I run my MPI program? ==
 
== Mpiexec or mpirun does not seem to be available. How do I run my MPI program? ==

Revision as of 13:19, 3 June 2010

How do I log in on hexagon?

To log in on hexagon you need a ssh program installed on your desktop. The syntax for logging in depends on which ssh client you use. From a Linux desktop "ssh username@hexagon.bccs.uib.no" is sufficient. See also Secure Shell.

I typed my password wrong several times, now it seems I can not log in. Has my account been closed?

Your account has most likely not been closed. Your computer (IP-address) have been temporarily blocked in our firewall to prevent bruteforce attacks. Try again in 15 minutes. If you still cannot connect please contact Support.

How do I compile my software with MPI?

When compiling your software on hexagon you have to use the wrappers provided by Cray, ftn, cc, CC and f77. These wrappers include MPI.

How do I change the compiler?

By default you will have the PGI compiler loaded when you log on to the system. If this compiler for some reason does not work correctly or optimal for your program, you can change to GNU or PathScale. This is done by the module command.

For example will the "module swap PrgEnv-pgi PrgEnv-gnu" change the compiler from PGI to GNU. You will still use the same wrapper to compile you program.

If you are uncertain which compiler you are using, "module list" will show you a list of the modules you currently have loaded. Either PrgEnv-pgi, PrgEnv-gnu or PrgEnv-pathscale will be listed.

See Application development (Hexagon) for more information.

Mpiexec or mpirun does not seem to be available. How do I run my MPI program?

Cray do not use mpiexec or mpirun. Instead they have aprun, which HAS to be used in order to run programs on the compute nodes.

You have to provide aprun with some flags depending on how you want your software to run.

aprun -n 4 (-N 4) ./a.out
    

The above example will run a.out on 4 cpus, on ONE node. Hexagon has 4 cpus per node. (-N 4) is default and could in this case be omitted (and is therefore put in parenthesis).

aprun -n 4 -N 2 ./a.out
    

The above example will run a.out on 4 cpus, where each node will use 2 cpus. Hence, TWO nodes will be used.

Please note that even though examples above run on the same number of cores, the last example will be charged as it was running on all cpus on the two nodes. This is because the node will be completely reserved for your job, that is, no other job can run on the free cpus.

See Job execution (Hexagon) for more information.

When I try to run my software through aprun I get the error message: "No such file or directory" to my home directory. What is wrong?

The /home directory is not mounted on the compute nodes. In order to run your software, the executable has to be located in /work/$USER/somewhere. Additionally, your current directory has also to be somewhere in the /work file system.

I get this strange error message. What am I doing wrong?

The error message:

[unset]: _pmi_init: _pmi_preinit encountered an internal error
Assertion failed in file /tmp/ulib/mpt/nightly/3.1/040709/mpich2/src/mpid/cray/src/adi/mpid_init.c 
at line 178: 0  aborting job:
(null)

This error message (or something very similar) is returned if you are trying to run a program which is compiled for the compute nodes on a login node. The program has to be executed with aprun. See here for more information about the batch system and aprun.

What is the OOM killer?

The error message:

_pmii_daemon(SIGCHLD): PE 4 exit signal Killed
[NID 21]Apid 611039: initiated application termination
[NID 00021] Apid 611039: OOM killer terminated this process.

This error message is returned in the output if your program uses more memory than available on one or several compute nodes. OOM stands for Out-Of-Memory. The OOM-killer is a standard Linux kernel feature that kills a process that uses up all the memory on a machine. The issue can be fixed or worked around in several ways. Running on more (or sometimes also fewer) cores per job can help minimize memory used per core and giving the job as a whole more memory. You can also ask for more memory per core. You control this with the batch system parameters mppmem, "memory-per-core" and mppnppn, "cores per node". See Job execution (Hexagon) for more information about the batch system and aprun.

How do I change my password?

The passwords on hexagon is stored in a readonly filesystem, so it is not possible to for a user to change the password directly. Please contact Support.

I have not enough disk quota to run my job. What shall I do?

If you are running out of space in your home folder, or job is producing a large amount of stdout & stderr it is recommended to redirect stdout & stderr into file on the /work file system. If possible you should avoid producing the stdout or stderr from the application altogether since it creates a high load on the login node and may slow down your program. Some example usage (note that your shell may require a different syntax):

# Both stderr and stdout into one file
aprun .... >& /work/$USER/combined.out
# Stderr and stdout into different files
aprun .... >/work/$USER/app.out 2>/work/$USER/app.err
# Stdout into file and dropping stderr
aprun .... >/work/$USER/app.out 2>/dev/null

My problem is not listed here. What do I do?

Send an email to our administrators at Support describing your problem. It will be beneficial to provide the job number which failed, and paths to output file, error file, submit script and Makefile. Then one of the engineers will help you as soon as possible.