wodel youchi
2017-01-31 13:25:37 UTC
Hi,
I am n newbie in HPC world
I am trying to execute the hpcc benchmark on our cluster, but every time I
start the job, I get this error, then the job exits
*compute017.22840Exhausted 1048576 MQ irecv request descriptors, which
usually indicates a user program error or insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute024.22840Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute019.22847Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)-------------------------------------------------------Primary
job terminated normally, but 1 process returneda non-zero exit code.. Per
user-direction, the job has been
aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun
detected that one or more processes exited with non-zero status, thus
causingthe job to be terminated. The first process to do so was: Process
name: [[19601,1],272] Exit code:
255--------------------------------------------------------------------------*
Platform : IBM PHPC
OS : RHEL 6.5
one management node
32 compute node : 16 cores, 32GB RAM, intel qlogic QLE7340 one port QRD
infiniband 40Gb/s
I compiled hpcc against : IBM MPI, Openmpi 2.0.1 (compiled with gcc 4.4.7)
and Openmpi 1.8.1 (compiled with gcc 4.4.7)
I get the errors, but each time on different compute nodes.
This is the command I used to start the job
*mpirun -np 512 --mca mtl psm --hostfile hosts32
/shared/build/hpcc-1.5.0b-blas-ompi-181/hpcc hpccinf.txt*
Any help will be appreciated, and if you need more details, let me know.
Thanks in advance.
Regards.
I am n newbie in HPC world
I am trying to execute the hpcc benchmark on our cluster, but every time I
start the job, I get this error, then the job exits
*compute017.22840Exhausted 1048576 MQ irecv request descriptors, which
usually indicates a user program error or insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute024.22840Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute019.22847Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)-------------------------------------------------------Primary
job terminated normally, but 1 process returneda non-zero exit code.. Per
user-direction, the job has been
aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun
detected that one or more processes exited with non-zero status, thus
causingthe job to be terminated. The first process to do so was: Process
name: [[19601,1],272] Exit code:
255--------------------------------------------------------------------------*
Platform : IBM PHPC
OS : RHEL 6.5
one management node
32 compute node : 16 cores, 32GB RAM, intel qlogic QLE7340 one port QRD
infiniband 40Gb/s
I compiled hpcc against : IBM MPI, Openmpi 2.0.1 (compiled with gcc 4.4.7)
and Openmpi 1.8.1 (compiled with gcc 4.4.7)
I get the errors, but each time on different compute nodes.
This is the command I used to start the job
*mpirun -np 512 --mca mtl psm --hostfile hosts32
/shared/build/hpcc-1.5.0b-blas-ompi-181/hpcc hpccinf.txt*
Any help will be appreciated, and if you need more details, let me know.
Thanks in advance.
Regards.