Reuti
2017-09-13 16:10:11 UTC
Hi,
I wonder whether it came ever to the discussion, that SGE can have a similar behavior like Torque/PBS regarding the mangling of hostnames. It's similiar to https://github.com/open-mpi/ompi/issues/2328, in the behavior that a node can have multiple network interfaces and each has an unique name. SGE's operation can be routed to a specific network interface by the use of a file:
$SGE_ROOT/$SGE_CELL/common/host_aliases
which has the format:
<sge-name of the node> <one or more blanks> <real long or short hostname>
Hence in the generated $PE_HOSTFILE the name known to SGE is listed, although the `hostname` command provides the real name. Open MPI would in this case start a `qrsh -inherit …` call instead of forking, as it thinks that these are different machines (assuming an allocation_rule of $PE_SLOTS so that the `mpiexec` is supposed to be on the same machine as the started tasks).
I tried to go the "old" way to provide a start_proc_args to the PE to create a symbolic link to `hostname` in $TMPDIR, so that inside the job script an adjusted `hostname` call is available, but obviously Open MPI calls gethostname() directly and not by an external binary.
So I mangled the hostname in the created machinefile in the jobscript to feed an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: Open MPI creates forks.
Does anyone else need such a patch in Open MPI and is it suitable to be included?
-- Reuti
PS: Only the headnodes have more than one network interface in our case and hence it's didn't come to my attention up to now, as now there was a need to use also some cores on the headnodes. They are known internally to SGE as "login" and "master", but the external names may be "foo" and "baz" which gethostname() returns.
I wonder whether it came ever to the discussion, that SGE can have a similar behavior like Torque/PBS regarding the mangling of hostnames. It's similiar to https://github.com/open-mpi/ompi/issues/2328, in the behavior that a node can have multiple network interfaces and each has an unique name. SGE's operation can be routed to a specific network interface by the use of a file:
$SGE_ROOT/$SGE_CELL/common/host_aliases
which has the format:
<sge-name of the node> <one or more blanks> <real long or short hostname>
Hence in the generated $PE_HOSTFILE the name known to SGE is listed, although the `hostname` command provides the real name. Open MPI would in this case start a `qrsh -inherit …` call instead of forking, as it thinks that these are different machines (assuming an allocation_rule of $PE_SLOTS so that the `mpiexec` is supposed to be on the same machine as the started tasks).
I tried to go the "old" way to provide a start_proc_args to the PE to create a symbolic link to `hostname` in $TMPDIR, so that inside the job script an adjusted `hostname` call is available, but obviously Open MPI calls gethostname() directly and not by an external binary.
So I mangled the hostname in the created machinefile in the jobscript to feed an "adjusted" $PE_HOSTFILE to Open MPI and then it's working as intended: Open MPI creates forks.
Does anyone else need such a patch in Open MPI and is it suitable to be included?
-- Reuti
PS: Only the headnodes have more than one network interface in our case and hence it's didn't come to my attention up to now, as now there was a need to use also some cores on the headnodes. They are known internally to SGE as "login" and "master", but the external names may be "foo" and "baz" which gethostname() returns.