Resources @CERN¶
Access to CERN resources are managed via subscription to e-groups.
Granted to everybody of the Muon Collider collaboration:
- muoncollider-batch (batch queue)
- muoncollider-readers (read from disk)
Granted to developers for the moment:
- muoncollider-writers (write to disk)
Accounting Groups¶
To see if you are able to submit you can check your accounting groups with this command on lxplus:
$) haggis rights
+----------------------------+
| gianelle |
+----------------------------+
| group_u_MUONCOLLIDER.users |
| group_u_LHCB.u_z5 |
+----------------------------+
If you're a member of multiple accounting groups, then you should try to ensure that you specify which group you wish to use for a specific activity. The system will otherwise assign you to one, normally the first is the default:
The syntax, for the submit file, to specify the group is:
+AccountingGroup = "group_u_MUONCOLLIDER.users"
Condor local submission¶
Submission to the HTCondor schedds at CERN normally makes use of a shared filesystem, ie AFS. This is convenient, but also shared filesystems introduce instability.
To submit a muoncollider's job to condor on CERN you first need a submit file:
job.sub
IDX is just an example of a parameter that you can use as argument for your job.
JobFlavour is needed to choose max job wall time.
Then you need to define your executable:
executable.sh
#!/bin/bash
## Job's argument
IDX=$1
#### Define some paths
### Eos
## User space where to store output file (you can also leave the files in the local directory)
EOS_USER_URL=root://eosuser.cern.ch
EOS_PATH=/eos/user/g/gianelle/MuonC/muontest
## Experiment space where for example BIB files are stored
EOS_EXP_URL=root://eosexperiment.cern.ch
EOS_EXP="/eos/experiment/muoncollider/"
### Docker file to use (i.e. which release version)
DOCKER=/cvmfs/unpacked.cern.ch/registry.hub.docker.com/infnpd/mucoll-ilc-framework:1.7-almalinux9
## the job's working directory
BASE="/afs/cern.ch/user/g/gianelle/muonc/job"
####
## Define a unique directory for the job using the argument
## in the local mode all the job sumbitted to condor are executed on the local directory
## so pay attention to the possible conflicts
WORKHOME="JOB_$IDX"
## cd in the working directory
cd $BASE
## a simple function to quit script
# ARGs: <error message> <error code>
quit () {
echo $1
# comment the line below if you project to leave outputs in the local directory
rm -rf $WORKHOME
exit $2
}
### function to copy file to eos space
# ARG1: input file name
# ARG2: output file name
copy() {
IN=$1
OUT=$2
if [ -f $1 ]; then
if ! xrdcp -f $IN ${OUT} ; then
quit "ERROR! Failed to transfer $IN" 2
fi
else
quit "ERROR! File $IN not exists" 2
fi
}
# create the unique working dir and cd in it
mkdir $WORKHOME
cd $WORKHOME
## copy or link the auxiliary files of the job inside the job directory
# you can use a generic name fot ìinput file so you don't need to
# customize the steering file at each execution
cp $BASE/k10_out3.hepmc input.hepmc
ln -s $BASE/sim_steer.py sim_steer.py
# back to $BASE directory
cd $BASE
# exec the singularity container
echo "Start singularity job at `date`"
# NB It mounts the eos experiment directory so the BIB files are accessible
singularity exec -B$EOS_EXP $DOCKER /bin/bash sim.sh ${IDX} &> job_${IDX}.log
echo "End job at `date`"
# copy outfile on user EOS space
ext=$(( 10#${IDX} ))
postfix=$(printf "%03d" $IDX )
copy ${WORKHOME}/OUTPUT_${ext}.slcio ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.slcio
copy ${WORKHOME}/simout.log ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.log
quit "All is done" 0
Last script is the executable that will run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container) in the same directory from which you have run the command, usually your afs home directory, or as in the previous script, the $BASE directory.
sim.sh
#!/bin/bash
# Job's argument
IDX=$1
# define as in the previous script the _same_ unique job directory
WORKHOME=JOB_$IDX
# number of events to process
NEVT=10
# set muoncollider environment
source /opt/ilcsoft/muonc/init_ilcsoft.sh
# cd in the working directory
cd $WORKHOME
# define the arguments for the ddsim script
INPUT=input.hepmc
# btw the output file name must be the same of the ones used in the previous script
ext=$(( 10#${IDX} ))
OUTPUT=OUTPUT_${ext}.slcio
skipEvt=$(( ( 10#${IDX} ) * $NEVT ))
echo "Start simulation at `date`"
# submit a simulation job
ddsim --steeringFile sim_steer.py --inputFile ${INPUT} --outputFile ${OUTPUT} --numberOfEvents ${NEVT} --skipNEvents ${skipEvt} &> simout.log
echo "End simulation at `date`"
To submit the file setting an arguments, use the usual condor command:
Here the job submission documentation from CERN
Condor spool submission¶
There are some schedds that do not allow shared filesystems on the worker node, which should make them more suitable for users who have longer jobs and are willing to have slightly more constraints.
To select the schedds:
Some updates are needed to scripts:
job.sub
# Unix submit description file
+owner = undefined
request_memory = 2GB
executable = executable.sh
Arguments = $(IDX)
log = log_$(IDX)_$(ClusterId).txt
transfer_input_files = sim_steer.py, sim.sh
transfer_output_files = job.log
output = outfile_$(IDX)_$(ClusterId).txt
error = errors_$(IDX)_$(ClusterId).txt
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
+JobFlavour = "testmatch"
queue
You need to define the files that have to transfer both for input and output
executable.sh
#!/bin/bash
## Job's argument
IDX=$1
#### Define some paths
### Eos
## User space where to store output file (you can also leave the files in the local directory)
EOS_USER_URL=root://eosuser.cern.ch
EOS_PATH=/eos/user/g/gianelle/MuonC/muontest
### Docker file to use (i.e. which release version)
DOCKER=/cvmfs/unpacked.cern.ch/registry.hub.docker.com/infnpd/mucoll-ilc-framework:1.7-almalinux9
## the job's working directory
BASE="/afs/cern.ch/user/g/gianelle/muonc/job"
####
## a simple function to quit script
# ARGs: <error message> <error code>
quit () {
echo $1
exit $2
}
### function to copy output files to eos space
# ARG1: input file name
# ARG2: output file name
copyout() {
IN=$1
OUT=$2
if [ -f $1 ]; then
if ! xrdcp -f $IN ${OUT} ; then
quit "ERROR! Failed to transfer $IN" 2
fi
else
quit "ERROR! File $IN not exists" 2
fi
}
### first copy the input file from the eos path
xrdcp ${EOS_USER_URL}//${EOS_PATH}/k10_out3.hepmc input.hepmc
# exec the singularity container
echo "Start singularity job at `date`"
singularity exec -B $PWD $DOCKER /bin/bash sim.sh ${IDX} &> job.log
echo "End job at `date`"
# copy outfile on user EOS space
ext=$(( 10#${IDX} ))
postfix=$(printf "%03d" $IDX )
copyout OUTPUT_${ext}.slcio ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.slcio
copyout simout.log ${EOS_USER_URL}//${EOS_PATH}/z2jmu_k10_10evt_${postfix}.log
quit "All is done" 0
The major difference is that we use the xrdcp command to transfer input and output files from/to eos space. Shared filesystem (i.e. afs) is still available on the worker nodes, but it is not safe to refers to it.
In the singularity command we muount the condor spool directory as the user "HOME".
Last script is the executable that you run inside the container (for example a bash script). Remember that when you execute the singularity command you jump (inside the container) in the same directory from which you have run the command, usually your afs home directory, or as in the previous script the condor spool directory.
The only difference from the local approach is that we don't need to create a unique directory for the job.
sim.sh
#!/bin/bash
echo "Start job at `date`"# Job's argument
IDX=$1
# number of events to process
NEVT=10
# set muoncollider environment
source /opt/ilcsoft/muonc/init_ilcsoft.sh
# define the arguments for the ddsim script
INPUT=input.hepmc
# btw the output file name must be the same of the ones used in the previous script
ext=$(( 10#${IDX} ))
OUTPUT=OUTPUT_${ext}.slcio
skipEvt=$(( ( 10#${IDX} ) * $NEVT ))
echo "Start simulation at `date`"
# submit a simulation job
ddsim --steeringFile sim_steer.py --inputFile ${INPUT} --outputFile ${OUTPUT} --numberOfEvents ${NEVT} --skipNEvents ${skipEvt} &> simout.log
echo "End simulation at `date`"
To submit the job setting an arguments, use the usual condor subnit command:
condor_submit -spool IDX=01 job.sub
A more complex example¶
If you need to reconstruct events using your own customized code (i.e. a custom processor) you need first of all to commit your code on git in a "ad hoc" branch (or you can zip your code and add it as an input file).
In the following example we will use as configuration file one of the official definition committed on the ProductionConfig package.
Also we will see how to manage BIB files.
We use the spool method, so the submission script is like the previous one (note that for reconstruction we need more memory):
job.sub
# Unix submit description file
+owner = undefined
request_memory = 10GB
executable = executable.sh
log = log_$(IDX)_$(ClusterId).txt
transfer_input_files = job.sh
transfer_output_files = job.log, reco.out
output = outfile_$(IDX)_$(ClusterId).txt
error = errors_$(IDX)_$(ClusterId).txt
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
+JobFlavour = "testmatch"
queue
Also the executable is like to the previous one.
Warning
We cannot copy all the 1000 BIB files, so we choose NBIBs files randomly that will be overlay with our signal
executable.sh
#!/bin/bash
## ARG1: an index to identify the job
IDX=$1
## Software docker image
DOCKER="/cvmfs/unpacked.cern.ch/registry.hub.docker.com/infnpd/mucoll-ilc-framework:1.6-centos8"
## Eos
EOS_USER_URL=root://eosuser.cern.ch
EOS_EXP_URL=root://eosexperiment.cern.ch
EOS_PATH="/eos/user/g/gianelle/MuonC/muontest"
EOS_BIB="/eos/experiment/muoncollider/data/BIB/MuCollv1_25ns_nEkin150MeV_QGSPBERT"
# Signal input file
INPUT="muonGun_testBIB_100k_sim.slcio"
# How many BIBs ?
NBIBs=10
### Exit function:
# ARG1: message
# ARG2: exit code
quit () {
echo $1
exit $2
}
### function to copy file to eos space
# ARG1: input file name
# ARG2: output file name
copy() {
IN=$1
OUT=$2
if ! xrdcp -f $IN ${OUT} ; then
quit "ERROR! Failed to transfer $IN" 2
fi
}
## Retrieve input Signal file
copy ${EOS_USER_URL}//${EOS_PATH}/${INPUT} input.slcio
## Retrieve input BKG files
BKGPre=${EOS_EXP_URL}//${EOS_BIB}/sim_mumu-1e3x500-26m-lowth-excl_seed
BKGPost="_allHits.slcio"
# total number of available BIBs files
BKGTot=1000
# array with BIB file names
BIBs=()
for (( i=1; i<=${NBIBs}; i++ )); do
RNDBKG=$(printf "%04d" $(($RANDOM % $BKGTot)) )
BKGFILE=${BKGPre}${RNDBKG}${BKGPost}
BIBFILE=BKG_seed${RNDBKG}.slcio
copy $BKGFILE $BIBFILE
BIBs+=( $BIBFILE )
done
# exec the singularity container
echo "Start singularity job at `date`"
singularity exec -B $PWD $DOCKER ./job.sh ${IDX} "${BIBs[@]}" &> job.log
echo "End job at `date`"
## save output files
POSTOUT=$(printf "%03d" $IDX )
copy Output_REC.slcio ${EOS_USER_URL}//${EOS_PATH}/Output_REC_${POSTOUT}.slcio
copy reco.out ${EOS_USER_URL}//${EOS_PATH}/reco_${POSTOUT}.out
quit "Well Done" 0
More interesting is the job that needs to be execute in the container:
job.sh
#!/bin/bash
## ARG1: the job index
## ARG2...ARGn: BIB file names
JOB=$1
shift
# the other args are BIB files
BIBs=("$@")
# define how many events per job
EVTxJOB=500
if [ ${JOB} -eq 0 ]; then
# the first job need an event more...
NEVT=$(( ${EVTxJOB} + 1 ))
SKIP=0
else
NEVT=${EVTxJOB}
SKIP=$(( ${EVTxJOB} * ${JOB} ))
fi
source /opt/ilcsoft/muonc/init_ilcsoft.sh
BASE=`pwd`
## Function to recompile processor, it also redefines MARLIN_DLL
# ARG1: processor name
compile () {
PROCNAME=$1
WD=`pwd`
mkdir BUILD/${PROCNAME}
cd BUILD/${PROCNAME}
cmake -C $ILCSOFT/ILCSoft.cmake -DBOOST_INCLUDEDIR=/usr/include/boost173 -DBOOST_LIBRARYDIR=/usr/lib64/boost173 ../../${PROCNAME}
make
cd $WD
echo $MARLIN_DLL | grep ${PROCNAME} &> /dev/null
if [ $? -eq 0 ]; then
NEWDLL=$(echo $MARLIN_DLL | sed -e "s#/opt/ilcsoft/muonc/${PROCNAME}/.*/lib/lib${PROCNAME}.so#${BASE}/BUILD/${PROCNAME}/lib/lib${PROCNAME}.so#g")
else
NEWDLL=$MARLIN_DLL:${BASE}/BUILD/${PROCNAME}/lib${PROCNAME}.so
fi
export MARLIN_DLL=$NEWDLL
}
### Clone the needed repositories
# configuration file
git clone https://github.com/MuonColliderSoft/ProductionConfig.git
# clone new processors, specifying the required branch
git clone -b MoveHitsCut https://github.com/MuonColliderSoft/MarlinTrkProcessors.git
git clone -b jetReco https://github.com/MuonColliderSoft/LCTuple.git
# compile processors
echo "Starting compiling code at `date`"
mkdir BUILD
compile "MarlinTrkProcessors"
compile "LCTuple"
echo "Start job at `date`"
Marlin --global.MaxRecordNumber=${NEVT} --global.SkipNEvents=${SKIP} --OverlayTrimmed.BackgroundFileNames="${BIBs[*]}" allProcess.xml &> reco.out
echo "Everything end at `date`"
Submitting Multiple Jobs¶
To submit more jobs you can use a for cycle from bash command line changing the IDX parameter, or you can modify you submit file in this way:
With these lines, for example, condor submits 40 jobs starting from Arguments=20
(see first line). In other words this is equivalent to the bash command line:
For other options see here
Commands¶
This is a list of the more used commands, see documentation for more info:
- submit job
-
query all mine jobs
-
query all jobs
-
remove single job
-
remove all terminated jobs
-
ssh to job's worker node
-
transfer stdout/stderr
Eos¶
Info
/eos/experiment/muoncollider/ is the experiment space
Info
eosexperiment.cern.ch is the host of the EOS (xroot) server
Commands¶
User can access (list and read) files in CERN EOS space from non lxplus nodes, without being authenticated.
In order to get full permission (write/delete), one should use CERN account to get Kerberos authentication:
To check if the token is valid:[gianelle@muonc ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_3807
Default principal: gianelle@CERN.CH
Valid starting Expires Service principal
06/29/23 11:42:42 06/30/23 12:42:39 krbtgt/CERN.CH@CERN.CH
renew until 07/04/23 11:42:39
- Create directory
- Copy file
- List files
HowTO¶
See here for a tutorial to access eos from lxplus.cern.ch
Here a talk on EOS for Users