Experimental environment
Table of Contents
- 1. Literate programming
- 2. Building reproducible software environments
- 3. Performing benchmarks
- 3.1. gcvb
- 3.2.
sbatchtemplates - 3.3. Ensuring filesystem
- 3.4. Configuration file
- 3.5. Definition file
- 3.6. Resource monitoring
- 3.7. Energy consumption monitoring
- 3.8. Result parsing
- 3.9. Database injecting
- 3.10. Extracting additional results
- 3.11. Wrapper scripts
- 3.12. Generate benchmark runs
- 3.13. Job submission
- 4. Post-processing results
- 5. Appendix
Reproducibility of numerical experiments has always been a complex matter. Building exactly the same software environment on multiple computing platforms may be long, tedious and sometimes virtually impossible to be done manually.
Within our research work, we put strong emphasis on ensuring the reproducibility of the experimental environment. On one hand, we make use of the GNU Guix transactional package manager allowing for a complete reproducibility of software environments across different machines. On the other hand, we rely on the principles of literate programming in an effort to maintain an exhaustive, clear and accessible documentation of our experiments and the associated environment.
1. Literate programming
We choose to write the source code of scripts and various configuration files allowing us to design and automatize numerical experiments in respect of the paradigm known as literate programming Knuth84. The idea of this approach is to associate source code with an explanation of its purpose written in a natural language.
There are numerous software tools designed for literate programming. We rely on
Org mode for Emacs Dominik18,emacs which defines the Org markup language
allowing to combine formatted text, images and figures with traditional source
code. Files containing documents written in Org mode should end with the .org
extension.
Extracting a compilable or interpretable source code from an Org document is called tangling OrgTangle. It is also possible to evaluate a particular source code block directly from the Emacs editor OrgEval while editing. For example, this is particularly useful for the visualization of experimental results.
Eventually, an Org document can be exported to various output formats OrgExport such as LaTeX or Beamer, HTML and so on.
The .org source this document is based on 1 features the source
code blocks of multiple scripts and configuration files that needs to be tangled
into separate source files before constructing our software environment and
realizing the experiments.
2. Building reproducible software environments
We are likely to work on various computing platforms and need to manage multiple software packages and their dependencies across different machines. To keep our software environment reproducible, we rely on the GNU Guix guix package manager.
2.1. GNU Guix
Guix is a transactional package manager and a stand-alone GNU Linux distribution where each user can install its own packages without any impact on the others and with the possibility to switch between multiple system or package versions. An environment created using Guix is fully reproducible across different computing platforms. In other words, a package built on a computer with a given commit remains the same when rebuilt on another machine.
2.2. Channels
Software packages in Guix are available through dedicated Git repositories, called channels, containing package definitions.
The first and default channel is the system channel providing Guix itself as
well as the definitions of some commonly used packages such as system libraries,
compilers, text editors and so on. Afterwards, we need to include additional
channels using a custom channel file channels.scm.
For each channel, we specify the commit of the associated repository to acquire. This way, we make sure to always build the environment using the exact same versions of every single package in the system and guarantee a better reproducibility.
(list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "9235dd136e70dfa97684aff4e9af4c0ce366ad68"))
Following the system channel, we include guix-hpc and guix-hpc-non-free
providing various open-source and non-free scientific software and libraries.
(channel (name 'guix-hpc) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git") (commit "7e7fdfb641f5009dcf21cd8f8edc0d41d1faa7f5")) (channel (name 'guix-hpc-non-free) (url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git") (commit "1305692c778810747c903cfa7e1d729d74ff2fb4"))
We need also the private channel guix-hpc-airbus defining proprietary Airbus
packages providing the benchmark software we use. The channel is private and one
must have access to the corresponding repository and a duly configured SSH key
in $HOME/.ssh on the target machine.
(channel (name 'guix-hpc-airbus) (url "git@gitlab.inria.fr:mfelsoci/guix-hpc-airbus.git") (commit "014bccdfebc4aa4ec4cc3f523301e55759e7a9ac"))
Eventually, we import the guix-extra channel providing some utility packages
we use mainly during the post-processing of benchmark results.
(channel (name 'guix-extra) (url "https://gitlab.inria.fr/mfelsoci/guix-extra.git") (commit "35cdd19f80516ad7b89a68be5184910719324aca")))
The channel file must be saved to $HOME/.config/guix/channels.scm and to make
the system aware of the changes, it is necessary to construct a new system
generation using guix pull.
See the complete source file 2.
2.3. Environments
To enter a particular software environment using Guix, we use the command
guix shell. Options of the latter allows us to specify the packages to
include together with the desired version, commit or branch.
To avoid typing lot of command line options anytime we want to enter the
environment, we use the setenv.sh shell script to automatize the
process. It also allows us to adjust the environment using specific options.
We rely on the benchmark suite test_FEMBEM from Airbus provided in one of
their proprietary software packages, namely scab. test_FEMBEM allows us to
evaluate numerous solver configurations. There is also an open-source version of
test_FEMBEM connected exclusively to open-source solvers. It is available in
Guix as a separate package.
scab as well as its direct dependencies, mpf and hmat, are proprietary
packages with closed source code residing in private Git repositories. Before
setting up the environment, one has to acquire their local copies so that Guix
can build the environment.
The setenv.sh script begins with a help message function accessible through
the -h option.
function help() { echo "Set up the experimental environment for the Ph.D. thesis of Marek" \ "Felšöci." >&2 echo "Usage: $0 [options]" >&2 echo >&2 echo "Options:" >&2 echo " -A Instead of the mainstream version of 'hmat', use" \ "its derived version containing the developments made by Aurélien" \ "Falco during his Ph.D. thesis." >&2 echo " -e ENVIRONMENT Switch the software environment to ENVIRONMENT." \ "Available environments are: 'benchmark' (includes packages for" \ "performing benchmarks, default choice), 'gcvb' (includes packages" \ "only for generating and launching benchmark job scripts, does not" \ "allow for running benchmark executables) 'ppp' (include packages for" \ "pre-processing and post-processing actions)." >&2 echo " -E Instead of the mainstream versions of 'mpf' and" \ "'scab', use their derived versions supporting the energy_scope" \ "tool." >&2 echo " -g Get the final 'guix environment' command line" \ "that can be followed by an arbitrary command to be executed inside" \ "the environment. The trailing '--' should be added manually!" >&2 echo " -h Show this help message." >&2 echo " -M Instead of the mainstream versions of 'hmat'," \ "'mpf' and 'scab' use their derived versions containing our work," \ "e. g. the 'mf/devel' branches of the associated Git repositories." >&2 echo " -o Use exclusively open-source software" \ "(open-source 'test_FEMBEM' with sequential version of 'hmat')." echo " -O Use OpenBLAS instead of Intel(R) MKL." >&2 echo " -p Prepare the source tarballs of the Airbus" \ "packages. If combined with '-r', the tarballs shall be placed into" \ "the directory specified by the latter. Also, the option can be used" \ "together with the '-A' or the '-M' option." >&2 echo " -r ROOT Search for Airbus sources at ROOT in lieu of the" \ "default location at '$HOME/src'." >&2 echo " -s PATH Instead of entering the specified environment," \ "create the corresponding Singularity image at PATH." >&2 echo " -S If the '-s' option is set, build the Singularity" \ "image using the same version of Open MPI as the one available on the" \ "Curta super-computer, e. g. replace the 'openmpi' and the" \ "'openmpi-mpi1-compat' packages by 'openmpi-curta'." >&2 echo " -x Use StarPU with the FXT support." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
The variable WITH_INPUT_STARPU_FXT contains the --with-input option for the
guix environment command necessary to replace the usage of the ordinary StarPU
Starpu2010 package by the one compiled with FXT trace generation support
whenever the -x option is passed on the command line.
WITH_INPUT_STARPU_FXT=""
Similarly, the WITH_INPUT_MKL variable contains the --with-input options
necessary to replace the usage of OpenBLAS OpenBLAS by Intel(R) Math kernel
library MKL in concerned packages.
WITH_INPUT_MKL="--with-input=pastix-5=pastix-5-mkl \ --with-input=mumps-scotch-openmpi=mumps-mkl-scotch-openmpi \ --with-input=openblas=mkl "
OPEN_SOURCE handles the -o option allowing to use exclusively open-source
solvers and the open-source version of the test_FEMBEM suite. By default, the
option is disabled.
OPEN_SOURCE=0
AIRBUS_ROOT specifies the location where to search for the source tarballs of
the Airbus packages. This can be modified using the -r option. The default
value is $HOME/src.
AIRBUS_ROOT="$HOME/src"
By default, the script suppose the tarballs exist already and tries to set up
the environment directly. PREPARE_TARBALLS is a boolean switch indicating
whether the source tarballs of the Airbus packages should be generated instead
of setting up the environment. For machines without access to the concerned
private Airbus repositories, the tarballs can be generated on an another machine
using the -p option.
PREPARE_TARBALLS=0
Normally, we want to set up the environment using the mainstream release of
hmat. The -A option and the associated boolean AIRBUS_AF allows to switch
to the version containing the developments made by Aurélien Falco Falco2019
that have not been merged into the mainstream version of the package yet.
AIRBUS_AF=0
The -M option and the associated boolean AIRBUS_MF allows to switch to our
development versions of Airbus packages.
AIRBUS_MF=0
The -E option and the associated boolean AIRBUS_ES allows to switch to the
versions of mpf and scab having the support for the energy_scope tool (see
Section 3.7).
AIRBUS_ES=0
At this stage, we are ready to parse the options and check the validity of option arguments where applicable.
GET_COMMAND=0 CURTA=0 ENVIRONMENT="benchmark" while getopts ":Ae:EgMoOhpr:s:Sx" option; do case $option in A) AIRBUS_AF=1 ;;
The -e option allows to choose among multiple software environments available.
e)
ENVIRONMENT=$OPTARG
;;
E)
AIRBUS_ES=1
;;
The -g option allows to print out the final guix shell command instead of
directly entering the environment. This is useful for writing one-line commands,
for example, in the GitLab continuous integration configuration.
g)
GET_COMMAND=1
;;
M)
AIRBUS_MF=1
;;
o)
OPEN_SOURCE=1
;;
O)
WITH_INPUT_MKL=""
;;
p)
PREPARE_TARBALLS=1
;;
r)
AIRBUS_ROOT=$OPTARG
if test ! -d "$AIRBUS_ROOT";
then
error "'$AIRBUS_ROOT' is not a valid directory!"
exit 1
fi
;;
The -s option allows to create a Singularity container corresponding to the
selected environment instead of directly entering it. If the container is
intended to be used on the Curta supercomputer curta, the -S option
allows to ensure the same version of Open MPI during the build as the one
available on Curta.
s)
SINGULARITY=$OPTARG
SINGULARITY_DIR=$(dirname "$SINGULARITY")
if test ! -d $SINGULARITY_DIR || test ! -w $SINGULARITY_DIR;
then
error "'$SINGULARITY_DIR' is not a valid and writable directory!"
exit 1
fi
;;
S)
CURTA=1
;;
x)
WITH_INPUT_STARPU_FXT="--with-input=starpu=starpu-fxt"
;;
We must also take into account unknown options, missing option arguments, syntax
mismatches as well as the case when the -h option is specified.
\?) error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
The following variables indicate the commit numbers, branch names and archive locations to use by default for the generation of the Airbus source tarballs.
HMAT_BASENAME="hmat-2021.1.0-21.52c4f6c" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="52c4f6c9dfa084ad67bb9876c30a049edbd82b07" HMAT_BRANCH="master" MPF_BASENAME="mpf-2021.1.0-26.a158c8c" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="a158c8c65c223b732479ff3f321c9b0cb5dd3549" MPF_BRANCH="master" SCAB_BASENAME="scab-2021.1.0-30.6f669fe" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237" SCAB_BRANCH="master" ACTIPOLE_BASENAME="actipole-2020.0.1-105.c84cd1a" ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz" ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe" ACTIPOLE_BRANCH="master"
However, when the -A, the -M or the -E option is used, we need to update
these specifications accordingly.
if test $AIRBUS_AF -ne 0; then HMAT_BASENAME="hmat-af-1.6.0-382.60743b1" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="60743b119afb966b6987c4421b949482dfbbf04f" HMAT_BRANCH="mf/af/bcsf" MPF_BASENAME="mpf-af-1.25.0-134.6fbcc2e" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="6fbcc2e67713205b1c3bbebdda1f6377d9698070" MPF_BRANCH="af/devel" SCAB_BASENAME="scab-af-1.9.0-175.2971244" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="29712447d805d957d715a1f5ab7d8e6903997652" SCAB_BRANCH="af/ND" elif test $AIRBUS_MF -ne 0; then HMAT_BASENAME="hmat-mf-2021.1.0-23.db095d4" HMAT_TARBALL="$AIRBUS_ROOT/$HMAT_BASENAME.tar.gz" HMAT_COMMIT="db095d49d6282582701e84ab64a9a03b65780271" HMAT_BRANCH="mf/devel" MPF_BASENAME="mpf-mf-2021.1.0-27.54cd92b" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="54cd92b1976f634053771a98f4af8afe0bc71c90" MPF_BRANCH="mf/devel" SCAB_BASENAME="scab-mf-2021.1.0-30.6f669fe" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="6f669fe98773013265240cdca4ce274f4dd85237" SCAB_BRANCH="mf/devel" elif test $AIRBUS_ES -ne 0; then MPF_BASENAME="mpf-energy_scope-2021.1.0-27.9aba318" MPF_TARBALL="$AIRBUS_ROOT/$MPF_BASENAME.tar.gz" MPF_COMMIT="9aba3184bb806dc0a05e2a933bfe858472bb207a" MPF_BRANCH="mf/energy_scope" SCAB_BASENAME="scab-energy_scope-2021.1.0-32.3056f3c" SCAB_TARBALL="$AIRBUS_ROOT/$SCAB_BASENAME.tar.gz" SCAB_COMMIT="3056f3c7a986cc2fabb6663707272ccf16fbb614" SCAB_BRANCH="mf/energy_scope" ACTIPOLE_BASENAME="actipole-energy_scope-2020.0.1-105.c84cd1a" ACTIPOLE_TARBALL="$AIRBUS_ROOT/$ACTIPOLE_BASENAME.tar.gz" ACTIPOLE_COMMIT="c84cd1a84224ea6eb34c1a4f4ed59c818d66c3fe" ACTIPOLE_BRANCH="master" fi
If the -p option is specified, we get a clone of the Airbus repositories and
create the source tarballs using the specified commit number and branch name
instead of trying to setup up the environment.
if test $PREPARE_TARBALLS -ne 0; then
We begin by removing any previous clones of the Airbus repositories in
AIRBUS_ROOT.
rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \ $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME $HMAT_TARBALL \ $MPF_TARBALL $SCAB_TARBALL $ACTIPOLE_TARBALL
Then, we make fresh clones, checkout the required revisions
git clone --recurse-submodules --single-branch --branch $HMAT_BRANCH \ https://aseris-lab.info/airbus/hmat $AIRBUS_ROOT/$HMAT_BASENAME cd $AIRBUS_ROOT/$HMAT_BASENAME git checkout $HMAT_COMMIT git submodule update git clone --single-branch --branch $MPF_BRANCH \ https://aseris-lab.info/airbus/mpf $AIRBUS_ROOT/$MPF_BASENAME cd $AIRBUS_ROOT/$MPF_BASENAME git checkout $MPF_COMMIT git clone --single-branch --branch $SCAB_BRANCH \ https://aseris-lab.info/airbus/scab $AIRBUS_ROOT/$SCAB_BASENAME cd $AIRBUS_ROOT/$SCAB_BASENAME git checkout $SCAB_COMMIT git clone --single-branch --branch $ACTIPOLE_BRANCH \ https://aseris-lab.info/airbus/actipole $AIRBUS_ROOT/$ACTIPOLE_BASENAME cd $AIRBUS_ROOT/$ACTIPOLE_BASENAME git checkout $ACTIPOLE_COMMIT
and verify that the cloned repositories are valid directories.
if test ! -d $AIRBUS_ROOT/$HMAT_BASENAME || \ test ! -d $AIRBUS_ROOT/$MPF_BASENAME || \ test ! -d $AIRBUS_ROOT/$SCAB_BASENAME || \ test ! -d $AIRBUS_ROOT/$ACTIPOLE_BASENAME; then error "Failed to clone the Airbus reporitories!" exit 1 fi
We remove the .git folders from inside the clones to shrink the size of the
final tarball created using the tar utility.
rm -rf $AIRBUS_ROOT/$HMAT_BASENAME/.git \ $AIRBUS_ROOT/$HMAT_BASENAME/hmat-oss/.git $AIRBUS_ROOT/$MPF_BASENAME/.git \ $AIRBUS_ROOT/$SCAB_BASENAME/.git $AIRBUS_ROOT/$ACTIPOLE_BASENAME/.git tar -czf $HMAT_TARBALL -C $AIRBUS_ROOT $HMAT_BASENAME tar -czf $MPF_TARBALL -C $AIRBUS_ROOT $MPF_BASENAME tar -czf $SCAB_TARBALL -C $AIRBUS_ROOT $SCAB_BASENAME tar -czf $ACTIPOLE_TARBALL -C $AIRBUS_ROOT $ACTIPOLE_BASENAME
At the end of the procedure, we check if the tarballs were created and remove the clones.
if test ! -f $HMAT_TARBALL || test ! -f $MPF_TARBALL || \ test ! -f $SCAB_TARBALL || test ! -f $ACTIPOLE_TARBALL; then error "Failed to create tarballs!" exit 1 fi rm -rf $AIRBUS_ROOT/$HMAT_BASENAME $AIRBUS_ROOT/$MPF_BASENAME \ $AIRBUS_ROOT/$SCAB_BASENAME $AIRBUS_ROOT/$ACTIPOLE_BASENAME exit 0 fi
Eventually comes the guix shell command itself. We use a variable to hold the
name of the package containing the solver test suite test_FEMBEM. Indeed, by
default we use its mainstream proprietary version from the scab package. When
the -A option is used, we rely on scab-af. When the -M option is used, we
rely on scab-mf. When the -E option is used, we rely on scab-energy_scope
and actipole-energy_scope. Finally, if the -o option is used, we rely on the
open-source version of test_FEMBEM available directly as a standalone Guix
package of the same name.
SCAB="scab" ACTIPOLE="actipole" if test $AIRBUS_AF -ne 0; then SCAB="scab-af" elif test $AIRBUS_MF -ne 0; then SCAB="scab-mf" elif test $AIRBUS_ES -ne 0; then SCAB="scab-energy_scope" ACTIPOLE="actipole-energy_scope" elif test $OPEN_SOURCE -ne 0; then SCAB="test_FEMBEM" fi
In order to access the additional features we implemented into gcvb (see
Section 3), we switch to our fork of the package's
repository, namely gcvb-felsocim. Sometimes, a local clone of the latter is
necessary. Being hosted on GitHub, it can not be acquired online by Guix on some
computing platforms having too restrictive proxy settings.
if test ! -d $AIRBUS_ROOT/gcvb; then git clone https://github.com/felsocim/gcvb.git $AIRBUS_ROOT/gcvb fi
The list of packages to include into the resulting environment as well as the
package modifiers and options to pass to the guix shell command or to the
guix pack command, if the -s option is set, are based on the environment
switch -e and the -x option. Also, if the -S option is used together with
-s, the version of Open MPI should match the one available on the Curta
cluster.
WITH_INPUT_OPENMPI="" if test $CURTA -ne 0 && test "$SINGULARITY" != ""; then WITH_INPUT_OPENMPI="--with-input=hwloc=hwloc@1 --with-input=openmpi=openmpi-curta --with-input=openmpi-mpi1-compat=openmpi-curta" fi
Available environments are listed below. Note that, the --preserve option
allows us to inherit selected environment variables from the parent environment.
benchmark: environment for performing benchmarks,
MODIFIERS_BENCHMARK="$WITH_INPUT_OPENMPI $WITH_INPUT_MKL $WITH_INPUT_STARPU_FXT --with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb" OPTIONS_BENCHMARK="--preserve=^SLURM" PACKAGES_BENCHMARK="bash coreutils inetutils findutils grep sed bc jq openssh tar gzip likwid python python-psutil python-numpy python-matplotlib gcvb-minimal-felsocim $SCAB $ACTIPOLE" if test "$SINGULARITY" == ""; then OPTIONS_BENCHMARK="$OPTIONS_BENCHMARK --with-input=slurm=slurm@19" PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK slurm openmpi" fi if test "$WITH_INPUT_STARPU_FXT" != ""; then PACKAGES_BENCHMARK="$PACKAGES_BENCHMARK r r-starvz" fi
gcvb: environment for generating and launching benchmak job scripts, but not for running the benchmark jobs themselves which are expected to run in a Singularity container,
MODIFIERS_GCVB="--with-input=slurm=slurm@19 --with-git-url=gcvb-minimal-felsocim=$AIRBUS_ROOT/gcvb" OPTIONS_GCVB="--preserve=SINGULARITY_EXEC --preserve=SINGULARITY_IMAGE" PACKAGES_GCVB="bash coreutils inetutils findutils grep sed bc jq openssh tar gzip likwid openmpi slurm python python-psutil gcvb-minimal-felsocim"
ppp: environment for pre-processing and post-processing actions, e.g. tangling source code from Org documents, HTML and LaTeX export, …
MODIFIERS_PPP="--with-input=r-ggplot2=r-ggplot2@git.bd50a551" OPTIONS_PPP="--preserve=TZDIR" PACKAGES_PPP="bash coreutils sed which emacs emacs-org2web emacs-org emacs-htmlize emacs-biblio emacs-org-ref emacs-ess python-pygments texlive r r-dbi r-rsqlite r-plyr r-dplyr r-readr r-tidyr r-ggplot2@git.bd50a551 r-scales r-cowplot r-stringr r-gridextra r-ggrepel r-rjson r-starvz r-ascii r-r-utils inkscape svgfix graphviz"
Based on the value of $ENVIRONMENT, we select the environment to set up.
MODIFIERS="" OPTIONS="" PACKAGES="" case $ENVIRONMENT in benchmark) MODIFIERS="$MODIFIERS_BENCHMARK" OPTIONS="$OPTIONS_BENCHMARK" PACKAGES="$PACKAGES_BENCHMARK" ;; gcvb) MODIFIERS="$MODIFIERS_GCVB" OPTIONS="$OPTIONS_GCVB" PACKAGES="$PACKAGES_GCVB" ;; ppp) MODIFIERS="$MODIFIERS_PPP" OPTIONS="$OPTIONS_PPP" PACKAGES="$PACKAGES_PPP" ;; *) error "'$ENVIRONMENT' is not a valid software environment switch!" exit 1 ;; esac
Now it is possible to assemble the guix shell command, its options and the
list of package to include in the resulting environment. To unset any existing
environment variables of the current environment, we use the --pure option.
ENVIRONMENT_COMMAND="guix shell --pure $MODIFIERS $OPTIONS $PACKAGES"
If the -g option is set, we do only print the command on the standard output.
If the -s option is set, we generate the corresponding Singularity image using
the guix pack command and exit. Otherwise, we directly enter the new
environment and launch a shell interpreter. The --norc option of bash prevents
the sourcing of the current user's .bashrc file which could compromise the
final environment with unwanted environment variables.
if test $GET_COMMAND -ne 0 && test "$SINGULARITY" == ""; then echo $ENVIRONMENT_COMMAND exit 0 fi
Note that the gcvb environment is intended for running benchmarks in a
Singularity container. Therefore generating a container for the environment
itself makes no sense!
if test "$SINGULARITY" != "" && test "$ENVIRONMENT" == "gcvb"; then error "Generating a container for the 'gcvb' environment makes no sense!" exit 1 fi if test "$SINGULARITY" != ""; then PACK_COMMAND="guix pack -f squashfs -S /usr/bin/env=/bin/env $MODIFIERS $PACKAGES" PACK_COMMAND=$(echo $PACK_COMMAND | sed 's/\n/ /g') if test $GET_COMMAND -ne 0; then echo $PACK_COMMAND exit 0 fi echo "Building Singularity container using '$PACK_COMMAND'..." $PACK_COMMAND > .packing if test $? -ne 0; then error "Failed to build the Singularity container!" exit 1 fi echo "Done" echo "Copying the container to '$SINGULARITY'..." cp $(tail -n 1 .packing) $SINGULARITY if test $? -ne 0; then error "Failed to copy the Singularity container to '$SINGULARITY'!" exit 1 fi echo "Done" echo "Making the container readable..." chmod 644 $SINGULARITY if test $? -ne 0; then error "Failed to make the Singularity container '$SINGULARITY' readable!" exit 1 fi echo "Done" exit 0 fi $ENVIRONMENT_COMMAND -- bash --norc
See the complete source file 3.
3. Performing benchmarks
To automatize the generation and the computation of benchmarks, we use gcvb gcvb, an open-source tool developed at Airbus. gcvb allows us to define benchmarks, generate corresponding shell job scripts for every benchmark or a selected group of benchmarks, submit these job scripts for execution, then gather and optionally validate or post-process the results. To generate multiple variants of the same benchmark, gcvb supports templates.
3.1. gcvb
gcvb uses a specific file and directory structure. A minimal setup requires a
configuration file and a benchmark definition Yaml file. These files must be
placed in the same folder. Furthermore, the name of the configuration file must
be config.yaml. On the other hand, the benchmark definition file may have an
arbitrary name. The folder we place thesefiles in represent the root of our gcvb
filesystem:
benchmarks/represents the root of the benchmark filesystem.data/contains data necessary to generate and perform benchmarks.all/represents one of possibly more folders containing benchmark data. For the sake of simplicity, we use one single folder for all benchmarks.inputholds any input file necessary to generate benchmarks.referencesholds any reference file needed for benchmark results validation.templatesprovides file templates for template-based benchmarks. There is one subfolder for each file template. Note that we use templates to produce specific batch job script header directives for the workload manager on the target computing platform (see Section 3.2) as well as to generate wrapper scripts for launching benchmarks (see Section 3.11).
results/contains benchmark results. Here, one sub-folder is produced every time a new session of benchmarks is generated based on the definition file. It contains job scripts and one folder per generated benchmark. These folders may hold any templated-based input file as well as the result of the corresponding benchmark execution.1/- …
config.yamlrepresents the configuration file.gcvb.dbrepresents an auto-generated NoSQL database which can be used to store benchmark results.benchmarks.yamlrepresents the benchmark definition file.
3.2. sbatch templates
We use slurm slurm to schedule and execute our experiments on the target
high-performance computing platforms. gcvb produces a job script for each
benchmark described in the definition file. This script is then passed to
slurm for scheduling on a computation node or nodes.
Each job script produced by gcvb is prepended with a header contaning the
configuration statements for the sbatch command of slurm slurmGuide
used to submit jobs for computation. We take advantage of the template feature
in order to be able to dynamically generate #SBATCH headers specific to a
given set of benchmarks.
An sbatch template begins as a standard shell script.
#! /usr/bin/env bash # # /slurm/ batch job script #
We use multiple template files but most of the #SBATCH directives are common
to all of them:
count of computational nodes to reserve,
#SBATCH -N {scheduler[nodes]}
count of task slots per node to reserve,
#SBATCH -n {scheduler[tasks]}
exclusion of the other users from the usage of the reserved resources,
#SBATCH --exclusive
reservation time,
#SBATCH --time={scheduler[time]}
location to place the slurm log files in where
%xis the corresponding job name and%jthe identifier of the job,#SBATCH -o slurm/%x-%j.log
we exclude selected nodes because of inconsistent configuration (less RAM than expected).
#SBATCH --exclude=miriel019,miriel023,miriel056
Note that {scheduler[nodes]} and so on represent placeholders for values
replaced by actual values based on the definition Yaml file during template
expansion (see Section 3.5).
One of the #SBATCH directives specific to each template file is the job name.
Based on the job name, we distinguish different sets of benchmarks. Grouping
individual benchmarks into a single job script allows us to submit less jobs.
This way, they are more balanced in terms of the computation time we need to
reserve for them on the target computing platform. For example, instead of
submitting 12 jobs having each the time limit of 10 minutes, we submit two jobs
with the time limit of 1 hour each. Benchmarks to be placed into a given common
job script are identified by matching their job name against a regular
expression. Finally, the --constraint switch allows us to specify the node
family to rely on.
In the sbatch header template monobatch used for benchmarks with all the
jobs running on single computational node, the job name is simply composed of a
prefix which typically corresponds to the constant part of a benchmark name (see
Section 3.5):
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
When a template-based benchmark definition yields a large amount of benchmarks,
we prefer to group them into multiple job scripts and launch the latter in
parallel. The value of {job[batch]} in the polybatch header determines which
benchmark belongs to which job script:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
polybatch-distributed is a variation of polybatch for distributed parallel
jobs requiring to specify additional scheduling constraints.
<<sbatch-beginning>> #SBATCH -N {scheduler[nodes]} #SBATCH --ntasks-per-node {scheduler[nt]} #SBATCH --exclusive #SBATCH --time={scheduler[time]} #SBATCH -o slurm/%x-%j.log #SBATCH --job-name={scheduler[prefix]}-{job[batch]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} ulimit -c 0
For coupled solver benchmarks, we need a more fine grained distribution of the
latter among slurm jobs. So, in the coupled sbatch header, we add to the
job name also the names of sparse and dense solvers involved in the benchmark:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
However, in some cases, the sparse and dense solver names are provided through a single placeholder:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{solver[name]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
For coupled solver benchmarks relying on out-of-core computation, we need to be able to specify extra constraints additionally to the node family name as well as to exclude specific nodes from the list of available computational nodes due to the fact that not all of the nodes of one node family have the desired hard disk type.
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{job[batch]}-{sparse[name]}-{dense[name]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} <<sbatch-end>>
In case of multi-node parallel distributed benchmarks, we specify task count per node instead of specifying the total amount of tasks. Furthermore, just like for out-of-core benchmarks, we need to be able to specify additional node constraints.
<<sbatch-beginning>> #SBATCH -N {scheduler[nodes]} #SBATCH --ntasks-per-node {scheduler[nt]} #SBATCH --exclusive #SBATCH --time={scheduler[time]} #SBATCH -o slurm/%x-%j.log #SBATCH --job-name={scheduler[prefix]}-{sparse[name]}-{dense[name]}-{scheduler[nodes]}-{job[nbpts]} #SBATCH --constraint={scheduler[family]}{scheduler[constraint]} ulimit -c 0
Same for scalability benchmarks. In scalability, we add to the job name the
name of the solver used:
<<sbatch-beginning>> #SBATCH --job-name={scheduler[prefix]}-{solver[name]}-{parallel[map]} #SBATCH --constraint={scheduler[family]} <<sbatch-end>>
At the end of the header, we want to record the date and the time when the job was scheduled and on which node.
echo "Job scheduled on $(hostname), on $(date)" echo
Then, we disable the creation of memory dump files in case of a memory error. Even if they can be particulary useful, in some cases they consume too much disk space and prevent other jobs from running.
ulimit -c 0
See the complete source files monobatch 4, polybatch
3.3. Ensuring filesystem
3.3.1. Initialization script
We wrote the shell script mkgcvbfs.sh to automatize the initialization of a
gcvb filesystem or to check if a specific gcvb filesystem is valid.
Traditionally, the script begins with a help message function that can be
triggered using the -h option.
function help() { echo "Initialize a gcvb file system described in FSTAB at FSPATH." >&2 echo "Usage: ./$(basename $0) [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo " -c Check if a valid gcvb filesystem is present in PATH." >&2 echo " -f FSTAB Initialize the gcvb filesystem specified in FSTAB." >&2 echo " -o FSPATH Set the output path for the filesystem to create." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
The script requires an .fstab file describing the filesystem to create (see
Section 3.3.2), e. g. the entries to initialize the filesystem
with and the destination path of the latter.
FSTAB holds the path to an .fstab description file provided using the -f
option.
FSTAB=""
FSPATH holds the destination path to create the filesystem in specified using
the -o option.
FSPATH=""
The -c option, corresponding to the CHECK_ONLY boolean variable, allows to
check an existing gcvb filesystem against an .fstab description instead of
creating it.
CHECK_ONLY=0
At this stage, we are ready to parse the options and check the validity of option arguments where applicable.
while getopts ":hcf:o:" option; do case $option in c) CHECK_ONLY=1 ;; f) FSTAB=$OPTARG if test ! -f $FSTAB; then error "'$FSTAB' is not a valid file!" exit 1 fi ;; o) FSPATH=$OPTARG ;;
We must also take into account unknown options, missing option arguments, syntax
mismatches as well as the case when the -h option is specified.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
Next, we have to check if the user has provided the path to the .fstab file
if test "$FSTAB" == ""; then error "No filesystem description file was specified!" exit 1 fi
as well as the destination path of the gcvb filesystem to create.
if test "$FSPATH" == ""; then error "No output location for the filesystem was specified!" exit 1 fi
Eventually, we process all of the entries in the .fstab description file. Each
line represents a specification of an entry in the gcvb filesystem to initialize
(see Section 3.3.2). Notice that, to separate information in an
entry specification we use colons.
for entry in $(cat $FSTAB); do
The first information tells us whether a file or a directory should be initialized.
ACTION=$(echo $entry | cut -d':' -f 1) case $ACTION in
If it is a file, follows its source path and its destination in the target filesystem.
F|f)
SOURCE=$(echo $entry | cut -d':' -f 2)
DESTINATION=$(echo $entry | cut -d':' -f 3)
If the -c option is passed (see variable CHECK_ONLY), we only check that the
target filesystem contains the file.
if test $CHECK_ONLY -ne 0; then if test ! -f $FSPATH/$DESTINATION; then error "Filesystem is incomplete! Missing '$FSPATH/$DESTINATION'." exit 1 fi continue fi
Otherwise, we need to check if the source file exists
if test ! -f $SOURCE; then error "Failed to initialize file '$SOURCE'!" exit 1 fi
before creating it at the desired path in the destination filesystem.
mkdir -p $FSPATH/$(dirname $DESTINATION) && \ cp $SOURCE $FSPATH/$DESTINATION if test $? -ne 0; then error "Failed to initialize file '$FSPATH/$DESTINATION'!" exit 1 fi ;;
If the entry specifies a directory, follows its destination path in the filesystem being initialized.
D|d)
DESTINATION=$(echo $entry | cut -d':' -f 2)
If the -c option is passed (see variable CHECK_ONLY), we only check that the
target filesystem contains the directory.
if test $CHECK_ONLY -ne 0; then if test ! -d $FSPATH/$DESTINATION; then error "Filesystem is uncomplete! Missing '$FSPATH/$DESTINATION'." exit 1 fi continue fi
Otherwise, we create the directory at the specified path.
mkdir -p $FSPATH/$DESTINATION if test $? -ne 0; then error "Failed to initialize directory '$FSPATH/$DESTINATION'!" exit 1 fi ;;
We also need to take care of the case where the action specified in the description file is not known.
*)
error "Failed to initialize filesystem! '$ACTION' is not a valid action."
exit 1
;;
esac
done
We finish by printing an information about successful filesystem initialization
or check in case the -c option is passed.
if test $CHECK_ONLY -ne 0; then echo "Successfully checked the filesystem '$FSPATH'." else echo "Successfully initialized a fresh gcvb filesystem at '$FSPATH'." fi
See the complete source file mkgcvbfs.sh 5.
3.3.2. Description file
The format of an .fstab description file is very straightforward. Each line
must begin with either a D=/=d or a F=/=f indicating whether a directory or
a file should be initialized. In case of a directory, the letter is followed by
a colon and the destination path of the directory. In case of a file, follows a
colon, the source path of the file, a colon and the destination path in the
target filesystem.
To describe the filesystem from Section 3.1, we use the benchmarks.fstab
description file:
D:data/all/input D:data/all/references D:data/all/templates D:results D:slurm F:monobatch:data/all/templates/monobatch/sbatch F:polybatch:data/all/templates/polybatch/sbatch F:polybatch-distributed:data/all/templates/polybatch-distributed/sbatch F:coupled:data/all/templates/coupled/sbatch F:coupled-simple:data/all/templates/coupled-simple/sbatch F:coupled-ooc:data/all/templates/coupled-ooc/sbatch F:coupled-distributed:data/all/templates/coupled-distributed/sbatch F:scalability:data/all/templates/scalability/sbatch F:wrapper-in-core.sh:data/all/templates/wrapper-in-core/wrapper.sh F:wrapper-ooc.sh:data/all/templates/wrapper-ooc/wrapper.sh F:wrapper-fxt.sh:data/all/templates/wrapper-fxt/wrapper.sh F:wrapper-in-core-distributed.sh:data/all/templates/wrapper-in-core-distributed/wrapper.sh F:es_config.json:data/all/templates/es/es_config.json F:setenv.sh:scripts/setenv.sh F:submit.sh:scripts/submit.sh F:rss.py:scripts/rss.py F:inject.py:scripts/inject.py F:parse.sh:scripts/parse.sh F:config.yaml:config.yaml F:benchmarks.yaml:benchmarks.yaml
See the complete description file benchmarks.fstab 6.
3.4. Configuration file
The configuration file is designed to provide a machine-specific information for
a gcvb benchmark collection such as the submit command for job scripts, etc.
Nevertheless, our configuration does not vary from machine to machine, so we
use the same config.yaml everywhere.
The configuration of a gcvb benchmark collection is simple. It usually holds in
a few lines of code beginning by a machine identifier.
machine_id: generic
The most important is to define the path to the executable used to submit job
scripts produced by gcvb. We rely on slurm as workload manager and we use its
sbatch command to submit job scripts.
submit_command: sbatch
Sometimes, we need to re-run the validation phase (see Section
3.5) of benchmarks without repeating the computation itself,
e.g. in case of a change in the result parsing script (see Section
3.8). This phase does not have to be performed as a Slurm job on
a separate node. Therefore, we do not want to use sbatch as submit command but
simply execute the job script produced by gcvb in a Bash shell on the current
node.
va_submit_command: bash
Eventually, an associative list of executables can be defined for a handy access
from definition file. Although, as the executables are not available in the
validation phase (see Section 3.5), we can not make use of the
mechanism and initialize executables as an empty list.
executables: []
See the complete configuration file config.yaml 7.
3.5. Definition file
The benchmark definition file begins by a set of default values automatically set for each benchmark defined in the file.
At first, we make all the benchmarks use the same data folder (see Section 3.1). Defining the benchmarks as of type template allows us to make gcvb automatically generate benchmarks for different set of parameters (see Section 3). We address this functionality further in this section too. Also, we want to keep resource usage and energy consumption logs in the gcvb SQLite databse.
default_values:
test:
description: "A test_FEMBEM benchmark."
data: "all"
type: "template"
keep: ["rss*.log", "hdd*.log", "flops*.csv",
"tmp_energy_scope*/energy_scope_eprofile*.txt"]
For each task, we define the MPI parallel configuration options using the
nprocs key. The placeholders {parallel[np]}, {parallel[map]},
{parallel[rank]} and {parallel[bind]} shall be replaced during the
generation of the benchmark job script (see below).
task:
nprocs: "-np {parallel[np]} -map-by ppr:1:{parallel[map]}
-rank-by {parallel[rank]} -bind-to {parallel[bind]}"
The nthreads key can be used to specify the number of threads to put in
action. Nevertheless, we set these values in the wrapper script used to launch
benchmark tasks (see Section 3.11). As nthreads must be
initialized even if it is not used, we initialize it with an empty string.
nthreads: ""
The main executable to run benchmark is given by the executable key. Although
we use the test_FEMBEM solver test suite, we do not launch the corresponding
executable directly. We use a Shell wrapper to perform some preparation and
completion actions (see Section 3.11) as well as a Python
wrapper to trace memory and storage resource consumption (see Section
3.6). At the end, all of this is run using mpirun.
executable: mpirun
executable and nprocs are then reused in the final launch_command of a
given benchmark. Below, we define the global launch command for all the
benchmark tasks based on the keys from the @job_creation alias allowing us to
access task attributes potentially specific to each benchmark, i.e.
executable, options and nprocs, from within launch_command.
Note that commands are run from within benchmark-specific folders under the
results/<session> directories (see Section 3.1). Also, we redirect
standard and error outputs to dedicated log files, i.e. stdout.log and
stderr.log respectively. stdout.log is used later for parsing benchmark
results (see Section 3.8). Another copy of the standard output
is saved to slurm-$SLURM_JOBID.out where SLURM_JOBID is an environment
variable set by the job scheduler slurm. This copy is required by the power
consumption monitoring tool energy_scope we rely on (see Section
3.7).
The wrapper script generated by gcvb based on the corresponding template is
not executable. Thus, we need to make it executable before running the
launch_command.
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
{@job_creation[options]} 2>&1 | tee stdout.log | tee slurm-$SLURM_JOBID.out"
In our case, we do not perform any result validation in terms of value check.
Although, we take advantage of the validation phase in gcvb to gather data from
log files into a separate data.csv file per benchmark and inject them into the
gcvb NoSQL database (see Section 3.1) using a Python script (see Section
3.9). The latter calls our parsing script (see Section
3.8) to extract data from the output logs.
We begin by defining the type of each validation. The most adapted type for our
needs is the generic configuration_independent type gcvb.
validation:
type: "configuration_independent"
Then, we set the validation executable. The executable, i.e. inject.py,
takes at least two arguments. The first one is the *.csv file containing
parsed benchmark results. Follows the call to the parsing script together with
its arguments. In this case, we specify here the output log file to be
stdout.log (see above). The -r . parameter tells the parsing script to look
for the resource monitoring logs produced by the associated Python script (see
Section 3.6) in the current working directory and the -o data.csv
parameter defines the output file for the parsing result. We shall define the
validation launch_command individually for each benchmark.
executable: "../../../scripts/inject.py data.csv ../../../scripts/parse.sh
-s stdout.log -r . -o data.csv"
At this stage, we can define actual benchmarks. They are structured in packs. We define four packs, i.e. in-core, out-of-core, multi-node parallel distributed and test benchmarks. Each pack then contains a list of benchmarks and each benchmark may be composed of one or more tasks having each one or more validation tasks.
3.5.1. In-core benchmarks
Packs:
-
pack_id: "in-core"
description: "In-core benchmarks."
Tests:
Firstly, we want to benchmark the SPIDO solver on dense linear systems resulting
from BEM discretization for various unknown counts. Under
template_instantiation there are four maps later expanded by gcvb to generate
multiple variants of the benchmark, e.g. for various problem sizes.
The solver map provides the name of the solver being evaluated. This
information is not uniformely present in the standard output of test_FEMBEM.
Therefore, for the sake of clarity and simplicity, we manually specify the
solver map in all benchmarks, not only when we evaluate multiple solvers
within one benchmark.
scheduler holds the common job name prefix and the scheduling information used
for the generation of the associated sbatch header file monobatch (see
Section 3.2) as well as the wrapper script (see Section
3.11).
parallel specifies the parallel configuration. The parallel{np} key gives
the number of MPI processes. The parallel{nt} key gives the number of threads
per MPI process. The parallel{map}, parallel{rank} and parallel{bind} keys
indicate the mapping, the ranking and the binding of the MPI processes,
respectively.
The nbpts array defines the problem sizes to generate the benchmark for. Note
that {scheduler[prefix]}, {scheduler[platform]}, {nbpts} and so on are the
placeholders for the values defined in template_instantiation.
The Cartesian product of all the map tuplets under template_instantiation
gives the total number of generated benchmarks. Here, we generate
1 \(\times\) 1 \(\times\) 3 = 3 variants grouped into a single job script with a
time limit of 2 hours.
Note that &MONO and &PARALLEL_DEFAULT are Yaml aliases to the corresponding
data allowing us to reuse them later in the document using *MONO and
*PARALLEL_DEFAULT, respectively.
-
id: "spido-{nbpts}"
template_files: &MONO [ "wrapper-in-core", "monobatch" ]
template_instantiation:
scheduler:
- { prefix: "spido", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-02:00:00" }
parallel:
- &PARALLEL_DEFAULT { np: 1, nt: 24, map: "node", rank: "node",
bind: "none" }
nbpts: [ 25000, 50000, 100000 ]
Follows the task corresponding to this benchmark. In this case, we only have to
indicate the options of test_FEMBEM specific to this set of benchmarks.
Tasks:
-
options: "-z -nbrhs 50 --bem -withmpf -nbpts {nbpts}"
For the validation phase we only need to specify an id and the corresponding
launch_command. Here it consists of the validation executable obtained
through the {@job_creation[va_executable]} placeholder followed by an option
of the parsing script specific to this benchmark. The -K option generally
allows us to include custom key-value pairs in the result output. Here, we use
it to specify the solver being evaluated.
Validations:
-
id: "va-spido-{nbpts}"
launch_command: "{@job_creation[va_executable]} -K solver=spido"
In the next definition, we benchmark the HMAT solver on dense BEM systems and
under similar conditions as SPIDO. Although, here we also vary the precision
parameter \(\epsilon\) (see the epsilon map) in template_instantiation.
-
id: "hmat-bem-{epsilon}-{nbpts}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "hmat-bem", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-03:00:00" }
parallel:
- *PARALLEL_DEFAULT
epsilon: [ "1e-3", "1e-6" ]
nbpts: [ 25000, 50000, 100000, 200000, 400000, 800000, 1000000 ]
Tasks:
-
options: "-z -nbrhs 50 --bem -withmpf --hmat --hmat-eps-assemb
{epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}"
Validations:
-
id: "va-hmat-bem-{epsilon}-{nbpts}"
launch_command: "{@job_creation[va_executable]}
-K solver=hmat-bem"
As of sparse solvers, we begin by benchmarking MUMPS on sparse linear systems
arising from FEM discretization. Besides unknown count (see the nbpts map), we
the value of the precision parameter \(\epsilon\) when the low-rank compression
mechanism is enabled (see the epsilon key in the compression map). The
options key in the compression map defines the options for test_FEMBEM to
enable or disable the compression.
-
id: "mumps-{compression[epsilon]}-{nbpts}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "mumps", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-04:00:00" }
parallel:
- *PARALLEL_DEFAULT
compression:
- { epsilon: "1e-3", options: "--mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy" }
- { epsilon: "1e-6", options: "--mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy" }
- { epsilon: "", options: "--no-mumps-blr" }
nbpts: [ 250000, 500000, 1000000, 2000000, 4000000, 8000000,
10000000 ]
Tasks:
-
options: "-z -nbrhs 50 --fem -withmpf {compression[options]}
{compression[epsilon]} --mumps-verbose -nbpts {nbpts}"
Validations:
-
id: "va-mumps-{compression[epsilon]}-{nbpts}"
launch_command: "{@job_creation[va_executable]} -K solver=mumps"
For illustrative purposes, we define a couple of additional benchmarks of MUMPS
on the same kind of linear systems. In this case, the problem size is fixed to
962 831 which matches the count of unknowns related to the FEM-discretized
domain in case of a coupled FEM/BEM system 1,000,000 unknowns. Moreover, we
consider both symmetric and non-symmetric systems (see the symmetry map). The
precision parameter \(\epsilon\) is fixed to 10-3.
-
id: "additional-mumps-{symmetry[label]}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "additional-mumps", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-00:20:00" }
parallel:
- *PARALLEL_DEFAULT
symmetry:
- { label: "symmetric", options: "" }
- { label: "non-symmetric", options: "-nosym" }
Tasks:
-
options: "-z -nbrhs 50 --fem -withmpf --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-verbose
-nbpts 962831 {symmetry[options]}"
Validations:
-
id: "va-additional-mumps-{symmetry[label]}"
launch_command: "{@job_creation[va_executable]} -K solver=mumps"
For HMAT on FEM systems, we evaluate the impact of the same parameters as in the case of MUMPS. In the benchmark prefix, we specify that we are working with symmetric matrices as we later define further HMAT benchmarks involving non-symmetric matrices.
-
id: "hmat-fem-symmetric-{epsilon}-{nbpts}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "hmat-fem-symmetric", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-02:00:00" }
parallel:
- *PARALLEL_DEFAULT
epsilon: [ "1e-3", "1e-6" ]
nbpts: [ 250000, 500000, 1000000, 2000000 ]
Tasks:
-
options: "-z -nbrhs 50 --fem -withmpf --hmat --hmat-eps-assemb
{epsilon} --hmat-eps-recompr {epsilon} -nbpts {nbpts}"
Validations:
-
id: "va-hmat-fem-symmetric-{epsilon}-{nbpts}"
launch_command: "{@job_creation[va_executable]}
-K solver=hmat-fem"
We also want to evaluate the performance of the prototype implementation of the
Nested Dissection (ND) ordering technique in HMAT for the solution of sparse
systems (see the variant map). The current implementation limits the
application of the algorithm to non-symmetric matrices. To be able to compare it
to runs without using the Nested Dissection, we must redo the latter using
non-symmetric matrices as well (see the variant map). Moreover, when the ND is
enabled, the HMAT solver uses a significantly higher amount of memory.
Consequently, cases counting more than 250,000 unknowns cause a memory overflow.
Therefore, we benchmark the algorithm on smaller systems.
-
id: "hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "hmat-fem-non-symmetric", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-01:00:00" }
parallel:
- *PARALLEL_DEFAULT
variant:
- { ND: "", options: "--hmat-lu -nosym" }
- { ND: "-nd", options: "--hmat-nd" }
epsilon: [ "1e-3", "1e-6" ]
nbpts: [ 25000, 50000, 100000, 200000, 250000 ]
Tasks:
-
options: "-z -nbrhs 50 --fem {variant[options]} -withmpf --hmat
--hmat-eps-assemb {epsilon} --hmat-eps-recompr {epsilon}
-nbpts {nbpts}"
Validations:
-
id: "va-hmat-fem-non-symmetric{variant[ND]}-{epsilon}-{nbpts}"
launch_command: "{@job_creation[va_executable]}
-K solver=hmat-fem{variant[ND]}"
In the next part, we benchmark the solvers for coupled sparse/dense FEM/BEM
systems. For solvers allowing data compression, we always set the precision
parameter \(\epsilon\) to 10-3 depending on the goal of benchmark. However, as
of the sparse part of the system, we consider both the case when it is
compressed (by MUMPS) and when it is not (see the BLR and option keys in the
sparse map).
At first, we consider the two-stage implementation scheme multi-solve using
MUMPS as sparse solver and SPIDO as dense solver. We vary problem's unknown
count (see the nbpts key in the job map) as well as the count of right-hand
sides to be processed at once by MUMPS during the Schur complement computation
(see the nbrhs key in the job map). The maps sparse and dense define the
options for the sparse and the dense solver respectively. Finally, the track
key in the job map allows us to enable execution timeline tracing for selected
runs.
-
id: "multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\
{dense[name]}-{job[nbrhs]}-{job[nbpts]}"
template_files: &COUPLED [ "wrapper-in-core", "coupled" ]
template_instantiation:
scheduler:
- { prefix: "multi-solve", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "1-03:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1M
- { nbpts: 1000000, nbrhs: 32, batch: 1, track: "" }
- { nbpts: 1000000, nbrhs: 64, batch: 1, track: "" }
- { nbpts: 1000000, nbrhs: 128, batch: 1, track: "" }
- { nbpts: 1000000, nbrhs: 256, batch: 1,
track: "--timeline-trace-calls" }
- { nbpts: 1000000, nbrhs: 512, batch: 1, track: "" }
# N = 2M
- { nbpts: 2000000, nbrhs: 32, batch: 2, track: "" }
- { nbpts: 2000000, nbrhs: 64, batch: 2, track: "" }
- { nbpts: 2000000, nbrhs: 128, batch: 2, track: "" }
- { nbpts: 2000000, nbrhs: 256, batch: 2, track: "" }
- { nbpts: 2000000, nbrhs: 512, batch: 3, track: "" }
- { nbpts: 2000000, nbrhs: 1024, batch: 3, track: "" }
# N = 4M
- { nbpts: 4000000, nbrhs: 32, batch: 4, track: "" }
- { nbpts: 4000000, nbrhs: 64, batch: 4, track: "" }
- { nbpts: 4000000, nbrhs: 128, batch: 5, track: "" }
- { nbpts: 4000000, nbrhs: 256, batch: 6, track: "" }
- { nbpts: 4000000, nbrhs: 512, batch: 6, track: "" }
# N = 7M
- { nbpts: 7000000, nbrhs: 32, batch: 7, track: "" }
- { nbpts: 7000000, nbrhs: 64, batch: 8, track: "" }
- { nbpts: 7000000, nbrhs: 128, batch: 9, track: "" }
- { nbpts: 7000000, nbrhs: 256, batch: 10, track: "" }
sparse:
- { name: "mumps", BLR: "blr-", options: "--mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbrhsmumps" }
- { name: "mumps", BLR: "", options: "--mumps-verbose --no-mumps-blr
--mumps-multi-solve -nbrhsmumps" }
dense:
- { name: "spido", options: "" }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{job[nbrhs]} {dense[options]} -nbpts {job[nbpts]} {job[track]}"
Validations:
-
id: "va-multi-solve-{job[batch]}-{sparse[BLR]}{sparse[name]}-\
{dense[name]}-{job[nbrhs]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={sparse[name]}/{dense[name]}"
Due to differences in parameter specification, we define a separate benchmark
for the two-stage multi-solve implementation relying on HMAT as dense solver.
Here, nbrhs does not vary except for the 2,000,000 unknowns problem. For all
the other runs, we set nbrhs to 256 which seems to be the optimal value based
on the runs involding SPIDO as dense solver. However, when using HMAT, the
parameter we are interested in is referred to as schur here (see the job
map). The value of schur must be at least as high as nbrhs. In the 2,000,000
unknowns case, we want to demostrate the effect of too low values of schur.
That is why we lower schur (together with nbrhs) to 32 in this case.
-
id: "multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
{job[nbrhs]}-{job[schur]}-{job[nbpts]}"
template_files: *COUPLED
template_instantiation:
scheduler:
- { prefix: "multi-solve", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "1-00:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1M
- { nbpts: 1000000, schur: 512, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 1024, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 2048, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 4096, nbrhs: 256, batch: 1 }
# N = 2M
- { nbpts: 2000000, schur: 32, nbrhs: 32, batch: 2 }
- { nbpts: 2000000, schur: 64, nbrhs: 64, batch: 2 }
- { nbpts: 2000000, schur: 128, nbrhs: 128, batch: 2 }
- { nbpts: 2000000, schur: 256, nbrhs: 256, batch: 2 }
- { nbpts: 2000000, schur: 512, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 1024, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 2048, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 4096, nbrhs: 256, batch: 1 }
# N = 4M
- { nbpts: 4000000, schur: 512, nbrhs: 256, batch: 3 }
- { nbpts: 4000000, schur: 1024, nbrhs: 256, batch: 3 }
- { nbpts: 4000000, schur: 2048, nbrhs: 256, batch: 4 }
- { nbpts: 4000000, schur: 4096, nbrhs: 256, batch: 3 }
# N = 7M
- { nbpts: 7000000, schur: 512, nbrhs: 256, batch: 5 }
- { nbpts: 7000000, schur: 1024, nbrhs: 256, batch: 6 }
- { nbpts: 7000000, schur: 2048, nbrhs: 256, batch: 7 }
- { nbpts: 7000000, schur: 4096, nbrhs: 256, batch: 8 }
# N = 9M
- { nbpts: 9000000, schur: 512, nbrhs: 256, batch: 9 }
- { nbpts: 9000000, schur: 1024, nbrhs: 256, batch: 10 }
- { nbpts: 9000000, schur: 2048, nbrhs: 256, batch: 11 }
- { nbpts: 9000000, schur: 4096, nbrhs: 256, batch: 12 }
sparse:
- { name: "mumps", options: "--mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve" }
dense:
- { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
-nbrhsmumps {job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]}
{dense[options]} -nbpts {job[nbpts]}"
Validations:
-
id: "va-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
{job[nbrhs]}-{job[schur]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={sparse[name]}/{dense[name]}"
Then, we benchmark the two-stage implementation scheme multi-factorization using
MUMPS as sparse solver and either SPIDO or HMAT as dense solver. We vary
problem's unknown count (see the nbpts key in the job map) as well as the
size of row and column blocks of the Schur complement matrix during the
computation (see the schur key in the job map). The nb key in job
indicates the corresponding count of blocks per block row or block column of the
Schur complement matrix. We use the IOblock key under job to set the size of
disk blocks when using SPIDO as dense solver. This is to ensure that the value
is a multiple of the chosen Schur complement block size. As we have only three
combinations possible, we use here the common solver map to provide both
sparse and dense solver specifications.
-
id: "multi-factorization-{job[batch]}-{solver[BLR]}{solver[name]}-\
{job[nbpts]}-{job[schur]}"
template_files: [ "wrapper-in-core", "coupled-simple" ]
template_instantiation:
scheduler:
- { prefix: "multi-factorization", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-05:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1.0M, n_BEM = 37,169
- { nbpts: 1000000, schur: 9296, IOblock: 2324, nb: 4, batch: 1 }
- { nbpts: 1000000, schur: 12390, IOblock: 2478, nb: 3, batch: 1 }
- { nbpts: 1000000, schur: 18585, IOblock: 3717, nb: 2, batch: 1 }
- { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1, batch: 1 }
# N = 1.5M, n_BEM = 48,750
- { nbpts: 1500000, schur: 12188, IOblock: 3047, nb: 4, batch: 3 }
- { nbpts: 1500000, schur: 16250, IOblock: 3250, nb: 3, batch: 3 }
- { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2, batch: 2 }
- { nbpts: 1500000, schur: 48750, IOblock: 4875, nb: 1, batch: 2 }
# N = 2.0M, n_BEM = 58,910
- { nbpts: 2000000, schur: 11784, IOblock: 1964, nb: 5, batch: 4 }
- { nbpts: 2000000, schur: 14728, IOblock: 2104, nb: 4, batch: 4 }
- { nbpts: 2000000, schur: 19638, IOblock: 3273, nb: 3, batch: 5 }
- { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2, batch: 6 }
# N = 2.5M, n_BEM = 68,524
- { nbpts: 2500000, schur: 11421, IOblock: 1269, nb: 6, batch: 7 }
- { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5, batch: 8 }
- { nbpts: 2500000, schur: 17135, IOblock: 3427, nb: 4, batch: 9 }
- { nbpts: 2500000, schur: 22842, IOblock: 3807, nb: 3, batch: 10 }
- { nbpts: 2500000, schur: 34264, IOblock: 4283, nb: 2, batch: 11 }
solver:
- { name: "mumps-spido", BLR: "blr-", options: "--mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-facto" }
- { name: "mumps-spido", BLR: "", options: "--mumps-verbose
--no-mumps-blr --mumps-multi-facto" }
- { name: "mumps-hmat", BLR: "blr-", options: "--mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-facto --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {solver[options]}
--mumps-sizeschur {job[schur]} -diskblock {job[IOblock]} -nbpts
{job[nbpts]}"
Validations:
-
id: "va-multi-factorization-{job[batch]}-\
{solver[BLR]}{solver[name]}-{job[nbpts]}-{job[schur]}"
launch_command: "{@job_creation[va_executable]}
-K solver={solver[name]},n_blocks={job[nb]}"
As of the partially sparse-aware single-stage scheme, we benchmark the
implementation using HMAT on both sparse and dense parts. Here, we vary solely
the problem's unknown count (see the nbpts map). The track key in the
nbpts map allows us to enable execution timeline tracing for selected runs.
-
id: "full-hmat-{nbpts[count]}"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "full-hmat", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "3-00:00:00" }
parallel:
- *PARALLEL_DEFAULT
nbpts:
- { count: 1000000, track: "--timeline-trace-calls" }
- { count: 2000000, track: "" }
- { count: 4000000, track: "" }
- { count: 7000000, track: "" }
- { count: 9000000, track: "" }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 -nbpts {nbpts[count]}
{nbpts[track]}"
Validations:
-
id: "va-full-hmat-{nbpts[count]}"
launch_command: "{@job_creation[va_executable]}
-K solver=hmat/hmat"
To evaluate the scalability of the solvers, we consider a large dense BEM or
sparse FEM linear system (see the nbpts key in the solver map) and vary the
parallel configuration. Note that for solvers allowing data compression, we set
the precision parameter ε to 10-3.
-
id: "scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
template_files: &SCALA [ "wrapper-in-core", "scalability" ]
template_instantiation:
scheduler:
- { prefix: "scalability", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-20:00:00" }
solver:
- { name: "spido", nbpts: 100000, options: "--bem" }
- { name: "mumps", nbpts: 2000000, options: "--fem
--mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3" }
- { name: "mumps", nbpts: 4000000, options: "--fem
--mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3" }
We consider four kinds of parallel configurations:
- 1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads;
mpirunconfiguration example:OMP_NUM_THREADS=24 MKL_NUM_THREADS=24mpirun -np 1 -map-by ppr:1:node -rank-by node -bind-to none test_FEMBEM...
parallel:
- { np: 1, nt: 1, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 6, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 12, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 18, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 24, map: "node", rank: "node", bind: "none" }
- 2 MPI processes mapped by, ranked by and bound to sockets and 1 to 12
OpenMP and MKL threads;
mpirunconfiguration example:OMP_NUM_THREADS=12 MKL_NUM_THREADS=12mpirun -np 2 -map-by ppr:1:socket -rank-by socket -bind-tosocket test_FEMBEM...
- { np: 2, nt: 1, map: "socket", rank: "socket", bind: "socket" }
- { np: 2, nt: 2, map: "socket", rank: "socket", bind: "socket" }
- { np: 2, nt: 4, map: "socket", rank: "socket", bind: "socket" }
- { np: 2, nt: 8, map: "socket", rank: "socket", bind: "socket" }
- { np: 2, nt: 12, map: "socket", rank: "socket", bind: "socket" }
- 4 MPI processes mapped by, ranked by and bound to numa sub-nodes and 1 to 6
OpenMP and MKL threads;
mpirunconfiguration example:OMP_NUM_THREADS=6 MKL_NUM_THREADS=6mpirun -np 4 -map-by ppr:1:numa -rank-by numa -bind-to numa test_FEMBEM...
- { np: 4, nt: 1, map: "numa", rank: "numa", bind: "numa" }
- { np: 4, nt: 2, map: "numa", rank: "numa", bind: "numa" }
- { np: 4, nt: 4, map: "numa", rank: "numa", bind: "numa" }
- { np: 4, nt: 6, map: "numa", rank: "numa", bind: "numa" }
- 1 to 24 MPI processes mapped by, ranked by and bound to cores and 1 OpenMP and
MKL thread.
mpirunconfiguration example:OMP_NUM_THREADS=1 MKL_NUM_THREADS=1mpirun -np 24 -map-by ppr:1:core -rank-by core -bind-to coretest_FEMBEM...
- { np: 1, nt: 1, map: "core", rank: "core", bind: "core" }
- { np: 6, nt: 1, map: "core", rank: "core", bind: "core" }
- { np: 12, nt: 1, map: "core", rank: "core", bind: "core" }
- { np: 18, nt: 1, map: "core", rank: "core", bind: "core" }
- { np: 24, nt: 1, map: "core", rank: "core", bind: "core" }
Finally, we have to set solver options (see the options value) and configure
the validation phase.
Tasks:
-
options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
{solver[nbpts]}"
Validations:
-
id: "va-scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={solver[name]}"
We define another two separate scalability benchmarks for HMAT as the MPI parallelization of the latter is not adapted for execution on one single node and consequently not studied here.
-
id: "scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
template_files: *SCALA
template_instantiation:
scheduler:
- { prefix: "scalability", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-14:00:00" }
solver:
- { name: "hmat-bem", nbpts: 100000, options: "--bem --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
- { name: "hmat-bem", nbpts: 1000000, options: "--bem --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
- { name: "hmat-fem", nbpts: 2000000, options: "--fem --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }
We consider only one kind of parallel configuration here: 1 MPI process mapped and ranked by node and 1 to 24 OpenMP and MKL threads and StarPU workers.
parallel:
- { np: 1, nt: 1, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 6, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 12, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 18, map: "node", rank: "node", bind: "none" }
- { np: 1, nt: 24, map: "node", rank: "node", bind: "none" }
Tasks:
-
options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
{solver[nbpts]}"
Validations:
-
id: "va-scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={solver[name]}"
By the means of an another HMAT scalability benchmark, we run the solver on one BEM and one FEM system in three different parallel configurations and generate an FXT execution trace for each one of them. Here, we consider smaller systems (25,000 unknowns for BEM and 50,000 unknowns for FEM) on a lower count of cores (4 cores maximum) than in the previous benchmark. The goal is to show the difference in performance between an exclusively StarPU-parallelized execution and an execution involving multiple MPI processes on one single node.
-
id: "fxt-scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
template_files: [ "wrapper-fxt", "scalability" ]
template_instantiation:
scheduler:
- { prefix: "fxt-scalability", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 4, time: "0-00:30:00" }
solver:
- { name: "hmat-bem", nbpts: 25000, options: "--bem --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
- { name: "hmat-fem", nbpts: 50000, options: "--fem --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --no-hmat-nd" }
We consider three parallel configurations here:
- 1 MPI process mapped and ranked by node and 4 OpenMP and MKL threads and StarPU workers;
- 2 MPI processes mapped, ranked and bound to sockets and 2 OpenMP and MKL threads and StarPU workers;
- 4 MPI processes mapped, ranked and bound to cores.
parallel:
- { np: 1, nt: 4, map: "node", rank: "node", bind: "none" }
- { np: 2, nt: 2, map: "socket", rank: "socket", bind: "socket" }
- { np: 4, nt: 1, map: "core", rank: "core", bind: "core" }
In this case, we have an extra validation task to perform the analysis of produced FXT traces by the StarVZ library of R (see Section 4.2).
Tasks:
-
options: "-z -nbrhs 50 -withmpf {solver[options]} -nbpts
{solver[nbpts]}"
Validations:
-
id: "va-1-fxt-scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={solver[name]}"
-
id: "va-2-fxt-scalability-{solver[name]}-{parallel[map]}-\
{parallel[np]}x{parallel[nt]}-{solver[nbpts]}"
launch_command: "phase1-workflow.sh $(pwd)/ &&
phase2-workflow.R $(pwd)/
$GUIX_ENVIRONMENT/site-library/starvz/etc/default.yaml"
3.5.2. energy_scope benchmarks
To experiment energy profiling of the solvers we use the energy_scope tool
(see Section 3.7). In the first place, we want to measure the
potential overhead energy_scope may have on the performance of test_FEMBEM.
We consider a coupled FEM/BEM system with a fixed size of 1,000,000 unknowns. We
use MUMPS as sparse and either SPIDO or HMAT as dense solver while applying the
multi-solve two-stage implementation scheme. Low-rank compression is enabled and
the precision parameter ε is set to 10-3. The number of right-hand
sides in the system is set to 150 for a more representative final solve phase.
We execute the benchmarks both with and without energy_scope.
-
pack_id: "energy_scope"
description: "energy_scope benchmarks."
Tests:
-
id: "es-overhead-{es[switch]}-{es[frequency]}-{es[interval]}-\
{es[profile]}-{dense[solver]}-{iteration}"
template_files: ["wrapper-in-core", "monobatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-overhead", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
dense: &ES_DENSE
- { solver: "spido", options: "" }
- { solver: "hmat", options: "--mumps-nbcols-schur-hmat 512
--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat" }
es:
- { switch: "on", frequency: "1Hz", interval: 1000,
idle_time: 0.99, profile: "e",
executable: "~/energy_scope/energy_scope_slurm.sh mpirun
--report-bindings" }
- { switch: "on", frequency: "2Hz", interval: 500,
idle_time: 0.49, profile: "et",
executable: "~/energy_scope/energy_scope_slurm.sh mpirun
--report-bindings" }
- { switch: "off", frequency: "1Hz", interval: 1000,
idle_time: 0.99, profile: "e", executable: "mpirun" }
iteration: [ 1, 2, 3 ]
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbrhsmumps 256 -nbpts 1000000 {dense[options]}
--energy-scope-tags"
Validations:
-
id: "va-es-overhead-{es[switch]}-{es[frequency]}-\
{es[interval]}-{es[profile]}-{dense[solver]}-{iteration}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/{dense[solver]},energy_scope={es[switch]}
-K es_frequency={es[frequency]},es_profile={es[profile]}"
-
id: "va-es-overhead-eprofile-{es[switch]}-{es[frequency]}-\
{es[interval]}-{es[profile]}-{dense[solver]}-{iteration}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
The next stage cosists of creating the energetic profile of test_FEMBEM when
solving purely sparse FEM and purely dense BEM systems, respectively. In this
case, we consider a FEM and a BEM system of fixed size proportional to
corresponding coupled FEM/BEM test cases with \(1,000,000 \leq N \leq 5,000,000\).
We use MUMPS, both with and without compression, to solve the FEM systems and
SPIDO or HMAT to solve the BEM systems. When the compression is enabled, the
precision parameter ε is set to 10-3. The number of right-hand sides
is set to 150 to ensure a more representative solve phase.
-
id: "es-fem-{job[batch]}-{nbpts}"
template_files: ["wrapper-in-core", "polybatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-fem", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-08:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: &ES_DEFAULT
- { interval: 1000, idle_time: 0.99, profile: "e",
executable: "~/energy_scope/energy_scope_slurm.sh mpirun
--report-bindings" }
job:
- { batch: 1, options: "--no-mumps-blr" }
- { batch: 2, options: "--mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3" }
nbpts: [ 962831, 2922756, 4891562, 6864384 ]
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 150 -nbrhsmumps 150 -withmpf --fem
--mumps-verbose {job[options]} -nbpts {nbpts} --energy-scope-tags"
Validations:
-
id: "va-es-fem-{job[batch]}-{nbpts}"
launch_command: "{@job_creation[va_executable]} -K solver=mumps"
-
id: "va-es-fem-eprofile-{job[batch]}-{nbpts}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
-
id: "es-bem-{job[batch]}-{nbpts}"
template_files: ["wrapper-in-core", "polybatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-bem", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-08:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: *ES_DEFAULT
job:
- { batch: 1, solver: "spido", options: "" }
- { batch: 2, solver: "hmat", options: "--hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3" }
nbpts: [ 37169, 77244, 108438, 135616 ]
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 150 -withmpf --bem {job[options]} -nbpts {nbpts}
--energy-scope-tags"
Validations:
-
id: "va-es-bem-{job[batch]}-{nbpts}"
launch_command: "{@job_creation[va_executable]}
-K solver={job[solver]}"
-
id: "va-es-bem-eprofile-{job[batch]}-{nbpts}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
For the energetic profiling of test_FEMBEM when solving coupled systems, we
consider both the multi-solve and the multi-factorization two-stage
implementation schemes. We profile the schemes a system with \(N\) = 1,000,000.
The number of right-hand sides is set to 150 for a more representative final
solve phase. We measure the impact of low-rank compression on the dense part but
the compression is always enabled for the sparse part. When applicable, the
precision parameter ε is set to 10-3.
-
id: "es-coupled-single-node-{job[scheme]}-{job[solver]}-{job[config]}"
template_files: ["wrapper-in-core", "monobatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-coupled-single-node", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-03:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: *ES_DEFAULT
job:
- { scheme: "multi-solve", solver: "spido", config: "opti",
options: "--mumps-multi-solve -nbrhsmumps 256" }
- { scheme: "multi-solve", solver: "hmat", config: "ssize",
options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 256 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { scheme: "multi-solve", solver: "hmat", config: "opti",
options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { scheme: "multi-facto", solver: "spido", config: "opti",
options: "--mumps-multi-facto --mumps-sizeschur 12390" }
- { scheme: "multi-facto", solver: "hmat", config: "opti",
options: "--mumps-multi-facto --mumps-sizeschur 12390 --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat" }
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 50 -nbpts 1000000 -withmpf --fembem --coupled
--mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh likwid-perfctr -g FLOPS_AVX -o flops-%r.csv -O -t 1s
python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
{@job_creation[options]} 2>&1 | tee stdout.log | tee
slurm-$SLURM_JOBID.out"
Validations:
-
id: "va-es-coupled-single-node-{job[scheme]}-{job[solver]}-\
{job[config]}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/{job[solver]}"
-
id: "va-es-coupled-single-node-eprofile-{job[scheme]}-\
{job[solver]}-{job[config]}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
-
id: "es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\
{job[solver]}-{job[nbpts]}"
template_files: ["wrapper-in-core", "polybatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-coupled-gc-single-node", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-18:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: *ES_DEFAULT
job:
# Multi-solve
# MUMPS/SPIDO
- { batch: 1, scheme: "multi-solve", solver: "spido",
nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
- { batch: 1, scheme: "multi-solve", solver: "spido",
nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
- { batch: 1, scheme: "multi-solve", solver: "spido",
nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256" }
# MUMPS/HMAT
- { batch: 1, scheme: "multi-solve", solver: "hmat",
nbpts: 1000000, options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 1, scheme: "multi-solve", solver: "hmat",
nbpts: 3000000, options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 1, scheme: "multi-solve", solver: "hmat",
nbpts: 5000000, options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
# Multi-factorization
# MUMPS/SPIDO
- { batch: 2, scheme: "multi-facto", solver: "spido",
nbpts: 1000000, options: "--mumps-multi-facto
--mumps-sizeschur 12390" }
- { batch: 2, scheme: "multi-facto", solver: "spido",
nbpts: 1500000, options: "--mumps-multi-facto
--mumps-sizeschur 16250" }
- { batch: 2, scheme: "multi-facto", solver: "spido",
nbpts: 2000000, options: "--mumps-multi-facto
--mumps-sizeschur 19683" }
# MUMPS/HMAT
- { batch: 2, scheme: "multi-facto", solver: "hmat",
nbpts: 1000000, options: "--mumps-multi-facto
--mumps-sizeschur 12390 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 2, scheme: "multi-facto", solver: "hmat",
nbpts: 1500000, options: "--mumps-multi-facto
--mumps-sizeschur 16250 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 2, scheme: "multi-facto", solver: "hmat",
nbpts: 2000000, options: "--mumps-multi-facto
--mumps-sizeschur 19683 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 50 -nbpts {job[nbpts]} -withmpf --fembem
--coupled --mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
Validations:
-
id: "va-es-coupled-gc-single-node-{job[batch]}-{job[scheme]}-\
{job[solver]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/{job[solver]}"
-
id: "va-es-coupled-gc-single-node-eprofile-{job[batch]}-\
{job[scheme]}-{job[solver]}-{job[nbpts]}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
-
id: "es-coupled-multi-node-{job[batch]}-{job[scheme]}-{job[solver]}"
template_files: ["wrapper-in-core-distributed", "polybatch-distributed",
"es"]
template_instantiation:
scheduler:
- { prefix: "es-coupled-multi-node", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 4, nt: 1,
np: 1, time: "0-03:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: *ES_DEFAULT
job:
- { batch: 1, scheme: "multi-solve", solver: "spido",
options: "--mumps-multi-solve -nbrhsmumps 256" }
- { batch: 1, scheme: "multi-solve", solver: "hmat",
options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 2, scheme: "multi-facto", solver: "spido",
options: "--mumps-multi-facto --mumps-sizeschur 19638" }
- { batch: 2, scheme: "multi-facto", solver: "hmat",
options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat" }
Tasks:
-
nprocs: "-np {scheduler[nodes]} -map-by
ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
-bind-to {parallel[bind]}"
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled
--mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh likwid-perfctr -g FLOPS_AVX -o flops-%r.csv -O -t 1s
python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
{@job_creation[options]} 2>&1 | tee stdout.log | tee
slurm-$SLURM_JOBID.out"
Validations:
-
id: "va-es-coupled-multi-node-{job[batch]}-{job[scheme]}-\
{job[solver]}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/{job[solver]}"
-
id: "va-es-coupled-multi-node-eprofile-{job[batch]}-\
{job[scheme]}-{job[solver]}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
-
id: "es-coupled-debug-multi-facto-{parallel[np]}"
template_files: ["wrapper-in-core", "monobatch", "es"]
template_instantiation:
scheduler:
- { prefix: "es-coupled-debug-multi-facto", platform: "plafrim",
family: "miriel", nodes: 1, tasks: 24, time: "0-12:00:00" }
parallel:
- { np: 4, nt: 6, map: "numa", rank: "numa", bind: "numa" }
- { np: 2, nt: 12, map: "socket", rank: "socket", bind: "socket" }
- { np: 1, nt: 24, map: "node", rank: "node", bind: "none" }
es: *ES_DEFAULT
job:
- { options: "--mumps-multi-facto --mumps-sizeschur 19638 --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat" }
Tasks:
-
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
options: "-d -nbrhs 50 -nbpts 2000000 -withmpf --fembem --coupled
--mumps-verbose --mumps-blr --mumps-blr-variant 1
--mumps-blr-accuracy 1e-3 {job[options]} --energy-scope-tags"
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh likwid-perfctr -g FLOPS_AVX -o flops-%r.csv -O -t 1s
python3 ../../../scripts/rss.py test_FEMBEM -radius 2 -height 4
{@job_creation[options]} 2>&1 | tee stdout.log | tee
slurm-$SLURM_JOBID.out"
Validations:
-
id: "va-es-coupled-debug-multi-facto-{parallel[np]}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/hmat"
-
id: "va-es-coupled-debug-multi-facto-eprofile-{parallel[np]}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
-
id: "es-industrial-{job[batch]}-{job[scheme]}-{job[solver]}"
template_files: ["wrapper-in-core-distributed", "polybatch-distributed",
"es"]
template_instantiation:
scheduler:
- { prefix: "es-industrial", platform: "plafrim", family: "miriel",
constraint: "&omnipath", nodes: 4, nt: 24, np: 1,
time: "1-00:00:00" }
parallel:
- *PARALLEL_DEFAULT
es: *ES_DEFAULT
job:
- { batch: 1, scheme: "multi-solve", solver: "spido",
options: "--mumps-multi-solve -nbrhsmumps 256" }
- { batch: 2, scheme: "multi-solve", solver: "hmat",
options: "--mumps-multi-solve -nbrhsmumps 256
--mumps-nbcols-schur-hmat 1024 --hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { batch: 3, scheme: "multi-facto", solver: "spido",
options: "--mumps-multi-facto --mumps-sizeschur 16885" }
- { batch: 4, scheme: "multi-facto", solver: "hmat",
options: "--mumps-multi-facto --mumps-sizeschur 16885 --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat" }
Tasks:
-
nprocs: "-np {scheduler[nodes]} -map-by
ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
-bind-to {parallel[bind]}"
executable: "mkdir -p wrk && mv es_config.json wrk/es_config.json &&
export ENERGY_SCOPE_TRACES_PATH=$(pwd) && {es[executable]}"
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh python3 ../../../scripts/rss.py actipole -p01
EPI_FAN_rwd_isolated_FEM_BEM.p01 -cprec -potential
{@job_creation[options]} 2>&1 | tee stdout.log | tee
slurm-$SLURM_JOBID.out"
options: "-withmpf --fembem --coupled --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 {job[options]}
--energy-scope-tags --putenv HMAT_DISABLE_OOC=1
--putenv MPF_SPIDO_INCORE=1 --mumpsic"
Validations:
-
id: "va-es-industrial-{job[batch]}-{job[scheme]}-\
{job[solver]}"
launch_command: "{@job_creation[va_executable]}
-K solver=mumps/{job[solver]}"
-
id: "va-es-industrial-eprofile-{job[batch]}-\
{job[scheme]}-{job[solver]}"
launch_command: "export ENERGY_SCOPE_TRACES_PATH=$(pwd) &&
~/energy_scope/energy_scope_run_analysis_all.sh"
3.5.3. Out-of-core benchmarks
-
pack_id: "ooc"
description: "Out-of-core benchmarks."
default_values:
launch_command: "chmod +x wrapper.sh && {@job_creation[executable]}
{@job_creation[nprocs]} $SINGULARITY_EXEC $SINGULARITY_IMAGE bash
./wrapper.sh python3 ../../../scripts/rss.py --with-hdd test_FEMBEM
-radius 2 -height 4 {@job_creation[options]} 2> stderr.log | tee
stdout.log | tee slurm-$SLURM_JOBID.out"
Tests:
The goal is to analyze the impact of out-of-core computation on the two-stage
implementation schemes multi-solve and multi-factorization using MUMPS as sparse
solver and SPIDO or HMAT as dense solver. We vary problem's unknown count (see
the nbpts key in the job map). For multi-solve, we vary also the count of
right-hand sides to be processed at once by MUMPS during the Schur complement
computation (see the nbrhs key in the job map). The maps sparse and
dense defines the options for the sparse and the dense solver respectively.
-
id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
{dense[ooc]}-{job[nbrhs]}-{job[nbpts]}"
template_files: &COUPLED_OOC [ "wrapper-ooc", "coupled-ooc" ]
template_instantiation:
scheduler:
- { prefix: "ooc-multi-solve", platform: "plafrim",
family: "miriel", constraint: "", exclude: "", nodes: 1,
tasks: 24, time: "2-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1M
- { nbpts: 1000000, nbrhs: 32, batch: 1 }
- { nbpts: 1000000, nbrhs: 64, batch: 1 }
- { nbpts: 1000000, nbrhs: 128, batch: 1 }
- { nbpts: 1000000, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, nbrhs: 512, batch: 1 }
# N = 2M
- { nbpts: 2000000, nbrhs: 32, batch: 2 }
- { nbpts: 2000000, nbrhs: 64, batch: 2 }
- { nbpts: 2000000, nbrhs: 128, batch: 2 }
- { nbpts: 2000000, nbrhs: 256, batch: 2 }
- { nbpts: 2000000, nbrhs: 512, batch: 3 }
# N = 4M
- { nbpts: 4000000, nbrhs: 32, batch: 4 }
- { nbpts: 4000000, nbrhs: 64, batch: 4 }
- { nbpts: 4000000, nbrhs: 128, batch: 5 }
- { nbpts: 4000000, nbrhs: 256, batch: 3 }
- { nbpts: 4000000, nbrhs: 512, batch: 3 }
# N = 6M
- { nbpts: 6000000, nbrhs: 32, batch: 6 }
- { nbpts: 6000000, nbrhs: 64, batch: 7 }
- { nbpts: 6000000, nbrhs: 128, batch: 7 }
- { nbpts: 6000000, nbrhs: 256, batch: 6 }
- { nbpts: 6000000, nbrhs: 512, batch: 5 }
# N = 7M
- { nbpts: 7000000, nbrhs: 32, batch: 8 }
- { nbpts: 7000000, nbrhs: 64, batch: 9 }
- { nbpts: 7000000, nbrhs: 128, batch: 10 }
- { nbpts: 7000000, nbrhs: 256, batch: 11 }
- { nbpts: 7000000, nbrhs: 512, batch: 12 }
sparse:
- { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
-nbrhsmumps" }
dense:
- { name: "spido", ooc: 1 }
- { name: "spido", ooc: 0 }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{job[nbrhs]} -nbpts {job[nbpts]}"
Validations:
-
id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\
{dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}"
-
id: "ooc-multi-solve-{job[batch]}-{sparse[name]}-{dense[name]}-\
{dense[ooc]}-{job[nbrhs]}-{job[schur]}-{job[nbpts]}"
template_files: *COUPLED_OOC
template_instantiation:
scheduler:
- { prefix: "ooc-multi-solve", platform: "plafrim",
family: "miriel", constraint: "", exclude: "", nodes: 1,
tasks: 24, time: "2-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1M
- { nbpts: 1000000, schur: 512, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 1024, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 2048, nbrhs: 256, batch: 1 }
- { nbpts: 1000000, schur: 4096, nbrhs: 256, batch: 1 }
# N = 2M
- { nbpts: 2000000, schur: 512, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 1024, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 2048, nbrhs: 256, batch: 1 }
- { nbpts: 2000000, schur: 4096, nbrhs: 256, batch: 1 }
# N = 4M
- { nbpts: 4000000, schur: 512, nbrhs: 256, batch: 3 }
- { nbpts: 4000000, schur: 1024, nbrhs: 256, batch: 3 }
- { nbpts: 4000000, schur: 2048, nbrhs: 256, batch: 4 }
- { nbpts: 4000000, schur: 4096, nbrhs: 256, batch: 3 }
# N = 6M
- { nbpts: 6000000, schur: 512, nbrhs: 256, batch: 4 }
- { nbpts: 6000000, schur: 1024, nbrhs: 256, batch: 5 }
- { nbpts: 6000000, schur: 2048, nbrhs: 256, batch: 5 }
- { nbpts: 6000000, schur: 4096, nbrhs: 256, batch: 6 }
# N = 7M
- { nbpts: 7000000, schur: 512, nbrhs: 256, batch: 13 }
- { nbpts: 7000000, schur: 1024, nbrhs: 256, batch: 14 }
- { nbpts: 7000000, schur: 2048, nbrhs: 256, batch: 14 }
- { nbpts: 7000000, schur: 4096, nbrhs: 256, batch: 13 }
# N = 8M
- { nbpts: 8000000, schur: 512, nbrhs: 256, batch: 7 }
- { nbpts: 8000000, schur: 1024, nbrhs: 256, batch: 8 }
- { nbpts: 8000000, schur: 2048, nbrhs: 256, batch: 8 }
- { nbpts: 8000000, schur: 4096, nbrhs: 256, batch: 6 }
# N = 9M
- { nbpts: 9000000, schur: 512, nbrhs: 256, batch: 9 }
- { nbpts: 9000000, schur: 1024, nbrhs: 256, batch: 10 }
- { nbpts: 9000000, schur: 2048, nbrhs: 256, batch: 11 }
- { nbpts: 9000000, schur: 4096, nbrhs: 256, batch: 12 }
sparse:
- { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
-nbrhsmumps" }
dense:
- { name: "hmat", ooc: 1}
- { name: "hmat", ooc: 0}
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{job[nbrhs]} --mumps-nbcols-schur-hmat {job[schur]} --hmat
--hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3 --coupled-mumps-hmat
-nbpts {job[nbpts]}"
Validations:
-
id: "va-ooc-multi-solve-{job[batch]}-{sparse[name]}-\
{dense[name]}-{dense[ooc]}-{job[nbrhs]}-{job[schur]}-\
{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}"
For multi-factorization, we vary the size of blocks of the Schur complement
matrix during the computation (see the schur key in the job map) and use the
IOblock key under job to set the size of a disk block when using SPIDO as
dense solver. This is to ensure that the value is a multiple of the chosen Schur
complement block size.
-
id: "ooc-multi-factorization-{job[batch]}-{sparse[name]}-{dense[name]}-\
{dense[ooc]}-{job[nbpts]}-{job[schur]}"
template_files: *COUPLED_OOC
template_instantiation:
scheduler:
- { prefix: "ooc-multi-factorization", platform: "plafrim",
family: "miriel", constraint: "", exclude: "", nodes: 1,
tasks: 24, time: "2-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
# N = 1.0M, n_BEM = 37,169
- { nbpts: 1000000, schur: 9296, IOblock: 2324, nb: 4, batch: 1 }
- { nbpts: 1000000, schur: 12390, IOblock: 2478, nb: 3, batch: 1 }
- { nbpts: 1000000, schur: 18585, IOblock: 3717, nb: 2, batch: 1 }
- { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1, batch: 1 }
# N = 1.5M, n_BEM = 48,750
- { nbpts: 1500000, schur: 12188, IOblock: 3047, nb: 4, batch: 3 }
- { nbpts: 1500000, schur: 16250, IOblock: 3250, nb: 3, batch: 3 }
- { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2, batch: 2 }
- { nbpts: 1500000, schur: 48750, IOblock: 4875, nb: 1, batch: 2 }
# N = 2.0M, n_BEM = 58,910
- { nbpts: 2000000, schur: 11784, IOblock: 1964, nb: 5, batch: 4 }
- { nbpts: 2000000, schur: 14728, IOblock: 2104, nb: 4, batch: 4 }
- { nbpts: 2000000, schur: 19638, IOblock: 3273, nb: 3, batch: 5 }
- { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2, batch: 5 }
# N = 2.5M, n_BEM = 68,524
- { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5, batch: 6 }
- { nbpts: 2500000, schur: 17135, IOblock: 3427, nb: 4, batch: 7 }
- { nbpts: 2500000, schur: 22842, IOblock: 3807, nb: 3, batch: 7 }
- { nbpts: 2500000, schur: 34264, IOblock: 4283, nb: 2, batch: 6 }
sparse:
- { name: "mumps", options: "-mumpsooc --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-facto" }
dense:
- { name: "spido", ooc: 1, options: "" }
- { name: "spido", ooc: 0, options: "" }
- { name: "hmat", ooc: 1, options: "--hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
- { name: "hmat", ooc: 0, options: "--hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
Tasks:
-
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{dense[options]} --mumps-sizeschur {job[schur]}
-diskblock {job[IOblock]} -nbpts {job[nbpts]}"
Validations:
-
id: "va-ooc-multi-factorization-{job[batch]}-{sparse[name]}-\
{dense[name]}-{dense[ooc]}-{job[nbpts]}-{job[schur]}"
launch_command: "{@job_creation[va_executable]}
-K dense_ooc={dense[ooc]},solver={sparse[name]}/{dense[name]}
-K n_blocks={job[nb]}"
3.5.4. Multi-node parallel distributed benchmarks
-
pack_id: "distributed"
description: "Multi-node parallel distributed benchmarks."
Tests:
The goal here is to analyze the impact of computation in a distributed memory
environment on the two-stage implementation schemes multi-solve and
multi-factorization using MUMPS as sparse solver and SPIDO or HMAT as dense
solver. We vary problem's unknown count (see the nbpts key in the job map).
For multi-solve, we vary also count of right-hand sides to be processed at once
by MUMPS during the Schur complement computation (see the nbrhs key in the
job map). For multi-factorization, we vary the size of blocks of the Schur
complement matrix during the computation (see the schur key in the job map)
and use the IOblock key under job to set the size of a disk block when using
SPIDO as dense solver. This is to ensure that the value is a multiple of the
chosen Schur complement block size. The maps sparse and dense defines the
options for the sparse and the dense solver respectively.
-
id: "distributed-multi-solve-{sparse[name]}-{dense[name]}-\
{scheduler[nodes]}-{job[nbpts]}"
template_files: &DISTRIBUTED [ "wrapper-in-core-distributed",
"coupled-distributed" ]
template_instantiation:
scheduler:
- { prefix: "distributed-multi-solve", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24,
np: 1, time: "1-12:00:00" }
- { prefix: "distributed-multi-solve", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24,
np: 1, time: "1-12:00:00" }
- { prefix: "distributed-multi-solve", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24,
np: 1, time: "1-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
- { nbpts: 1000000 }
- { nbpts: 2000000 }
- { nbpts: 4000000 }
- { nbpts: 7000000 }
- { nbpts: 9000000 }
sparse:
- { name: "mumps", options: "--mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 --mumps-multi-solve
-nbrhsmumps 256" }
dense:
- { name: "spido", options: ""}
- { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat 1024"}
Tasks:
-
nprocs: "-np {scheduler[nodes]} -map-by
ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
-bind-to {parallel[bind]}"
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{dense[options]} -nbpts {job[nbpts]}"
Validations:
-
id: "va-distributed-multi-solve-{sparse[name]}-{dense[name]}-\
{scheduler[nodes]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={sparse[name]}/{dense[name]}"
-
id: "distributed-multi-factorization-{sparse[name]}-{dense[name]}-\
{scheduler[nodes]}-{job[nbpts]}"
template_files: *DISTRIBUTED
template_instantiation:
scheduler:
- { prefix: "distributed-multi-factorization", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 2, nt: 24,
np: 1, time: "1-12:00:00" }
- { prefix: "distributed-multi-factorization", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 4, nt: 24,
np: 1, time: "1-12:00:00" }
- { prefix: "distributed-multi-factorization", platform: "plafrim",
family: "miriel", constraint: "&omnipath", nodes: 8, nt: 24,
np: 1, time: "1-12:00:00" }
parallel:
- *PARALLEL_DEFAULT
job:
- { nbpts: 1000000, schur: 37170, IOblock: 3717, nb: 1 }
- { nbpts: 1500000, schur: 24375, IOblock: 4875, nb: 2 }
- { nbpts: 2000000, schur: 29456, IOblock: 4208, nb: 2 }
- { nbpts: 2500000, schur: 13705, IOblock: 2741, nb: 5 }
sparse:
- { name: "mumps", options: "--mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-facto" }
dense:
- { name: "spido", options: "" }
- { name: "hmat", options: "--hmat --hmat-eps-assemb 1e-3
--hmat-eps-recompr 1e-3 --coupled-mumps-hmat" }
Tasks:
-
nprocs: "-np {scheduler[nodes]} -map-by
ppr:{scheduler[np]}:{parallel[map]} -rank-by {parallel[rank]}
-bind-to {parallel[bind]}"
options: "-d -nbrhs 50 --fembem -withmpf --coupled {sparse[options]}
{dense[options]} --mumps-sizeschur {job[schur]}
-diskblock {job[IOblock]} -nbpts {job[nbpts]}"
Validations:
-
id: "va-distributed-multi-factorization-{sparse[name]}-\
{dense[name]}-{scheduler[nodes]}-{job[nbpts]}"
launch_command: "{@job_creation[va_executable]}
-K solver={sparse[name]}/{dense[name]},n_blocks={job[nb]}"
3.5.5. Test benchmarks
-
pack_id: "test"
description: "Benchmarks for testing the benchmark framework."
Tests:
In this section, we define a small number of testing benchmarks allowing us to check the correct operation of the benchmark framework itself for each category of benchmarks, i.e. in-core, out-of-core and parallel distributed.
-
id: "test-ic"
template_files: *MONO
template_instantiation:
scheduler:
- { prefix: "test-ic", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-00:15:00" }
parallel:
- *PARALLEL_DEFAULT
Tasks:
-
options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbpts 200000"
Validations:
-
id: "va-test-ic"
launch_command: "{@job_creation[va_executable]}"
-
options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512
--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat"
Validations:
-
id: "va-test-ic"
launch_command: "{@job_creation[va_executable]}"
-
id: "test-ooc"
template_files: ["wrapper-ooc", "monobatch"]
template_instantiation:
scheduler:
- { prefix: "test-ooc", platform: "plafrim", family: "miriel",
nodes: 1, tasks: 24, time: "0-00:15:00" }
parallel:
- *PARALLEL_DEFAULT
dense:
- { ooc: 1 }
Tasks:
-
options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
-mumpsooc --mumps-multi-solve -nbpts 200000"
Validations:
-
id: "va-test-ooc"
launch_command: "{@job_creation[va_executable]}"
-
options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3 -mumpsooc
--mumps-multi-solve -nbpts 200000 --mumps-nbcols-schur-hmat 512
--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat"
Validations:
-
id: "va-test-ooc"
launch_command: "{@job_creation[va_executable]}"
-
id: "test-distributed"
template_files: ["wrapper-in-core-distributed", "monobatch"]
template_instantiation:
scheduler:
- { prefix: "test-distributed", platform: "plafrim",
family: "miriel", nodes: 2, tasks: 24, nt: 24, np: 1,
time: "0-00:15:00" }
parallel:
- *PARALLEL_DEFAULT
Tasks:
-
nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none"
options: "-z -nbrhs 50 --fembem -withmpf --coupled --mumps-verbose
--mumps-blr --mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbpts 400000"
Validations:
-
id: "va-test-distributed"
launch_command: "{@job_creation[va_executable]}"
-
nprocs: "-np 2 -map-by ppr:1:node -rank-by node -bind-to none"
options: "-z --fembem -withmpf --coupled --mumps-verbose --mumps-blr
--mumps-blr-variant 1 --mumps-blr-accuracy 1e-3
--mumps-multi-solve -nbpts 400000 --mumps-nbcols-schur-hmat 512
--hmat --hmat-eps-assemb 1e-3 --hmat-eps-recompr 1e-3
--coupled-mumps-hmat"
Validations:
-
id: "va-test-distributed"
launch_command: "{@job_creation[va_executable]}"
See the complete definition file benchmarks.yaml 8.
3.6. Resource monitoring
To continuously measure the amount of storage (RAM and optionally hard drive)
resources consumed during each benchmark run, we launch test_FEMBEM through the
Python script rss.py producing as many memory and disk usage logs as there are
MPI processes created during benchmark execution. The peak values are included
at the very end of these log files. Note that measures are taken regularly each
second and the output units are mibibytes [MiB].
import subprocess import sys import threading import time import os import psutil from datetime import datetime
After importing necessary Python modules, we begin by defining a help message
function that can be triggered with the -h option.
def help(): print("Monitor memory and disk usage of PROGRAM.\n", "Usage: ", sys.argv[0], " [options] [PROGRAM]\n\n", "Options:\n -h Show this help message.", file = sys.stderr, sep = "")
We must check if at least one argument has been passed to the script. It takes
as arguments the executable the resource consumption of which shall be measured
together with the arguments of the latter. Otherwise, one can specify the -h
option to make the script print a help message on how to use it. Also, the
--with-hdd switch allows one to enable the hard drive consumption monitoring.
if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() sys.exit(1) if sys.argv[1] == "-h": help() sys.exit(0) with_hdd = False if sys.argv[1] == "--with-hdd": with_hdd = True
Follow some function definitions. At first, a function to determine the MPI rank of the currently monitored process.
def mpi_rank(): rank = 0 for env_name in ['MPI_RANKID', 'OMPI_COMM_WORLD_RANK']: try: rank = int(os.environ[env_name]) break except: continue return rank
We also need a function to determine and format the current date and time.
Indeed, we include a timestamp in the output log files everytime we record a
measure. This helps us to synchronize the records with other monitoring tools
we are using, i.e. energy_scope (see Section 3.7).
def timestamp(): now = datetime.now() return now.strftime("%Y/%m/%dT%H:%M:%S")
The script gather real (resident) memory, e.g. the amout of data stored in the
random access memory (RAM). On Linux, the values can be obtained from the
/proc filesystem.
Memory usage statistics of a particular process are stored in
/proc/<pid>/statm where <pid> is the process identifier (PID). In this file,
the field VmRSS holds the amount of real memory used by the process at instant
\(t\). See the associated function below.
def rss(pid): with open("/proc/%d/statm" % pid, "r") as f: line = f.readline().split(); VmRSS = int(line[1]) * 4 return timestamp(), (VmRSS / 1024.)
To be able to monitor disk space used during benchmarks, we assign to each run
its specific temporary folder all the files used by test_FEMEBM are put into.
Finally, we monitor the evolution of the size of TMPDIR in our Python script
using the du Linux command.
def hdd(tmpdir): du = subprocess.Popen(["du", "-s", tmpdir], stdout = subprocess.PIPE) HddNow = int(du.stdout.read().decode("utf-8").split("\t")[0]) return timestamp(), (HddNow / 1024.)
Before entering into the monitoring loop, we need to:
- initialize variables to store peak values;
VmHWM = int(0) HddPeak = int(0)
- get the path of the temporary folder from the
TMPDIRenvironment variable. If the variable is not set,/tmpis returned to allow the script to detect that no separate temporary folder has been created and exit with failure to prevent taking potentially incorrect measures;
tmpdir = os.getenv('TMPDIR', '/tmp') if tmpdir == '/tmp': print('Error: Temporary files were not redirected!', file = sys.stderr) sys.exit(1)
- launch the program to monitor and get the rank of the currently monitored process.
myargs = sys.argv[1:] if with_hdd == True: myargs = sys.argv[2:] p = subprocess.Popen(myargs) rank = mpi_rank()
At this point, we begin the monitoring loop. The latter breaks when the monitored process exits. Note that the measured values are initially stored in memory. They are dumped into the corresponding log files at the end of the benchmark execution. This is to prevent frequent disk accesses that may impact the execution time of the monitored benchmark.
rss_log = [] hdd_log = [] while p.returncode == None:
The following tasks are performed to gather necessary results:
get current memory usage, update the peak memory usage if necessary and store the corresponding value converted from pages to kibibytes to the log;
VmRSS = rss(p.pid) if VmRSS[1] > VmHWM: VmHWM = VmRSS[1] rss_log.append(VmRSS)
collect disk usage statistics, update the peak disk space usage if necessary and store the corresponding value converted from bytes to kibibytes to the log;
if with_hdd == True: HddNow = hdd(tmpdir) if HddNow[1] > HddPeak: HddPeak = HddNow[1] hdd_log.append(HddNow)
sleep for one second and repeat.
time.sleep(1) p.poll()
Eventually, before exiting, we append usage peaks to the logs and dump the latter to disk.
print("Writing storage resource usage logs...") m = open("rss-%d.log" % rank, "w") for entry in rss_log: m.write("%s\t%g\n" % (entry[0], entry[1])) m.write("%g\n" % VmHWM) m.flush() m.close() if with_hdd == True: d = open("hdd-%d.log" % rank, "w") for entry in hdd_log: d.write("%s\t%g\n" % (entry[0], entry[1])) d.write("%g\n" % HddPeak) d.flush() d.close() sys.exit(p.returncode)
See the complete source file rss.py 9.
3.7. Energy consumption monitoring
energy_scope is an energy consumption monitoring tool. To configure what and
how should be monitored, it uses a configuration in JSON format. Among other
parameters, it allows the choose the acquisition frequency and profile.
In this section, we define a gcvb template (see Section 3.1) of the
configuration file.
{{
"parameter": {{
"comment":"configuration file template",
"version":"2022-01",
"owner":"energy_scope",
"cpu_info":"/proc/cpuinfo",
"nvidia_gpu_info":"/proc/driver/nvidia/gpus",
"stop_file_prefix":"/tmp/es_read_rt_stop_flag_",
"synchro_file_saved_prefix":"wrk/energy_scope_tmp_file_",
"synchro_started_prefix":"wrk/energy_scope_tmp_started_",
"msr_cmd":"rdmsr",
"opt_enode":"node",
"opt_estat":"shell",
"opt_profile":"none",
"opt_send":"none",
"analyseafteracquisition":"no",
"eprofile_period(ms)":{es[interval]},
"default_read_rt_profile":"{es[profile]}"
}},
"data": {{
"intel": {{
"generic" : {{
"read_rt_idle_time":{es[idle_time]},
"registers": {{
"0x010": {{"when":"ae", "pct":"package"}},
"0x0ce": {{"when":"b", "pct":"package"}},
"0x0e7": {{"when":"N", "pct":"thread"}},
"0x0e8": {{"when":"a", "pct":"thread"}},
"0x198": {{"when":"N", "pct":"core"}},
"0x19c": {{"when":"at", "pct":"core"}},
"0x1A2": {{"when":"b", "pct":"package"}},
"0x606": {{"when":"b", "pct":"package"}},
"0x610": {{"when":"b", "pct":"package"}},
"0x611": {{"when":"ae", "pct":"package"}},
"0x613": {{"when":"a", "pct":"package"}},
"0x614": {{"when":"b", "pct":"package"}},
"0x619": {{"when":"ae", "pct":"package"}},
"0x620": {{"when":"b", "pct":"package"}},
"0x621": {{"when":"a", "pct":"package"}},
"0x639": {{"when":"ae", "pct":"package"}},
"0x641": {{"when":"ae", "pct":"package"}},
"0x64e": {{"when":"a", "pct":"thread"}},
"0x770": {{"when":"b", "pct":"package"}},
"0x771": {{"when":"b", "pct":"thread"}},
"0x774": {{"when":"b", "pct":"thread"}}
}}
}}
}},
"amd":{{
"generic" : {{
"read_rt_idle_time":{es[idle_time]},
"registers": {{
"0x010": {{"when":"ae", "pct":"package"}},
"0x0e7": {{"when":"a", "pct":"thread"}},
"0x0e8": {{"when":"a", "pct":"thread"}},
"0xC0000104": {{"when":"b", "pct":"thread"}},
"0xC0010064": {{"when":"b", "pct":"core"}},
"0xC0010299": {{"when":"b", "pct":"package"}},
"0xC001029A": {{"when":"a", "pct":"core"}},
"0xC001029B": {{"when":"ae", "pct":"package"}}
}}
}}
}}
}}
}}
See the complete energy_scope configuration file template
es_config.json 10.
3.8. Result parsing
The Shell script parse.sh parses data from a test_FEMBEM benchmark and store
the result to a comma-separated values .csv file.
We begin with a help message function that can be triggered using the -h
option.
function help() { echo -n "Parse data from a test_FEMBEM run and store the result to a " >&2 echo "comma-separated (*.csv) file." >&2 echo "Usage: ./parse.sh [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo -n " -s FILE Read the standard output of " >&2 echo "test_FEMBEM from FILE rather than from the standard input." >&2 echo -n " -r PATH Read memory and disk usage " >&2 echo "logs provided by rss.py from the directory at PATH." >&2 echo -n " -l FILE=FUNCTION[,FUNCTION] Read the trace log of the " >&2 echo -n "test_FEMBEM run from FILE and parse execution time and " >&2 echo "iteration count of one or more FUNCTION(s)." echo -n " -K KEY=VALUE[,KEY=VALUE] Parse one or more additional " >&2 echo "KEY=VALUE pairs to be added to the output (*.csv) file." >&2 echo " -H Do not print header on output." >&2 echo -n " -o FILE Specify the file to store the " >&2 echo "output of the parsing to." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
There should be at least one argument provided on the command line.
if test $# -lt 1; then help exit 1 fi
STDOUT holds the path to the input file containing test_FEMBEM standard
output.
STDOUT=""
RSS contains path and name pattern of input files containing the log of the
resource monitoring script (see Section 3.6).
RSS=""
TRACE is the input file containing the trace call log produced by
test_FEMBEM.
TRACE=""
OUTPUT is the path to the output file to store the parsed values to.
OUTPUT=""
FUNCTIONS contains names of functions to parse information about from the
trace call log produced by test_FEMBEM.
FUNCTIONS=""
CUSTOM_KV holds custom key-value pairs to be included in the output *.csv
file.
CUSTOM_KV=""
DISABLE_HEADING toggles heading in the output *.csv file.
DISABLE_HEADING=0
At this point, we parse provided arguments and check the validity of option values.
while getopts ":hs:r:l:K:o:H" option; do case $option in
Standard output of the test_FEMBEM executable run can be passed to the script
either through its standard input or in a file specified using the -s option.
s)
STDOUT=$OPTARG
if test ! -f $STDOUT;
then
error "'$STDOUT' is not a valid file!"
exit 1
fi
;;
When the -r option is provided, the script shall also parse resource
monitoring logs.
r)
RSS=$OPTARG
if test ! -d $RSS;
then
error "'$RSS' is not a valid directory!"
exit 1
fi
;;
When the -l option is provided, the script shall read the trace call log of
test_FEMBEM from the file passed as argument and parse execution time, iteration
count and floating-point operations (Flops) of one or more functions.
l)
TRACE=$(echo $OPTARG | cut -d '=' -f 1)
FUNCTIONS=$(echo $OPTARG | cut -d '=' -f 2 | sed 's/,/\t/g')
if test ! -f $TRACE;
then
error "'$TRACE' is not a valid file!"
exit 1
fi
;;
The -K option allows to add one or more KEY=VALUE pairs into the output
*.csv file.
K)
if test "$CUSTOM_KV" == "";
then
CUSTOM_KV="$OPTARG"
else
CUSTOM_KV="$CUSTOM_KV,$OPTARG"
fi
;;
The -H option toggles printing of the header in the output *.csv file.
H)
DISABLE_HEADING=1
;;
Using the -o option, one can specify the name of the output *.csv file.
o)
OUTPUT=$OPTARG
;;
Eventually, we take care of unknown options or missing arguments and raise an error if any.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
If the standard output of test_FEMBEM is not provided in a file, we try to read its content from the standard input of the script.
if test "$STDOUT" == ""; then rm -rf .input while read line; do echo $line >> .input done STDOUT=.input fi
If no standard output from test_FEMBEM to parse is found, we abort the script and raise an error.
if test $(wc --bytes $STDOUT | cut -d' ' -f 1) -lt 1; then error "'$STDOUT' contains no 'test_FEMBEM' standard output to parse!" exit 1 fi
The treatment starts by separating custom key-value pairs, if any, by a
tabulation character, so they can be looped over using a for loop.
CUSTOM_KV=$(echo $CUSTOM_KV | sed 's/,/\t/g')
We print the header line to the output file if it was not explicitly disabled
using the -H option.
if test $DISABLE_HEADING -ne 1; then echo -n "processes,by,mapping,ranking,binding,omp_thread_num," > $OUTPUT echo -n "mkl_thread_num,hmat_ncpu,mpf_max_memory,nbpts,radius," >> $OUTPUT echo -n "height,nbrhs,step_mesh,nbem,nbpts_lambda,lambda," >> $OUTPUT echo -n "thread_block_size,proc_block_size,disk_block_size," >> $OUTPUT echo -n "tps_cpu_facto_mpf,tps_cpu_solve_mpf,error,symmetry," >> $OUTPUT echo -n "h_assembly_accuracy,h_recompression_accuracy," >> $OUTPUT echo -n "assembled_size_mb,tps_cpu_facto,factorized_size_mb," >> $OUTPUT echo -n "tps_cpu_solve,mumps_blr,mumps_blr_accuracy," >> $OUTPUT echo -n "mumps_blr_variant,coupled_method,coupled_nbrhs," >> $OUTPUT echo -n "size_schur,cols_schur,sparse_ooc,platform,node_family," >> $OUTPUT echo -n "singularity_container,slurm_jobid,system_kind,es_cpu," >> $OUTPUT echo -n "es_dram,es_duration,rm_peak,hdd_peak" >> $OUTPUT
The common header entries are followed by custom key-value pairs and additional details about selected functions acquired from the trace call logs produced by test_FEMBEM, if any.
for kv in $CUSTOM_KV; do KEY=$(echo $kv | cut -d '=' -f 1) echo -n ",$KEY" >> $OUTPUT done for f in $FUNCTIONS; do echo -n ","$f"_exec_time,"$f"_nb_execs,"$f"_flops" >> $OUTPUT done echo >> $OUTPUT fi
Next, we parse all the interesting information from the provided standard output
of test_FEMBEM. The regular expressions and parameters of grep and sed
utilities used in the script are chosen based on the output format of
test_FEMBEM messages (see Section 5.1 in Appendix).
processes=$(cat $STDOUT | grep "<PERFTESTS> MPI process count" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9]//g') mapping=$(cat $STDOUT | grep "<PERFTESTS> MPI processes mapped by" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') ranking=$(cat $STDOUT | grep "<PERFTESTS> MPI processes ranked by" | uniq |\ cut -d '=' -f 2 | sed 's/[^a-z]//g') binding=$(cat $STDOUT | grep "<PERFTESTS> MPI processes bound to" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') omp_thread_num=$(cat $STDOUT | grep "OpenMP thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') mkl_thread_num=$(cat $STDOUT | grep "MKL thread number" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') hmat_ncpu=$(cat $STDOUT | grep "HMAT_NCPU" | cut -d '=' -f 4 | \ sed 's/[^0-9]//g') mpf_max_memory=$(cat $STDOUT | grep "MPF_MAX_MEMORY" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts=$(cat $STDOUT | grep "<PERFTESTS> NbPts =" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') radius=$(cat $STDOUT | grep "Reading radius" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') height=$(cat $STDOUT | grep "Reading height" | cut -d '=' -f 2 | \ sed 's/[^0-9.]//g') nbrhs=$(cat $STDOUT | grep "<PERFTESTS> NbRhs" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') step_mesh=$(cat $STDOUT | grep "<PERFTESTS> StepMesh" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') nbem=$(cat $STDOUT | grep "<PERFTESTS> NbPtsBEM" | cut -d '=' -f 2 | \ sed 's/[^0-9]//g') nbpts_lambda=$(cat $STDOUT | grep "<PERFTESTS> nbPtLambda" | \ cut -d '=' -f 2 | sed 's/[^-0-9e.+]//g') lambda=$(cat $STDOUT | grep "<PERFTESTS> Lambda" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') thread_block_size=$(cat $STDOUT | grep "thread block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') proc_block_size=$(cat $STDOUT | grep "proc block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') disk_block_size=$(cat $STDOUT | grep "disk block" | cut -d ':' -f 2 | \ cut -d 'x' -f 1 | sed 's/[^0-9]//g') tps_cpu_facto_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFactoMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve_mpf=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolveMPF =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') error=$(cat $STDOUT | grep "<PERFTESTS> Error" | cut -d '=' -f 2 | \ sed 's/[^-0-9e.+]//g') symmetry=$(cat $STDOUT | grep "Testing: Non-Symmetric matrices and solvers.") if test "$symmetry" != ""; then symmetry="non-symmetric" else symmetry="symmetric" fi h_assembly_accuracy=$(cat $STDOUT | grep "\[HMat\] Compression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') h_recompression_accuracy=$(cat $STDOUT | \ grep "\[HMat\] Recompression epsilon" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') assembled_size_mb=$(cat $STDOUT | grep "<PERFTESTS> AssembledSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_facto=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuFacto =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') factorized_size_mb=$(cat $STDOUT | grep "<PERFTESTS> FactorizedSizeMb" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') tps_cpu_solve=$(cat $STDOUT | grep "<PERFTESTS> TpsCpuSolve =" | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') mumps_blr="NA" if test "$(cat $STDOUT | grep 'NOT Using Block-Low-Rank compression')" != ""; then mumps_blr=0 elif test "$(cat $STDOUT | grep 'Using Block-Low-Rank compression')" != ""; then mumps_blr=1 fi mumps_blr_accuracy=$(cat $STDOUT | \ grep "Accuracy parameter for Block-Low-Rank" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') mumps_blr_variant=$(cat $STDOUT | grep "BLR Factorization variant" | \ cut -d ':' -f 2 | sed 's/[^-0-9e.+]//g') coupled_method="" if test "$(cat $STDOUT | grep 'Multi solve method.')" != ""; then coupled_method="multi-solve" elif test "$(cat $STDOUT | grep 'Multi facto method.')" != ""; then coupled_method="multi-facto" fi coupled_nbrhs=$(cat $STDOUT | grep "Number of simultaneous RHS" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') if test "$coupled_nbrhs" == ""; then coupled_nbrhs=32 # Default value fi size_schur=$(cat $STDOUT | grep "Size of the block Schur" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') cols_schur=$(cat $STDOUT | \ grep "Number of columns in the H-matrix Schur block" | \ cut -d ':' -f 2 | sed 's/[^0-9.]//g') sparse_ooc=$(cat $STDOUT | grep "[mumps] Out-of-core.") if test "$sparse_ooc" != ""; then sparse_ooc=1 else sparse_ooc=0 fi platform=$(cat $STDOUT | grep "<PERFTESTS> Underlying platform" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') node_family=$(cat $STDOUT | grep "<PERFTESTS> Node family" | uniq | \ cut -d '=' -f 2 | sed 's/[^a-z]//g') singularity_container=$(cat $STDOUT | grep "<PERFTESTS> Singularity image") if test "$singularity_container" != ""; then singularity_container=$(echo $singularity_container | cut -d '=' -f 2 | \ sed 's/[[:space:]]*//g') else singularity_container="NA" fi slurm_jobid=$(cat $STDOUT | grep "<PERFTESTS> Slurm job identifier" | uniq | \ cut -d '=' -f 2 | sed 's/[^0-9.]//g') if test "$(cat $STDOUT | grep 'Testing : FEM-BEM')" != ""; then system_kind="fem-bem" elif test "$(cat $STDOUT | grep 'Testing : FEM')" != ""; then system_kind="fem" elif test "$(cat $STDOUT | grep 'Testing : BEM')" != ""; then system_kind="bem" else system_kind="unknown" fi es_cpu="NA" es_dram="NA" es_duration="NA" if test -f energy_scope_estat*; then es_estat=$(cat energy_scope_estat*) es_cpu=$(echo $es_estat | jq '.arch.data."joule(J)"."ecpu(J)"') es_dram=$(echo $es_estat | jq '.arch.data."joule(J)"."edram(J)"') es_duration=$(echo $es_estat | jq '."duration(sec)"') fi
If the -r option was provided, we parse also storage resource monitoring logs.
Memory usage log files are named rss-x.log and disk usage log files are named
hdd-x.log where \(x\) is MPI process rank.
if test "$RSS" != ""; then rm_peak="0.0" hdd_peak="0.0" for rss in $(ls $RSS | grep -e "^rss"); do rm_peak="$rm_peak + $(cat $RSS/$rss | tail -n 1)" done for hdd in $(ls $RSS | grep -e "^hdd"); do hdd_peak="$hdd_peak + $(cat $RSS/$hdd | tail -n 1)" done rm_peak=$(echo $rm_peak | bc -l) hdd_peak=$(echo $hdd_peak | bc -l) fi
Eventually, we print parsed values, values from custom key-value pairs, if any, and additional function information possibly parsed from trace call logs produced by test_FEMBEM.
echo -n "$processes,$by,$mapping,$ranking,$binding,$omp_thread_num," >> $OUTPUT echo -n "$mkl_thread_num,$hmat_ncpu,$mpf_max_memory,$nbpts," >> $OUTPUT echo -n "$radius,$height,$nbrhs,$step_mesh,$nbem,$nbpts_lambda," >> $OUTPUT echo -n "$lambda,$thread_block_size,$proc_block_size," >> $OUTPUT echo -n "$disk_block_size,$tps_cpu_facto_mpf,$tps_cpu_solve_mpf," >> $OUTPUT echo -n "$error,$symmetry,$h_assembly_accuracy," >> $OUTPUT echo -n "$h_recompression_accuracy,$assembled_size_mb," >> $OUTPUT echo -n "$tps_cpu_facto,$factorized_size_mb,$tps_cpu_solve," >> $OUTPUT echo -n "$mumps_blr,$mumps_blr_accuracy,$mumps_blr_variant," >> $OUTPUT echo -n "$coupled_method,$coupled_nbrhs,$size_schur,$cols_schur," >> $OUTPUT echo -n "$sparse_ooc,$platform,$node_family,$singularity_container," >> $OUTPUT echo -n "$slurm_jobid,$system_kind,$es_cpu,$es_dram,$es_duration," >> $OUTPUT echo -n "$rm_peak,$hdd_peak" >> $OUTPUT for kv in $CUSTOM_KV; do VALUE=$(echo $kv | cut -d '=' -f 2) echo -n ",$VALUE" >> $OUTPUT done for f in $FUNCTIONS; do exec_time=$(cat $TRACE | grep $f | cut -d '|' -f 2 | sed 's/[^0-9.]//g') nb_execs=$(cat $TRACE | grep $f | cut -d '|' -f 3 | sed 's/[^0-9]//g') flops=$(cat $TRACE | grep $f | cut -d '|' -f 4 | sed 's/[^0-9]//g') echo -n ",$exec_time,$nb_execs,$flops" >> $OUTPUT done
At the very end, we print the trailing new line character to the output file, perform cleaning and terminate.
echo >> $OUTPUT rm -rf .input exit 0
See the complete source file parse.sh 11.
3.9. Database injecting
The Python script inject.py allows to call a custom parsing script (see
Section 3.8), producing a .csv file on output and gather the
results exported in the latter, then inject the values into a gcvb NoSQL
database (see Section 3.1) for a possible result visualization using the
gcvb dashbord feature even if it is only experimental for now. See a concrete
usage of this script in Section 3.5.
At the beginning, we import necessary Python modules as well as the gcvb
module.
import gcvb import subprocess import csv import sys
The script expect the -h option or at least two arguments:
DATASETwhich stands for the.csvfile to gather the data from,PARSERwhich represents the path to the parsing script to use.
DATASET should be a .csv. file containing two lines, one heading providing
the captions and then the associated values.
Next, may follow the arguments to call PARSER with.
The script continues with a help message function, that can be triggered with
the -h option, and argument check.
def help(): print("Launch PARSER (with ARGUMENTS if needed) instrumented to produce a ", "comma-separated values .csv output DATASET, deserialize it and ", "inject to the current gcvb database provided by the gcvb module.\n", "Usage: ", sys.argv[0], " DATASET PARSER [ARGUMENTS]\n\n", "Options:\n -h Show this help message.", file = sys.stderr, sep = "") if len(sys.argv) < 2: print("Error: Arguments mismatch!\n", file = sys.stderr) help() if sys.argv[1] == "-h": help() sys.exit(0)
In the main function, we parse the arguments
def main(): args = sys.argv args.pop(0) # Drop the script name. dataset = args.pop(0) parser = args.pop(0)
and launch the parsing script.
subprocess.run([parser] + args)
Once it's finished, we open the data set produced, gather data into a dictionary
and inject key-value pairs one by one into the currently used gcvb database
provided by the imported gcvb module.
with open(dataset, mode = 'r') as data: reader = csv.DictReader(data) for row in reader: for item in row: gcvb.add_metric(item, row[item]) if __name__ == '__main__': main()
See the complete source file inject.py 12.
3.10. Extracting additional results
In some cases, we need to extract a more detailed information from the log files produced by one or more selected benchmarks. For example, regarding real memory (RAM) consumption, we gather only the peak usage values by default. Although, we may also need to know how the consumption evolves during the entire execution time of a given benchmark. Furthermore, some benchmarks may produce additional data which are not taken into account by the parsing script (see Section 3.8).
Therefore, we define here the extract.sh shell script allowing to extract
additional benchmark results for one or more user-specified benchmarks.
The script begins traditionnaly with a help message function that can be
triggered using the -h option.
function help() { echo "Extract additional results for selected benchmark or benchmarks." >&2 echo "Usage: ./extract.sh [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Show this help message." >&2 echo -n " -B ID[,ID] Select the benchmark matching ID. Multiple " >&2 echo "identifiers may be specified to select multiple benchmarks." >&2 echo " -d PATH Place the extracted files to PATH." >&2 echo " -r Extract detailed real memory (RAM) consumption." >&2 echo " -s PATH Search for benchmark results in PATH:" >&2 echo " -t Extract the execution timeline." >&2 echo " -T PATH Search for the timeline extraction tool in PATH." \ "By default, the script looks in './sources'.">&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
Follows the option parsing.
BENCHMARKS="" DESTINATION=$(pwd) RSS=0 SOURCE="" TIMELINE=0 TIMELINE_PATH="./sources" while getopts ":hB:d:rs:tT:" option; do case $option in
The -B option allows to specify one or more benchmarks to extract additional
results for. If more than one identifier is present, they must be separated by
commas. The latter are then converted to tabulations for easier post-processing.
B)
BENCHMARKS="$BENCHMARKS $(echo $OPTARG | sed 's/,/\t/g')"
if test "$BENCHMARKS" == "";
then
error "Bad usage of the '-B' option (no identifiers specified)!"
exit 1
fi
;;
By default, the extracted files shall be placed to the current working directory
of the script. One can use the -d option to specify an another destination
directory.
d)
DESTINATION=$OPTARG
if test ! -d "$DESTINATION";
then
error "'$DESTINATION' is not a valid destination directory!"
exit 1
fi
;;
The -r option tells the script to extract detailed real memory (RAM)
consumption data.
r)
RSS=1
;;
The -s option allows to specify the path to the directory to search for
benchmark results in. Typically, this represent the path to a gcvb benchmark
session directory (see Section 3.1).
s)
SOURCE=$OPTARG
if test ! -d "$SOURCE";
then
error "'$SOURCE' is not a valid source directory!"
exit 1
fi
;;
The -t option allows to extract the execution timelines of the selected
benchmarks. We also have to check that the external proprietary extraction tool
from Airbus we use is available from the current working directory.
t)
TIMELINE=1
;;
The -T option allows to specify the path to search the timeline extraction
tool in.
T)
TIMELINE_PATH=$OPTARG
;;
Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
The -B and the -s options are mandatory and at least one of the extraction
options, -r or -t, must be specified as well.
if test "$SOURCE" == ""; then error "No source directory was specified! Nothing to do." exit 1 fi if test "$BENCHMARKS" == ""; then error "No benchmark identifier was specified!" exit 1 fi if test $RSS -eq 0 && test $TIMELINE -eq 0; then error "Use at least one of the extraction options! Nothing to do." exit 1 fi
If the -t option was specified, we need to check if the timeline extraction
tool is accessible.
if test $TIMELINE -ne 0 && test ! -f $TIMELINE_PATH/timeline.py; then error "Timeline extraction tool was not found in '$TIMELINE_PATH'!" exit 1 fi
For each benchmark identifier specified using the -B option we:
- extract all of the real memory consumption logs if the
-roption is present,
for b in $BENCHMARKS; do if test $RSS -ne 0; then RSS_LOGS=$(ls $SOURCE/$b/rss-*.log) if test "$RSS_LOGS" == ""; then error "There are no real memory (RAM) consumption logs at '$RSS_LOGS'!" exit 1 fi COUNTER=0 for l in $RSS_LOGS; do cp $l $DESTINATION/rss-$COUNTER-$b.log if test $? -ne 0; then error "Failed to extract the real memory (RAM) consumption log '$l'!" exit 1 fi COUNTER=$(expr $COUNTER + 1) done fi
- extract the execution timelines from the standard output log of the selected
benchamrks if the
-toption is present.
if test $TIMELINE -ne 0; then python $TIMELINE_PATH/timeline.py $SOURCE/$b/stdout.log \ > $DESTINATION/timeline-$b.log if test $? -ne 0 || test ! -f $DESTINATION/timeline-$b.log; then error "Failed to extract the execution timeline of the benchmark '$b'!" exit 1 fi fi done
See the complete source file extract.sh 13.
3.11. Wrapper scripts
Before launching a benchmark there are some preparation actions to perform.
Similarly, there are some completion and cleaning actions to perform once the
benchmark task finishes. For the sake of convenience, especially in case of
distributed parallel execution, we use a wrapper Shell script to perform these
actions. This script has the form of a gcvb template file (see Section
3.1) allowing us to make the script fit the individual parameters of each
benchmark. Although, the major part of the template is identical for all kinds
of benchmark we define (see Section 3.5), some parts may differ
to meet the specific needs of a given benchmark or set of benchmarks, e.g.
setting of the environment variables related to out-of-core computation
activation. In this section, we define multiple variations of the wrapper script
template.
3.11.1. In-core benchmarks
The wrapper script template wrapper-incore.sh is meant for benchmarks run
in-core, i.e. with the out-of-core computation disabled.
At first, we need to setup a dedicated temporary folder for the benchmark
enabling us to measure the disk space used during the execution of the latter
(see Section 3.6). The GCVB_TEST_ID environment variable is set by
gcvb and represents a unique benchmark identifier.
mkdir -p /tmp/vive-pain-au-chocolat/$GCVB_TEST_ID export TMPDIR=/tmp/vive-pain-au-chocolat/$GCVB_TEST_ID
As a temporary workaround for Singularity containers to work properly, we need
to set the PYTHONPATH environment variable too.
if test "$SINGULARITY_CONTAINER" != ""; then export PYTHONPATH=$GUIX_PYTHONPATH fi
Using the OMPI_MCA_pml environment variable, we specify the communication
library to use by OpenMPI at runtime.
export OMPI_MCA_pml='^ucx'
Then, we position the environment variables specifying the number of threads to
use available through the parallel placeholder (see Section
3.5).
export OMP_NUM_THREADS={parallel[nt]} MKL_NUM_THREADS={parallel[nt]} \ HMAT_NCPU={parallel[nt]}
Also, we need to disable the out-of-core computation feature. For MUMPS, it is disabled by default. For SPIDO and HMAT we need to set a couple of environment variables to disable it explicitly.
export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1
In prevision of later post-processing of the standard output of the execution, we try to determine the rank of the current process
if test "$OMPI_COMM_WORLD_RANK" != ""; then MPI_RANK=$OMPI_COMM_WORLD_RANK elif test "$MPI_RANKID" != ""; then MPI_RANK=$MPI_RANKID else echo "Failed to get the MPI of the current process! Can not proceed." >&2 exit 1 fi
and if we are the 0th process, we print out:
selected scheduling information (see the
schedulerplaceholder in Section 3.5),if test $MPI_RANK -eq 0; then echo "<PERFTESTS> Underlying platform = {scheduler[platform]}" echo "<PERFTESTS> Node family = {scheduler[family]}" fi
the MPI configuration (see the
parallelplaceholder in Section 3.5),if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {parallel[np]}" echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}" echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}" echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}" fi
the path to the Singularity container, if the execution is taking place within one of them,
if test "$SINGULARITY_CONTAINER" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Singularity image = $SINGULARITY_CONTAINER." fi
the Slurm job identifier.
if test "$SLURM_JOBID" != "" && test $MPI_RANK -eq 0; then echo "<PERFTESTS> Slurm job identifier = $SLURM_JOBID" fi
We also need to create symbolic links to industrial test case meshes in the current working directory.
if test $MPI_RANK -eq 0; then MESHES="../../../meshes" for i in $(ls $MESHES); do ln -s $MESHES/$i $i done fi
At this point, we can launch the benchmark command passed as argument.
eval "$*"
After the execution, we have to clean all the files produced during the
benchmark execution except for logs, results, FXT traces, configuration files,
scheduling and launching scripts, i.e. files matching *.log, *.yaml,
es_config.json, prof_file*, *.out, *batch, coupled, scalability, or
*.sh.
if test $MPI_RANK -eq 0; then rm -f $(find . -type f ! -name "*.log" ! -name "*.yaml" ! -name "prof_file*" \ ! -name "*.out" ! -name "*batch" ! -name "coupled" \ ! -name "scalability" ! -name "*.sh" ! -name "*energy_scope*" \ ! -name "es_config.json" ! -name "*.csv") fi
3.11.2. Out-of-core benchmarks
Compared to the template for in-core benchmarks (see Section
3.11.1), the template for out-of-core benchmarks wrapper-ooc.sh
makes sure the environment variables disabling the feature for SPIDO and HMAT
are unset if the placeholder {dense[ooc]} is true. For MUMPS, the activation
of out-of-core is handled using a dedicated switch of test_FEMBEM. Also, by
default, HMAT starts to dump data on disk only when the solver's consumption
reaches 55% of available system memory. This leaves only 45% of available memory
for all the other data and computations. Therefore, we need to decrease this
threshold.
<<code:incore-preparation>> <<code:incore-mpi>> if test {dense[ooc]} -eq 1; then unset MPF_SPIDO_INCORE unset HMAT_DISABLE_OOC TOTAL_MEMORY_KiB=$(grep "MemTotal" /proc/meminfo | sed 's/[^0-9]//g') TOTAL_MEMORY_MiB=$(echo "$TOTAL_MEMORY_KiB / 1024 / 4;" | bc -l) export HMAT_LIMIT_MEM=TOTAL_MEMORY_MiB else export MPF_SPIDO_INCORE=1 HMAT_DISABLE_OOC=1 fi <<code:incore-completion>>
3.11.3. FXT execution tracing template
In the template wrapper-fxt.sh for benchmarks ensuring FXT execution tracing,
we need to set specific StarPU environment variables related to this
functionality and add the path to the StarVZ R library (see Section
4.2) in PATH. Note that the out-of-core computation
remains disabled.
<<code:incore-preparation>> <<code:incore-mpi>> <<code:incore-incore>> export STARPU_FXT_PREFIX=$(pwd)/ STARPU_PROFILING=1 STARPU_WORKER_STATS=1 export PATH=$GUIX_ENVIRONMENT/site-library/starvz/tools/:$PATH <<code:incore-completion>>
3.11.4. Multi-node parallel distributed benchmarks template
In the template wrapper-in-core-distributed.sh for multi-node parallel
distributed benchmarks, the total number of MPI processes is given by the
{scheduler[nodes]} placeholder instead of {parallel[np]}. In this case, we
also print out the number of MPI processes per node.
<<code:incore-preparation>> <<code:incore-incore>> if test $MPI_RANK -eq 0; then echo "<PERFTESTS> MPI process count = {scheduler[nodes]}" echo "<PERFTESTS> MPI processes per node = {scheduler[np]}" echo "<PERFTESTS> MPI processes mapped by = {parallel[map]}" echo "<PERFTESTS> MPI processes ranked by = {parallel[rank]}" echo "<PERFTESTS> MPI processes bound to = {parallel[bind]}" fi <<code:incore-completion>>
See the complete wrapper script templates
wrapper-in-core.sh 14, wrapper-ooc.sh 15 and
wrapper-fxt.sh 16.
3.12. Generate benchmark runs
3.13. Job submission
To simplify the scheduling of benchmarks to run in parallel based on the
associated job names (see Section 3.2), we use the shell
script submit.sh. It determines the list of benchmarks from a given benchmark
session, finds the corresponding slurm batch script configuration
headers and schedule associated jobs.
We begin by defining a help message function that can be triggered if the script
is run with the -h option.
function help() { echo "Submit benchmarks from SESSION." >&2 echo "Usage: $(basename $0) [options]" >&2 echo >&2 echo "Options:" >&2 echo " -h Print this help message." >&2 echo " -E REGEX Submit all benchmarks except those verifying REGEX" \ "(mutually exclusive with '-F')." >&2 echo " -F REGEX Submit only the benchmarks verifying REGEX." >&2 echo " -g Generate job scripts without submitting them." >&2 echo " -i ID[,ID] Submit only the benchmarks having ID." >&2 echo -n " -l Only list the benchmarks to run instead of " >&2 echo "actually submitting the corresponding jobs." >&2 echo " -s SESSION Session to submit the benchmarks from." >&2 echo " -v Validate benchmarks without computing." >&2 echo " -w Make the script wait until all of the submitted" \ "jobs complete." >&2 }
We use a generic error message function. The error message to print is expected to be the first argument to the function. If not present, the function displays a generic error message.
function error() { if test $# -lt 1; then echo "An unknown error occurred!" >&2 else echo "Error: $1" >&2 fi }
Follows the parsing of options.
REGEX="" EXCLUDE=0 INCLUDE=0 IDs="" SESSION="" SESSION_ID="" GENERATE_ONLY=0 LIST_ONLY=0 VALIDATE=0 WAIT=0 while getopts ":hE:F:gi:ls:vw" option; do case $option in
Using the -E option, one can decide to exclude from scheduling the benchmarks
the names of which verify a given regular expression.
E)
REGEX=$OPTARG
EXCLUDE=1
if test "$REGEX" == "";
then
error "Bad usage of the '-E' option (empty regular expression)!"
exit 1
fi
;;
Using the -F option, one can decide to schedule only the benchmarks the names
of which verify a given regular expression.
F)
REGEX=$OPTARG
INCLUDE=1
if test "$REGEX" == "";
then
error "Bad usage of the '-F' option (empty regular expression)!"
exit 1
fi
;;
The -g option can be used to generate job scripts for selected benchmarks
without actually submitting them. The generated scripts are then placed into
results/<session>/jobs.
g)
GENERATE_ONLY=1
;;
The -i option allows to select benchmarks to schedule by providing their
exact identifiers separated by commas if multuiple identifiers are specified.
i)
IDs="$IDs $(echo $OPTARG | sed 's/,/ /g')"
if test "$IDs" == "";
then
error "Bad usage of the '-i' option (empty list of identifiers)!"
exit 1
fi
;;
The -l option enables to list benchmarks selected for scheduling without
actually scheduling them. It may be particularly useful, for example, in
combination with the -F option to verify whether the provided regular
expression matches the intended benchmarks.
l)
LIST_ONLY=1
;;
The -s option is used to specify the benchmark session in the results
directory (see Section 3.1) to schedule the benchmarks from.
s)
SESSION=$OPTARG
if test ! -d $SESSION;
then
error "'$SESSION' is not a valid directory!"
exit 1
fi
SESSION_ID=$(basename $SESSION)
;;
To re-run the validation phase without performing the computation of benchmarks,
we can use the -v option.
v)
VALIDATE=1
;;
To make the script wait until all the jobs that it has scheduled finish, the
-w option may be used.
w)
WAIT=1
;;
Eventually, we have to take care of unknown options or missing arguments, if any, raise an error and terminate the script in that case.
\?) # Unknown option error "Arguments mismatch! Invalid option '-$OPTARG'." echo help exit 1 ;; :) # Missing option argument error "Arguments mismatch! Option '-$OPTARG' expects an argument!" echo help exit 1 ;; h | *) help exit 0 ;; esac done
Options -E and -F, -F and -i, -l and -w, -g and -w as well as
-g and -l are mutually exclusive.
if test $EXCLUDE -ne 0 && test $INCLUDE -ne 0; then error "Options '-E' and '-F' must not be used together!" exit 1 fi if test "$REGEX" != "" && test "$IDs" != ""; then error "Options '-E' and '-F' must not be used together with the '-i' option!" exit 1 fi if test $WAIT -ne 0 && test $LIST_ONLY -ne 0; then error "Options '-l' and '-w' must not be used together!" exit 1 fi if test $WAIT -ne 0 && test $GENERATE_ONLY -ne 0; then error "Options '-g' and '-w' must not be used together!" exit 1 fi if test $GENERATE_ONLY -ne 0 && test $LIST_ONLY -ne 0; then error "Options '-g' and '-l' must not be used together!" exit 1 fi
After argument parse and check, we naviagte to the root of the gcvb filesystem
of the given benchmark session folder.
cd $SESSION/../../
We also check whether we are in a valid gcvb filesystem contaning a
config.yaml and a results folder (see Section 3.1).
if test ! -f config.yaml || test ! -d results; then error "'$SESSION' is not a correct gcvb session directory!" exit 1 fi
Then, we ensure to clean all the temporary folders we may have used before. As
configured in definition files (see Section 3.5), to isolate our
temporary files from other users, we use a dedicated temporary folder at
/tmp/vive-pain-au-chocolat.
rm -rf /tmp/vive-pain-au-chocolat if test $? -ne 0; then error "Unable to clean any temporary folder(s) previously used!" exit 1 fi
Before submitting, we look for all the sbatch configuration headers. If there
is none, there is no valid benchmark job to submit.
RUNS=$(find $SESSION -maxdepth 2 -type f -name "sbatch") if test "$RUNS" == ""; then error "No sbatch file was found! No valid job for submission." exit 1 fi
The script loops over all the headers found and extracts the job name from each
of them. We use an associative array to store for each job name, corresponding
to a set of benchmarks (e. g. multi-solve-1-mumps-spido), the path to the
associated header file.
declare -A BATCH_JOBS
In session folder, there is one folder for each benchmark, not only per set of benchmarks. So, the same header file may be present multiple times in different folders. Using an associative array we manage to keep the path to only one copy of each header and prevent redundant job submissions.
for run in $RUNS do JOB_NAME=$(grep "\-\-job-name" $run | cut -d '=' -f 2)
If the -i options is used to specify exact identifiers of benchmarks to be
submitted, we need to filter the detected benchmarks.
if test "$IDs" != ""; then for id in $IDs; do if test $(echo $run | grep "$id" | wc -l) -gt 0; then BATCH_JOBS[$JOB_NAME]=$run fi done
Also, in the case the -F option followed by a regular expression is specified,
we pick only the jobs the name of which matches the expression.
elif test "$REGEX" == "" || \ (test $INCLUDE -ne 0 && [[ "$JOB_NAME" =~ $REGEX ]]) || \ (test $EXCLUDE -ne 0 && ! [[ "$JOB_NAME" =~ $REGEX ]]); then BATCH_JOBS[$JOB_NAME]=$run fi done
If the user instruments the script to wait until the submitted jobs complete
using the -w option, before actually submitting the jobs, we must reinitialize
the list of jobs to wait for.
if test $WAIT -ne 0; then JOB_LIST="" fi
Finally, we call the gcvb submission command for each job name with the
--filter-by-test-id and --header options allowing us to specify a subset of
jobs to submit and prepend associated shell scripts with corresponding slurm
batch script configuration headers. If the -v option is used, we pass the
--validate-only option to gcvb in order to run only the validation phase. If
the -g option is used, we pass the --dry-run option to gcvb in order to
exit before actually submit the generated job script. The latter is then moved
into results/<session>/jobs. If the -l option is specified, the jobs
selected for scheduling are not submitted, only their identifier is printed out
on the screen.
SET_COUNT=${#BATCH_JOBS[@]} i=1 for unique in "${!BATCH_JOBS[@]}"; do PREFIX="[ $i / $SET_COUNT ]" if test $LIST_ONLY -ne 0; then echo "$PREFIX Listing job '$unique'." else ACTION_ERROR="submit" ACTION_SUCCESS="Submitted" if test $VALIDATE -ne 0; then ACTION_ERROR="validate" ACTION_SUCCESS="Validated" python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --validate-only --wait-after-submitting elif test $GENERATE_ONLY -ne 0; then ACTION_ERROR="generate job script for" ACTION_SUCCESS="Generated job script for" python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --dry-run --header ${BATCH_JOBS[$unique]} mkdir -p $SESSION/jobs mv $SESSION/job.sh $SESSION/jobs/$unique.sh else python3 -m gcvb --filter-by-test-id "$unique" compute \ --gcvb-base $SESSION_ID --wait-after-submitting \ --header ${BATCH_JOBS[$unique]} fi if test $? -ne 0; then error "$PREFIX Failed to $ACTION_ERROR batch job '$unique'!" exit 1 else echo "$PREFIX $ACTION_SUCCESS batch job '$unique'." fi
Also, if the -w option is set, we use the Slurm's squeue command to collect
job identifiers of the launched jobs to make the script wait for the latter to
complete before exiting itself.
if test $WAIT -ne 0; then LAST_JOB=$(squeue -u $USER -ho "%i" -S -V | head -n 1) JOB_LIST="$LAST_JOB $JOB_LIST" fi fi i=$(expr $i + 1) done
Once the jobs are submitted through the sbatch command and if the -w option
is set, we make the script wait until all the submitted jobs complete. We use
squeue again to retrieve the list of running jobs.
Note that if the compute command of gcvb executes correctly but slurm
fails to submit the job anyway, JOB_LIST will not be an empty string but will
contain spaces. Therefore, before entering the waiting loop, we have to ensure
the variable does not begin with a space in order to prevent an infinite waiting
loop.
if test $WAIT -ne 0; then if test "${JOB_LIST:0:1}" == " "; then error "No jobs to wait for!" exit 1 fi echo -n "Waiting for submitted jobs to complete... " while test "$JOB_LIST" != ""; do STILL_RUNNING="" for id in $JOB_LIST; do if test $(squeue | grep "$id" | wc -l) -gt 0; then STILL_RUNNING="$STILL_RUNNING $id" fi done JOB_LIST=$STILL_RUNNING sleep 1m done echo "Done" fi exit 0
See the complete source file submit.sh 17.
4. Post-processing results
4.1. Benchmarks
To visualize benchmark results we use R and rely mainly on the plot drawing
library ggplot2 ggplot2. The R script file plot.R we describe in this
section is meant to be used from within plot drawing R source code blocks in
concerned Org documents, e.g. reports, articles and so on.
4.1.1. Prerequisites
At the beginning we need to import several libraries for:
manipulating data sources, i.e. R data frames, SQLite databases and JSON files,
library(plyr) library(dplyr) library(readr) library(DBI) library(rjson)
transforming data frames,
library(tidyr)exporting R objects into Org-formatted ASCII text,
library(ascii) options(asciiType = "org")
decompressing gzip files,
library(R.utils)defining and exporting plots.
library(ggplot2) library(scales) library(grid) library(gridExtra) require(cowplot) library(stringr) library(ggrepel)
We also need to prevent using the cairo library to produce figures. This does
not work on Linux.
set_null_device(cairo_pdf)
4.1.2. Importing data
- Benchmark results
The benchmark tool we rely on,
gcvb, stores benchmark results into a local SQLite database (see Section 3.1). We define a global variable having the namegcvb_db_pathto store the path to thegcvbdatabase we are currently using.gcvb_db_path <- "~/benchmarks/gcvb.db"
To extract the results from the database, we define the function
gcvb_importwhich takes up to three arguments:sessionindicating thegcvbbenchmark session to extract the results from (see Section 3.1),likerepresenting a list or a vector of benchmark name patterns,pathcontaining the path to thegcvbSQLite database file and by default is set togcvb_db_path(see above).
Using the
likeargument one can extract only the results of benchmarks the identifiers of which match given pattern or patterns (see Section 3.5). The latter may contain wildcard characters supported by the SQL's operatorLIKEsqliteLike.The function begins by opening a connection to
gcvbSQLite database and setting theLIKEkeyword to be case-sensitive.gcvb_import <- function(session, like = c(), path = gcvb_db_path) { conn <- dbConnect(RSQLite::SQLite(), path) dbExecute(conn, "PRAGMA case_sensitive_like = true;")
A
gcvbdatabase contains 6 tables (see Figure 1).Figure 1: EER diagram of a
gcvbdatabase.At first, we need to select all the benchmarks belonging to
sessionfrom thetesttable and optionally restrict the selection to benchmarks matching the patterns oflike.request <- "" for(p in 1:length(like)) { if(p > 1) { request <- paste(request, "OR ") } request <- paste0(request, "name LIKE '", like[p], "'") } if(length(like) > 0) { request <- paste( "SELECT id, name, start_date FROM test WHERE (", request, ") AND run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } else { request <- paste( "SELECT id, name, start_date FROM test WHERE run_id IN", "(SELECT id FROM run WHERE gcvb_id =", session, ")" ) } benchmarks <- dbGetQuery(conn, request)
Then, if a given benchmark was executed more then once, we want to keep only the data associated with the latest run. To do this, we sort the data by the execution start timestamp, i.e. the
start_datecolumn and remove duplicates based on benchmark identifiers in thenamecolumn. Note thatstart_dateis imported as string, so we need to convert it to a datetime format at first. At the end, we drop thestart_datecolumn being useless for further treatment.benchmarks$start_date <- as.POSIXct(benchmarks$start_date) benchmarks <- benchmarks[order(benchmarks$start_date, decreasing = TRUE), ] benchmarks <- benchmarks[!duplicated(benchmarks$name), ] benchmarks <- subset(benchmarks, select = -c(start_date))
Benchmark results are stored in the separate
validtable. Each row references a benchmark in thetesttable and represents one metric and its value (see an example in Table 1). This representation is referred to as long format.Table 1: Example of the validtable.id metric value test_id task_step 39870 nbrhs 50 1260 NULL 39871 step_mesh 0.02241996 1260 NULL … … … … … In addition to the
validtable, thefiletable holds files potentially associated with a given benchmark, e.g. resource consumption logs (see an example in Table 2. Note that the files themselves are stored in a gzip-compressed binary form.Table 2: Example of the filestable.id filename file test_id 9 rss-0.log BLOB 3666 10 hdd-0.log BLOB 3666 11 …/energy_scope_eprofile_1473279.txt BLOB 3666 … … … … So, for each of the selected benchmarks, we extract the associated rows from the
validand thefilestables based on thetest_idkey. Note that thevaluecolumn is of mixed type, i.e. it contains both strings and numerical values. To prevent the RdbGetQueryfunction from doing inappropriate type conversions, we cast thevaluecolumn to text in the corresponding SQL request. We do the same for thefilecolumn in thefilestable. Note also that for the union operation below to work, we have to rename the selected columns of thefilestable to match those in thevalidtable.for(row in 1:nrow(benchmarks)) { test_id <- benchmarks[row, "id"] metrics <- dbGetQuery( conn, paste( "SELECT metric, CAST(value AS TEXT) AS 'value'", "FROM valid WHERE test_id = :id", "UNION", "SELECT filename AS 'metric', QUOTE(file) AS 'value'", "FROM files WHERE test_id = :id" ), params = list(id = test_id) )
At the same time, we convert the original long format representation (see Table 1) into wide format for easier data manipulation. In the latter, to each metric corresponds a separate column (see an example in Table 3). This way, each row contains all the results of a given benchmark.
Table 3: Example of a table containing benchmark results in wide format. id name nbrhs step_mesh … 1260 spido-100000 50 0.02241996 … if(nrow(metrics) > 0) { for(metric in 1:nrow(metrics)) { metric_name <- metrics[metric, "metric"] if(!(metric_name %in% colnames(benchmarks))) { benchmarks[[metric_name]] <- NA } benchmarks[which(benchmarks$id == test_id), metric_name] <- metrics[metric, "value"] } } }
Finally, we convert columns containing numerical values to the appropriate data type, close the database connection and return the resulting data frame. Note that the names of some columns, e.g.
rss-0.log,hdd-1.logortmp_energy_scope_1473279/energy_scope_eprofile_1473279.txtmay vary from one benchmark to another so we have to determine their names dynamically using a regular expression.cols_to_numeric <- colnames(benchmarks) dynamic_columns <- cols_to_numeric[ grepl( "^rss.*\\.log$|^hdd.*\\.log$|^flops.*\\.csv$|.*energy_scope_eprofile.*\\.txt$", cols_to_numeric ) ] cols_to_numeric <- cols_to_numeric[! cols_to_numeric %in% c( c("name", "mapping", "ranking", "binding", "coupled_method", "solver", "variation", "platform", "node", "node_family", "symmetry", "singularity_container", "system_kind", "energy_scope", "config_kind"), dynamic_columns )] benchmarks <- benchmarks %>% mutate_at(cols_to_numeric, as.numeric) dbDisconnect(conn) return(benchmarks) }
- Storage resources, power and energy consumption traces
During a benchmark execution, we use a custom Python script (see Section 3.6) to measure the consumption of real memory (RAM) and hard drive (HDD) periodically each second. There is one consumption log file per MPI process. The name of the log files matches
rss-N.logfor RAM consumption andhdd-N.logfor HDD consumption.Nstands for the rank of the corresponding MPI process. In a log file, each line contains the current timestamp and consumption in mibibytes (MiB). The very last line corresponds to the peak value.Then, we use
energy_scope(see Section 3.7) to measure power and energy consumption of processing units and RAM. In this case, the acquisition interval can be defined by the user. As a result,energy_scopeproduces an archive containing all the measures in a raw form. It is then possible, to produce a JSON trace file based on this archive. The output file is stored undertmp_energy_scope_X/energy_scope_eprofile_X.txtwhereXrepresents the identifier of the corresponding Slurm job (see Section 3.2).At the end of a benchmark execution, the above log and trace files are compressed and stored into the SQLite database holding all the benchmark results (see Section 4.1.2.1). Therefore, in the first place, we need to define a function allowing us to extract the files from the database and decompress them.
- Retrieving trace and log files
The function
gcvb_retrieve_filestakes a data frame containing the results of one single benchmark (see thebenchmarkargument), retrieved by thegcvb_importfunction (see Section 4.1.2.1) and formatted using thegcvb_formatfunction (see Section 4.1.3.1), extracts the corresponding storage resource, power and energy consumption logs and traces then decompresses them into the current working directory. Note that theenergy_scopetrace file is renamed toeprofile.txtto ensure a static file name and simplify its later processing.In the first place, we need to dynamically determine the names of the columns containing the log and trace files based on the patterns discussed above.
gcvb_retrieve_files <- function(benchmark) { rss_columns <- colnames(benchmark) rss_columns <- rss_columns[ grepl( "^rss.*\\.log$|^hdd.*\\.log$|^flops.*\\.csv$|.*energy_scope_eprofile.*\\.txt$", rss_columns ) ]
Then, we iterate over the corresponding columns in
benchmarkand:for(rss_column in rss_columns) {
retrieve the compressed version of the file as string,
rss_file <- benchmark[[rss_column]]if the file is not empty, strip the
X'prefix and the'suffix from this string,if(!is.na(rss_file)) { rss_file <- substring(rss_file, 3, nchar(rss_file) - 1)
split the string into a vector of two-character strings representing the hexadecimal binary content of the file,
rss_file <- strsplit(rss_file, "")[[1]] rss_file <- paste0(rss_file[c(TRUE, FALSE)], rss_file[c(FALSE, TRUE)])
change the output file name to
eprofile.txt, if the file corresponds to anenergy_scopetrace file,if(grepl("energy_scope", rss_column, fixed = TRUE)) { filename <- "eprofile.txt.gz" } else { filename <- paste0(basename(rss_column), ".gz") }
convert the vector of two-character strings into a raw binary form and restore the corresponding gzip-compressed file,
writeBin(as.raw(as.hexmode(rss_file)), filename)
decompress the file into the current working directory and remove its compressed counterpart.
gunzip(filename, overwrite = TRUE) } } }
- Storage resources consumption traces
The function
read_rssconstructs an R data frame based on RAM consumption and optionally on HDD consumption logs (is theinclude_hddswitch is set toTRUE) located inlogs_path. If a named vector is provided through the optionalaestheticsargument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. If the output of the function is inteded to be combined with a power consumption trace (see Section 4.1.2.2.4), the optionaloverride_beginningargument can be used to synchronize the output data frame according to the power consumption trace by considering the datetime given by this argument as the alternative beginning of the acquisition. Note that the power consumption trace always starts first as the energy and power consumption tracing toolenergy_scopeis always launched before the memory and disk usage monitoring script.The
outputdata frame shall contain four columns, thetimestampcolumn giving the time of the measure in milliseconds since the beginning of the acquisition, the correspondingconsumptionvalue, thekindof the consumption measure, i.e. whether it is the RAM or HDD consumption and theprocessrank column.read_rss <- function(logs_path, include_hdd = FALSE, aesthetics = NA, override_beginning = NA) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the RAM and, optionally, HDD log files in
logs_path.pattern <- ifelse(include_hdd, "rss|hdd.*\\.log", "rss.*\\.log") logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
Then, we read each log file, strip its last line containing the peak value, which we do not use here, and append the data into the output data frame
data.for(log in logs) { count <- length(readLines(log, warn = FALSE)) - 1 data <- read.table(log, nrow = count) if(ncol(data) < 2) { # Legacy format colnames(data) <- c("consumption") data$timestamp <- 0:(count - 1) } else { colnames(data) <- c("timestamp", "consumption") data$timestamp <- as.POSIXct(data$timestamp, format = "%Y/%m/%dT%H:%M:%S") start <- min(data$timestamp) if(!is.na(override_beginning)) { beginning <- as.integer(min(data$timestamp) - override_beginning) data$timestamp <- data$timestamp + beginning } data$timestamp <- (as.integer(data$timestamp) - as.integer(start)) * 1000 } data$consumption <- as.numeric(data$consumption / 1024.) base <- basename(log) kind <- ifelse(!include_hdd || startsWith(base, "rss"), "sram", "shdd") data$kind <- rep(kind, nrow(data)) process <- gsub("([a-z]+)-(\\d+)\\.log", "\\2", base) data$process <- rep(as.integer(process), nrow(data)) output <- rbind(output, data) }
Finally, we process the
aestheticsparameter, if provided, and return the output data frame. In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) } if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
- Floating-point operation count traces
The function
read_flopsconstructs an R data frame based on logs of floating-point operation counts located inlogs_path. If a named vector is provided through the optionalaestheticsargument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector (see Section 4.1.2.2.2). If the output of the function is inteded to be combined with a power consumption trace (see Section 4.1.2.2.4), the optionalshiftargument can be used to synchronize the output data frame according to the power consumption trace by shifting the output timestamps by the value (in millisecons) of this argument. As the measuring tool provides only timestamps relative to the beginning of the execution, one can use the beginning of the storage resources consumption trace as t0 in this case (see Section 4.1.2.2.2).The
outputdata frame shall contain four columns, thetimestampcolumn giving the time of the measure in milliseconds since the beginning of the acquisition, the correspondingconsumptionvalue, thekindof the measure, i.e. the floating-point operation count and theprocessrank column.read_flops <- function(logs_path, aesthetics = NA, shift = 0) { columns <- c("timestamp", "consumption", "kind", "process") output <- data.frame(matrix(nrow = 0, ncol = length(columns))) colnames(output) <- columns
We begin by listing all the RAM and, optionally, HDD log files in
logs_path.pattern <- "flops.*\\.csv" logs <- list.files( logs_path, pattern = pattern, full.names = TRUE, include.dirs = FALSE )
Then, we read each log file, strip its last line containing the peak value, which we do not use here, and append the data into the output data frame
data.for(log in logs) { input <- read.csv(log, header = FALSE, skip = 2) cpus <- as.integer(input[1, 3]) data <- data.frame( timestamp = input[, c(4)], consumption = rowSums(input[, c((4 + 5 * cpus + 1):ncol(input))]) ) # Convert to milliseconds. data$timestamp <- as.integer(round(data$timestamp, 0) * 1000) if(shift > 0) { data$timestamp <- data$timestamp + shift } data$kind <- rep("flops", nrow(data)) process <- gsub("([a-z]+)-(\\d+)\\.csv", "\\2", basename(log)) data$process <- rep(as.integer(process), nrow(data)) output <- rbind(output, data) }
Finally, we process the
aestheticsparameter, if provided, and return the output data frame. In case of multiple log files per benchmark, i.e. when a benchmark is run in distributed fashion, we need to sum up the consumption of all the processes.processes <- max(output$process) if(processes > 0) { output <- output %>% group_by(timestamp, kind) %>% summarise( consumption = sum(consumption) ) } if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] output[[name]] <- rep(as.character(aesthetic), nrow(output)) } } return(output) }
- Power and energy consumption trace
The function
read_esconstructs an R data frame based on a givenenergy_scopeJSON data filees_data. The data frame contains power consumption per second. If a named vector is provided through the optionalaestheticsargument, the function adds extra constant columns into the data frame using the names from the named vector and initialized with the values of the named vector. Adding such columns may be useful later for using aesthetics function (see Section 4.1.4) when drawing plots. The data frame contains also tag information. Benchmark executable may print specialenergy_scopetags to delimit the beginning and the end of a particular portion of computation in the power consumption trace. If theonlyoptional argument is provided, only the tags specified by the latter are treated. Theintervalargument allows one to set the interval (in milliseconds) the measures were taken in. By default, the function considers the interval of 1000 milliseconds.The resulting data frame is composed of the following columns:
tagrepresenting the tag label,typeindicating whether a line corresponds to a start or a stop tag,type_labelholding prettified version oftypefor plot drawing,timestamprepresenting the time of the measure in milliseconds since the beginning of the acquisition,consumptionrepresenting current power consumption of CPU or RAM (in Watts [W]),kindindicating whetherconsumptionrepresents power consumption of CPU or RAM,aestheticsrepresenting an optional named vector describing constant columns to be added to the output data frame in order to simplify plot drawing based on the latter.
Note that the format of the output data frame is compatible, in terms of columns it conains, to the format of data frames produced by the function
read_rss(see Section 4.1.2.2.2) for reading RAM and HDD consumption traces. This is to allow for combining power and storage resources consumption data. Indeed, one can provide this function with a data frame obtained withread_rssthrough therss_dataargument and extend the output data frame with RAM and HDD consumption. Thetimestampcolumn present also inrss_datashall allow us to synchronize the measures.The first step is to read the data from the JSON data file at
es_dataread_es <- function(es_data, aesthetics = c(), only = NA, interval = 1000, rss_data = NA, include_hdd = FALSE, flops_data = NA) { es_raw <- fromJSON(file = es_data)
and extract the timestamp corresponding to the beginning of acquisition as
es_start.es_start <- as.POSIXct( es_raw$data$data$tags$es_total$start, format = "%Y/%m/%d %H:%M:%S" )
Then, we collect CPU and RAM power consumption data into the respective data frames
es_cpuandes_ram. Thetimestampcolumn is generated based ones_start. Note that the acquisition interval is one second.es_cpu <- data.frame( consumption = as.numeric(es_raw$data$data$`ecpu(W)`), timestamp = as.integer(0) ) for(i in 2:nrow(es_cpu)) { es_cpu$timestamp[i] <- es_cpu$timestamp[i - 1] + interval } es_cpu$kind <- rep("ecpu", nrow(es_cpu)) es_ram <- data.frame( power_total = as.numeric(es_raw$data$data$`etotal(W)`), power_cpu = as.numeric(es_raw$data$data$`ecpu(W)`), timestamp = as.integer(0) ) es_ram$consumption <- es_ram$power_total - es_ram$power_cpu es_ram <- subset(es_ram, select = c("consumption", "timestamp")) for(i in 2:nrow(es_ram)) { es_ram$timestamp[i] <- es_ram$timestamp[i - 1] + interval } es_ram$kind <- rep("eram", nrow(es_ram))
We also make a local copy of
rss_dataif provided.rss_ram <- NA rss_hdd <- NA flops <- NA if(!is.na(rss_data)) { rss <- read_rss(rss_data, include_hdd, aesthetics, es_start) rss_ram <- subset(rss, kind == "sram") if(include_hdd) { rss_hdd <- subset(rss, kind == "shdd") } actual_start <- min(rss_ram$timestamp) actual_end <- max(rss_ram$timestamp) es_cpu <- subset( es_cpu, timestamp >= actual_start & timestamp <= actual_end ) es_ram <- subset( es_ram, timestamp >= actual_start & timestamp <= actual_end ) } if(!is.na(flops_data)) { shift <- 0 if(!is.na(rss_data)) { shift <- min(rss_ram$timestamp) } flops <- read_flops(flops_data, aesthetics, shift) }
Now, we initialize additional columns to store tag information.
es_cpu$tag <- NA es_cpu$type <- NA es_cpu$type_label <- NA es_ram$tag <- NA es_ram$type <- NA es_ram$type_label <- NA if(!is.na(rss_data)) { rss_ram$tag <- NA rss_ram$type <- NA rss_ram$type_label <- NA if(include_hdd) { rss_hdd$tag <- NA rss_hdd$type <- NA rss_hdd$type_label <- NA } } if(!is.na(flops_data)) { flops$tag <- NA flops$type <- NA flops$type_label <- NA }
For each tag considered, we add four new lines into the
esdata frame, i.e. one line with start time and CPU power consumption, one for stop time and CPU power consumption, one for start time and RAM power consumption and one for stop time and RAM power consumption.If during benchmark program execution a loop generates more than one occurrence of a given
energy_scopetag, a unique identifier is appended to the latter, e.g.tag_0,tag_1, etc. Otherwise,energy_scopewould discard all the data related to the tag. However, the presence of such identifiers complicates the post-treatment. Therefore, we need to strip them off the tag nametagbut keep them for later usage ininstance.for(entire_tag in names(es_raw$data$data$tags)) { tag <- sub("_[^_]+$", "", entire_tag) instance <- tail(strsplit(entire_tag, "_")[[1]], n = 1) instance <- suppressWarnings(as.integer(instance)) instance <- ifelse(is.na(instance), "i", instance)
Here we use the imported JSON data file
es_rawto get the power consumption values as well as the start and stop timestamps of tags.if(is.na(only) || tag %in% only) { es_start <- as.integer(es_start) start_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$start, format = "%Y/%m/%d %H:%M:%S" ) ) stop_time <- as.integer( as.POSIXct( es_raw$data$data$tags[[entire_tag]]$stop, format = "%Y/%m/%d %H:%M:%S" ) ) begin <- (start_time - es_start) * 1000 end <- (stop_time - es_start) * 1000 row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_cpu[ es_cpu$timestamp == begin & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_cpu[ es_cpu$timestamp == end & is.na(as.character(es_cpu$tag)), "consumption" ], kind = "ecpu" ) es_cpu <- rbind(es_cpu, row) row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = es_ram[ es_ram$timestamp == begin & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = es_ram[ es_ram$timestamp == end & is.na(as.character(es_ram$tag)), "consumption" ], kind = "eram" ) es_ram <- rbind(es_ram, row) if(!is.na(rss_data)) { consumption <- rss_ram[ rss_ram$timestamp == begin & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } consumption <- rss_ram[ rss_ram$timestamp == end & is.na(as.character(rss_ram$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "sram" ) rss_ram <- rbind.fill(rss_ram, row) } if(include_hdd) { consumption <- rss_hdd[ rss_hdd$timestamp == begin & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } consumption <- rss_hdd[ rss_hdd$timestamp == end & is.na(as.character(rss_hdd$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "shdd" ) rss_hdd <- rbind.fill(rss_hdd, row) } } } if(!is.na(flops_data)) { consumption <- flops[ flops$timestamp == begin & is.na(as.character(flops$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "start", type_label = paste0( "bold(alpha)~", label_tag_short(c(tag), instance) ), timestamp = begin, consumption = as.numeric(consumption), kind = "flops" ) flops <- rbind.fill(flops, row) } consumption <- flops[ flops$timestamp == end & is.na(as.character(flops$tag)), "consumption" ] consumption <- ifelse(length(consumption) > 0, consumption, NA) if(!is.na(consumption)) { row <- data.frame( tag = as.character(tag), type = "stop", type_label = paste0( "bold(omega)~", label_tag_short(c(tag), instance) ), timestamp = end, consumption = as.numeric(consumption), kind = "flops" ) flops <- rbind.fill(flops, row) } } } }
If the
aestheticsargument is provided, we include constant columns described by the latter to the resulting data frame. Finally, we return a single data frame combininges_cpu,es_ramandes_rssif therss_dataargument was provided. In this case, we also include an additional column to the three data frames to distinguish between power and storage resources consumption. This is useful for plotting figures (see Section 4.1.5.13).if(length(aesthetics) > 0) { for(aesthetic in aesthetics) { name <- names(aesthetics)[aesthetics == aesthetic] es_cpu[[name]] <- rep(as.character(aesthetic), nrow(es_cpu)) es_ram[[name]] <- rep(as.character(aesthetic), nrow(es_ram)) if(!is.na(rss_data)) { rss_ram[[name]] <- rep(as.character(aesthetic), nrow(rss_ram)) if(include_hdd) { rss_hdd[[name]] <- rep(as.character(aesthetic), nrow(rss_hdd)) } } if(!is.na(flops_data)) { flops[[name]] <- rep(as.character(aesthetic), nrow(flops)) } } } es <- rbind(es_cpu, es_ram) es$kind_general <- "e" if(!is.na(flops_data)) { flops$kind_general <- "f" es <- rbind.fill(es, flops) } if(!is.na(rss_data)) { rss_ram$kind_general <- "s" if(include_hdd) { rss_hdd$kind_general <- "s" es <- rbind.fill(es, rbind(rss_ram, rss_hdd)) } else { es <- rbind.fill(es, rss_ram) } es$timestamp <- es$timestamp - min(rss_ram$timestamp) } return(es) }
- Retrieving trace and log files
- Execution timeline traces
The function
read_timelineallows us to read atest_FEMBEMtimeline tracetrace, extract data about selected computationphasesfrom the latter and return it in form of a data frame suitable for drawing axis intercepting lines with the associated labels using therss_by_timeplot function (see Section 4.1.5.12).phasesrepresents a data frame of selected computation phases to extract information about. Note that one computation phaselabelcan appear more than once if the corresponding function was called multiple times. This is why the data frame contains theoccurrencecolumn (see Table 4) indicating which occurrence of a given phase should be considered. Thetypecolumn indicates whether a given row in the data frame designs the beginning (ENTER) or the end (EXIT) of the corresponding computation phase. In this data frame, one can provide additional information, i.e. prettified computation phase labels for better data readability (see Table 4). Thephasesdata frame is used to construct the output data frame of this function.Table 4: Example of a data frame that can be passed to read_timelinethrough thephasesargument.type name label occurrence EXIT mpf_solvegemm() Block-wise Schur complement computation 1 … … … … We begin by reading the timeline trace
trace.read_timeline <- function(trace, phases) { complete_timeline <- read.delim(trace, header = FALSE)
For an easier column access, we add column names to the newly created data frame.
colnames(complete_timeline) <- c("rank", "time", "type", "duration", "path")
Then, we construct the resulting
timelinedata frame containing only the data we are interested in based on the inputphasesdata frame. The new data frame contains three additional columns:timeproviding the corresponding computation phase delimiter timestamp,zeroa constant zero column useful for plotting vertical or horizontal axis intercepts,middleproviding the values representing the middles of the intervals defined by the values of thetimecolumn useful to center phase labels in therss_by_timeplot function (see Section 4.1.5.12).
timeline <- phases timeline$time <- NA timeline$middle <- NA timeline$zero <- rep(0, nrow(timeline)) for(i in 1:length(phases)) { time <- complete_timeline[ complete_timeline$type == phases[[i]]["type"] & str_detect(complete_timeline$path, paste0(phases[[i]]["label"], "$")), "time" ] time <- time[[as.integer(phases[[i]]["occurrence"])]] middle <- ifelse( i > 1, time - ((time - timeline[i - 1, 1]) / 2), time / 2 ) timeline[i, "time"] <- as.integer(time) timeline[i, "middle"] <- as.double(middle) }
At the end, we return the
timelinedata frame.return(timeline) }
4.1.3. Formatting data
- Benchmark results
Before actually visualizing data, we have to format it. This is the role of the function
gcvb_formatwhich takes a single argument -data, an R data frame containing benchmark results imported by thegcvb_importfunction (see Section 4.1.2).gcvb_format <- function(data) {
Due to the way certain benchmarks are defined in
gcvb, thesolvercolumn may contain non-uniform values, i.e., in some cases, coupled solver names use dashes instead of slashes, which are the expected separators. For example, we need to replace the dash inmumps-spidoby a slash so it becomesmumps/spido.data$solver <- ifelse( data$solver == "mumps-spido", "mumps/spido", ifelse(data$solver == "mumps-hmat", "mumps/hmat", data$solver) )
Names of columns holding the same information, such as factorization and solve time, expected result accuracy and so on, vary depending on the solver used during a given benchmark. To ease further data treatment, we fusion these into common columns.
data$tps_facto <- ifelse( is.na(data$tps_cpu_facto_mpf), data$tps_cpu_facto, data$tps_cpu_facto_mpf ) data$tps_solve <- ifelse( is.na(data$tps_cpu_solve_mpf), data$tps_cpu_solve, data$tps_cpu_solve_mpf ) data$desired_accuracy <- ifelse( !is.na(data$mumps_blr_accuracy), data$mumps_blr_accuracy, NA ) data$size_schur <- ifelse( is.na(data$size_schur) & data$coupled_method == "multi-facto", data$disk_block_size, data$size_schur ) if("cols_schur" %in% colnames(data)) { data$cols_schur <- ifelse( data$coupled_method == "multi-solve" & data$solver == "mumps/spido", data$coupled_nbrhs, data$cols_schur ) } data$desired_accuracy <- ifelse( (data$solver == "hmat-bem" | data$solver == "hmat-fem" | data$solver == "hmat-fem-nd") & !is.na(data$h_assembly_accuracy), data$h_assembly_accuracy, data$desired_accuracy )
Then, we must remove from the data frame the lines corresponding to unfinished jobs, i.e. lines without computation time information or where the computation time is null.
data <- subset(data, !is.na(tps_facto)) data <- subset(data, !is.na(tps_solve)) data <- subset(data, tps_facto > 0.0) data <- subset(data, tps_solve > 0.0)
If the
dense_ooccolumn is present, we have to take care ofNAvalues so that we can make the column a factor.NAappears in case of purely in-core benchmarks that do not position this variable at all. Finaly, we replaceNAby-1.if("dense_ooc" %in% colnames(data)) { data$dense_ooc <- ifelse( is.na(data$dense_ooc), -1, data$dense_ooc ) data$dense_ooc <- as.character(data$dense_ooc) data$dense_ooc <- factor(data$dense_ooc, levels = c("-1", "0", "1")) }
The same goes for the
energy_scopecolumn. In this case,NAmeans that the tool was not used.if("energy_scope" %in% colnames(data)) { data$energy_scope <- ifelse( is.na(data$energy_scope), "off", data$energy_scope ) data$energy_scope <- as.character(data$energy_scope) data$energy_scope <- factor(data$energy_scope, levels = c("off", "on")) }
Some required columns are not present in the data frame by default, so we have to compute them based on existing data.
The total of cores used for computation is the combination of MPI process count and thread count.
data$p_units <- data$omp_thread_num * data$processesThe value of
n_blocksrepresents the number of blocks the dense submatrix \(A_{ss}\) is split into during the Schur complement computation in the two-stage multi-factorization implementation. When this value is not explicitly set by the user,n_blocksequalsauto. Although, for the plot drawing functions to operate correctly, we need to recompute the corresponding numerical values.if("n_blocks" %in% colnames(data)) { data$n_blocks <- ifelse( data$n_blocks == "auto", ceiling(data$nbem / data$size_schur), as.numeric(data$n_blocks) ) }
Eventually, factorization and solve efficiency computation is split into several steps:
- gathering best sequential times for each solver,
sequentials <- subset( data, select = c("solver", "tps_facto", "tps_solve", "nbpts"), data$p_units == 1 ) if(nrow(sequentials) > 0) { sequentials <- merge( aggregate(tps_facto ~ solver + nbpts, min, data = sequentials), sequentials ) sequentials <- sequentials %>% dplyr::rename( tps_facto_seq = tps_facto, tps_solve_seq = tps_solve )
- adding a column containing sequential time for every run in the data frame,
data <- dplyr::left_join(data, sequentials, by = c("solver", "nbpts"))
- computing the efficiency itself.
data$efficiency_facto <- data$tps_facto_seq / (data$p_units * data$tps_facto) data$efficiency_solve <- data$tps_solve_seq / (data$p_units * data$tps_solve) } return(data) }
4.1.4. Style presets
We also define a common set of data visualization elements to ensure a unified graphic output when drawing plots.
- Label functions
Label functions allow us to format and prettify column names or values when used in plot legends.
label_percentage
This function multiplies
valueby 100 and returns the results as a string with appended percentage sign.label_percentage <- function(value) { return(paste0(value * 100, "%")) }
label_epsilon
This function converts solution accuracy threshold values from numeric values to strings having the form \(1 \times 10^{-n}\) or to ‘unset (no compression)’ if the value is not assigned like in the case of solvers not using data compression.
label_epsilon <- function(labels) { out <- c() for (i in 1:length(labels)) { if(!is.na(labels[i])) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } else { out[i] <- "unset (no compression)" } } return(out) }
label_storage
This function prettifies storage support names.
label_storage <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "rm_peak" = "Real memory (RAM)", "hdd_peak" = "Disk drive" ) } return(out) }
label_ooc
This function prettifies the names of facet grids distinguishing between benchmarks with and without out-of-core computation (column
dense_oocin the data frame).label_ooc <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( labels[i], "-1" = "All in-core", "0" = "Out-of-core except Schur complement", "1" = "All out-of-core" ) } return(out) }
label_solver
This function prettifies solver names.
label_solver <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "spido" = "SPIDO", "hmat-bem" = "HMAT", "hmat-fem" = "HMAT", "hmat-fem-nd" = "HMAT (proto-ND)", "hmat-oss" = "HMAT-OSS", "chameleon" = "Chameleon", "mumps" = "MUMPS", "mumps-blr" = "MUMPS (compressed)", "mumps-no-blr" = "MUMPS (uncompressed)", "pastix" = "PaStiX", "mumps/spido" = "MUMPS/SPIDO", "pastix/spido" = "PaStiX/SPIDO", "mumps/hmat" = "MUMPS/HMAT", "hmat/hmat" = "HMAT" ) } return(out) }
label_tag
This function prettifies
energy_scope(see Section 3.7) tag names.label_tag <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = expression( italic(S[i] == A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = expression( italic(A[vv]^{-1}*A[sv[i]]^{T}) ~ "(sparse solve)" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = expression( italic(A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = expression( italic(compress * group("(", A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}, ")")) ~ "(SpMM)" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(AXPY)" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = expression( italic(A[ss[i]] - A[sv[i]]*A[vv]^{-1}*A[sv[i]]^{T}) ~ "(compressed AXPY)" ), "MPF_SolveGEMM_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Multi-factorization "CreateSchurComplement" = expression( italic(S[i] == A[ss[ij]] - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "CreateSchurComplement_factorization" = expression( italic(X[ij] == - A[sv[i]]*A[vv]^{-1}*A[sv[j]]^{T}) ~ "(Schur compl. block)" ), "MPF_multifact_MUMPS" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), "MPF_multifact_MUMPS_HMAT" = expression( italic(S == A[ss] - A[sv]*A[vv]^{-1}*A[sv]^{T}) ~ "(Schur complement)" ), # Common "testMPF_factorisation_Schur" = expression( italic(S^{-1}) ~ "(Schur factorization)" ), "testMPF_factorisation" = expression( italic(A^{-1}) ~ "(factorization of" ~ italic(A) * ")" ), "testMPF_solve" = expression( italic(A^{-1}*x) ~ "(system solution)" ) ) } return(out) }
label_tag_short
This function do the same thing as
label_tagbut returns shorter labels than the latter.label_tag_short <- function(labels, instance = "i") { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), # Multi-solve "CreateSchurComplement_MultiSolve_MUMPS" = paste0( "italic(S[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_solve" = paste0( "italic(solve[", instance, "])" ), "CreateSchurComplement_MultiSolve_MUMPS_product" = paste0( "italic(SpMM[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_assembly" = paste0( "italic(compress[", instance, "])" ), "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = paste0( "italic(CAXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = paste0( "italic(AXPY[", instance, "])" ), "MPF_SolveGEMM_MUMPS" = "italic(S)", "CreateSchurComplement" = paste0( "italic(S[", instance, "])" ), # Multi-factorization "CreateSchurComplement_factorization" = paste0( "italic(X[", instance, "])" ), "MPF_multifact_MUMPS" = "italic(S)", "MPF_multifact_MUMPS_HMAT" = "italic(S)", # Common "testMPF_factorisation_Schur" = "italic(S^{-1})", "testMPF_factorisation" = "italic(A^{-1})", "testMPF_solve" = "italic(A^{-1}*x)" ) } return(out) }
label_kind
This function prettifies power and storage resources consumption kinds (see Section 4.1.2.2.4).
label_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "ecpu" = "CPU", "eram" = "RAM", "sram" = "RAM", "shdd" = "Disk", "flops" = "Flop rate" ) } return(out) }
label_mapping
This function transforms MPI process mapping values into strings giving a more detailed information about the underlying parallel configuration.
label_mapping <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "node" = "1 unbound MPI process \U2A09 1 to 24 threads", "socket" = "2 MPI processes bound to sockets \U2A09 1 to 12 threads", "numa" = "4 MPI processes bound to NUMAs \U2A09 1 to 6 threads", "core" = "1 to 24 MPI processes bound to cores \U2A09 1 thread" ) } return(out) }
label_nbpts
This function converts problem's unknown counts into the form
N = <count>where<count>uses the scientific notation of the associated numerical values.label_nbpts <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- paste( "\U1D441 =", formatC( as.numeric(labels[i]), format = "e", digits = 1 ) ) } return(out) }
label_scientific
This function converts numeric label values to scientific notation.
label_scientific <- function(labels) { out <- c() for (i in 1:length(labels)) { out[i] <- formatC( as.numeric(labels[i]), format = "e", digits = 1 ) } return(out) }
label_couplingandlabel_coupling_two_stage
These functions prettify names of the implementation schemes for the solution of coupled linear systems.
label_coupling <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve (two-stage)", "multi-facto" = "Multi-factorization (two-stage)", "full-hmat" = "Partially sparse-aware (single-stage)" ) } return(out) } label_coupling_short <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "multi-solve" = "Multi-solve", "multi-facto" = "Multi-factorization", "full-hmat" = "Partially sparse-aware" ) } return(out) }
label_peak_kind
This function prettifies peak memory usage value kinds.
label_peak_kind <- function(labels) { out <- c() for(i in 1:length(labels)) { out[i] <- switch( as.character(labels[i]), "assembly_estimation" = expression("assembled system"), "schur_estimation" = expression("Schur complement matrix " * italic(S)), "peak_symmetric_factorization" = expression( "symmetric factorization of " * italic(A[vv])), "peak_non_symmetric_factorization" = expression( "non-symmetric factorization of " * italic(A[vv])) ) } return(out) }
- Label maps
A label map is a variation of a label function. It provides alternative names for given columns of a dataframe.
phase.labs
This label map provides prettified names for factorization and solve time columns
tps_factoandtps_solve, the corresponding efficiency columnsefficiency_factoandefficiency_solveas well as the column holding the associated execution time of the Schur block creation phaseCreateSchurComplement_exec_time.phase.labs <- c( "Factorization", "Solve", "Factorization", "Solve", "Schur complement computation" ) names(phase.labs) <- c( "tps_facto", "tps_solve", "efficiency_facto", "efficiency_solve", "CreateSchurComplement_exec_time" )
- Plot theme
To preserve a unique color palette as well as sets of point shapes and line types throughout all the plots, we provide each legend entry with the information on its:
- color,
colors <- c( "0.001" = "#F07E26", "1e-06" = "#9B004F", "node" = "#1488CA", "socket" = "#95C11F", "numa" = "#FFCD1C", "core" = "#6561A9", "1 (24 threads)" = "#E63312", "2 (48 threads)" = "#F07E26", "4 (96 threads)" = "#9B004F", "8 (192 threads)" = "#1488CA", "(32, 32)" = "#F07E26", "(48, 48)" = "#9B004F", "(64, 64)" = "#1488CA", "(128, 128)" = "#95C11F", "(256, 256)" = "#E63312", "(256, 512)" = "#FFCD1C", "(512, 512)" = "#800080", "(256, 1024)" = "#6561A9", "(256, 2048)" = "#89CCCA", "(256, 4096)" = "#384257", "1" = "#1488CA", "2" = "#95C11F", "3" = "#FFCD1C", "4" = "#6561A9", "5" = "#384257", "250000" = "#384257", "5e+05" = "#1488CA", "1e+06" = "#E63312", "1200000" = "#95C11F", "1400000" = "#FFCD1C", "1500000" = "#6561A9", "2e+06" = "#F07E26", "2500000" = "#9B004F", "multi-facto" = "#6561A9", "multi-solve" = "#89CCCA", "full-hmat" = "#C7D64F", "spido" = "#384257", "hmat" = "#9B004F", "mumps" = "#F07E26", "mumps-blr" = "#1488CA", "mumps/spido" = "#95C11F", "mumps/hmat" = "#FFCD1C", "ecpu" = "#384257", "eram" = "#E63312", "sram" = "#1488CA", "shdd" = "#95C11F", "flops" = "#9B004F" )
- point shape style,
shapes <- c( "spido" = 16, "hmat-bem" = 15, "hmat-fem" = 15, "hmat-oss" = 15, "hmat-fem-nd" = 23, "chameleon" = 18, "mumps" = 16, "pastix" = 17, "mumps/spido" = 16, "pastix/spido" = 17, "mumps/hmat" = 15, "hmat/hmat" = 18, "rm_peak" = 16, "vm_peak" = 17, "hdd_peak" = 15, "CreateSchurComplement_MultiSolve_MUMPS" = 22, "CreateSchurComplement_MultiSolve_MUMPS_solve" = 22, "CreateSchurComplement_MultiSolve_MUMPS_product" = 21, "MPF_SolveGEMM_MUMPS_HMAT_assembly" = 23, "MPF_SolveGEMM_MUMPS_SPIDO_AXPY" = 24, "MPF_SolveGEMM_MUMPS_HMAT_AXPY" = 25, "MPF_SolveGEMM_MUMPS" = 23, "CreateSchurComplement" = 22, "CreateSchurComplement_factorization" = 22, "MPF_multifact_MUMPS" = 23, "MPF_multifact_MUMPS_HMAT" = 23, "testMPF_factorisation_Schur" = 24, "testMPF_factorisation" = 23, "testMPF_solve" = 24 )
- line type.
linetypes <- c( "spido" = "solid", "hmat-bem" = "dotted", "hmat-fem" = "dotted", "hmat-oss" = "dotted", "hmat-fem-nd" = "longdash", "chameleon" = "longdash", "mumps" = "solid", "pastix" = "dashed", "mumps/spido" = "solid", "pastix/spido" = "dashed", "mumps/hmat" = "dotted", "hmat/hmat" = "longdash", "rm_peak" = "solid", "vm_peak" = "dotted", "hdd_peak" = "longdash", "assembly_estimation" = "solid", "peak_symmetric_factorization" = "dotted", "peak_non_symmetric_factorization" = "longdash", "schur_estimation" = "dashed" )
Furthermore, each plot object uses a set of common theme layers provided by the function
generate_theme. If needed, the latter allows us to provide custom breaks for color, point shape or line type aesthetics, legend title, the number of lines allowed in the legend as well as the legend box placement.generate_theme <- function( color_breaks = waiver(), color_labels = waiver(), color_override_aes = list(), shape_breaks = waiver(), shape_labels = waiver(), shape_override_aes = list(), linetype_breaks = waiver(), linetype_labels = waiver(), linetype_override_aes = list(), legend_title = element_text(family = "CMU Serif", size = 14, face = "bold"), legend_text = element_text(family = "CMU Serif", size = 14), theme_text = element_text(family = "CMU Serif", size = 16), legend_rows = 1, legend_rows_color = NA, legend_rows_fill = NA, legend_rows_shape = NA, legend_rows_linetype = NA, legend_box = "vertical", legend_position = "bottom" ) { return(list( scale_color_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_fill_manual( values = colors, na.value = "#384257", labels = color_labels, breaks = color_breaks ), scale_shape_manual( values = shapes, labels = shape_labels, breaks = shape_breaks, na.translate = F ), scale_linetype_manual( values = linetypes, labels = linetype_labels, breaks = linetype_breaks ), guides( shape = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_shape), legend_rows, legend_rows_shape ), byrow = TRUE, override.aes = shape_override_aes ), linetype = guide_legend( order = 1, nrow = ifelse( is.na(legend_rows_linetype), legend_rows, legend_rows_linetype ), byrow = TRUE, override.aes = linetype_override_aes ), color = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_color), legend_rows, legend_rows_color ), byrow = TRUE, override.aes = color_override_aes ), fill = guide_legend( order = 2, nrow = ifelse( is.na(legend_rows_fill), legend_rows, legend_rows_fill ), byrow = TRUE ) ),
We set the font family to Computer Modern Serif in order to match the font family used in LaTeX typeset PDF documents. Then, we set the font size to 16 points and the legend text size to 14 points.
theme_bw(), theme( text = theme_text, legend.text = legend_text, legend.title = legend_title,We place the legend at the desired position and sourround it with a rectangle.
legend.position = legend_position, legend.background = element_rect(color = "gray40", size = 0.5),We place legend items either side by side or one by line.
legend.box = legend_box,
For better visibility, we make legend symbols longer.
legend.key.width = unit(3, "line"),Finally, we rotate X-axis text to avoid long label overwriting.
axis.text.x = element_text(angle = 60, hjust = 1) ) )) }
4.1.5. Plot functions
In performance analysis, the metrics we study are in general the same. So, we prepared a series of plot functions for each type of data visualization. This way, we can for example reuse the same function to plot disk usage peaks by problem unknown's count for any combination of solvers.
times_by_nbpts
This function returns a plot of factorization and solve times relatively to linear system's unknown count for a given set of either dense or sparse solvers.
We combine both metrics in one plot in which they are distinguished thanks to a facet grid. Although, to be able to use the latter, we need to convert the data frame from wide format into long format (see Section 4.1.2) RwideToLong.
times_by_nbpts <- function(dataframe) { dataframe_long <- gather( dataframe, key = "computation_phase", value = "time", c("tps_facto", "tps_solve") )
Now, we can begin to define the resulting
ggplot2plot object. The column names featured in this plotting function are:nbptslinear system's unknown count,tps_factorepresents factorization time in seconds,tps_solverepresents solve time in seconds,desired_accuracyrepresents solution accuracy threshold \(\epsilon\),computation_phasetells whether the row contains factorization or solve time (used in the long formatted data framedataframe_long),timecontains factorization or solve time (used in the long formatted data framedataframe_long).
plot <- ggplot(dataframe_long, aes(x = nbpts, y = time))In case the function is used to plot benchmark results of a solver using various data compression levels, we add a color aesthetics in order to distinguish among different sets of points.
if(!all(is.na(dataframe_long$desired_accuracy))) { plot <- ggplot( dataframe_long, aes( x = nbpts, y = time, color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) }
For better readability, we use log10 scale on both axes.
plot <- plot + geom_line() + geom_point(size = 2.5) + scale_x_continuous( "# Unknowns (\U1D441)", trans = "log10", breaks = dataframe_long$nbpts, labels = scientific ) + scale_y_continuous( "Computation time [s]", trans = "log10", labels = scientific ) +
Finally, using facet grid we distinguish between factorization and solve time on horizontal and between each solver considered on vertical row. The plot theme is provided by the
generate_themefunction (see Section 4.1.4.3).facet_grid( . ~ computation_phase, labeller = labeller(computation_phase = phase.labs) ) + labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, shape_labels = label_solver, linetype_labels = label_solver ) return(plot) }multisolve_times_by_nbrhs_and_nbpts
This function returns a plot of factorization time relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. Using the
oocswitch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation. Thedistributedswitch shall adapt the output figure for multi-node benchmarks if set toTRUE.y_break_widthallows one to customize the distance between Y-axis breaks. By default, it corresponds to 20% of the maximum Y-axis value, i.e. the maximum value of thetps_factocolumn.Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbptslinear system's unknown count,tps_factorepresents factorization time in seconds,tps_solverepresents solve time in seconds,processesgives the number of MPI processes used for the computation,p_unitsgives the total number of threads used for the computation,cols_schurgives the number of columns in a Schur complement block,solvercontains the names of solvers featured in the coupling.
Note that we create an additional column
p_labels. It contains the legend labels for the color aesthetics combining the values ofprocessesandp_units, e.g.2 (48 threads)for 2 nodes and 48 threads in total.multisolve_times_by_nbpts_and_nbrhs <- function(dataframe, ooc = FALSE, distributed = FALSE, y_break_width = as.integer( 0.2 * max( dataframe$tps_facto ))) { dataframe$cols_schur_labels <- paste0( "(", dataframe$coupled_nbrhs, ", ", dataframe$cols_schur, ")" ) cbreaks <- unique(na.omit(dataframe$cols_schur_labels)) cbreaks <- cbreaks[order( as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)), as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] if(distributed) { dataframe$p_labels <- paste0( dataframe$processes, " (", dataframe$p_units, " threads)" ) plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe, aes( x = nbpts, y = tps_facto, color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe$nbpts, labels = scientific ) + scale_y_continuous( "Factorization time [s]", breaks = breaks_width(y_break_width), labels = scientific )If the
oocswitch is set toTRUE, include a facet grid based on thedense_ooccolumn values.if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the
cols_schurcolumn.if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(dataframe$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver, legend_rows_color = 2 ) } return(plot) }
multisolve_rss_by_nbrhs_and_nbpts
This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count for different counts of right-hand sides used during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. The function can take two extra arguments,
limit_maxandtick_values, allowing to redefine the default Y-axis maximum and tick values respectively. Using theoocswitch, we can include a facet grid to distinguish between runs with and without out-of-core Schur complement computation as well as between RAM and disk space consumption.The column names featured in this plotting function are:
nbptslinear system's unknown count,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peakhard drive usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),processesgives the number of MPI processes used for the computation,p_unitsgives the total number of threads used for the computation,cols_schurgives the number of columns in a Schur complement block,solvercontains the names of solvers featured in the coupling.
Note that we create an additional column
p_labels. It contains the legend labels for the color aesthetics combining the values ofprocessesandp_units, e.g.2 (48 threads)for 2 nodes and 48 threads in total.We begin by defining the plot object directly.
multisolve_rss_by_nbpts_and_nbrhs <- function(dataframe, limit_max = 126, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE ) { df <- dataframe df$cols_schur_labels <- paste0( "(", df$coupled_nbrhs, ", ", df$cols_schur, ")" ) cbreaks <- unique(na.omit(df$cols_schur_labels)) cbreaks <- cbreaks[order(as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\1", cbreaks)),as.integer(gsub("\\(([0-9]+), ([0-9]+)\\)", "\\2", cbreaks)))] y_axis_label <- "Real memory (RAM) usage peak [GiB]" if(distributed) { df$p_labels <- paste0( df$processes, " (", df$p_units, " threads)" ) plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = p_labels, shape = solver, linetype = solver ) ) } else if(ooc) { rss_kinds <- c("rm_peak", "hdd_peak") df <- gather( df, key = "rss_kind", value = "rss_peak", all_of(rss_kinds) ) df$rss_kind <- factor(df$rss_kind, levels = rss_kinds) y_axis_label <- "Storage usage peak [GiB]" plot <- ggplot( df, aes( x = nbpts, y = rss_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } else { plot <- ggplot( df, aes( x = nbpts, y = rm_peak / 1024., color = as.character(cols_schur_labels), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = df$nbpts, labels = scientific ) + scale_y_continuous( y_axis_label, labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )If the
oocswitch is set toTRUE, include a facet grid based on therss_kindand thedense_ooccolumn values.if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the
cols_schurcolumn.if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(df$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(bold(group("(", list(n[italic(c)], n[italic(S)]), ")"))) ) + generate_theme( color_breaks = cbreaks, shape_labels = label_solver, linetype_labels = label_solver, legend_rows_color = 2 ) } return(plot) }
multisolve_memory_aware
This function returns a plot of factorization time relatively to residual memory (RAM) usage peaks during the Schur complement computation when relying on the two-stage multi-solve implementation scheme. In this case, we show the results for a fixed problem's size, i.e.
my_nbpts.narrowallow to optimize the output for multi-column document layout.The column names featured in this plotting function are:
tps_factorepresents factorization time in seconds,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),solvercontains the names of solvers featured in the coupling.
We begin by shrinking the data frame on input according to
my_nbpts.multisolve_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE, font = "CMU Serif", fontsize = 16) { dataframe_largest <- subset( dataframe, coupled_method == "multi-solve" & nbpts == my_nbpts )
Then, we create an additional column in the data frame to hold prettified labels providing the Schur complement computation phase parameter values.
dataframe_largest$label_content <- ifelse( dataframe_largest$solver != "mumps/spido", paste0( "atop(n[c]==", dataframe_largest$coupled_nbrhs, ", ", "n[S]==", dataframe_largest$cols_schur, ")" ), paste0("n[c]==", dataframe_largest$coupled_nbrhs) )
Now, we define the plot object. We also need to find the minimal and the maximal values for the axes (columns
rm_peakandtps_facto).rm_min <- as.integer(min(dataframe_largest$rm_peak) / 1024.) rm_max <- as.integer(max(dataframe_largest$rm_peak) / 1024.) tps_min <- as.integer(min(dataframe_largest$tps_facto)) tps_max <- as.integer(max(dataframe_largest$tps_facto)) plot <- ggplot( dataframe_largest, aes( x = rm_peak / 1024., y = tps_facto, color = coupled_method, label = label_content, shape = solver, linetype = solver ) ) + geom_line(size = 0.8) + geom_point(size = 2.5) + coord_cartesian(clip = "off") +
We make use of the
geom_label_repelfunction from theggrepelR package to obtain better label layout.geom_label_repel( parse = TRUE, xlim = c(NA, NA), ylim = c(NA, NA), color = "black", family = font, size = 3, min.segment.length = 0, max.overlaps = Inf ) +The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.
scale_x_continuous( "Random access memory (RAM) usage peak [GiB]", breaks = seq( rm_min - 5, rm_max + 5, by = 5 ), limits = c(NA, NA), trans = "log10" ) + scale_y_continuous( "Computation time [s]", trans = "log10", labels = scientific ) +Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-solve implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.
labs( shape = "Solver coupling", linetype = "Solver coupling" ) if(narrow) { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-solve"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-solve"]), legend_rows = 2, legend_box = "horizontal", legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-solve"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-solve"]), legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } plot <- plot + guides( color = FALSE, fill = FALSE ) return(plot) }multifacto_memory_aware
This function returns a plot of factorization time relatively to residual memory (RAM) usage peaks during the Schur complement computation when relying on the two-stage multi-factorization implementation scheme. In this case, we show the results for a fixed problem's size, i.e.
my_nbpts.narrowallow to optimize the output for multi-column document layout.The column names featured in this plotting function are:
tps_factorepresents factorization time in seconds,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),solvercontains the names of solvers featured in the coupling.
We begin by shrinking the data frame on input according to
my_nbpts.multifacto_memory_aware <- function(dataframe, my_nbpts, narrow = FALSE, font = "CMU Serif", fontsize = 16) { dataframe_largest = subset( dataframe, coupled_method == "multi-facto" & nbpts == my_nbpts )
Before plotting, we also need to:
- convert the
n_blockscolumn from string to numeric type and use it to create a corresponding prettified label column,
dataframe_largest$n_blocks = as.numeric(dataframe_largest$n_blocks) dataframe_largest$label_content = paste0( "n[b]==", dataframe_largest$n_blocks )- convert the values of
rm_peakto gibibytes (GiB),
dataframe_largest$rm_peak = as.numeric(dataframe_largest$rm_peak / 1024.)
- order the rows of the data frame based on the
n_blockscolumn for the data points to be correctly connected by line segments on the final plot.
dataframe_largest <- dataframe_largest[order(dataframe_largest$n_blocks),]Then, we define the plot object. We also need to find the minimal and the maximal values for the axes (columns
rm_peakandtps_facto).rm_min = as.integer(min(dataframe_largest$rm_peak)) rm_max = as.integer(max(dataframe_largest$rm_peak)) tps_min = as.integer(min(dataframe_largest$tps_facto)) tps_max = as.integer(max(dataframe_largest$tps_facto)) plot = ggplot( dataframe_largest, aes( x = rm_peak, y = tps_facto, color = coupled_method, label = label_content, shape = solver, linetype = solver ) ) + geom_path() + geom_point(size = 2.5) + coord_cartesian(clip = "off") +We make use of the
geom_label_repelfunction from theggrepelR package to obtain better label layout.geom_label_repel( parse = TRUE, xlim = c(NA, NA), ylim = c(NA, NA), color = "black", family = font, size = 3, min.segment.length = 0, max.overlaps = Inf ) +The X-axis shows the RAM usage peaks and the Y-axis shows the count of unknowns in the linear system. We use custom break and limit values for better readability.
scale_x_continuous( "Random access memory (RAM) usage peak [GiB]", breaks = seq( rm_min - 5, rm_max + 5, by = 5 ), limits = c(NA, NA), trans = "log10" ) + scale_y_continuous( "Computation time [s]", trans = "log10", labels = scientific ) +Finally, we set the legend title, apply the custom theme and return the plot object. Note that we do not show the legend for the color aesthetics as there is only one item, the multi-factorization implementation scheme. However, we still define the color aesthetics in order to preserve the general color scheme.
labs( shape = "Solver coupling", linetype = "Solver coupling" ) if(narrow) { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-facto"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-facto"]), legend_rows = 2, legend_box = "horizontal", legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( shape_labels = label_solver, shape_override_aes = list(color = colors["multi-facto"]), linetype_labels = label_solver, linetype_override_aes = list(color = colors["multi-facto"]), legend_title = element_text( family = font, size = fontsize - 2, face = "bold" ), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } plot <- plot + guides( color = FALSE, fill = FALSE ) return(plot) }multifacto_times_by_nbpts_and_schur_size
This function returns a plot of factorization time relatively to linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme.
The out-of-core block size column is not the same for all the solver couplings we consider (column
disk_block_sizein case of MUMPS/SPIDO andschur_sizein case of MUMPS/HMAT), so we need to copy the data to the same column,size_schur.multifacto_times_by_nbpts_and_schur_size <- function( dataframe, ooc = FALSE, distributed = FALSE, y_break_width = as.integer( 0.2 * max( dataframe$tps_facto ))) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbptslinear system's unknown count,tps_factorepresents factorization time in seconds,n_blocksgives the count of blocks the Schur complement matrix was split to,size_schurgives the out-of-core block size in number of matrix lines or columnssolvercontains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = p_labels, shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = tps_facto, color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = scientific ) + scale_y_continuous( "Factorization time [s]", breaks = breaks_width(y_break_width), labels = scientific )If the
oocswitch is set toTRUE, include a facet grid based on thedense_ooccolumn values.if(ooc) { plot <- plot + facet_grid( . ~ dense_ooc, labeller = labeller(dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the
nbptscolumn.if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
multifacto_rss_by_nbpts_and_schur_size
This function returns a plot of real memory (RAM) usage peaks relatively to linear system's unknown count and the number of blocks \(n_b\) the Schur submatrix is split into during the Schur complement computation phase when relying on the two-stage multi-factorization implementation scheme. The function can take two extra arguments,
limit_maxandtick_values, allowing to redefine the default Y-axis maximum and tick values respectively.The out-of-core block size column is not the same for all the solver couplings we consider (column
disk_block_sizein case of MUMPS/SPIDO andschur_sizein case of MUMPS/HMAT), so we need to copy the data to the same column,size_schur.multifacto_rss_by_nbpts_and_schur_size <- function(dataframe, limit_max = 126, tick_values = c( 0, 30, 60, 90, 120), ooc = FALSE, distributed = FALSE ) { local <- dataframe local$size_schur[local$solver == "mumps/spido"] = local$disk_block_size[local$solver == "mumps/spido"]
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbptslinear system's unknown count,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),n_blocksgives the count of blocks the Schur complement matrix was split to,size_schurgives the out-of-core block size in number of matrix lines or columnssolvercontains the names of solvers featured in the coupling.
if(distributed) { local$p_labels <- paste0( local$processes, " (", local$p_units, " threads)" ) plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(p_labels), shape = solver, linetype = solver ) ) } else if(ooc) { local <- gather( local, key = "rss_kind", value = "rss_peak", c("rm_peak", "hdd_peak") ) plot <- ggplot( local, aes( x = nbpts, y = rss_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } else { plot <- ggplot( local, aes( x = nbpts, y = rm_peak / 1024., color = as.character(n_blocks), shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5) +
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = local$nbpts, labels = scientific ) + scale_y_continuous( "Real memory (RAM) usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )If the
oocswitch is set toTRUE, include a facet grid based on therss_kindand thedense_ooccolumn values.if(ooc) { plot <- plot + facet_grid( rss_kind ~ dense_ooc, labeller = labeller(rss_kind = label_storage, dense_ooc = label_ooc) ) }
Finally, we set the legend title, apply the custom theme and return the plot object. The color breaks are automatically determined based on the values of the
nbptscolumn.if(distributed) { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = "# Nodes" ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$p_labels)))), shape_labels = label_solver, linetype_labels = label_solver ) } else { plot <- plot + labs( shape = "Solver coupling", linetype = "Solver coupling", color = expression(n[italic(b)]) ) + generate_theme( color_breaks = as.character(sort(unique(na.omit(local$n_blocks)))), shape_labels = label_solver, linetype_labels = label_solver ) } return(plot) }
compare_coupled
This function returns a plot of the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
For example, if we want to plot only the results regarding two-stage schemes, we can use the
legend_rowsargument to adjust the number of rows in the legend. Then, theshortenargument allows to remove legend titles and use shorter legend key labels in order to reduce the final canvas size.We begin by restraining the input data frame to the best results of two-stage and single-stage schemes.
Note that as the latter imply the use of only one solver and not a coupling of solvers, we need to specify the coupling method name manually to enable data frame treatment.
compare_coupled <- function(dataframe, legend_rows = 3, shorten = FALSE, check_epsilon = FALSE, font = "CMU Serif", fontsize = 16) { dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbptslinear system's unknown count,tps_factorepresents factorization time in seconds,coupled_methodgives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solvercontains the names of solvers featured in the coupling.
plot <- NA if(check_epsilon) { plot <- ggplot( dataframe_best, aes( x = nbpts, y = error, color = coupled_method, shape = solver, linetype = solver ) ) } else { plot <- ggplot( dataframe_best, aes( x = nbpts, y = tps_facto, color = coupled_method, shape = solver, linetype = solver ) ) } plot <- plot + geom_line() + geom_point(size = 2.5)
The X-axis shows the count of unknowns in the linear system and the Y-axis shows the computation time.
if(check_epsilon) { vjust <- -0.6 size <- 5 if(shorten) { vjust <- -0.3 size <- 4 } plot <- plot + geom_hline(yintercept = 1.0e-3) + geom_text( aes( 1.0e+6, 1.0e-3, label = as.character(expression(epsilon == 10^{-3})), vjust = vjust, hjust = 0 ), parse = TRUE, color = "black", family = font, size = size ) } plot <- plot + scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = scientific ) if(check_epsilon) { plot <- plot + scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe_best$error), 2.0e-3), labels = scientific, breaks = breaks_log(10), trans = "log10" ) } else { plot <- plot + scale_y_continuous( "Factorization time [s]", labels = scientific, trans = "log10", limits = c(NA, NA) ) }
Finally, we set the legend title, apply the default theme and return the plot object. The color, shape and line type breaks are automatically determined based on the values of the
coupled_methodandsolvercolumns.labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) if(shorten) { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling_short, color_override_aes = list(linetype = 0, shape = 15, size = 6), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = "horizontal", legend_title = element_blank(), legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } else { plot <- plot + generate_theme( color_breaks = unique(na.omit(dataframe_best$coupled_method)), color_labels = label_coupling, color_override_aes = list(linetype = 0, shape = 15, size = 8), shape_breaks = unique(na.omit(dataframe_best$solver)), shape_labels = label_solver, linetype_breaks = unique(na.omit(dataframe_best$solver)), linetype_labels = label_solver, legend_rows = legend_rows, legend_box = "horizontal", legend_text = element_text(family = font, size = fontsize - 2), theme_text = element_text(family = font, size = fontsize) ) } return(plot) }compare_rss_coupled
This function returns a plot of the real memory (RAM) usage peaks for the best factorization times relatively to linear system's unknown count for each implementation scheme for solving coupled systems.
We begin by restraining the input data frame to the best results of two-stage schemes and the results of the single-stage scheme implemented using HMAT.
Note that as the latter only implies the use of one single solver, we need to specify the coupling method name manually to enable the data frame treatment.
compare_rss_coupled <- function(dataframe) { dataframe_best <- dataframe dataframe_best$coupled_method <- as.character(dataframe_best$coupled_method) dataframe_best$coupled_method[dataframe_best$solver == "hmat/hmat"] <- "full-hmat" dataframe_best <- merge( aggregate( tps_facto ~ coupled_method + nbpts + solver, min, data = dataframe_best ), dataframe_best )
Now, we can begin to define the plot object. The column names featured in this plotting function are:
nbptslinear system's unknown count,tps_factorepresents factorization time in seconds,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),coupled_methodgives the name of the approach used to solve coupled FEM/BEM systems (e. g. multi-solve, etc.),solvercontains the names of solvers featured in the coupling.
plot <- ggplot( dataframe_best, aes( x = nbpts, y = rm_peak / 1024., color = coupled_method, shape = solver, linetype = solver ) ) + geom_line() + geom_point(size = 2.5) +The X-axis shows the count of unknowns in the linear system and the Y-axis shows the RAM usage peaks.
scale_x_continuous( "# Unknowns (\U1D441)", breaks = dataframe_best$nbpts, labels = scientific ) + scale_y_continuous( "Real memory (RAM) usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +Finally, we set the legend title, apply the default theme and return the plot object. The color, shape and line type breaks are automatically determined based on the values of the
coupled_methodandsolvercolumns.labs( shape = "Solvers", linetype = "Solvers", color = "Implementation\nscheme" ) + generate_theme( color_breaks = length(unique(na.omit(dataframe_best$coupled_method))), color_labels = label_coupling, shape_breaks = length(unique(na.omit(dataframe_best$solver))), shape_labels = label_solver, linetype_breaks = length(unique(na.omit(dataframe_best$solver))), linetype_labels = label_solver, legend_rows = 3, legend_box = "horizontal" ) return(plot) }accuracy_by_nbpts
This function returns a plot of relative solution error with respect to linear system's unknown count.
The column names featured in this plotting function are:
nbptslinear system's unknown count,errorrelative forward error of the solution approximation.
accuracy_by_nbpts <- function(dataframe) { plot <- ggplot( dataframe, aes( x = nbpts, y = error, color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) + geom_line() + geom_point(size = 2.5) + scale_x_continuous( "# Unknowns (\U1D441)", trans = "log10", breaks = dataframe$nbpts, labels = scientific ) +
For the Y-axis, we define the tick values manually for better comparison to the associated relative error values.
scale_y_continuous( expression(paste("Relative error (", E[rel], ")")), limits = c(min(dataframe$error), max(dataframe$error)), breaks = c(1e-13, 1e-10, 1e-6, 1e-3), labels = scientific, trans = "log10" ) +Finally, we set the legend labels, apply the common theme parameters and return the plot object.
labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, shape_labels = label_solver, linetype_labels = label_solver ) return(plot) }
rss_peaks_by_nbpts
This function returns a plot of real memory (RAM) and disk space usage peaks relatively to linear system's unknown count. The function can take two extra arguments,
limit_maxandtick_values, allowing to redefine the default Y-axis maximum and tick values respectively.The column names featured in this plotting function are:
nbptslinear system's unknown count,rm_peakreal memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),hdd_peakdisk space usage peaks, stored in MiB but converted to GiB.
At first, we convert the data frame from wide to long format.
rss_peaks_by_nbpts <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120)) { dataframe_long <- gather( dataframe, key = "memory_type", value = "memory_usage", c("rm_peak", "hdd_peak") )
Also, we fix the order of the values in
memory_typeso that real memory (RAM) usage peaks come before disk usage peaks in the facet grid.dataframe_long$memory_type <- factor( dataframe_long$memory_type, c('rm_peak', 'hdd_peak') )
Then, we configure the plot object itself as described above and handle different levels of expected relative solution error in case of solvers using data compression by adding the shape aesthetics.
plot <- ggplot( dataframe_long, aes( x = nbpts, y = memory_usage / 1024., color = as.character(desired_accuracy), shape = solver, linetype = solver ) ) + geom_line() + geom_point(size = 2.5) +The X-axis shows the count of unknowns in the linear system and the Y-axis shows the storage usage peaks. On the Y-axis, we round the values to 0 decimal places.
scale_x_continuous( "# Unknowns (\U1D441)", trans = "log10", breaks = dataframe_long$nbpts, labels = scientific ) + scale_y_continuous( "Storage usage peak [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values ) +Storage types are distinguished using a facet grid.
facet_grid( . ~ memory_type, labeller = labeller(memory_type = label_storage) ) +We end by setting legend titles, applying the common theme parameters and returning the plot object.
labs(color = "\U1D700", shape = "Solver", linetype = "Solver") + generate_theme( color_labels = label_epsilon, shape_labels = label_solver, linetype_labels = label_solver ) return(plot) }
rss_by_time
This function returns a plot of real memory (RAM) usage relatively to the execution time. The
dataframeargument represents the data frame containing RAM consumption data read using theread_rssfunction (see Section 4.1.2.2.2). Moreover, usinglimit_maxandtick_valuesone can redefine the default Y-axis maximum and tick values respectively to improve readibility of the plot. Thetimelineargument is useful whendataframecontains coupled FEM/BEM benchmark results.timelinerepresents an additional data frame that can be used to highlight selected computation phases and estimated reference RAM usage peaks of a sparse FEM factorization and Schur matrix size in the plot (see Section 4.1.2.3).The column names featured in this plot function are:
timetimestamp in seconds giving the time elapsed since the beginning of the execution,rssglobal real memory usage in mibibytes (MiB) but converted to gibibytes (GiB).
We begin directly by defining the plot object.
rss_by_time <- function(dataframe, limit_max = 126, tick_values = c(0, 30, 60, 90, 120), estimated_peak_factorization = NA, estimated_schur = NA, timeline = NA) { plot <- ggplot( dataframe, aes( x = timestamp, y = consumption, fill = coupled_method ) ) +
We draw the RAM usage as an area chart.
geom_area()
In the case the extra
timelineargument is present, we draw additional labelled vertical lines corresponding to selected computation phases using thegeom_vlineandgeom_textlayers. Note that for thegeom_textlayer we also need to adjust the text position and rotation. To enable the processing of the labels as math expressions, we use theparseparameter.if(!is.na(timeline)) { plot <- plot + geom_vline( data = timeline, aes(xintercept = time), linetype = "dotted", size = 0.4 ) + geom_text( data = timeline, aes( x = middle, y = 0.75 * limit_max, label = label ), parse = T, angle = 90, vjust = "center" )
Then, we draw also the horizontal segments corresponding to estimated RAM usage peaks of a sparse FEM factorization and estimated Schur matrix size.
However, at first, we have to convert the
timelinedata frame to the long format (see Section 4.1.2). This allows us to use different sizes to distinguish different segments based on the kind of value they represent.columns <- c("assembly_estimation", "peak_symmetric_factorization", "peak_non_symmetric_factorization", "schur_estimation") timeline_long <- gather( timeline, key = "peak_kind", value = "peak_value", columns ) plot <- plot + geom_spoke( data = timeline_long, aes( x = middle - 0.5 * (time - middle), y = peak_value, linetype = peak_kind, radius = time - middle ), angle = 2 * pi ) }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
plot = plot + scale_x_continuous("Execution time [s]") + scale_y_continuous( "Real memory (RAM) usage [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, limit_max), breaks = tick_values )We end by configuring the visual aspect of the plot.
if(is.data.frame(timeline)) { plot <- plot + labs( fill = "Implementation scheme", linetype = "Estimated peak usage" ) + generate_theme( color_labels = label_coupling, linetype_labels = label_peak_kind, legend_rows_linetype = length(columns) ) } else { plot <- plot + labs(fill = "Implementation scheme") + generate_theme(color_labels = label_coupling) } return(plot) }
eprofile
This function returns a plot of power consumtion relatively to execution time. The first and only argument
es_dataof the function corresponds to the data frame containing power consumption data as well as tag information, if any (see Section 4.1.2.2.4).eprofile <- function(es_data, zoom_on = c(NA, NA), TDP = NA, rss = TRUE, flops = TRUE) { zoomed <- FALSE solvers <- unique(na.omit(es_data$solver)) es <- es_data if(!is.na(zoom_on[1]) && !is.na(zoom_on[2])) { zoomed <- TRUE es <- subset( es, timestamp >= zoom_on[1] * 1000 & timestamp <= zoom_on[2] * 1000 ) } edata <- subset(es, kind_general == "e") power <- ggplot( data = subset(edata, is.na(type)) ) sdata <- NA storage <- NA if(rss) { sdata <- subset(es, kind_general == "s") sdata$kind <- factor(sdata$kind, levels = c("sram", "shdd")) storage <- ggplot( data = subset(sdata, is.na(type)) ) } fdata <- NA if(flops) { fdata <- subset(es, kind_general == "f") fdata$kind <- factor(fdata$kind, levels = c("flops")) }
At first, we draw the power consumption curve. The tags are represented by colored points surrounded with black stroke and annotated with labels. Note that the latter are stored as math expressions and need to be parsed by the
geom_label_repelfunction.if(flops) { power <- power + geom_line( data = fdata, aes(x = timestamp, y = consumption / 1000, color = kind) ) } power <- power + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(edata, !is.na(type) & kind == "ecpu"), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = "CMU Serif", parse = TRUE, show.legend = FALSE ) if(rss) { storage <- storage + geom_line(aes(x = timestamp, y = consumption, color = kind)) + geom_point( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, shape = tag), color = "black", fill = "white", size = 3 ) + geom_label_repel( data = subset(sdata, !is.na(type)), aes(x = timestamp, y = consumption, label = type_label), force_pull = 0, segment.color = "black", min.segment.length = 0, max.overlaps = 20, family = "CMU Serif", parse = TRUE, show.legend = FALSE ) }
If the
TDPparameter is provided, we add a horizontal line matching its value.if(!is.na(TDP)) { power <- power + geom_hline(yintercept = TDP) + geom_text( aes( 0, TDP, label = paste("TDP:", as.character(TDP), "W"), vjust = 2, hjust = 0 ), color = "black", family = "CMU Serif", size = 5 ) }
Next, we set up the axis and their scaling. We round the Y-axis values to two decimal places.
power <- power + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) if(flops) { coeff <- 0.2 if(zoomed) { coeff <- 1.1 } power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ), sec.axis = sec_axis( ~ . * 1, name = "Flop rate [Gflop/s]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(coeff * (max(fdata$consumption) / 1000)) ) ) ) } else { power <- power + scale_y_continuous( "Power consumption [W]", labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.25 * max(edata$consumption)) ) ) } if(rss) { y_axis_title <- ifelse( length(unique(na.omit(sdata$kind))) < 2, "RAM usage [GiB]", "Storage usage [GiB]" ) storage <- storage + scale_x_continuous( "Execution time [s]", labels = function(label) sprintf("%d", as.integer(label / 1000)), limits = c(NA, NA), breaks = breaks_extended(n = 10) ) + scale_y_continuous( y_axis_title, labels = function(label) sprintf("%.0f", label), breaks = breaks_width( as.integer(0.2 * max(sdata$consumption)) ) ) }
We end by configuring the visual aspect of the plot. We use a facet grid to draw a separate plot for each solver coupling.
if("coupled_method" %in% colnames(es_data)) { power <- power + facet_grid( . ~ coupled_method + solver, labeller = labeller( coupled_method = label_coupling, solver = label_solver ), scales = "free_x", switch = "y" ) } else { power <- power + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) } if(rss) { storage <- storage + facet_grid( . ~ solver, labeller = labeller( solver = label_solver ), scales = "free_x", switch = "y" ) + labs( shape = "Computation phase" ) } power <- power + generate_theme( color_labels = label_kind, color_override_aes = list(size = 8), shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) if(rss) { power <- power + labs( color = "Device" ) + guides( fill = "none", shape = "none", linetype = "none" ) + theme( axis.title.x = element_blank() ) storage <- storage + generate_theme( shape_labels = label_tag, legend_rows_shape = 1, legend_box = "horizontal" ) + guides(color = "none", fill = "none") + theme( strip.text.x = element_blank(), strip.background.x = element_blank() ) return( plot_grid( power, storage, nrow = 2, ncol = 1, align = "v", axis = "b", rel_heights = c(1.1, 1) ) ) } power <- power + labs( color = "Device", shape = "Computation phase" ) + guides( fill = "none" ) return(power) }
es_comparison
es_comparison <- function(dataframe) { extended <- dataframe extended$etotal <- as.numeric(0) for(i in 1:nrow(extended)) { gcvb_retrieve_files(extended[i, ]) eprofile <- fromJSON(file = "./eprofile.txt") extended[i, "etotal"] <- as.numeric( eprofile$data$data$tags$es_total$`joule(J)` ) } extended$solver <- ifelse( extended$solver == "mumps" & extended$error < 10e-12, "mumps-blr", extended$solver ) postamble <- generate_theme( color_labels = label_solver, legend_rows = 2, legend_title = element_blank() ) energy <- ggplot( data = extended, aes( x = nbpts, y = etotal, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Energy consumption [J]", labels = scientific ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) time <- ggplot( data = extended, aes( x = nbpts, y = tps_facto + tps_solve, fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "Computation time [s]" ) + postamble rss <- ggplot( data = extended, aes( x = nbpts, y = rm_peak / 1024., fill = solver ) ) + geom_bar(position = "dodge", stat = "identity") + scale_x_continuous( "# Unknowns (\U1D441)", breaks = extended$nbpts, labels = scientific ) + scale_y_continuous( "RAM usage peak [GiB]" ) + postamble + guides(fill = "none") + theme(axis.title.x = element_blank()) return( plot_grid( energy, time, rss, nrow = 1, ncol = 3, align = "h", axis = "b", rel_widths = c(1, 1.15, 1) ) ) }
scalability
This function returns a plot of factorization and solve computation times relatively to available processor core count.
The column names featured in this plotting function are:
p_unitsgiving the count of cores,tps_factorepresents factorization time in seconds,tps_solverepresents solve time in seconds,mappingprovides the mapping configuration of MPI processes.
Then, data frame is converted to long format (see Section 4.1.2) so we are able to combine times of both phases using a facet grid. Also, we force the order of facet labels for the
solvercolumn by refactoring the latter.scalability <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "time", c("tps_facto", "tps_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = time, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
As there may be a considerable scale difference between factorization and solve time values, we use log10 scale on the Y-axis to be able to combine both metrics on the same axis without loosing on readability.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) + scale_y_continuous( "Computation time [s]", labels = scientific, trans = "log10" ) +Finally, using facet grid we distinguish between factorization and solve times on horizontal and between each solver considered on vertical.
For the sake of readability, we specify that scales on Y-axis can be different. It is useful when multiple solvers are considered having each considerably different time values.
Also, before returning the plot we apply the common theme parameters. For scalability and for parallel efficiency plots, we split the legend items to two lines as the labels are too long.
facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ), scales = "free" ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }scalability_ram
This function is a variation of
scalability(see Section 4.1.5.15) and returns a plot of real memory (RAM) usage peaks relatively to available processor core count.The column names featured in this plotting function are:
p_unitsgiving the count of cores,rm_peakrepresents real memory usage peaks, stored in mibibytes (MiB) but converted to gibibytes (GiB),mappingprovides the mapping configuration of MPI processes.
scalability_ram <- function(dataframe) { plot <- ggplot( dataframe, aes( x = p_units, y = rm_peak / 1024., group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) +
Note that we round the values on the Y-axis to two decimal places and define custom axis limits and breaks adapted for the current type of plot.
scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe$p_units) ) + scale_y_continuous( "Real memory (RAM) usage peaks [GiB]", labels = function(label) sprintf("%.0f", label), limits = c(NA, 126), breaks = c(0, 30, 60, 90, 120) ) +In this case, we use using facet grid exclusively as a mean to better annotate the resulting plot.
facet_grid( . ~ solver + nbpts, labeller = labeller( solver = label_solver, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }efficiency
This function returns a plot of parallel efficiency, i.e. the percentage of computational resource usage relatively to available processor core count.
The column names featured in this plotting function are:
p_unitsgiving the count of cores,efficiency_factorepresents factorization phase efficiency,efficiency_solverepresents solve phase efficiency,mappingprovides the mapping configuration of MPI processes.
The function itself follows the exact same logic as the
scalabilityfunction (see Section 4.1.5.15).efficiency <- function(dataframe) { dataframe_long <- gather( subset(dataframe, !is.na(dataframe$nbpts)), key = "computation_phase", value = "efficiency", c("efficiency_facto", "efficiency_solve") ) dataframe_long$solver <- factor( dataframe_long$solver, levels = unique(na.omit(dataframe_long$solver)) ) plot <- ggplot( dataframe_long, aes( x = p_units, y = efficiency, group = mapping, color = mapping ) ) + geom_line() + geom_point(size = 2.5, shape = 16) + geom_hline(yintercept = 1) + scale_x_continuous( "# Cores (\U1D450)", breaks = (dataframe_long$p_units) ) +
There is only one additional step, i. e. to convert efficiency values from \(x\) such as \(0 \leq x \leq 1\) to percentages using the
label_percentagelabel function (see Section 4.1.4.1). Also, we fix our own Y-axis tick values (breaks).scale_y_continuous( "Parallel efficiency [%]", breaks = c(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), labels = label_percentage ) + facet_grid( computation_phase ~ solver + nbpts, labeller = labeller( solver = label_solver, computation_phase = phase.labs, nbpts = label_nbpts ) ) + labs(color = "Parallel\nconfiguration") + generate_theme( color_breaks = c("node", "socket", "numa", "core"), color_labels = label_mapping, legend_rows = 4 ) return(plot) }See the complete R script file
plot.R18.
4.2. StarPU execution traces
To visualize StarPU execution traces in FXT format, we also make use of R, the ggplot2 package and of the dedicated StarVZ library starvz16,starvz18.
To configure the output plot produced by StarVZ, we use a Yaml configuration
file. We leave cosmetic options take their default values, except the base font
size (see base_size value), and we override those specifying what kind of
information shall be included in resulting plots. In our case, we want to plot
only the StarPU worker statistics (items st and starpu).
default:
base_size: 14
starpu:
active: TRUE
legend: TRUE
ready:
active: FALSE
submitted:
active: FALSE
st:
active: TRUE
Showing critical path time bound is not necessary.
cpb: FALSE
On the other hand, we want to display all the labels, the legend and the application time span.
labels: "ALL"
legend: TRUE
makespan: TRUE
To reduce the size of the resulting plot, we enable the aggregation of visualized tasks in a time step.
aggregation:
active: TRUE
method: "static"
step: 500
See the complete configuration file starvz.yaml 19.
The StarVZ configuration begins by determining the path to the latest gcvb
benchmark session directory.
output = try( system("ls -t ~/benchmarks/results | head -n 1", intern = TRUE) ) latest_session_path = paste( "~/benchmarks/results", output, sep = "/" )
Then, we import the required R libraries and set the starvz_config variable
holding the path to the Yaml configuration file of the StarVZ package.
library(ggplot2) library(starvz) library(grid) starvz_config = "starvz.yaml"
4.2.1. Generating plots
Eventually, we generate the plots. To do this, we reuse the above code thanks to
the :noweb reference <<starvz-header>>:
execution traces of the HMAT solver on a BEM system,
<<starvz-header>> hmat_bem_node = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-node-1x4-25000", sep = "/" ), starvz_config ) ) hmat_bem_socket = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-socket-2x2-25000", sep = "/" ), starvz_config ) ) hmat_bem_core = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-bem-core-4x1-25000", sep = "/" ), starvz_config ) ) hmat_bem_node = hmat_bem_node + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_bem_node$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_node$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_node$theme$legend.position = "none" hmat_bem_node$theme$axis.title.x = element_blank() hmat_bem_node$theme$axis.text.x = element_blank() hmat_bem_node$theme$axis.ticks.x = element_blank() hmat_bem_socket = hmat_bem_socket + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_bem_socket$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_socket$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_socket$theme$legend.position = "none" hmat_bem_socket$theme$axis.title.x = element_blank() hmat_bem_socket$theme$axis.text.x = element_blank() hmat_bem_socket$theme$axis.ticks.x = element_blank() hmat_bem_core = hmat_bem_core + theme_bw() + theme( text = element_text(family = "Arial", size = 16), legend.text = element_text(family = "Arial", size = 14) ) hmat_bem_core$scales$scales[[1]]$range = c(NA, 44000) hmat_bem_core$scales$scales[[1]]$limits = c(NA, 44000) hmat_bem_core$theme$legend.title = element_blank() hmat_bem_core$theme$legend.position = "bottom" grid.newpage() grid.draw( rbind( ggplotGrob(hmat_bem_node), ggplotGrob(hmat_bem_socket), ggplotGrob(hmat_bem_core), size = "max" ) )execution traces of the HMAT solver on a FEM system.
<<starvz-header>> hmat_fem_node = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-node-1x4-50000", sep = "/" ), starvz_config ) ) hmat_fem_socket = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-socket-2x2-50000", sep = "/" ), starvz_config ) ) hmat_fem_core = panel_st_runtime( starvz_read( paste( latest_session_path, "fxt-scalability-hmat-fem-core-4x1-50000", sep = "/" ), starvz_config ) ) hmat_fem_node = hmat_fem_node + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_fem_node$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_node$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_node$theme$legend.position = "none" hmat_fem_node$theme$axis.title.x = element_blank() hmat_fem_node$theme$axis.text.x = element_blank() hmat_fem_node$theme$axis.ticks.x = element_blank() hmat_fem_socket = hmat_fem_socket + theme_bw() + theme(text = element_text(family = "Arial", size = 16)) hmat_fem_socket$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_socket$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_socket$theme$legend.position = "none" hmat_fem_socket$theme$axis.title.x = element_blank() hmat_fem_socket$theme$axis.text.x = element_blank() hmat_fem_socket$theme$axis.ticks.x = element_blank() hmat_fem_core = hmat_fem_core + theme_bw() + theme( text = element_text(family = "Arial", size = 16), legend.text = element_text(family = "Arial", size = 14) ) hmat_fem_core$scales$scales[[1]]$range = c(NA, 101000) hmat_fem_core$scales$scales[[1]]$limits = c(NA, 101000) hmat_fem_core$theme$legend.title = element_blank() hmat_fem_core$theme$legend.position = "bottom" grid.newpage() grid.draw( rbind( ggplotGrob(hmat_fem_node), ggplotGrob(hmat_fem_socket), ggplotGrob(hmat_fem_core), size = "max" ) )
Note that we do not tangle the source code in this section. Each figure can be generated by executing the associated code block from within the Emacs editor.
5. Appendix
5.1. Example standard output of a test_FEMBEM execution
Testing : with MPF matrices.
***
*** mpf version (commit ID )
*** Built on Jan 1 1970 at 00:00:01
***
[mpf] MPI thread level provided = MPI_THREAD_MULTIPLE
[mpf] OpenMP thread number = 24
[mpf] MKL thread number = 24
[mpf] MPF_MAX_MEMORY = 64324 Mb
0 : [ test_FEMBEM@miriel060.plafrim.cluster is online... ]
tmpdir=/tmp/felsoci/p-spido-25000 pid=4614
[mpf] The direct solver has been selected
[mumps] Version : 5.2.1
*** 10 Command Line Parameters not read by MPFinit :
*** -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000
***
*** scab version (commit ID )
*** Built on Jan 1 1970 at 00:00:01
*** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi
***
[Scab] CPUID detection: SSE2=1 AVX=1 AVX2=1
[Scab] D01 structure stored out-of-core
(forced by globalTmp=0, nodeTmp=1 or D01SingleReader=1)
*** 10 Command Line Parameters not read by SCAB_Init :
*** -nbrhs 50 -z -radius 2 -height 4 --bem -nbpts 25000
***
*** test_FEMBEM for scab version (commit ID )
*** Built on Jan 1 1970 at 00:00:01
*** Submit bug reports at https://imacs.polytechnique.fr/bugzilla/enter_bug.cgi
***
Testing: BEM (dense matrix).
Reading nbPts = 25000
Reading radius = 2.000000
Reading height = 4.000000
Reading nbRHS = 50
Setting lambda = 0.448399 (with 10.000000 points per wavelength)
Double Complex
Setting sparseRHS = 71
<PERFTESTS> MPF_Version =
<PERFTESTS> TEST_FEMBEM_Version =
<PERFTESTS> HMAT_Version = 2019.1.0
<PERFTESTS> ScalarType = DOUBLE COMPLEX
Number of points for the test = 25000
Number of right hand sides for the test = 50
<PERFTESTS> StepMesh = 4.483993e-02
<PERFTESTS> NbPts = 25000
<PERFTESTS> NbPtsBEM = 25000
<PERFTESTS> NbRhs = 50
<PERFTESTS> nbPtLambda = 1.000000e+01
<PERFTESTS> Lambda = 4.483993e-01
Computing RHS...
**** Computing Classical product...
BEM: 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done.
<PERFTESTS> TpsCpuClassic = 1.653491
Size of thread block : [358 x 358]
Size of proc block : [3572 x 3572]
Size of disk block : [3572 x 3572]
MatAssembly_SPIDO : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done.
Assembly done. Making the matrix symmetric...
<PERFTESTS> TpsCpuMPFCreation = 6.426428
VecAssembly_EMCP2 : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done.
VecAssembly_EMCP2 : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done.
MatFactorLDLt_SPIDO : 0% ......... 10% ......... 20% ......... 30% ......... 40% ......... 50% ......... 60% ......... 70% ......... 80% ......... 90% ......... 100% Done.
MIN = 2.4028e+00 MAX = 3.5494e+00 COND = 1.4772e+00
<PERFTESTS> TpsCpuFactoMPF = 85.717232
<PERFTESTS> TpsCpuSolveMPF = 5.684338
**** Comparing results...
<PERFTESTS> Error = 6.8930e-15
test_FEMBEM : end of computation
Writing tracing report in traceCall.log
Writing tracing report in traceCall.json
Bibliography
- [Knuth84] Knuth, Literate Programming, Comput. J., 27(2), 97–111 (1984). link. doi.
- [Dominik18] Dominik, The Org Mode 9.1 Reference Manual, 12th Media Services (2018).
- [emacs] @miscemacs, title = GNU Emacs: An extensible, customizable, free/libre text editor — and more., howpublished = \urlhttps://www.gnu.org/software/emacs/
- [OrgTangle] @miscOrgTangle, title = Org mode documentation (Extracting source code), howpublished = \urlhttps://orgmode.org/manual/Extracting-Source-Code.html,
- [OrgEval] @miscOrgEval, title = Org mode documentation (Evaluating source code), howpublished = \urlhttps://orgmode.org/manual/Evaluating-Code-Blocks.html,
- [OrgExport] @miscOrgExport, title = Org mode documentation (Exporting), howpublished = \urlhttps://orgmode.org/manual/Exporting.html
- [guix] @miscguix, title = GNU Guix software distribution and transactional package manager, howpublished = \urlhttps://guix.gnu.org,
- [Starpu2010] Augonnet, Thibault & Namyst, StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, INRIA, (2010).
- [OpenBLAS] @miscOpenBLAS, title = OpenBLAS, an optimized BLAS library, howpublished = \urlhttps://www.openblas.net/
- [MKL] @miscMKL, title = Intel(R) Math kernel library, howpublished = \urlhttps://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html
- [Falco2019] @phdthesisFalco2019, TITLE = Combler l'\'ecart entre H-Matrices et m\'ethodes directes creuses pour la r\'esolution de syst\`emes lin\'eaires de grandes tailles, AUTHOR = Falco, Aurélien, URL = https://tel.archives-ouvertes.fr/tel-02183902, NUMBER = 2019BORD0090, SCHOOL = Universit\'e de Bordeaux, YEAR = 2019, MONTH = Jun, KEYWORDS = FEM/BEM coupling ; Finite elements ; Linear algebra ; Low-Rank compression ; H-Matrices ; Sparse matrices ; Couplage FEM/BEM ; El\'ements finis ; Alg\`ebre lin\'eaire ; Compression de rang faible ; H-Matrices ; Matrices creuses, TYPE = thèse de doctorat, PDF = https://tel.archives-ouvertes.fr/tel-02183902/file/FALCO_AURELIEN_2019.pdf, HAL_ID = tel-02183902, HAL_VERSION = v1,
- [curta] @misccurta, title = Cluster Curta, Mésocentre de Calcul Intensif Aquitain, howpublished = \urlhttps://www.mcia.fr/projects/cluster-curta
- [gcvb] @miscgcvb, title = (g)enerate (c)ompute (v)alidate (b)enchmark, a Python 3 module aiming at facilitating non-regression, validation and benchmarking of simulation codes, howpublished = \urlhttps://github.com/felsocim/gcvb
- [slurm] @miscslurm, title = slurm workload manager, howpublished = \urlhttps://slurm.schedmd.com,
- [slurmGuide] @miscslurmGuide, title = Quick start user's guide for slurm, howpublished = \urlhttps://slurm.schedmd.com/quickstart.html,
- [ggplot2] @miscggplot2, title = ggplot2, a system for declaratively creating graphics, howpublished = \urlhttps://ggplot2.tidyverse.org/
- [sqliteLike] @miscsqliteLike, title = The LIKE, GLOB, REGEXP, and MATCH operators, SQL Language Expressions, SQLite, howpublished = \urlhttps://www.sqlite.org/lang_expr.html#like
- [RwideToLong] @miscRwideToLong, title = Cookbook for R: Converting data between wide and long format, howpublished = \urlhttp://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
- [starvz16] Pinto, Stanisic, Legrand, Schnorr, Thibault & Danjean, Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach, 17-24, in in: 2016 Third Workshop on Visual Performance Analysis (VPA), edited by (2016)
- [starvz18] Garcia Pinto, Mello Schnorr, Stanisic, Legrand, Thibault & Danjean, A visual performance analysis framework for task-based parallel applications running on hybrid clusters, Concurrency and Computation: Practice and Experience, 30(18), e4472 (2018). link. doi.