Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
csc-training
GitHub Repository: csc-training/csc-env-eff
Path: blob/master/_slides/SRTFiles/10_speed_up_jobs_script.md
696 views
---
theme: csc-2019 lang: en
---

Not fast enough? How HPC can help. {.title}

![](https://mirrors.creativecommons.org/presskit/buttons/88x31/png/by-sa.png)
All material (C) 2020-2021 by CSC -IT Center for Science Ltd. This work is licensed under a **Creative Commons Attribution-ShareAlike** 4.0 Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creativecommons.org/licenses/by-sa/4.0/)

:::info (speech)

:::

The purpose of large computers

  • Typically large computers, like those at CSC, are not faster than others - they are just bigger.

    • For fast computation they utilize parallelism (and typically have special disk and memory solutions, too)

  • Parallelism simplified:

    • You use hundreds of ordinary computers simultaneously to solve a single problem.

:::info (speech)

:::

First steps for fast jobs (1/2)

  • Spend a little time to investigate:

    • Which of all the available software would be the best solve the kind of problem you have?

  • Consider:

    • The software that is the fastest to solve your problem might not always be the best.

      • Issues like ease-of-use and compute-power/memory demands are also highly relevant.

    • Quite often it is useful to start simple and gradually use more complex approaches if needed.

:::info (speech)

:::

First steps for fast jobs (2/2)

  • When you have found the software you want to use, check if it is available at CSC as a ready installed optimal version docs.csc.fi/apps

    • Spend some time getting familiar with the software users manual, if available.

  • If you can't find a suitable software, consider writing your own code.

:::info (speech)

:::

Optimize the performance of your own code (1/2)

  • If you have constructed your own code, compile it with optimizing compiler options.

  • Construct a small and quick test case and run it in the test queue

    • Docs: Queue options.

    • Use the test case to optimize computations before starting massive ones.

:::info (speech)

:::

Optimize the performance of your own code (2/2)

  • Use profiling tools to find out how much time is spent in different parts of the code

  • When the computing bottle-necks are identified try to figure out ways to improve the code.

    • Again, [email protected] is a channel to ask for help. The more concrete the problem is described, the better.

:::info (speech)

:::

Running your software

  • It is not only how your software is constructed and compiled that affect performance.

  • It can also be run in different ways

:::info (speech)

:::

HPC parallel jobs

:::info (speech)

:::

Running in parallel

  • A code is typically parallelized with MPI and/or OpenMP standards. They can be run in several different ways.

    • Can you split your work into smaller, fully independent, bits and run them simultaneously?

    • Can you automate setting up, running and analysing your array jobs?

    • Can your software utilize GPUs?

:::info (speech)

:::

What is MPI?

  • MPI (and OpenMP too!) are widely used standards for writing software that run in parallel.

  • MPI (Message Passing Interface) is a standard that utilizes compute cores that do not share their memory

    • It passes data-messages back and forth between the cores.

:::info (speech)

:::

What is OpenMP?

  • OpenMP (Open Multi-Processing) is a standard that utilize compute cores that share memory

    • They do not need to send messages betwen each other.

  • Basically OpenMP is easier for beginners, but problems quickly arise with so called 'race conditions'.

    • This appears when different compute cores process and update the same data without proper synchronization.

:::info (speech)

:::

Self study materials for OpenMP and MPI

  • There are many tutorials available on the internet.

    • Look with simple searches for e.g. 'MPI tutorial'.

  • Check the documented exercise material and model answers from the CSC course "Introduction to Parallel Programming"

:::info (speech)

:::

Task farming - running multiple independent jobs simultaneously

  • Task farming means that you have a set of, more or less, similar jobs that can be run fully independently of each other.

  • Such jobs are most easily run as so called array-jobs.

    • Individual tasks should take at least 30 minutes or more - otherwise you're generating too much overhead

    • In this case, there is likely a more efficient solution

  • If running your jobs become slightly more complex, with e.g. some minor dependencies, workflows can be used.

:::info (speech)

:::

Task farming 2.0

  • Task farming can be combined with e.g. OpenMP to speed up the subjobs,

  • And on top of those with MPI to run several jobs in parallel.

    • In this setup you'd have three layers or parallelization array-MPI-OpenMP

    • Setting this up will take skill and time

    • Always test your setup - a typo can result in a lot of lost resources

:::info (speech)

:::

Things to consider in task farming

  • In a big allocation each computing core should have work to do

    • If the separate jobs are very different, some will end before the others, and some cores will idle - wasting resources

    • A fix would be to use e.g. loops, to lump really small and numerous jobs into fewer and bigger ones.

  • As always, try to estimate as exact as possible the amount of memory and the time it takes the separate runs to finish.

:::info (speech)

:::

GPUs can speed up jobs

  • GPUs, or Graphics Processing Units, are extremely powerful processors developed for graphics and gaming.

  • They can be used for science, but are often really tricky to program.

    • Only a small set of algorithms can use the full power of GPUs.

  • Check the manual if the software can utilize GPUs.

  • Do not try to use GPUs, unless you know what you are doing.

:::info (speech)

:::

Tricks of the trade 1/4

  • It is reasonable to try to achieve best performance by using the fastest computers available. This is however far from the only important issue.

  • Different codes may give very different performance.

  • Before launching massive simulations, look for the most efficient algorithms to get the job done.

    • (examples on the next slide)

:::info (speech)

:::

Tricks of the trade 2/4

  • Well known boosters are:

    • Enchanced sampling methods in molecular dynamics (vs. brute force plain MD)

    • Bayesian Optimization Structure Search (BOSS, potential energy mapping)

    • When starting a new project, begin with small and fast test cases, and scale up gradually.

    • When using separate runs to scan a parameter space, start with a coarse scan, and improve resolution where needed.

    • Be careful in submitting large numbers of jobs before you know the results are really what you are looking for.

    • Try to use or implement so called 'restart options' in your software, and always check results in between restarts.

:::info (speech)

:::

Tricks of the trade 3/4

  • Try to first formulate your scientific results when you have a minimum amount of computational results

    • it often helps to clarify what you still need to compute and what computations would be redundant.

    • and what results you need to store

  • Reserving more memory resources and more compute cores does not necessary mean faster comutations.

    • Check with seff, sacct and from the logs if the memory was used, and if the job ran faster

  • Testing for optimal setup regarding compute cores and memory is good practice before performing massive computations.

:::info (speech)

:::

Tricks of the trade 4/4

  • Running the same job on a laptop may be useful for comparison.

  • Avoid unnecessary reads and writes of data.

    • If you do, read and write in big chunks. Avoid writes/reads of huge numbers of small files. If this is necessary, use NVMe in Puhti, not Lustre.

  • Don't run too short jobs.

    • There's a time-overhead in setting up a batch job. Aim for at least 30 minute jobs.

    • Also, don't run too short job steps. They will clutter Slurm accounting.

  • Don't run too long jobs.

    • The possibility of something going wrong gets bigger with risk of losing time and results. Restart option saves.

:::info (speech)

:::