Path: blob/master/tools/perf/Documentation/perf-bench.txt
51030 views
perf-bench(1)1=============23NAME4----5perf-bench - General framework for benchmark suites67SYNOPSIS8--------9[verse]10'perf bench' [<common options>] <subsystem> <suite> [<options>]1112DESCRIPTION13-----------14This 'perf bench' command is a general framework for benchmark suites.1516COMMON OPTIONS17--------------18-r::19--repeat=::20Specify number of times to repeat the run (default 10).2122-f::23--format=::24Specify format style.25Current available format styles are:2627'default'::28Default style. This is mainly for human reading.29---------------------30% perf bench sched pipe # with no style specified31(executing 1000000 pipe operations between two tasks)32Total time:5.855 sec335.855061 usecs/op34170792 ops/sec35---------------------3637'simple'::38This simple style is friendly for automated39processing by scripts.40---------------------41% perf bench --format=simple sched pipe # specified simple425.98843---------------------4445SUBSYSTEM46---------4748'sched'::49Scheduler and IPC mechanisms.5051'syscall'::52System call performance (throughput).5354'mem'::55Memory access performance.5657'numa'::58NUMA scheduling and MM benchmarks.5960'futex'::61Futex stressing benchmarks.6263'epoll'::64Eventpoll (epoll) stressing benchmarks.6566'internals'::67Benchmark internal perf functionality.6869'uprobe'::70Benchmark overhead of uprobe + BPF.7172'all'::73All benchmark subsystems.7475SUITES FOR 'sched'76~~~~~~~~~~~~~~~~~~77*messaging*::78Suite for evaluating performance of scheduler and IPC mechanisms.79Based on hackbench by Rusty Russell.8081Options of *messaging*82^^^^^^^^^^^^^^^^^^^^^^83-p::84--pipe::85Use pipe() instead of socketpair()8687-t::88--thread::89Be multi thread instead of multi process9091-g::92--group=::93Specify number of groups9495-l::96--nr_loops=::97Specify number of loops9899Example of *messaging*100^^^^^^^^^^^^^^^^^^^^^^101102---------------------103% perf bench sched messaging # run with default104options (20 sender and receiver processes per group)105(10 groups == 400 processes run)106107Total time:0.308 sec108109% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups110(20 sender and receiver threads per group)111(20 groups == 800 threads run)112113Total time:0.582 sec114---------------------115116*pipe*::117Suite for pipe() system call.118Based on pipe-test-1m.c by Ingo Molnar.119120Options of *pipe*121^^^^^^^^^^^^^^^^^122-l::123--loop=::124Specify number of loops.125126-G::127--cgroups=::128Names of cgroups for sender and receiver, separated by a comma.129This is useful to check cgroup context switching overhead.130Note that perf doesn't create nor delete the cgroups, so users should131make sure that the cgroups exist and are accessible before use.132133134Example of *pipe*135^^^^^^^^^^^^^^^^^136137---------------------138% perf bench sched pipe139(executing 1000000 pipe operations between two tasks)140141Total time:8.091 sec1428.091833 usecs/op143123581 ops/sec144145% perf bench sched pipe -l 1000 # loop 1000146(executing 1000 pipe operations between two tasks)147148Total time:0.016 sec14916.948000 usecs/op15059004 ops/sec151152% perf bench sched pipe -G AAA,BBB153(executing 1000000 pipe operations between cgroups)154# Running 'sched/pipe' benchmark:155# Executed 1000000 pipe operations between two processes156157Total time: 6.886 [sec]1581596.886208 usecs/op160145217 ops/sec161162---------------------163164SUITES FOR 'syscall'165~~~~~~~~~~~~~~~~~~166*basic*::167Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).168This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not169cached by glibc.170171172SUITES FOR 'mem'173~~~~~~~~~~~~~~~~174*memcpy*::175Suite for evaluating performance of simple memory copy in various ways.176177Options of *memcpy*178^^^^^^^^^^^^^^^^^^^179-s::180--size::181Specify size of memory to copy (default: 1MB).182Available units are B, KB, MB, GB and TB (case insensitive).183184-p::185--page::186Specify page-size for mapping memory buffers (default: 4KB).187Available values are 4KB, 2MB, 1GB (case insensitive).188189-k::190--chunk::191Specify the chunk-size for each invocation. (default: 0, or full-extent)192Available units are B, KB, MB, GB and TB (case insensitive).193194-f::195--function::196Specify function to copy (default: default).197Available functions are depend on the architecture.198On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.199200-l::201--nr_loops::202Repeat memcpy invocation this number of times.203204-c::205--cycles::206Use perf's cpu-cycles event instead of gettimeofday syscall.207208*memset*::209Suite for evaluating performance of simple memory set in various ways.210211Options of *memset*212^^^^^^^^^^^^^^^^^^^213-s::214--size::215Specify size of memory to set (default: 1MB).216Available units are B, KB, MB, GB and TB (case insensitive).217218-p::219--page::220Specify page-size for mapping memory buffers (default: 4KB).221Available values are 4KB, 2MB, 1GB (case insensitive).222223-k::224--chunk::225Specify the chunk-size for each invocation. (default: 0, or full-extent)226Available units are B, KB, MB, GB and TB (case insensitive).227228-f::229--function::230Specify function to set (default: default).231Available functions are depend on the architecture.232On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.233234-l::235--nr_loops::236Repeat memset invocation this number of times.237238-c::239--cycles::240Use perf's cpu-cycles event instead of gettimeofday syscall.241242*mmap*::243Suite for evaluating memory subsystem performance for mmap()'d memory.244245Options of *mmap*246^^^^^^^^^^^^^^^^^247-s::248--size::249Specify size of memory to set (default: 1MB).250Available units are B, KB, MB, GB and TB (case insensitive).251252-p::253--page::254Specify page-size for mapping memory buffers (default: 4KB).255Available values are 4KB, 2MB, 1GB (case insensitive).256257-r::258--randomize::259Specify seed to randomize page access offset (default: 0, or not randomized).260261-f::262--function::263Specify function to set (default: all).264Available functions are 'demand' and 'populate', with the first265demand faulting pages in the region and the second using an eager266mapping.267268-l::269--nr_loops::270Repeat mmap() invocation this number of times.271272-c::273--cycles::274Use perf's cpu-cycles event instead of gettimeofday syscall.275276SUITES FOR 'numa'277~~~~~~~~~~~~~~~~~278*mem*::279Suite for evaluating NUMA workloads.280281SUITES FOR 'futex'282~~~~~~~~~~~~~~~~~~283*hash*::284Suite for evaluating hash tables.285286*wake*::287Suite for evaluating wake calls.288289*wake-parallel*::290Suite for evaluating parallel wake calls.291292*requeue*::293Suite for evaluating requeue calls.294295*lock-pi*::296Suite for evaluating futex lock_pi calls.297298SUITES FOR 'epoll'299~~~~~~~~~~~~~~~~~~300*wait*::301Suite for evaluating concurrent epoll_wait calls.302303*ctl*::304Suite for evaluating multiple epoll_ctl calls.305306SUITES FOR 'internals'307~~~~~~~~~~~~~~~~~~~~~~308*synthesize*::309Suite for evaluating perf's event synthesis performance.310311SEE ALSO312--------313linkperf:perf[1]314315316