Path: blob/master/tools/perf/Documentation/perf-bench.txt
26282 views
perf-bench(1)1=============23NAME4----5perf-bench - General framework for benchmark suites67SYNOPSIS8--------9[verse]10'perf bench' [<common options>] <subsystem> <suite> [<options>]1112DESCRIPTION13-----------14This 'perf bench' command is a general framework for benchmark suites.1516COMMON OPTIONS17--------------18-r::19--repeat=::20Specify number of times to repeat the run (default 10).2122-f::23--format=::24Specify format style.25Current available format styles are:2627'default'::28Default style. This is mainly for human reading.29---------------------30% perf bench sched pipe # with no style specified31(executing 1000000 pipe operations between two tasks)32Total time:5.855 sec335.855061 usecs/op34170792 ops/sec35---------------------3637'simple'::38This simple style is friendly for automated39processing by scripts.40---------------------41% perf bench --format=simple sched pipe # specified simple425.98843---------------------4445SUBSYSTEM46---------4748'sched'::49Scheduler and IPC mechanisms.5051'syscall'::52System call performance (throughput).5354'mem'::55Memory access performance.5657'numa'::58NUMA scheduling and MM benchmarks.5960'futex'::61Futex stressing benchmarks.6263'epoll'::64Eventpoll (epoll) stressing benchmarks.6566'internals'::67Benchmark internal perf functionality.6869'uprobe'::70Benchmark overhead of uprobe + BPF.7172'all'::73All benchmark subsystems.7475SUITES FOR 'sched'76~~~~~~~~~~~~~~~~~~77*messaging*::78Suite for evaluating performance of scheduler and IPC mechanisms.79Based on hackbench by Rusty Russell.8081Options of *messaging*82^^^^^^^^^^^^^^^^^^^^^^83-p::84--pipe::85Use pipe() instead of socketpair()8687-t::88--thread::89Be multi thread instead of multi process9091-g::92--group=::93Specify number of groups9495-l::96--nr_loops=::97Specify number of loops9899Example of *messaging*100^^^^^^^^^^^^^^^^^^^^^^101102---------------------103% perf bench sched messaging # run with default104options (20 sender and receiver processes per group)105(10 groups == 400 processes run)106107Total time:0.308 sec108109% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups110(20 sender and receiver threads per group)111(20 groups == 800 threads run)112113Total time:0.582 sec114---------------------115116*pipe*::117Suite for pipe() system call.118Based on pipe-test-1m.c by Ingo Molnar.119120Options of *pipe*121^^^^^^^^^^^^^^^^^122-l::123--loop=::124Specify number of loops.125126-G::127--cgroups=::128Names of cgroups for sender and receiver, separated by a comma.129This is useful to check cgroup context switching overhead.130Note that perf doesn't create nor delete the cgroups, so users should131make sure that the cgroups exist and are accessible before use.132133134Example of *pipe*135^^^^^^^^^^^^^^^^^136137---------------------138% perf bench sched pipe139(executing 1000000 pipe operations between two tasks)140141Total time:8.091 sec1428.091833 usecs/op143123581 ops/sec144145% perf bench sched pipe -l 1000 # loop 1000146(executing 1000 pipe operations between two tasks)147148Total time:0.016 sec14916.948000 usecs/op15059004 ops/sec151152% perf bench sched pipe -G AAA,BBB153(executing 1000000 pipe operations between cgroups)154# Running 'sched/pipe' benchmark:155# Executed 1000000 pipe operations between two processes156157Total time: 6.886 [sec]1581596.886208 usecs/op160145217 ops/sec161162---------------------163164SUITES FOR 'syscall'165~~~~~~~~~~~~~~~~~~166*basic*::167Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).168This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not169cached by glibc.170171172SUITES FOR 'mem'173~~~~~~~~~~~~~~~~174*memcpy*::175Suite for evaluating performance of simple memory copy in various ways.176177Options of *memcpy*178^^^^^^^^^^^^^^^^^^^179-l::180--size::181Specify size of memory to copy (default: 1MB).182Available units are B, KB, MB, GB and TB (case insensitive).183184-f::185--function::186Specify function to copy (default: default).187Available functions are depend on the architecture.188On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.189190-l::191--nr_loops::192Repeat memcpy invocation this number of times.193194-c::195--cycles::196Use perf's cpu-cycles event instead of gettimeofday syscall.197198*memset*::199Suite for evaluating performance of simple memory set in various ways.200201Options of *memset*202^^^^^^^^^^^^^^^^^^^203-l::204--size::205Specify size of memory to set (default: 1MB).206Available units are B, KB, MB, GB and TB (case insensitive).207208-f::209--function::210Specify function to set (default: default).211Available functions are depend on the architecture.212On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.213214-l::215--nr_loops::216Repeat memset invocation this number of times.217218-c::219--cycles::220Use perf's cpu-cycles event instead of gettimeofday syscall.221222SUITES FOR 'numa'223~~~~~~~~~~~~~~~~~224*mem*::225Suite for evaluating NUMA workloads.226227SUITES FOR 'futex'228~~~~~~~~~~~~~~~~~~229*hash*::230Suite for evaluating hash tables.231232*wake*::233Suite for evaluating wake calls.234235*wake-parallel*::236Suite for evaluating parallel wake calls.237238*requeue*::239Suite for evaluating requeue calls.240241*lock-pi*::242Suite for evaluating futex lock_pi calls.243244SUITES FOR 'epoll'245~~~~~~~~~~~~~~~~~~246*wait*::247Suite for evaluating concurrent epoll_wait calls.248249*ctl*::250Suite for evaluating multiple epoll_ctl calls.251252SUITES FOR 'internals'253~~~~~~~~~~~~~~~~~~~~~~254*synthesize*::255Suite for evaluating perf's event synthesis performance.256257SEE ALSO258--------259linkperf:perf[1]260261262