linux profileing with performance counters

Perf Event is a performance diagnostic tool that is released and maintained with the Linux kernel code. Perf can not only be used to analyze the performance of program, but it can also be used to analyze the performance of Linux kernel.

introduction

Perf a tool to analyze performance of program. We can use PMU,tracepoint and special counters in kernel to summarize the performance. With perf we can analyze the hardware event at the running time of programs, such as instructions retired and processor clock cycles. You can also analyze software events, such as Page Fault and the switch of process.

These features make Perf has many abilities to analyze. For example, you can count each instructions in a clock cycle, which called IPC. Lower IPC means the code does not make good use of CPU. Perf can also get function-level sampling of program, so as to understand where is the performance bottlenecks of the program. Perf can also do benchmark to measure the performance of scheduler.

  • Hardware Events

  • Software Events

  • Kernel Tracepoint Events

  • User Statically-Defined Tracing (USDT)

  • Dynamic Tracing

  • Timed Profiling

background

There are many aspects that can affect the performance of program. For example, if some program can not make full use of cache, the performance will decline.

a. hardware feature - cache

Cache is a kind of SRAM, and has a quite high-speed I/O. It is an important part to make full use of cache in improving

PMU = performance monitoring unit

PMCs = performance monitoring counters

PICs = performance instruction counters

b. pipeline & superscalar & run out of order

The best way to improve performance is parallelization. The cpu can do parallel work in hardware, such as pipeline.

IPC = instruction per cycle

CPI = cycles per instruction

SCPI = stalled cycles per instruction

c. PMU

There is a PMU unit in hardware, which means performance monitor unit. This unit allows software to set counter according some hardware events, and then the cpu begins to summarize the counts of that event. When the counts is bigger then the number set inside, an interrupt will occur.

d. Tracepoints

Tracepoint is some hooks in linux kernel, once it is enabled, it can be triggered when some certain code runs. This feature can be applied to many kinds of trace/debug tools. For example, if you want to know the events in memory manage module, then you can use the tracepoint in slab allocator, and when the kernel get to these tracepoints, it will notify perf.

block : block device I/O

ext4 : file system operations

kmem : kernel memory allocation events

random : kernel random number generator events

sched : CPU scheduler events

syscalls : system call enter and exits

ask : task events

Dynamic tracing is unstable, it may break after a kernel update or patch. So use static tracepoints first. But dynamic tracing is useful to an already_running kernel or application.

Basic use of Perf

You should install perf before using it, to do this you will have to get a distribution of linux higher then 2.6.31. Enter the tools/perf directory and then makd and make install

perf list

You can use perf list to list all the events that can trigger the perf sampling.

Here is a example programe for us to use perf to count the performance.

There are many reasons causing program to run slow. Some use the cpu most of running time, calls CPU bound, others have much I/O work calls IO bound.

We can do gcc -o t1 -g test.c to compile the file. And then use the perf stat to analyze data.

perf stat

These data tells us that the program t1 is CPU bound, because task-clock(msec) is 0.993 and close to 1, which means most of its time are CPU bound work.

perf top

This function can be used to observe current performance of the system.For example we use the following code to as process t2 to test.

and then we may get:

It is easy to find that the t2 process has taken up so much cpu.We can use __-e__ to list some process or functions that cause some certain events.

perf record & perf report

we can get:

perf -g record

http://www.brendangregg.com/perf.html

https://www.ibm.com/developerworks/cn/linux/l-cn-perf2/

https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html

Last updated