libcpucycles

Here is how libcpucycles decides which cycle counter to use. The underlying principles are as follows:

When cpucycles() is first called, libcpucycles tries running each cycle counter that has been compiled into the library. For example, for 64-bit ARM CPUs, libcpucycles will try arm64-pmc, arm64-vct, default-gettimeofday, default-mach, default-monotonic, and default-perfevent, minus any of those that failed to compile.

Cycle counters that fail at run time with SIGILL (or SIGFPE or SIGBUS or SIGSEGV) are eliminated from the list. For example, arm64-pmc will fail with SIGILL if the kernel does not allow user access to PMCCNTR_EL0. Beware that libcpucycles does not catch SIGILL after its initial tests: if the kernel initially allows user access to PMCCNTR_EL0 but later turns it off then arm64-pmc will crash.

Independently of these counters, libcpucycles uses various OS mechanisms to obtain an estimate of the CPU frequency. This estimate is also available to the caller as cpucycles_persecond().

The methods that libcpucycles uses to ask the OS for an estimated CPU frequency fail on some OS-CPU combinations, in which case libcpucycles falls back to a cpucyclespersecond environment variable, or, if that variable does not exist, an estimate of 2399987654 cycles per second. (This estimate is in a realistic range of CPU speeds, and is close to multiples of 24MHz, 25MHz, and 19.2MHz, which are common crystal frequencies.) The sysadmin can create /etc/cpucyclespersecond to override all of the OS mechanisms.

For counters that do not ask for scaling, the estimated CPU frequency is shown in cpucycles-info as a double-check on the counter results. For counters that ask for scaling, libcpucycles uses the estimated CPU frequency to compute the scaling, so this is not a double-check. If a counter asks for scaling and the estimated CPU frequency does not seem close to a multiple of the counter frequency (possibly with a small power-of-2 denominator) then libcpucycles will throw the counter away, except in the case of fixed-resolution OS counters such as gettimeofday and CLOCK_MONOTONIC.

libcpucycles computes a precision estimate for each counter (times any applicable scaling) as follows. Call the counter 1000 times. Check that the counter has never decreased, and has increased at least once. (A counter where the decrease/increase checks fail is retried 10 times, so 10000 calls overall, and removed if it fails all 10 times.) The precision estimate is then the smallest nonzero difference between adjacent counter results, plus a penalty explained below.

The penalty is 100 cycles for off-core counters (including RDTSC) and default-perfevent, and 200 cycles for fixed-resolution OS counters. For example, an on-core CPU cycle counter will be selected even if it actually has, e.g., a resolution of 8 cycles and 50 cycles of overhead.

Finally, libcpucycles selects the counter where the precision estimate is the smallest number of cycles. Note that an inaccurate estimate of CPU frequency can influence the choice between a scaled counter and an unscaled counter.

libcpucycles does not carry out its counter selection (typically tens of milliseconds, sometimes even more) as a static initializer; callers are presumed to not want to incur the cost of initialization unless and until they are actually using cpucycles(). A multithreaded caller thus has to place locks around any possibly-first call to cpucycles(), or create its own static initializer (an __attribute__((constructor)) function) with an initial cpucycles() call so that all subsequent cpucycles() calls are thread-safe.


Version: This is version 2023.01.05 of the "Selection" web page.