libcpucycles is a microlibrary for counting CPU cycles. Cycle counts are not as detailed as Falk diagrams but are the most precise timers available to typical software; they are central tools used in understanding and improving software performance.
The libcpucycles API is simple: include <cpucycles.h>
, call
cpucycles()
to receive a long long
whenever desired, and link with
-lcpucycles
.
Internally, libcpucycles understands machine-level
cycle counters for amd64 (both PMC and TSC), arm32, arm64 (both PMC and
VCT), mips64, ppc32, ppc64, riscv32, riscv64, s390x, sparc64, and x86.
libcpucycles also understands four OS-level mechanisms, which give
varying levels of accuracy: mach_absolute_time
, perf_event
,
CLOCK_MONOTONIC
, and, as a fallback, microsecond-resolution
gettimeofday
.
When the program first calls cpucycles()
, libcpucycles automatically
benchmarks the available mechanisms and selects the
mechanism that does the best job. Subsequent cpucycles()
calls are
thread-safe and very fast. An accompanying cpucycles-info
program
prints a summary of cycle-counter accuracy.
For comparison, there is a simple-sounding __rdtsc()
API provided by
compilers, but this works only on Intel/AMD CPUs and is generally noisier
than PMC. There is a __builtin_readcyclecounter()
that works on more
CPUs, but this works only with clang
and has the same noise problems.
Both of these mechanisms put the burden on the caller to figure out what
can be done on other CPUs. Various packages include their own more
portable abstraction layers for counting cycles (see, e.g., FFTW's
cycle.h
,
used to automatically select from among multiple implementations
provided by FFTW), but this creates per-package effort to keep up with
the latest cycle counters. The goal of libcpucycles is to provide
state-of-the-art cycle counting centrally for all packages to use.
Version: This is version 2024.01.14 of the "Intro" web page.