hrtime: Benchmarking with Details

I’ve been optimizing Go code for a while and trying to improve my benchmarking game.

Let’s take a look at a simple example:

1
2
3
4
5
6
7


func BenchmarkReport(b *testing.B) {
    runtime.GC()
    for i := 0; i < b.N; i++ {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
}

Running go test -bench . will give us the result:

1

BenchmarkReport-32      20000000               107 ns/op

This might be enough for getting a rough estimate where you stand in terms of performance, but a more detailed output is required for optimization. Condensing everything into a single number is bound to be simplistic.

github.com/loov/hrtime

Let me introduce you to the hrtime package which I wrote to get a more extensive output from benchmarks.

Histogram #

The first approach to using it is to use hrtime.NewBenchmark. Rewriting the first example as a regular program.

1
2
3
4
5
6
7
8


func main() {
    bench := hrtime.NewBenchmark(20000000)
    for bench.Next() {
        r := fmt.Sprintf("hello, world %d", 123)
        runtime.KeepAlive(r)
    }
    fmt.Println(bench.Histogram(10))
}

Which will print:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


avg 372ns;  min 300ns;  p50 400ns;  max 295µs;
p90 400ns;  p99 500ns;  p999 1.8µs;  p9999 4.3µs;
    300ns [ 7332554] ███████████████████████
    400ns [12535735] ████████████████████████████████████████
    600ns [   18955]
    800ns [    2322]
      1µs [   20413]
    1.2µs [   34854]
    1.4µs [   25096]
    1.6µs [   10009]
    1.8µs [    4688]
      2µs+[   15374]

As we can see that P99 is 500ns, which means that 1% of all our measurements are above 500ns. We can try to optimize this by allocating fewer strings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


func main() {
    bench := hrtime.NewBenchmark(20000000)
    var back [1024]byte
    for bench.Next() {
        buffer := back[:0]
        buffer = append(buffer, []byte("hello, world ")...)
        buffer = strconv.AppendInt(buffer, 123, 10)
        runtime.KeepAlive(buffer)
    }
    fmt.Println(bench.Histogram(10))
}

And the result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


avg 267ns;  min 200ns;  p50 300ns;  max 216µs;
  p90 300ns;  p99 300ns;  p999 1.1µs;  p9999 3.6µs;
      200ns [ 7211285] ██████████████████████▌
      300ns [12658260] ████████████████████████████████████████
      400ns [   81076]
      500ns [    3226]
      600ns [     343]
      700ns [     136]
      800ns [     729]
      900ns [    8108]
        1µs [   15436]
      1.1µs+[   21401]

We can now see that the 99% has gone from 500ns to 300ns.

If you have a keen eye, you may have noticed that Go benchmark gave an average 107ns/op however hrtime gave us 372ns/op. This is the unfortunate side-effect in trying to measure more – it always has an overhead. The final results include this overhead.

Note: depending on the operating-system the overhead can be significantly less and hrtime does support repeated calls inside with https://godoc.org/github.com/loov/hrtime#Histogram.Divide.

Stopwatch #

Sometimes you also want to measure concurrent operations. For this there is Stopwatch . https://godoc.org/github.com/loov/hrtime#Stopwatch

Let’s say you want to measure how long does a send take on a highly contended channel. This, of course, is a contrived example, but roughly shows the idea of how we can start the measurement from one goroutine, stop it in another and finally print it all out.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


func main() {  
    const numberOfExperiments = 1000
    bench := hrtime.NewStopwatch(numberOfExperiments)

    ch := make(chan int32, 10)
    wait := make(chan struct{})

    // start senders
    for i := 0; i < numberOfExperiments; i++ {
        go func() {
            <-wait
            ch <- bench.Start()
        }()
    }

    // start one receiver
    go func() {
        for lap := range ch {
            bench.Stop(lap)
        }
    }()

    // wait for all goroutines to be created
    time.Sleep(time.Second)
    // release all goroutines at the same time
    close(wait)

    // wait for all measurements to be completed
    bench.Wait()
    fmt.Println(bench.Histogram(10))  
}

hrtesting #

Of course, writing a separate binary for all of the tests isn’t that convenient. For that there’s **github.com/loov/hrtime/hrtesting**, which provides wrappers for testing.B.

1
2
3
4
5
6
7
8
9


func BenchmarkReport(b *testing.B) {
    bench := hrtesting.NewBenchmark(b)
    defer bench.Report()

    for bench.Next() {
        r := fmt.Sprintf("hello, world %d", 123)  
        runtime.KeepAlive(r)  
    }  
}

It will print out 50%, 90%, and 99% percentiles.

1
2
3
4
5
6
7


BenchmarkReport-32               3000000               427 ns/op
--- BENCH: BenchmarkReport-32
    benchmark_old.go:11: 24.5µs₅₀ 24.5µs₉₀ 24.5µs₉₉ N=1
    benchmark_old.go:11:  400ns₅₀  500ns₉₀ 12.8µs₉₉ N=100
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  500ns₉₉ N=10000
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  600ns₉₉ N=1000000
    benchmark_old.go:11:  400ns₅₀  500ns₉₀  500ns₉₉ N=3000000

Unfortunately, with Go 1.12 it will print all of the runs of Benchmark instead of just the last. However, with Go 1.13, the output will be much nicer:

1

BenchmarkReport-32   3174566  379 ns/op  400 ns/p50  400 ns/p90 ...

And comparing results with benchstat will work as well.

hrplot #

Leaving the best for last github.com/loov/hrtime/hrplot. Using my experimental plotting package, I decided to add a convenient way to plot the benchmark results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


func BenchmarkReport(b *testing.B) {  
    bench := hrtesting.NewBenchmark(b)  
    defer bench.Report()  
    defer hrplot.All("all.svg", bench)  
      
    runtime.GC()  
    for bench.Next() {  
        r := fmt.Sprintf("hello, world %d", 123)  
        runtime.KeepAlive(r)  
    }  
}

Will create an SVG file called all.svg. It contains a line plot, which shows how much time each iteration took. Second is a density plot, which shows the distribution of timing measurements. And finally a percentiles plot.

Conclusion #

Performance optimization is fun, but having tools to help you out makes it even more enjoyable.

Go try out github.com/loov/hrtime and let me know what you think.