Debugging code generation in Go

Last week, I was at dotGo, participating one of the finest Go conferences and catching up with gophers from the other side of the Atlantic. I gave a very short lightning talk about inspection of code generation with the tools already available in the toolchain. This post goes through the talk for those who didn’t have the privilege to be at the conference. Slides are also available at go-talks.

Throughout this article, we will use the following program:

package main

import "fmt"

func main() {
    sum := 1 + 1
    fmt.Printf("sum: %v\n", sum)


Go build is a command that encapsulates tons of things for the user. But, it also provides more detailed information about what it does if you ask for it. -x is a flag that makes go build output what is being invoked. If you want to see what the components of the toolchain are, which sequence they are invoked and which flags being used, use -x.

$ go build -x
mkdir -p $WORK/hello/_obj/
mkdir -p $WORK/hello/_obj/exe/
cd /Users/jbd/src/hello
/Users/jbd/go/pkg/tool/darwin_amd64/compile -o $WORK/hello.a -trimpath $WORK -p main -complete -buildid d934a5702088e0fe5c931a55ff26bec87b80cbdc -D _/Users/jbd/src/hello -I $WORK -pack ./hello.go
cd .
/Users/jbd/go/pkg/tool/darwin_amd64/link -o $WORK/hello/_obj/exe/a.out -L $WORK -extld=clang -buildmode=exe -buildid=d934a5702088e0fe5c931a55ff26bec87b80cbdc $WORK/hello.a
mv $WORK/hello/_obj/exe/a.out hello

Intermediate assembly

In Go, there is an intermediate assembly phase before generating the actual arch-specific assembly. The compiler takes some Go files, generates the intermediate instructions and escalates it to the obj package to generate the machine code. If you are curious about what compiler generates in this phase, -S makes the compiler dump the output.

The intermediate assembly is generally a good reference to understand the cost of a Go line. Or it could be be a great reference if you want to replace, let’s say, a Go function with a more optimized assembly equivalent.

You are seeing the output for the main.main here.

$ go build -gcflags="-S"
# hello
"".main t=1 size=179 args=0x0 locals=0x60
    0x0000 00000 (/Users/jbd/src/hello/hello.go:5)  TEXT    "".main(SB), $96-0
    0x0000 00000 (/Users/jbd/src/hello/hello.go:5)  MOVQ    (TLS), CX
    0x0009 00009 (/Users/jbd/src/hello/hello.go:5)  CMPQ    SP, 16(CX)
    0x000d 00013 (/Users/jbd/src/hello/hello.go:5)  JLS 169
    0x0013 00019 (/Users/jbd/src/hello/hello.go:5)  SUBQ    $96, SP
    0x0017 00023 (/Users/jbd/src/hello/hello.go:5)  MOVQ    BP, 88(SP)
    0x001c 00028 (/Users/jbd/src/hello/hello.go:5)  LEAQ    88(SP), BP
    0x0021 00033 (/Users/jbd/src/hello/hello.go:5)  FUNCDATA    $0, gclocals·69c1753bd5f81501d95132d08af04464(SB)
    0x0021 00033 (/Users/jbd/src/hello/hello.go:5)  FUNCDATA    $1, gclocals·e226d4ae4a7cad8835311c6a4683c14f(SB)
    0x0021 00033 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $2, "".autotmp_1+64(SP)
    0x002a 00042 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $0, "".autotmp_0+72(SP)
    0x0033 00051 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $0, "".autotmp_0+80(SP)
    0x003c 00060 (/Users/jbd/src/hello/hello.go:7)  LEAQ, AX
    0x0043 00067 (/Users/jbd/src/hello/hello.go:7)  MOVQ    AX, (SP)
    0x0047 00071 (/Users/jbd/src/hello/hello.go:7)  LEAQ    "".autotmp_1+64(SP), AX
    0x004c 00076 (/Users/jbd/src/hello/hello.go:7)  MOVQ    AX, 8(SP)
    0x0051 00081 (/Users/jbd/src/hello/hello.go:7)  PCDATA  $0, $1
    0x0051 00081 (/Users/jbd/src/hello/hello.go:7)  CALL    runtime.convT2E(SB)
    0x0056 00086 (/Users/jbd/src/hello/hello.go:7)  MOVQ    16(SP), AX
    0x005b 00091 (/Users/jbd/src/hello/hello.go:7)  MOVQ    24(SP), CX
    0x0060 00096 (/Users/jbd/src/hello/hello.go:7)  MOVQ    AX, "".autotmp_0+72(SP)
    0x0065 00101 (/Users/jbd/src/hello/hello.go:7)  MOVQ    CX, "".autotmp_0+80(SP)
    0x006a 00106 (/Users/jbd/src/hello/hello.go:7)  LEAQ    go.string."sum: %v\n"(SB), AX
    0x0071 00113 (/Users/jbd/src/hello/hello.go:7)  MOVQ    AX, (SP)
    0x0075 00117 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $8, 8(SP)
    0x007e 00126 (/Users/jbd/src/hello/hello.go:7)  LEAQ    "".autotmp_0+72(SP), AX
    0x0083 00131 (/Users/jbd/src/hello/hello.go:7)  MOVQ    AX, 16(SP)
    0x0088 00136 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $1, 24(SP)
    0x0091 00145 (/Users/jbd/src/hello/hello.go:7)  MOVQ    $1, 32(SP)
    0x009a 00154 (/Users/jbd/src/hello/hello.go:7)  PCDATA  $0, $1
    0x009a 00154 (/Users/jbd/src/hello/hello.go:7)  CALL    fmt.Printf(SB)
    0x009f 00159 (/Users/jbd/src/hello/hello.go:8)  MOVQ    88(SP), BP
    0x00a4 00164 (/Users/jbd/src/hello/hello.go:8)  ADDQ    $96, SP
    0x00a8 00168 (/Users/jbd/src/hello/hello.go:8)  RET
    0x00a9 00169 (/Users/jbd/src/hello/hello.go:8)  NOP
    0x00a9 00169 (/Users/jbd/src/hello/hello.go:5)  PCDATA  $0, $-1
    0x00a9 00169 (/Users/jbd/src/hello/hello.go:5)  CALL    runtime.morestack_noctxt(SB)
    0x00ae 00174 (/Users/jbd/src/hello/hello.go:5)  JMP 0

If you would like to learn more insights about the intermediate assembly and why it is significant in Go, I highly recommend Rob Pike’s The Design of the Go Assembler from GopherCon this year.


As I mentioned, -S is only for the intermediate assembly. The actual machine representation is available in the final artifact. You can use a disassembler to examine what’s inside. Use go tool objdump with a binary or library. You probably also want to use -s to focus by symbol name. In this example, I am dumping the main.main. This is the actual assembly generated for darwin/amd64.

$ go tool objdump -s main.main hello
TEXT main.main(SB) /Users/jbd/src/hello/hello.go
    hello.go:5  0x2040  65488b0c25a0080000  GS MOVQ GS:0x8a0, CX
    hello.go:5  0x2049  483b6110            CMPQ 0x10(CX), SP
    hello.go:5  0x204d  0f8696000000        JBE 0x20e9
    hello.go:5  0x2053  4883ec60            SUBQ $0x60, SP
    hello.go:5  0x2057  48896c2458          MOVQ BP, 0x58(SP)
    hello.go:5  0x205c  488d6c2458          LEAQ 0x58(SP), BP
    hello.go:7  0x2061  48c744244002000000  MOVQ $0x2, 0x40(SP)
    hello.go:7  0x206a  48c744244800000000  MOVQ $0x0, 0x48(SP)
    hello.go:7  0x2073  48c744245000000000  MOVQ $0x0, 0x50(SP)
    hello.go:7  0x207c  488d053d4d0800      LEAQ 0x84d3d(IP), AX


Sometimes, all you need is to check the symbols rather than understand the code or data sections. Similar to the general-purpose nm tool, Go distributes an nm which allows you to list the symbols in an artifact with annotations and size. Pretty handy if you want to see what’s in and being exported from a Go binary or library.

$ go tool nm hello
f4760 B __cgo_init
f4768 B __cgo_notify_runtime_init_done
f4770 B __cgo_thread_start
4fb70 T __rt0_amd64_darwin
4e220 T _gosave
4fb90 T _main
ad1e0 R _masks
4fd00 T _nanotime
4e480 T _setg_gcc
ad2e0 R _shifts
624a0 T errors.(*errorString).Error
62400 T errors.New
52470 T fmt.(*buffer).WriteRune


With the contribution of the new SSA backend, the team contributed a tool that visualizes all the SSA passes. Set the GOSSAFUNC env variable with a func name and run the go build command. It will generate an ssa.html file that shows all the steps the compiler has gone through to optimize your code.

$ GOSSAFUNC=main go build && open ssa.html

Here, there is a visualization of all the passes applied to the main function.

SSA optimizations

The Go compiler can also annotate the inlining and escape analysis. If you pass the -m=2 flag to the compiler, it will output the optimizations or annotations related to these two. Here we see the net/context package related inlining operations and escape analysis.

$ go build -gcflags="-m"
../ can inline Background as: func() Context { return background }
../ can inline TODO as: func() Context { return todo }
../ cannot inline WithCancel: non-leaf function
../ cannot inline WithDeadline: non-leaf function
../ cannot inline WithTimeout: non-leaf function
../ inlining call to time.Time.Add method(time.Time) func(time.Duration) time.Time { time.t·2.sec += int64(time.d·3 / time.Duration(1000000000)); var time.nsec·4 int32; time.nsec·4 = <N>; time.nsec·4 = time.t·2.nsec + int32(time.d·3 % time.Duration(1000000000)); if time.nsec·4 >= int32(1000000000) { time.t·2.sec++; time.nsec·4 -= int32(1000000000) } else { if time.nsec·4 < int32(0) { time.t·2.sec--; time.nsec·4 += int32(1000000000) } }; time.t·2.nsec = time.nsec·4; return time.t·2 }
../ cannot inline WithValue: non-leaf function
../ background escapes to heap
../     from ~r0 (return) at ../
../ todo escapes to heap
../     from ~r0 (return) at ../
../ parent escapes to heap
../     from parent (passed to function[unknown]) at ../
../ leaking param: parent
../     from parent (interface-converted) at ../
../     from parent (passed to function[unknown]) at ../
../ parent escapes to heap
../     from parent (passed to function[unknown]) at ../
../ leaking param: parent
../     from parent (interface-converted) at ../
../     from parent (passed to function[unknown]) at ../
../ leaking param: deadline
../     from deadline (passed to function[unknown]) at ../
../ ctx escapes to heap
../     from ~r2 (return) at ../
../ leaking param: parent
../     from parent (passed to function[unknown]) at ../
../ parent escapes to heap
../     from parent (passed to function[unknown]) at ../
../ leaking param: parent
../     from parent (interface-converted) at ../
../     from parent (passed to function[unknown]) at ../
../ leaking param: key
../     from key (passed to function[unknown]) at ../
../ leaking param: val
../     from val (passed to function[unknown]) at ../
../ context.WithValue(parent, key, val) escapes to heap
../     from ~r3 (return) at ../
<autogenerated>:1: leaking param: .this
<autogenerated>:1:  from .this.Deadline() (receiver in indirect call) at <autogenerated>:1
<autogenerated>:2: leaking param: .this
<autogenerated>:2:  from .this.Done() (receiver in indirect call) at <autogenerated>:2
<autogenerated>:3: leaking param: .this
<autogenerated>:3:  from .this.Err() (receiver in indirect call) at <autogenerated>:3
<autogenerated>:4: leaking param: key
<autogenerated>:4:  from .this.Value(key) (parameter to indirect call) at <autogenerated>:4
<autogenerated>:4: leaking param: .this
<autogenerated>:4:  from .this.Value(key) (receiver in indirect call) at <autogenerated>:4

You can use -m to see a less verbose output without reasonings, but David Chase says even though -m=2 is not perfect, it is often useful.

It is worth it to mention that you often need to disable optimizations to have a simpler view of what’s going on, since optimization can change the sequence of operations, add code, remove code or apply transformations to the code. With optimizations, it gets harder to map a Go line to the optimized output and it gets harder to do perf testing where you end up changing more than one thing at a time. Disabling optimizations is possible with -N, and disabling inlining is possible with -l.

$ go build -gcflags="-l -N"

Once optimizations are disabled, you can debug without being affected by transformations or do perf testing without being affected more than one change at a time.


If you are working on the lexer, the compiler provides a flag to debug the lexer as it goes through the source files.

$ go build -gcflags="-x"
# hello
lex: ident main
lex: implicit semi
lex: string literal
lex: implicit semi
lex: FUNC
lex: ident main
./hello.go:5 lex: TOKEN '('
./hello.go:5 lex: TOKEN ')'
./hello.go:5 lex: TOKEN '{'
lex: ident sum
./hello.go:6 lex: TOKEN COLAS
lex: integer literal
./hello.go:6 lex: TOKEN '+'
lex: integer literal
lex: implicit semi
lex: ident fmt
./hello.go:7 lex: TOKEN '.'
lex: ident Printf
./hello.go:7 lex: TOKEN '('
lex: string literal
./hello.go:7 lex: TOKEN ','
lex: ident sum
./hello.go:7 lex: TOKEN ')'
lex: implicit semi
./hello.go:8 lex: TOKEN '}'
lex: implicit semi

If you have any suggestions or comments, please ping @rakyll.