- Products
- Solutions Use casesBy industry
- Developers
- Resources Connect
- Pricing
We have a process to diagnose memory leak for Go services. Tools such as pprof and minikube can help us finding the root cause.
Every programming language can cause memory leak.
At Nylas, we build microservices with Golang. We love Golang light-weight runtime and concurrency. Our Golang services run in Kubernetes, making it is easy to scale up and down. We monitor the CPU and memory usage for each Kubernetes pod to make sure services are in good health.
Sometimes, we find pods restarting several times a day without any error. The memory consumption keeps going up, until it reaches the memory limit. Here is a visualization of the memory consumption and pod restarts:
The pod restarts when the memory limit is reached. This potentially impacts service availability, since our API can be temporary unavailable during a restart.
A “wavy” memory consumption may not be a memory leak. It can be caused by Golang’s built-in garbage collection. During garbage collection, the memory consumption swings periodically. This is a typical memory consumption graph for a Go program:
Golang’s garbage collection strategy is the result of a design trade-off between CPU and memory usage. One round of garbage collection can take many CPU cycles to free up memory. If garbage collection runs too frequently, a program can become slow and unresponsive. Go deliberately delays garbage collection unless it is necessary.
By default, the garbage collector starts when new heap size is equal to 100% of live heap size. If live heap takes 20 MiB, the garbage collection only starts after new heap is also 20 MiB. This behavior is configurable by setting the environment variable GOGC
. Check out https://tip.golang.org/doc/gc-guide for more on Golang’s garbage collection.
We quickly ruled out garbage collection as the cause of our memory issue. Our active heap is very small (around 20 MiB), but our memory consumption seems to grow infinitely (even over 1 GiB). In addition, Go garbage collection does not lead to pod restarts. The restarting behavior is clearly a result of exceeding Kubernetes resource limit. We even tried to run garbage collection manually (via runtime.GC()
) periodically to clear memory, but our memory usage still keeps growing.
This is not normal garbage collection. We are quite certain a memory leak is taking place.
Many of our API services use Go Fiber framework. Fiber has a built-in middleware to profile memory called pprof
. Here is how you can install pprof
in your API service:
func main() { // Create fiber server app := fiber.New() // Use pprof to profile memory usage **app.Use(pprof.New())** // Start app.Listen(":8080") }
If your service is using Gorilla/mux, this is how to install pprof
:
func main() { r := mux.NewRouter() AttachProfiler(r) http.ListenAndServe(":8080", r) } func AttachProfiler(router *mux.Router) { router.HandleFunc("/debug/pprof/", pprof.Index) router.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) router.HandleFunc("/debug/pprof/profile", pprof.Profile) router.HandleFunc("/debug/pprof/symbol", pprof.Symbol) }
If the service is not using either of above, you can start a fiber or gorilla mux server on another port.
Once pprof
is installed, run the server and go to localhost:8080/debug/pprof where you will see a UI like this:
Let’s dig a little deeper and understand the different profiles available:
One way of finding a memory leak is by reading the heap sample to find patterns. If a function is always putting new stuff on the heap, it is highly likely to have a memory leak.
See this example heap:
Function appendSlice
is repeatedly allocating memory in the heap. Obviously, this function is causing memory leak.
However, raw heap samples are not easy to read. It is difficult to find memory pattern out of a huge text blob. Fortunately, we have a way to visualize memory usage:
# one time install go install github.com/google/pprof@latest # Dump heap to a file curl http://<HOSTNAME>:<PORT>/debug/pprof/heap > heap.out # Use pprof to interact with heap go tool pprof heap.out # Inside the new command prompt png
After taking the steps above, pprof
will generate a memory allocation diagram named profile001.png
:
In the diagram, each box is a function, and each arrow means a function call. The bigger the box, the higher memory usage. From the graph above, the blame goes to runtime function “allocm” (the largest box near the bottom).
Once we found which function is causing the problem, we can check how the memory is leaked. Let’s look at some typical cases.
Let’s take a look at the following program:
var globalSlice = make([]int64, 0) func appendSlice(c *fiber.Ctx) error { globalSlice = append(globalSlice, time.Now().Unix()) return c.JSON(map[string]int{ "sliceSize": len(globalSlice), }) }
When the server starts, the size of globalSlice
is 0. Every time we call appendSlice
function, it will append a number to the global slice. Since it is a global variable, the slice will live in heap forever. It will keep growing and growing until memory is exhausted.
It is not common to declare a global slice directly. Unbounded slices often hide in global structs. We encourage developers to audit all global variables, and make sure all of them has limited memory allocation.
Take a look at the function below:
func hangingGoRoutine() { go time.Sleep(time.Hour * 24) }
Every time function hangingGoRoutine
is called, a Go routine get created. The Go routine remains alive for 24 hours. If we call this function 1000 times, there will be 1000 go routines. Growing number of Go routines means unbounded memory. If a Go routine is not properly closed, it will also result in a memory leak.
Usually, a hanging Go routine is not as simple as the sleeper example above. It can be an http client that keeps connection alive. It can also be a dead loop. Long polling or web socket client both keep connection open forever. If you are going to use a never-ending Go routine, make sure there is only one such connection.
Let’s consider the code below:
func openFile() { file, err := os.Open("/path/to/file.txt") // defer file.Close() if err != nil { log.Fatal(err) } }
This code will result in a memory leak, because os.Open
opens a file stream, but never closes it. If openFile
function is called repeatedly, it will keep allocating memory for the new file streams. Make sure you always call defer file.Close()
after opening a file.
Let’s look at another example:
func makeHttpCall() { client := &http.Client { } req, err := http.NewRequest(method, url, nil) if err != nil { panic(err) } res, err := client.Do(req) if err != nil { panic(err) } body, err := ioutil.ReadAll(res.Body) // defer res.Body.Close() if err != nil { panic(err) } fmt.Println(string(body)) }
The code above also causes memory leak. Function ioutil.ReadAll
reads the response body, but never close it. The function needs to call defer res.Body.Close()
.
It turns out that this is exactly what happened in our system. Our services makes a http call, but forgot to close the response body when reading it.
We do not suggest using fiber’s pprof
in production:
pprof
tool exposes an endpoint /debug/pprof
. This endpoint is not protected by authentication. If your API is public, anyone in the world can see your memory allocation.We encourage profiling memory locally. You can deploy your microservice locally with Minikube and Tilt. For more on deploying services locally, you can check my previous article here: https://www.nylas.com/blog/how-we-test-microservices-locally-at-nylas/
Minikube does not show CPU and memory usage graph by default. You will need to enable metric server plugin:
minikube addons enable metrics-server minikbue start minikube dashboard
Wait for some time until there is enough data. Then you will see a graph like this:
To simulate API traffic, you can write a simple script like this:
#!/bin/bash for CALL_I in {1..10000} do curl --location --request GET 'http://localhost:8080/append-slice' echo "Calling /append-slice $CALL_I/10000" done
Watch the memory graph when running this script. If there is a memory leak, memory usage graph will rise and never going down.
I made a proof of concept repository for investigating Go memory leaks: https://github.com/quzhi1/GoMemoryLeak. You can clone this repo, see an example of memory leaking service, and investigate using the steps above.
If a service has growing memory usage, check whether it is a memory leak. Consider profiling tools such as pprof
to find which function is causing the leak. Do an audit of your code base, and find how the memory was leaked. Lastly, try to reproduce the problem and the fix locally using minikube.
Special thanks to all Nylanauts who helped building and deploying Kong plugins:
You can sign up Nylas for free and start building!
Zhi is a staff engineer who leads the Developer Velocity team at Nylas. In 2021, he left Stripe and became a Nylanaut. Zhi is also a history buff who loves traveling.