Recently, I stumbled upon a rather interesting bug in a Go project I’m working on at my job. It wasn’t a complex issue, but it highlighted a subtle pitfall related to how we often handle JSON responses, particularly when dealing with pagination. Let me share the story.
The Setup
Our project involves a microservice, let’s call it the “User Service”, that provides user data. Another microservice, let’s call it the “User Consumer,” needs to replicate users from the User Service. The User Service exposes an API that returns user data in pages, along with a token to fetch the next page, if available. The API response initially looked like this:
{
"Users": [
{
"ID": 2,
"Name": "Bob"
}
],
"NextIterator": "some-token-to-next-page"
}
Or, if no more pages were available:
{
"Users": [
{
"ID": 2,
"Name": "Bob"
}
],
"NextIterator": null
}
The Consumer would make a request to the User Service, process the returned users, check the NextIterator
, and, if not
null, repeat the process, passing the NextIterator
in the next request.
The Go code for this looked somewhat like the following. Note that this is a simplified version and the bug is already fixed. For simplicity, this example uses an HTTP request; however, the original implementation called an AWS Lambda function. Also, please note that collecting all data in memory, as shown here, is not ideal but reflects how the legacy code is structured.
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
)
type User struct {
ID int
Name string
}
type usersPage struct {
Users []User
NextIterator *string
}
type UserClient struct {
httpClient *http.Client
baseURL *url.URL
}
func NewUserClient(httpClient *http.Client, baseURL *url.URL) *UserClient {
return &UserClient{httpClient: httpClient, baseURL: baseURL}
}
func (u *UserClient) CollectAllUsers() ([]User, error) {
var allUsers []User
var iterator *string
// The bug was here! `page` was defined outside the loop
var page usersPage
for {
userPageURL := *u.baseURL
query := u.baseURL.Query()
if iterator != nil {
query.Set("iterator", *iterator)
}
userPageURL.RawQuery = query.Encode()
resp, err := http.Get(userPageURL.String())
if err != nil {
return nil, err
}
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
_ = resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
// Reset the page DTO before unmarshalling the new response
// This is one of the ways to fix the bug
page = usersPage{}
if err := json.Unmarshal(body, &page); err != nil {
return nil, err
}
allUsers = append(allUsers, page.Users...)
iterator = page.NextIterator
if iterator == nil {
break
}
}
return allUsers, nil
}
The Problem
Everything worked perfectly until our team decided to simplify the response when no more pages were available. We started returning responses like this:
{
"Users": [
{
"ID": 2,
"Name": "Bob"
}
]
}
Notice the missing NextIterator
field.
This seemingly harmless change caused the User Consumer to get stuck in an infinite loop. The problem was in the
original code, where the usersPage
variable was declared outside the loop and reused with every iteration:
var page usersPage // !!!
for {
// ...
if err := json.Unmarshal(body, &page); err != nil {
// ...
}
// ...
iterator = page.NextIterator
if iterator == nil {
break
}
}
When the User Service stopped sending NextIterator
, the previous value of NextIterator
in page
wasn’t being
cleared. json.Unmarshal
doesn’t zero the struct before writing new values. As result, the iterator
variable within
the loop kept its value from the last non-empty response, preventing the loop from breaking.
The Fix and a Question
The fix was relatively simple: we moved the declaration of page
into the loop:
for {
var page usersPage // fixed!
// ...
if err := json.Unmarshal(body, &page); err != nil {
// ...
}
// ...
iterator = page.NextIterator
if iterator == nil {
break
}
}
This creates a new usersPage
struct in every iteration, effectively resetting NextIterator
each time.
However, this fix led me to question something. The initial code likely intended to avoid unnecessary allocations by
reusing the same page
variable. The updated code now allocates a new usersPage
on each iteration, which feels like a
small regression for performance.
Could we achieve the same result, without introducing additional allocations? Yes. We could explicitly zero out the struct before unmarshalling:
for {
// Reset the page DTO before unmarshalling the new response
page = usersPage{}
if err := json.Unmarshal(body, &page); err != nil {
// ...
}
// ...
iterator = page.NextIterator
if iterator == nil {
break
}
}
Go’s compiler will generate code to zero the struct’s memory. This seems like a nice balance between correctness and potential performance.
This got me wondering: why doesn’t encoding/json
’s Unmarshal
function use reflect.Zero
to zero the destination
struct by default before reading JSON data?
I haven’t delved deep enough to definitively answer that, but I suspect this would lead to more overhead for most use cases where you are unmarshalling into a clean/fresh struct.
Conclusion
The bug was a good reminder that even seemingly simple changes in API responses can have significant impacts on the
code that consumes them. It also highlighted the importance of understanding how json.Unmarshal
works and the need to
ensure that structs are correctly zeroed/reset before unmarshalling data into them.
I hope this story helps you avoid similar bugs in your projects. Happy coding!