Structure Data with Protocol Buffers in GoLang

Structure Data with Protocol Buffers in GoLang

When building distributed services, you’re communicating between the services over a network.

To send data (such as your structs) over a network, you need to encode the data in a format to transmit, and lots of programmers choose JSON.

When you’re building public APIs or you’re creating a project where you don’t control the clients, JSON makes sense because it’s accessible—both for humans to read and computers to parse.

But when you’re building private APIs or building projects where you do control the clients, you can make use of a mechanism for structuring and transmitting data that is compared to JSON, and that makes you

  • more productive

  • create faster services

  • have more features

  • have fewer bugs

So what is this mechanism?

Protocol buffers

Protocol buffers (also known as protobuf), which is Google’s language and platform-neutral extensible mechanism for structuring and serializing data.

The advantages of using protobuf :

  • Guarantees type-safety;

  • Prevents schema-violations;

  • Enables fast serialization;

  • Offers backward compatibility.

yeah, I heard you,

what's backward compatibility?

in software or technology, backward compatibility means that new versions or updates can still work with older versions or systems. For example, if you upgrade your phone's operating system, backward compatibility ensures that your existing apps still work without any issues. It's like ensuring that the new fits seamlessly with the old, allowing for smooth transitions and continued use of older components or systems alongside newer ones.

what you can do with Protobuf?

  • define your data structure,

  • compile protobuf into code in many languages,

  • read and write structured data to and from different data streams.

Protocol buffers are good for communicating between two systems (such as microservices), which is why Google used protobuf when building gRPC to develop a high-performance remote procedure call (RPC) framework.

Why Use Protocol Buffers?

Protobuf offers all kinds of useful features:

  • Consistent schemas

    • define your data schemas once and share them across services where a central repository ("structs") is housed in microservices, ensuring a consistent data model throughout your system
  • Versioning for free -- maintain backward compatibility

    • By allowing developers to number fields in messages, ensuring backward compatibility as new features and changes are rolled out, while also providing mechanisms to mark deprecated fields as reserved, preventing their use and prompting compiler errors if attempted.
  • Less boilerplate

    • handle encoding and decoding for you, which means you don’t have to handwrite that code yourself.
  • Extensibility

    • through compiler-supported extensions, enabling the generation of custom code logic during compilation, such as automatically generating common methods across multiple structs.
  • Language agnosticism

    • Protobuf is implemented in many languages
  • Performance

    • highly performant, has smaller payloads, and serializes up to six times faster than JSON

Protobuf Vs JSON

key differences between Protobuf and JSON

When choosing between JSON and Protobuf, consider the specific needs of your project.

  • JSON is often preferred for its ease of use, human readability, and broad compatibility, making it a good choice for web APIs and configurations.
  • Protobuf, on the other hand, offers advantages in performance, efficiency, and type safety, making it better suited for internal microservices communication, especially in performance-critical applications.
AspectJSONProtobuf
FormatText-based, human-readableBinary, not human-readable
SizeLarger due to text format, which increases payloadSmaller, efficient encoding, which reduces payload
SpeedGenerally slower serialization/deserialization due to parsing textFaster serialization/deserialization due to compact binary format
CompatibilityBroadly supported across many programming languages and systemsRequires specific support; while widely supported, it's not as universal as JSON
SchemaSchema-less, flexible structureRequires predefined schema, which enforces structure and data types
VersioningLess formal support for versioning; changes in data structure can lead to issuesStrong support for backward and forward compatibility through explicit versioning
Ease of UseEasy to use directly with minimal setup, great for debuggingRequires initial setup to define schema, less straightforward for beginners
InteroperabilityExcellent for APIs where interoperability with web technologies is crucialIdeal for internal communication where efficiency is critical and environments are controlled
Type SafetyLess type-safe, relies on runtime interpretationHighly type-safe, with compile-time checks
Tooling and SupportExtensive tooling is available due to its ubiquity in web developmentGood tooling, especially within systems designed for high efficiency, but can be more complex to set up

Benchmark Results between Protobuf and JSON

Here's a hypothetical example illustrating the kind of differences you might see in a benchmark comparing JSON and Protobuf:

MetricJSONProtobufImprovement with Protobuf
Serialization Time (ms)2.00.54x faster
Deserialization Time (ms)2.50.6~4.2x faster
Payload Size (KB)103~3.3x smaller
CPU Usage during SerializationHighLowMore efficient
Memory Usage during SerializationModerateLowMore efficient

Serialization/Deserialization Speed

  • Protobuf is generally much faster than JSON for both serialization and deserialization. Protobuf can outperform JSON by a factor of 3x to 6x in speed. This is because Protobuf uses a binary format and a predefined schema, which allows for more efficient parsing.

Payload Size

  • Protobuf payloads are significantly smaller than JSON, often by 50% to 80%. This reduction in payload size leads to lower bandwidth usage and can be critical in network-constrained environments or for mobile applications where data usage is a concern.

System Resource Utilization

  • Protobuf uses fewer CPU resources than JSON due to its binary format and efficient parsing and serialization mechanisms. This can lead to lower server costs and better scalability in distributed systems

These results are illustrative and can vary based on the complexity of the data structure, the programming language, and the specific implementation. In general, Protobuf's advantages in speed and efficiency make it a preferred choice for internal service communication in distributed systems where performance and resource utilization are critical concerns. However, for public APIs or scenarios where human readability and ease of debugging are paramount, JSON might still be the preferred format.

Install the Protocol Buffer Compiler

download and install in your terminal like so:

$ wget https://github.com/protocolbuffers/protobuf/\
releases/download/v25.3/protoc-25.3-linux-x86_64.zip
$ unzip protoc-25.3-linux-x86_64.zip -d /usr/local/protobuf

Then add the binary to your PATH env var using your shell’s configuration file. If you’re using ZSH for instance, run something like the following to update your configuration:

$ echo 'export PATH="$PATH:/usr/local/protobuf/bin"' >> ~/.zshenv
  • At this point, the protobuf compiler is installed on your machine. To test the installation, run protoc --version.
$ protoc --version
----------------------------------
output :
libprotoc 25.3

If you do see errors, don’t worry: few installation problems, Google them you will find answers right away.

Define Your Domain Types as Protocol Buffers

In the previous tutorial, we defined our Record type in Go as this struct:

type Record struct {
    Value []byte `json:"value"`
    Offset uint64 `json:"offset"`
}

let's turn that into a protobuf message

create an api/v1 directory and create a file called log.proto

syntax = "proto3";
package log.v1;
option go_package = "github.com/user/api/log_v1";

message Record {
    bytes value = 1;
    uint64 offset = 2;
}
  • protobuf messages are equivalent to the Go structs

  • use the repeated keyword to define a slice of some type, so repeated Record records mean the records field is a []Record in Go.

  • These field numbers identify your fields in the marshaled binary format, and you shouldn’t change them once your messages are in use in your projects.

Compile Protocol Buffers

To compile protobuf into the code you need the runtime. The compiler itself doesn’t know how to compile protobuf into every language—it needs a language-specific runtime to do so.

inside the ~/.bashrc file write the two lines and run the file source ~/.bashrc

export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
Install
____________
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go get google.golang.org/protobuf

Let's compile your protobuf by running that command at the root of the project

protoc api/v1/*.proto \
--go_out=. \
--go_opt=paths=source_relative \
--proto_path=.

Now look at the api/v1 directory and you’ll see a new file called log.pb.go. Open it up to see the Go code that the compiler generated from your protobuf code.

check this issue in Stackoverflow if you get that error**
Error "protoc-gen-go: program not found or is not executable"**

link: https://stackoverflow.com/questions/57700860/error-protoc-gen-go-program-not-found-or-is-not-executable#:~:text=Go%201.17%2B,file.go

Work with the Generated Code

Although the generated code in log.pb.go is a lot longer than your handwritten code in log.go, use the code as you handwritten it. For example, you’ll create instances using the & operator (or new keyword) and access fields using a dot.

The compiler generates various methods on the struct, but the only methods you’ll use directly are the getters. they a useful when you have multiple messages with the same getter(s) and you want to abstract those method(s) into an interface.

For example, imagine building an e-commerce shop that sells books and games, every item has a different price, now you want to find the total of the items in the user’s cart. You’d make a Pricer interface (the abstraction) and a Total function that takes in a slice of Pricer interfaces and returns their total cost. Here’s what the code would look like

// book.go
type Book struct {
    Price uint64
}
func(b *Book) GetPrice() uint64 { // ... }

// game.go
type Game struct {
    Price uint64
}
func(b *Game) GetPrice() uint64 { // ... }

// calculate-price-service.go
type Pricer interface {
    GetPrice() uint64
}
func Total(items []Pricer) uint64 { // ... }

By this way, you can pass any item to the total function if it implements the GetPeice method, and that's how interfaces work in Golang.

the thing here what if you want to change the price of all your inventory? books, games, or others.

if we just had setters, we could use an interface like the following to set the price on the different kinds of items in your inventory:

type PriceAdjuster interface {
    SetPrice(price uint64)
}

When the compiled code isn’t quite what you need, you can extend the code and add your customization to it.

References :