When building distributed services, you’re communicating between the services over a network.
To send data (such as your structs) over a network, you need to encode the data in a format to transmit, and lots of programmers choose JSON.
When you’re building public APIs or you’re creating a project where you don’t control the clients, JSON makes sense because it’s accessible—both for humans to read and computers to parse.
But when you’re building private APIs or building projects where you do control the clients, you can make use of a mechanism for structuring and transmitting data that is compared to JSON, and that makes you
more productive
create faster services
have more features
have fewer bugs
So what is this mechanism?
Protocol buffers
Protocol buffers (also known as protobuf), which is Google’s language and platform-neutral extensible mechanism for structuring and serializing data.
The advantages of using protobuf :
Guarantees type-safety;
Prevents schema-violations;
Enables fast serialization;
Offers backward compatibility.
yeah, I heard you,
what's backward compatibility?
in software or technology, backward compatibility means that new versions or updates can still work with older versions or systems. For example, if you upgrade your phone's operating system, backward compatibility ensures that your existing apps still work without any issues. It's like ensuring that the new fits seamlessly with the old, allowing for smooth transitions and continued use of older components or systems alongside newer ones.
what you can do with Protobuf?
define your data structure,
compile protobuf into code in many languages,
read and write structured data to and from different data streams.
Protocol buffers are good for communicating between two systems (such as microservices), which is why Google used protobuf when building gRPC to develop a high-performance remote procedure call (RPC) framework.
Why Use Protocol Buffers?
Protobuf offers all kinds of useful features:
Consistent schemas
- define your data schemas once and share them across services where a central repository ("structs") is housed in microservices, ensuring a consistent data model throughout your system
Versioning for free -- maintain backward compatibility
- By allowing developers to number fields in messages, ensuring backward compatibility as new features and changes are rolled out, while also providing mechanisms to mark deprecated fields as reserved, preventing their use and prompting compiler errors if attempted.
Less boilerplate
- handle encoding and decoding for you, which means you don’t have to handwrite that code yourself.
Extensibility
- through compiler-supported extensions, enabling the generation of custom code logic during compilation, such as automatically generating common methods across multiple structs.
Language agnosticism
- Protobuf is implemented in many languages
Performance
- highly performant, has smaller payloads, and serializes up to six times faster than JSON
Protobuf Vs JSON
key differences between Protobuf and JSON
When choosing between JSON and Protobuf, consider the specific needs of your project.
- JSON is often preferred for its ease of use, human readability, and broad compatibility, making it a good choice for web APIs and configurations.
- Protobuf, on the other hand, offers advantages in performance, efficiency, and type safety, making it better suited for internal microservices communication, especially in performance-critical applications.
Aspect | JSON | Protobuf |
Format | Text-based, human-readable | Binary, not human-readable |
Size | Larger due to text format, which increases payload | Smaller, efficient encoding, which reduces payload |
Speed | Generally slower serialization/deserialization due to parsing text | Faster serialization/deserialization due to compact binary format |
Compatibility | Broadly supported across many programming languages and systems | Requires specific support; while widely supported, it's not as universal as JSON |
Schema | Schema-less, flexible structure | Requires predefined schema, which enforces structure and data types |
Versioning | Less formal support for versioning; changes in data structure can lead to issues | Strong support for backward and forward compatibility through explicit versioning |
Ease of Use | Easy to use directly with minimal setup, great for debugging | Requires initial setup to define schema, less straightforward for beginners |
Interoperability | Excellent for APIs where interoperability with web technologies is crucial | Ideal for internal communication where efficiency is critical and environments are controlled |
Type Safety | Less type-safe, relies on runtime interpretation | Highly type-safe, with compile-time checks |
Tooling and Support | Extensive tooling is available due to its ubiquity in web development | Good tooling, especially within systems designed for high efficiency, but can be more complex to set up |
Benchmark Results between Protobuf and JSON
Here's a hypothetical example illustrating the kind of differences you might see in a benchmark comparing JSON and Protobuf:
Metric | JSON | Protobuf | Improvement with Protobuf |
Serialization Time (ms) | 2.0 | 0.5 | 4x faster |
Deserialization Time (ms) | 2.5 | 0.6 | ~4.2x faster |
Payload Size (KB) | 10 | 3 | ~3.3x smaller |
CPU Usage during Serialization | High | Low | More efficient |
Memory Usage during Serialization | Moderate | Low | More efficient |
Serialization/Deserialization Speed
- Protobuf is generally much faster than JSON for both serialization and deserialization. Protobuf can outperform JSON by a factor of 3x to 6x in speed. This is because Protobuf uses a binary format and a predefined schema, which allows for more efficient parsing.
Payload Size
- Protobuf payloads are significantly smaller than JSON, often by 50% to 80%. This reduction in payload size leads to lower bandwidth usage and can be critical in network-constrained environments or for mobile applications where data usage is a concern.
System Resource Utilization
- Protobuf uses fewer CPU resources than JSON due to its binary format and efficient parsing and serialization mechanisms. This can lead to lower server costs and better scalability in distributed systems
These results are illustrative and can vary based on the complexity of the data structure, the programming language, and the specific implementation. In general, Protobuf's advantages in speed and efficiency make it a preferred choice for internal service communication in distributed systems where performance and resource utilization are critical concerns. However, for public APIs or scenarios where human readability and ease of debugging are paramount, JSON might still be the preferred format.
Install the Protocol Buffer Compiler
install the compiler, Go to the Protobuf release page on GitHub Link
download the relevant release for your computer, I'm using Linux Ubuntu
download and install in your terminal like so:
$ wget https://github.com/protocolbuffers/protobuf/\
releases/download/v25.3/protoc-25.3-linux-x86_64.zip
$ unzip protoc-25.3-linux-x86_64.zip -d /usr/local/protobuf
Then add the binary to your PATH env var using your shell’s configuration file. If you’re using ZSH for instance, run something like the following to update your configuration:
$ echo 'export PATH="$PATH:/usr/local/protobuf/bin"' >> ~/.zshenv
- At this point, the protobuf compiler is installed on your machine. To test the installation, run
protoc --version.
$ protoc --version
----------------------------------
output :
libprotoc 25.3
If you do see errors, don’t worry: few installation problems, Google them you will find answers right away.
Define Your Domain Types as Protocol Buffers
In the previous tutorial, we defined our Record type in Go as this struct:
type Record struct {
Value []byte `json:"value"`
Offset uint64 `json:"offset"`
}
let's turn that into a protobuf message
create an api/v1
directory and create a file called log.proto
syntax = "proto3";
package log.v1;
option go_package = "github.com/user/api/log_v1";
message Record {
bytes value = 1;
uint64 offset = 2;
}
protobuf messages are equivalent to the Go structs
use the repeated keyword to define a slice of some type, so repeated Record records mean the records field is a []Record in Go.
These field numbers identify your fields in the marshaled binary format, and you shouldn’t change them once your messages are in use in your projects.
Compile Protocol Buffers
To compile protobuf into the code you need the runtime. The compiler itself doesn’t know how to compile protobuf into every language—it needs a language-specific runtime to do so.
inside the ~/.bashrc
file write the two lines and run the file source ~/.bashrc
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
Install
____________
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
go get google.golang.org/protobuf
Let's compile your protobuf by running that command at the root of the project
protoc api/v1/*.proto \
--go_out=. \
--go_opt=paths=source_relative \
--proto_path=.
Now look at the api/v1
directory and you’ll see a new file called log.pb.go
. Open it up to see the Go code that the compiler generated from your protobuf code.
check this issue in Stackoverflow if you get that error**
Error "protoc-gen-go: program not found or is not executable"**
Work with the Generated Code
Although the generated code in log.pb.go
is a lot longer than your handwritten code in log.go
, use the code as you handwritten it. For example, you’ll create instances using the &
operator (or new
keyword) and access fields using a dot.
The compiler generates various methods on the struct, but the only methods you’ll use directly are the getters. they a useful when you have multiple messages with the same getter(s) and you want to abstract those method(s) into an interface.
For example, imagine building an e-commerce shop that sells books and games, every item has a different price, now you want to find the total of the items in the user’s cart. You’d make a Pricer interface (the abstraction) and a Total function that takes in a slice of Pricer interfaces and returns their total cost. Here’s what the code would look like
// book.go
type Book struct {
Price uint64
}
func(b *Book) GetPrice() uint64 { // ... }
// game.go
type Game struct {
Price uint64
}
func(b *Game) GetPrice() uint64 { // ... }
// calculate-price-service.go
type Pricer interface {
GetPrice() uint64
}
func Total(items []Pricer) uint64 { // ... }
By this way, you can pass any item to the total function if it implements the GetPeice method, and that's how interfaces work in Golang.
the thing here what if you want to change the price of all your inventory? books, games, or others.
if we just had setters, we could use an interface like the following to set the price on the different kinds of items in your inventory:
type PriceAdjuster interface {
SetPrice(price uint64)
}
When the compiled code isn’t quite what you need, you can extend the code and add your customization to it.