Highly Scalable: Changes in Google Protocol Buffers v3

Simplified Usage and Expanded Language Support

Overview

Google Protocol Buffers (Protobuf) are a powerful, language-neutral, platform-neutral serialization format for structured data. After the release of version 2 in 2008, the long-awaited v3 version is finally in development. As of March 2015, the alpha2 version has been released, and here’s a summary of the major changes so far.

New Features and Changes

Removal of Default Values and Required Fields
The concept of required fields and default values has been removed. Now, primitive types (like numbers, enums, booleans, and strings) no longer require hasField checks when serializing. Fields are automatically set with appropriate values during serialization.
Removal of Unknown Fields
Protocol Buffers no longer store unknown fields. This helps reduce unnecessary overhead.
Deprecation of Extensions and Introduction of Any Type
The extensions feature has been removed and replaced with a new type called Any. This change offers more flexibility and simplicity.
Improved Enum Values
Previously, it was possible to encounter enums with values whose meaning was unclear. In v3, these have been revised to ensure that their meaning is more understandable and consistent.
Addition of Maps
Protocol Buffers v3 introduces native support for map fields. Maps are stored as unordered collections in memory, and values can be accessed through generated accessors.
Small Adjustments for Time and Dynamic Data Types
Basic data types, like time and dynamic data, now include minor adjustments to improve flexibility and handling.
JSON Support as an Alternative to Binary Encoding
In addition to the binary encoding format, Protobuf v3 now supports JSON as an encoding option, which can be useful for certain use cases like web services.

Language Support

Currently, as of the alpha-2 version, Ruby is the only language officially supported. However, based on the changes mentioned, it’s expected that support for other languages, such as Android Java, Objective-C, and Go, will be added in the future. The languages currently mentioned in mailing lists include Ruby, PHP, Node.js, and Objective-C.

Compatibility with v2

In Protobuf v3, you can determine whether to compile your .proto file using proto2 or proto3 syntax by specifying it in the syntax declaration. If nothing is specified, proto2 is used by default, and you’ll receive a warning.

Since v3 is not backward-compatible with v2, it is recommended to use it for new projects. However, v2 will continue to be supported for the foreseeable future.

Support for Map Fields

Protobuf v3 introduces the map field, which is stored in memory as an unordered map. Values can be accessed via accessors generated by the Protobuf compiler. This adds better handling of key-value pairs, something previously missing in v2.

C++ Arena Allocation Support

Profiling has revealed that memory allocation and deallocation can consume significant CPU time. To optimize this, v3 introduces Arena allocation. New objects are allocated in a pre-allocated memory pool, which eliminates the cost of deallocation and reduces memory fragmentation. This change results in a performance improvement of around 20%–50%.

To enable Arena allocation, you can add the following option in your .proto file:

option cc_enable_arenas = true;

This option instructs the protocol compiler to generate the necessary code for Arena allocation. However, this does not affect the existing API or the wire format of the Protobuf messages. It is recommended to enable this option in new projects, and in the future, Arena allocation will be enabled by default.

Feel free to post this on your Blogger.com site. This version is now more concise and clear for an English-speaking audience. Let me know if you need further adjustments!

Monday, March 9, 2015

Changes in Google Protocol Buffers v3

Simplified Usage and Expanded Language Support