- Introduction: The Publish-Subscribe Pattern
- The Goal: ZMQ vs Fast RTPS Performance comparison
- ZMQ vs eProsima Fast RTPS - Latency Benchmarks
- ZMQ vs eProsima Fast RTPS – Throughput Benchmarks
- More Information
Most of the middleware options available today (Web Services, Rest, Apache Thrift, RMI...) are based in the request-reply pattern: A client interacts with a Server Resource, Service, or procedure requesting a reply. This is the most natural way for the developer, when he needs something, he asks and gets an answer from a server.
But in many situations is better to use a different pattern, the publish-Subscribe pattern: as soon a publisher has interesting data publish this data to the interested subscribers. The main benefits are:
- Performance: Better latency and throughput. The data is sent as soon is available.
- Highly decoupled model: it is not required to ask periodically for the data, the subscriber just declare its interest in receiving the data updates.
Examples: ZMQ and eProsima Fast RTPS
The emergence of large and exigent distributed systems in the Internet of Things requires lightweight and fast performance middleware (see our blog post "Protocols for Fog Computing: RTPS/DDS"), and among the options available we compared ZMQ vs eProsima Fast RTPS, both of them are high performance asynchronous middleware implementing the publish-subscribe Pattern.
ZMQ is a messaging middleware but does not require a message broker, and implements several communication patters, including publish-subscribe and request-reply. Serialization and deserialization of the messages should be implemented by the user. The API looks like a socket library.
eProsima Fast RTPS is a high performance implementation of the Real-Time Publish Subscribe protocol (see our Introduction to RTPS), offering a simple pub-sub API. The product includes serialization support through code generation from an Interface Definition Language (IDL), using an approach similar to Apache Thrift, and an ultra-fast serialization library: eProsima Fast Buffers. The Request-Reply patterns is also available using our eProsima RPC over DDS.
The goal of these tests is to measure and compare the Latency and the Throughput between eProsima Fast RTPS and ZMQ using the publish-subscribe pattern. In both cases, we serialize the data using eProsima Fast Buffers, a really fast serialization engine.
- Fedora 20 64bit as OS
The performance benchmarks presented in this article were obtained using from two to four computers with the following characteristics:
- Intel Core i3 @3.4GHz
- 4GB RAM
- Intel Gigabit Network adapter at 1Gbps
Middleware Releases and configuration:
- eProsima Fast RTPS 1.0
- Serialization: eProsima Fast Buffers 0.3.0
- Mode: Reliable over UDP (unicast & multicast), Automatic Discovery
- ZMQ 4.0.5
- Serialization: eProsima Fast Buffers 0.3.0
- Pub Sub Comms
Differences to consider
There are some differences between eProsima Fast RTPS and ZMQ that need to be presented and analyzed.
- Transport: eProsima Fast RTPS is an implementation of the Real Time Publish Subscribe Protocol over UDP. This protocol includes its own ACK/NACK based reliability protocol in both unicast and multicast. ZMQ uses TCP.
- Protocol Headers: Another important difference, and one that directly affects to the performance, is the header of each protocol. RTPS is a much more versatile protocol that is designed to be implemented over protocols with no reliability. It also has many other possibilities (keyed topics, ordered delivery, etc...) and thus its header is larger. As we will later see in the tests results, ZMQ is slightly better eProsima Fast RTPS for very small messages and the most likely explanation is the smaller header.
- Node Discovery: The discovery mechanism is also a factor to take into account. eProsima Fast RTPS comes with a built-in endpoint discovery mechanism. The user only needs to specify the topic name and the topic data type and, if the Qos are compatible, the middleware directly matches the publishers and the susbcribers. This allows for a much easier setup and configuration. However ZMQ does not have such a mechanism and the user needs to manually set the IPs of the publisher and the subscriber in order to achieve communication.
One to One Pub-Sub Latency
The comparison for the one subscriber latency case can be observed in the following plot:
The latency of the ZMQ library is slightly better for small message sizes. However, when the message size increases the latency of Fast RTPS is better that the one obtained with ØMQ. Both of them exhibit the same linear behavior with eProsima Fast RTPS having the smaller slope.
As mentioned before, ZMQ exhibits smaller latencies for message sizes between 16 and 128 bytes. The most likely explanation of this phenomenon is that the header of ØMQ messages is smaller than the header for RTPS messages. With larger message sizes the importance of the header size decreases since it represents a smaller percentage of the transmitted data.
One to Many Pub-Sub Latency
The same test was also carried out with three subscribers matched to the same publisher:
In this case the latency of ZMQ and eProsima Fast RTPS is very similar for small sized messages. As the size increases, the advantages of using Fast RTPS and a multicast broadcast becomes clearer. With message sizes of 16K bytes the difference in Latency can be as high as 200us.
In this case, ZMQ's advantage of having a smaller header is being neutralized by having to send the same data to each of the susbcribers. As you can see the values for the latency for smaller message sizes are very similar between the two implementations, strengthening the position of eProsima Fast RTPS against ZMQ. A higher number of subscribers will likely cause ZMQ to increase its latency values, while Fast RTPS will increase much slower.
The following graph contains the throughput comparison between ZMQ and eProsima Fast RTPS.
This plot shows that ZMQ can achieve higher throughput with smaller message sizes. This is because ZMQ is using TCP, a stream protocol optimized for throughput: Several messages are sent in the same package. This behavior can be simulated in eProsima Fast RTPS using an array of the Type used for the Topic.
eProsima Fast RTPS reaches sooner the maximum throughput and surpass ZMQ for message sizes larger than 1000 bytes.
The latency is usually defined as the amount of time a message takes to traverse a system. In a packet-based network the latency is usually measured either as the one-way latency (the time from the source sending the packet to the destination receiving it) or as the round-trip delay time (the time from source to destination plus the time from the destination back to the source). The latter is more often used since it can be measured from a single point.
In the case of an RTPS communication exchange the latency could be defined as the time it takes a publisher to serialize and send a data message plus the time it takes a matching subscriber to receive and de-serialize it. Applying the same round-trip concept mentioned before, the round-trip latency could be defined as the time it takes a message to be sent by a publisher, received by a subscriber and sent back to the same publisher. For example, in the figure below the round-trip time would be T2-T1 making the latency (T2-T1)/2.
In a multiple subscriber scenario, the measured latency is obtained with a similar procedure. In this case the publisher sends the data to both subscribers but only one responds to the message. In a similar way, the latency is also calculated as (T2-T1)/2.
In communication networks the throughput is usually defined as the rate of successful message delivery over a communication channel. The throughput is usually expressed in bytes per second. There are different methods to measure the throughput of a communication network. The most common ones are to send a large file (or multiple smaller ones) and measure the time that takes to transmit it to another point of the network and afterward divide the amount of data by the time it took to send it.
In the case of an RTPS communication, the throughput can be measured sending groups of messages in a certain amount of time and the obtaining the combined size of the transmitted data. However, to obtain the maximum throughput value, different message demands (D - the number of continuous messages send) must be tried in order to find the best value; i.e., one that maximizes the available send pipe in the publisher without overflowing the receive queue in the subscriber (producing packet losses). A diagram showing the process followed to perform this test is shown next:
Of course the throughput can be measured both at the publisher side (how much data you send) and at the subscriber side (how much data you receive). If no packets are lost the values will be very similar and the slight difference in the values will be caused by differences in time measurement. However, if packets are lost the throughput values will be different depending on the side. To establish a sound measurement rule we will assume that the maximum throughput for each message size will be the one measured in the publisher side provided that no packet loss is experienced in the subscriber side.
- Introduction to RTPS
- eProsima Fast RTPS
- eProsima Fast RTPS Performance
- Protocols for Fog Computing: RTPS/DDS