•  
  •  

Benchmarking DDS Implementations - Introduction

The existence of different DDS implementations motivates the execution of performance comparisons across them. In this article we present a comparison between Fast-RTPS, CyclondeDDS, and OpenSplice, comparing both Latency and Throughput, in three different scenarios: Dual Host, interprocess and intraprocess.

In all the tested cases, Fast RTPS exposes a lower latency and higher throughput than the analyzed alternatives. In the following pictures, we show the graphs corresponding to the dual-machine case as an introduction, and in the following sections, we will detail the rest of the experiments.

Benchmarking DDS Implementations - INDEX

This article is divided into the following sections:

 

Latency Comparison - Introduction

The existence of different DDS implementations motivates the execution of performance comparisons across them. In this article, a comparison between Fast-RTPS, CyclondeDDS, and OpenSplice in terms of latency performance is presented, showing that Fast-RTPS is the fastest message delivery implementation in all the tested cases. Furthermore, Fast-RTPS is the most consistent one when dealing with message delivery delays. This all means that, with Fast-RTPS, the experienced latency is the shortest, and it remains almost the same at all times.

Latency Results: Localhost comparison - Inter-process

 

Back to Benchmarking Index

Latency Results: Localhost comparison - Intra-process

Back to Benchmarking Index

Latency Results: Dual Host comparison

 

Back to Benchmarking Index

Latency Results: Conclusion

Both the localhost and the dual host comparisons show a clear difference between Fast-RTPS and the other two implementations, in favor of the former. It can be seen that Fast-RPTS mean latencies are smaller than the other implementations’ minima at all times. It is important to note that Fast-RTPS latency is stable for more payloads, increasing with a smaller increasing rate in the larger payloads than CycloneDDS or OpenSplice. In this scenario, it is also notable that, especially in the dual host case, Fast-RTPS mean follows the minima more closely, meaning that the difference between minima and maxima is smaller at all times (the distributions are narrower around the mean). 

Back to Benchmarking Index

Methodology: Measuring Latency

In network computing, latency is defined as a measurement of the amount of time that a message spends traversing a system, i.e. it is a measurement of how much time passes since the message is sent by the sender (a Fast-RTPS publisher in our case) until it is received by a receiver (a Fast-RTPS subscriber). To avoid clock synchronization issues between the systems were publisher and subscriber lie, a common way to approximate latency is by the means of the round-trip time. In this scenario, the publisher sends a message and waits for the subscriber to send it back (much like a ping-pong paradigm), measuring the elapsed time between publisher’s send action to publisher’s receive action. Then, to get a latency approximation, the measured round-trip time is divided by two.

 

Back to Benchmarking Index

Latency Tests details

  1. This comparison has been established by performing experiments with the following tools:
  2. The three applications have been configured with the same DDS quality of service (QoS) parameters, which has required to apply a code patch to OpenSplice examples, since the reliability is not configurable.
  3. The experiments have been performed using UDPv4 as transport layer.
  4. The experiments have been performed running both the sender and the reader in the same machine (localhost), and in separate machines connected via ethernet (dual host).
  5. The experiments are performed for 11 different message sizes: 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, and 16384 bytes.
  6. To perform the experiments, 10000 messages are sent for each message size, and the minimum and average measurements are extracted.
  7. The experiments have been conducted using the following software versions:
  8. The local host experiment corresponds to execution 2019-09-23_07-28-16 as presented in eProsima’s latency local host experiments’ log.
  9. The local host experiment corresponds to execution 2019-09-23_08-41-33 as presented in eProsima’s latency dual host experiments’ log.

DDS QoS

A quick overview of the DDS QoS configuration is provided in the following table. For further inquiries about the purpose of each of the parameters please refer to the Fast-RTPS documentation.

  • Reliability: RELIABLE
  • History kind: KEEP_LAST
  • History depth:1
  • Durability: VOLATILE

Back to Benchmarking Index

Throughput Comparison - Introduction

The existence of different DDS implementations motivates the execution of performance comparisons across them. In this section, a comparison between Fast-RTPS, CyclondeDDS, and OpenSplice in terms of throughput performance is presented, showing that Fast-RTPS is the most capable implementation in terms of delivered samples per second in both best-effort and reliable communication schemes.

Back to Benchmarking Index

Throughput Results: Localhost comparison - Inter-process

Back to Benchmarking Index

Throughput Results: Localhost comparison - Dual host

Back to Benchmarking Index

Throughput Results: Conclusion

The local host comparison shows a clear difference between Fast-RTPS and the other two implementations, in favor of the former. It can be seen that Fast-RPTS’ throughput is the highest for every payload. Regarding the dual host comparison, both Fast-RTPS and CycloneDDS show an almost equivalent throughput up to a payload of 256 bytes, point where the difference between the two becomes noticeable large, again in favor of Fast-RTPS.

Back to Benchmarking Index

Methodology: Measuring Throughput

In network computing, throughput is defined as a measurement of the amount of information that can be sent/received through the system per unit time, i.e. it is a measurement of how many bits traverse the system every second. The normal measuring operation is for a publisher to send a large group of messages (known as batch, burst or demand) within a certain time (burst interval). After finishing the sending, if the operation has taken less time than the burst interval, the publisher is put to rest until the interval has completely elapsed (else, the publisher is not allowed to rest). This operation is performed until the test time is reached. On the receiving side, a subscriber is receiving the information, taking note of the time when the first message was received, and counting every message that arrives. When the test is complete, the subscriber can compute a receiving sample rate. Knowing the size of every message (in bits), the throughput is simply the product of the sample rate times the message size. The following diagram illustrates this process.

Back to Benchmarking Index

Throughput Test Details

  1. This comparison has been established by performing experiments with the following tools:
    • Fast-RTPS’ ThroughputTest
    • CycloneDDS’ ThroughputPublisher and ThroughputSubscriber examples
    • OpenSplice’s Throughput/publisher and Throughput/subscriber examples.
  2. The three applications have been configured with the same DDS quality of service (QoS) parameters, which has required to apply code patches to both CycloneDDS and OpenSplice examples.
  3. The experiments have been performed using UDPv4 as transport layer.
  4. The experiments have been performed running both the sender and the reader in the same machine (localhost), and in separate machines connected via ethernet (dual host).
  5. The localhost experiments are performed for 11 different message sizes: 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, and 16384 bytes.
  6. The dual host experiments are performed for 13 different message sizes: 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384 bytes, 32768, and 64000 bytes.
  7. To perform the experiments, demands of 100, 200, 500, 1000, 10000, 20000, 30000, 40000, and 50000 number of messages are used.
  8. To perform the experiments, burst intervals of 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 ms are used.
  9. The results presented in this article correspond, for each configuration (localhost/dual host), payload, reliability, and implementation, to the maximum throughput found among all the combinations from points 5 to 8.
  10. Some kernel’s buffers have been enlarged to maximize the flow of information,
    • net.core.rmem_default = 21299200
    • net.core.rmem_max = 21299200
    • net.ipv4.udp_mem = 102400 873800 21299200
    • net.core.netdev_max_backlog = 30000
  11. The experiments have been conducted using the following software versions:
    • Fast-RTPS commit: 0bcafbde1c6fa3ef7285819980f932df910dba61
    • CycloneDDS commit: aa5236dea46b82e6db26a0c87b90cedeca465524
    • OpenSplice version: v6.9
  12. The local host experiment corresponds to execution 2019-11-04_15-39-11 as presented in eProsima’s latency local host experiments’ log.
  13. The local host experiment corresponds to execution 2019-11-06_15-43-09 as presented in eProsima’s latency dual host experiments’ log.

DDS QoS

A quick overview of the DDS QoS configuration is provided in the following table. For further inquiries about the purpose of each of the parameters please refer to the Fast-RTPS documentation.

  • Reliability: RELIABLE
  • History kind: KEEP_ALL
  • Durability: VOLATILE

Back to Benchmarking Index

Testing environment

The experiments have been performed at eProsima’s facilities using two PowerEdge R330 e34s running a Linux 4.15.0-64-generic kernel and Ubuntu 18.04.2 LTS bionic. The specifications of the machines are:

  • Architecture: x86_64
  • CPU(s): 8
  • Thread(s) per core: 2
  • Model name: Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz

Middleware Versions: The latest version on Master Repositories on September 17th, 2019

  • Fast RTPS 1.9.x: 010ac53
  • Cyclone DDS: 801c4b1
  • OpenSplice DDS: v6.9 

Back to Benchmarking Index

Further information

For any questions about the methodology used to conduct the experiments, the specifics on how to configure Fast-RTPS or how to patch OpenSplice for replicating these results, or any other question you may have, please do not hesitate in contacting This email address is being protected from spambots. You need JavaScript enabled to view it. 

Back to Benchmarking Index