•  
  •  

Updated 26th of February 2021

This performance testing, carried out by eProsima, focuses on latency and throughput.

  1. Test environment
  2. Latency
    1. Latency perfromance for different delivery mechanisms
      1. Intra-process Delivery
      2. Inter-process Shared Memory
      3. UDP Transport
  3. Throughput
    1. Throughput perfromance for different delivery mechanisms
      1. Intra-process Delivery
      2. Inter-process Shared Memory
      3. UDP Transport
  4. Conclusion

Test environment

The following performance experiments were performed using one computer with the following characteristics:

  • PowerEdge R330 e34s
  • Linux 5.4.0-53-generic
  • OS: Ubuntu 20.04.1 LTS

Latency

Latency is usually defined as the amount of time that it takes for a message to traverse a system. In packet-based networking, latency is usually measured either as one-way latency (the time from the source sending the packet to the destination receiving it), or as the round-trip delay time (the time from source to destination plus the time from the destination back to the source). The latter is more often used since it can be measured from a single point.

In the case of a DDS communication exchange, latency could be defined as the time it takes for a DataWriter to serialize and send a data message plus the time it takes for a matching DataReader to receive and deserialize it. Applying the same round-trip concept mentioned before, the round-trip latency could be defined as the time it takes for a message to be sent by a DataWriter, received by a DataReader and sent back to the same DataWriter. For example, in the figure below, the round-trip time would be T2-T1 making the latency estimation (T2-T1)/2.

Latency performance for different delivery mechanisms

Fast DDS has as a unique attribute; the duality characteristic of synchronous and asynchronous publishing modes. Meaning, Fast-DDS offers two possibilities of publication modes for the middleware layer, for more information about both options, please have a look at the following link.

For these latency performance tests, eProsima focused on the synchronous publication mode, comparing the different delivery mechanisms available in Fast DDS v2.2.0.

Intra-process Delivery

Intra-process delivery is a Fast DDS feature that accelerates the communications between entities inside the same process, averting any of the overhead involved in the transport layer. Intra-process guarantees that the DataReader receives the message by making the DataWriter directly call the reception routines of the DataReader

In figure 1.1a and 1.1b, the latency performance of the intra-process feature with and without zero-copy delivery mechanism activated can be seen. The graphic clearly shows the performance enhancement for every payload size tested when using zero-copy. This indicates that the vast majority of the latency overhead is caused by the data copies from buffer to buffer (which are not present in the zero-copy case). Because of this, the larger the data samples, the more the latency improvements.

Fig 1.1a Fast DDS 2.2.0 Intraprocess Reliable

Fig 1.1b Fast DDS v2.2.0 Intraprocess Reliable - zoom - up to 65 kB payload

Inter-process Shared Memory

In graphics 1.2a and 1.2b, the Fast DDS v2.2.0 - Share Memory Transport (SHM) performance can be seen alongside the three newly introduced delivery mechanisms: SHM (Loans), SHM (Data Sharing) and SHM (zero-copy). Due to its architecture, SHM (zero-copy) is showing far better latency results than the rest for all payload sizes. As it was previously stated, generating less copies of large data samples, results in remarkable latency improvements.

Fig 1.2a Fast DDS v2.2.0 SHM

Fig 1.2b Fast DDS v2.2.0 SHM - zoom - up to 65 kB payload

Shared Memory Transport (SHM) is a Fast DDS feature that facilitates communications between entities running in the same processing unit/machine. It provides a better performance than the standard UDP transport due to  the following factors: 

  • Its Large message support, where the only message size limit is the machine's memory capacity.
  • The number of memory copies reduction: SHM can directly share the same memory buffer with all the destination endpoints.
  • Less operating system overhead: the share memory transport transfer requires a smidgen system calls amount when compared to other protocols such as UDP.

 

With data sharing delivery, Fast DDS is capable of hastening the communications between entities throughout the same machine, sharing the history of the DataWriter with the DataReader through shared memory. Doing so, Data Sharing avoids the copy made from the DataWriter history to the transport and the copy from the transport to the DataReader.  

Furthermore, based on the Data Sharing application, it is possible to configure Zero Copy communication.

The eProsima Fast-DDS Zero-Copy communication helps to save time and resources, given that as the name describes it, this feature allows the transmission of data between applications without copying data in memory, meaning with zero-copy.  How can it be achieved? Combining the extensions:

  • Data Sharing delivery
  • DataWriter sample loaning
  • Loans from the Datareader


For further information about the three mechanisms please refer to eProsima Shared Memory

UDP Transport

UDP is a connectionless transport where the receiving DomainParticipant must open a UDP port listening for incoming messages, whereas the sending DomainParticipant sends messages to this port. Fast DDS enables a UDPv4 transport by default. Nevertheless, the application can enable other UDP transports if needed.

In the following figure 1.3a and 1.3b, the UDP transport performance with and without activated loans can be appreciated. It is important to note that the latency with activated loans is performing considerably better at all times. Eliminating the overhead caused by data copies notably impacts the latency performance seen in all tests.

Fig 1.3a  Fast DDS v2.2.0 UDP

Fig 1.3b Fast DDS v2.2.0 UDP - zoom - up to 65 kB payload

 

Throughput

In network computing, throughput is defined as a measurement of the amount of information that can be sent/received through the system per unit time, i.e. it is a measurement of how many bits traverse the system every second. The normal measuring operation is for a publisher to send a large group of messages (known as batch, burst or demand) within a certain time (burst interval). After finishing the sending, if the operation has taken less time than the burst interval, the publisher is put to rest until the interval has completely elapsed (else, the publisher is not allowed to rest). This operation is performed until the test time is reached. On the receiving side, a subscriber is receiving the information, taking note of the time when the first message was received, and counting every message that arrives. When the test is complete, the subscriber can compute a receiving sample rate. Knowing the size of every message (in bits), the throughput is simply the product of the sample rate times the message size. The following diagram illustrates this process.

Throughput performance for different delivery mechanisms

Fast DDS has as a unique attribute; the duality characteristic of synchronous and asynchronous publishing modes. Meaning, Fast-DDS offers two possible publication modes for the middleware layer.For more information about both options, please have a look at the following link.

For these throughput performance tests, eProsima focused on the synchronous publication mode, comparing the different delivery mechanisms available in Fast DDS v2.2.0 and comparing them to Fast DDS v2.1.0.

Intra-process Delivery

Intra-process delivery is a Fast DDS feature that accelerates the communications between entities inside the same process, averting any of the overhead involved in the transport layer. Intra-process guarantees that the DataReader receives the message by making the DataWriter directly call the reception routines of the DataReader

In the graphs from figure 2.1a to 2.1d, the throughput performance of the intra-process communication using the newly introduced zero-copy delivery mechanism available in Fast DDS v2.2.0 and Fast DDS v2.1.0 can be seen. The graphics clearly show the performance enhancement for every payload size tested when using zero-copy. This indicates that the vast majority of the throughput overhead is caused by the data copies from buffer to buffer (which are not present in the case of zero-copy). Because of this, the larger the data samples, the more the throughput improvements.

Fig 2.1a Fast DDS v2.2.0 Intra-process Reliable (Fast DDS v2.2.0 and Fast DDS v2.1.0 are not visible here as they are too close to the X-axis to be seen)

Fig 2.1b Fast DDS v2.2.0 Intra-process Reliable - zoom - up to 65 kB data payload

 

Fig 2.1c Fast DDS v2.2.0 Intraprocess Reliable - up to 150 Gb/s throughput

Fig 2.1d Fast DDS v2.2.0 Intraprocess Reliable - zoom - up to 65 kB data payload and 150 Gb/s throughput

Inter-process Shared Memory

From figure 2.2a to 2.2d, the Fast DDS v2.2.0 - Share Memory Transport (SHM) performance can be seen alongside the three newly introduced delivery mechanisms: SHM (Loans), SHM (Data Sharing) and SHM (zero-copy). Comparing any of these options to Fast DDS v2.1.0, a clear improvement can be noticed. Due to its architecture, SHM (zero-copy) is showing far better throughput results than the rest for all payload sizes. As it was previously stated, generating less copies of large data samples, results in remarkable throughput improvements.

Fig 2.2a Fast DDS v2.2.0 SHM (Fast DDS v2.2.0, Fast DDS v2.2.0 (loans), Fast DDS v2.2.0 (data sharing) and Fast DDS  v2.1.0 are not visible here as they are too close to the X-axis to be seen)

Fig 2.2b Fast DDS v2.2.0 SHM - zoom - up to 65 kB data payload

Fig 2.2c Fast DDS v2.2.0 SHM - Reliable - up to 125 Gb/s throughput

Fig 2.2d Fast DDS v2.2.0 SHM - Reliable - up to 65 kB data payload and  8 Gb/s throughput

 

For further information about the three mechanisms please refer to eProsima Shared Memory.

UDP Transport

UDP is a connectionless transport where the receiving DomainParticipant must open a UDP port listening for incoming messages, whereas the sending DomainParticipant sends messages to this port. Fast DDS enables a UDPv4 transport by default. Nevertheless, the application can enable other UDP transports if needed.

In figures 2.3a and 2.3b, the UDP transport performance with and without loans, and Fast DDS v2.1.0 can be appreciated. It is important to note that the throughput with activated loans is performing considerably better at all times and stabilizes with large data over 1MB. Eliminating the overhead caused by data copies notably impacts the throughput performance seen in all tests.

Fig 2.3a  Fast DDS v2.2.0 UDP

Fig 2.3b  Fast DDS v2.2.0 UDP - zoom - up to 65 kB data payload

Conclusion

In summary, it can be observed that the Fast DDS v2.2.0 performance shows a stunning improvement in terms of latency and throughput testing results. Thanks to a proper data model design, Fast DDS has improved, being capable to keep latency and throughput stable, no matter the data sample size.

 

MORE INFORMATION ABOUT EPROSIMA FAST DDS AND ITS PERFORMANCE:

For any questions please contact This email address is being protected from spambots. You need JavaScript enabled to view it.