Useful Tools to Debug DDS Issues Reply

While developing an application using RTI Connext, DDS users may run into a situation where the publisher and subscriber are not communicating. In these situations, we will usually get the question: how can I figure out what the issue is and how to solve it?

There are several tools and features that can help you debug your DDS issues:

1 – Look for log warnings or error messages.

You can enable log messages in your DDS applications. There are different verbosity levels depending on the kind of messages that you would like to see. Usually, we recommend that customers use the  “Warning” verbosity level to see the most common problems. If you need further details in a certain situation, you can increase the verbosity level to “All.” However, be aware that setting the highest verbosity level will result in the output of lots of messages.

You can set logging via XML or code. To set it via XML – with warning verbosity and sending the output to a file – you need to add the following settings to your XML QoS profile:

         <participant_factory_qos>

           <logging>

             <output_file>log.txt</output_file>

             <verbosity>WARNING</verbosity>

             <print_format>TIMESTAMPED</print_format>

           </logging>

         </participant_factory_qos>

To enable logging via code, add the following lines to your application main function (C++ example):

         NDDSConfigLogger::get_instance()->

         set_verbosity_by_category(NDDS_CONFIG_LOG_CATEGORY_ALL,

         NDDS_CONFIG_LOG_VERBOSITY_WARNING);

      FILE *myLogFile = fopen(“my_logfile.txt”, “a+”);

      NDDSConfigLogger::get_instance()->set_output_file(myLogFile);

Note that this code needs to be added before you call any other operations on the RTI Connext DDS API to ensure that any output goes to the file instead of the standard output.

If you need to limit the size of the log output file, follow the steps in this KB.

RTI Log Parser

Once you have the log messages, you can parse them using our RTI Log Parser. This is a command-line tool that processes and enhances Connext DDS log messages, making it easier to debug applications.

Let’s take the following log messages as an example:

COMMENDSrReaderService_onSubmessage:[1484571004,3507893055] reader oid 0x80000004 received DATA of sn(0000000000,00001346), vSn(0000000000,00001346) from writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_onSubmessage: accepted sn(0000000000,00001346), dataRcvd.lead(0,1346), nextRelSn(0,1347), reservedCount(1)

COMMENDSrReaderService_onSubmessage:[1484571004,3577888137] reader oid 0x80000004 received DATA of sn(0000000000,00001348), vSn(0000000000,00001348) from writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_onSubmessage: accepted sn(0000000000,00001348), dataRcvd.lead(0,1347), nextRelSn(0,1347), reservedCount(2)

MIGInterpreter_parse:HEARTBEAT from 0xffffffff,0X49E6

COMMENDSrReaderService_onSubmessage:[1484571004,3578261799] reader oid 0x80000004 received HB for sn (0000000000,00001345)-(0000000000,00001348), epoch(347) from writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_sendAckNacks:[1484571004,3578261799] reader oid 0x80000004 sent NACK of bitmap lead(0000000000,00001347), bitcount(2), epoch(353) to writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_onSubmessage:[1484571004,3578751425] reader oid 0x80000004 received DATA of sn(0000000000,00001347), vSn(0000000000,00001347) from writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_onSubmessage: accepted sn(0000000000,00001347), dataRcvd.lead(0,1347), nextRelSn(0,1349), reservedCount(-1)

MIGInterpreter_parse:HEARTBEAT from 0xffffffff,0X49E6

COMMENDSrReaderService_onSubmessage:[1484571004,3579107907] reader oid 0x80000004 received HB for sn (0000000000,00001347)-(0000000000,00001348), epoch(348) from writer 0xffffffff.49e6.1.80000003

COMMENDSrReaderService_onSubmessage:[1484571004,3579107907] reader oid 0x80000004 sent ACK of bitmap lead(0000000000,00001349), bitcount(0), epoch(354) to writer 0xffffffff.49e6.1.80000003

This is how the output will look like after being parsed by RTI Log Parser:

2017-01-16T12:50:03.919122 |  —>   |     H2.A2.P1     |   R-K_800000   | Received DATA [1346] from writer W-K_800000 (reliable)

2017-01-16T12:50:03.919122 |      |                     |             | *Warning: Missing packet for H2.A2.P1.W-K_800000 to R-K_800000*

2017-01-16T12:50:03.919122 |  —>   |     H2.A2.P1     |   R-K_800000   | Received DATA [1348] from writer W-K_800000 (reliable)

2017-01-16T12:50:03.919122 |  —>   |     H2.A2.P1     |   R-K_800000   | Received HB [347] from writer W-K_800000 for samples in [1345, 1348]

2017-01-16T12:50:03.919122 |   <—  |     H2.A2.P1     |   R-K_800000   | Sent NACK [353] to writer W-K_800000 for 1347 count 2

2017-01-16T12:50:03.919122 |  —>   |     H2.A2.P1     |   R-K_800000   | Received DATA [1347] from writer W-K_800000 (reliable)

2017-01-16T12:50:03.919122 |  —>   |     H2.A2.P1     |   R-K_800000   | Received HB [348] from writer W-K_800000 for samples in [1347, 1348]

2017-01-16T12:50:03.919122 |   <—  |     H2.A2.P1     |   R-K_800000   | Sent ACK [354] to writer W-K_800000 for 1349 count 0

For more information about how to configure DDS logging, please refer to the User’s Manual.

2 – Check status of DDS entities

  1. Use RTI Admin Console

    Admin Console is a GUI tool that includes many features to easily debug your DDS applications. For example, with Admin Console you can:

    • Monitor log messages and view your QoS, as explained in this blog post.
    • Automatically perform match analysis of your DDS entities. The match analysis shows if there are any QoS incompatibilities between your DDS entities that prevent them from communicating
    • Visualize your data.                                                                                                            
  2. Enable Monitoring Libraries

    RTI Monitoring Library enables RTI Connext DDS applications to provide monitoring data. The monitoring data can be visualized with RTI Monitor, a separate GUI application that can run on the same host as Monitoring Library or on a different host. Monitoring Library periodically queries the status of all Connext DDS entities and brings up all the entity creation/deletions and QoS changes, among other statistics.

    You can enable the Monitoring Libraries by setting values in the DomainParticipant’s PropertyQosPolicy (via code or through an XML QoS profile). The specific instructions to enable Monitoring Library can be found in the User’s Manual. If you want to configure the behavior and statistics obtained by Monitoring Library, you can refer to this section of the User’s Manual.

  3. Check entity status

  4. DDS entities have different statuses associated with them. The statuses give information about important changes in the entity state. For example, there are statuses that are triggered when a sample is lost or rejected, and also a status to detect the liveliness loss of an entity. An entity status can be retrieved by:

    • Directly checking the status with the appropriate get API
    • Using listeners
    • Using StatusConditions and WaitSets

     This Community Best Practice article explains the difference between these methods.

     You can find the different statutes associated with an entity in the User’s Manual and in the online API documentation, for example for DataWriters and DataReaders.

3 – Check what’s happening on the wire

Looking at the traffic generated by your applications can give you a lot of information about what is going on. To do this you can use Wireshark.

  • You can find documentation about how to use Wireshark with RTPS messages in our documentation section “Wireshark and RTPS.”
  • Wireshark cannot capture traffic from the loopback interface on Windows. You can use RawCap instead to capture this traffic on Windows. Once traffic is captured in a pcap file, you can open it with Wireshark to analyze. These are the steps to use RawCap:
    • Run your applications with shared memory disabled. You can disable shared memory via XML by adding into your <participant_qos> this setting:

           <transport_builtin>

             <mask>UDPv4</mask>

           </transport_builtin>

  • In a terminal run RawCap.exe and select the interface in which you wish to capture traffic.
  • You can configure Wireshark to show RTPS packets with specific colors, as explained in this Community KB.

4 – Finally, you can also find useful tips for system-level basic debugging in this KB article.

Binge-Worthy Listening: Announcing the First RTI Podcast for the IIoT Reply

RTI Announces The Connext Podcast!If you knew there was a way to learn something new or be inspired in about 30 minutes, would you say no? What if it could make you better at your job? Keep that in mind.

Seven months ago I started commuting more than 10 minutes to take my kids to school. Before this, I’d spend the 10 minutes driving to school chatting it up with my kids, and the 10 minutes driving back home listening to music or simply enjoying some quiet time before starting my day. When I suddenly had about three times that to spend, I started looking for interesting and productive ways to fill the time. Enter podcasts.

I have listened to so many episodes and learned so much about personal finance, goal setting, marketing, engineering, the IIoT, etc. If you’re interested in a topic, I’d guess there’s a podcast or two out there that you’d enjoy. And this got me thinking: RTI should have a podcast.

We have a ton of content, but it’s mostly in written form (the exception being video tutorials and our large on-demand webinar collection). I believe that our content holds real value for our users and people who are interested in distributed system design, software and system architecture, and the IIoT. If you’re interested in these things, whether you’re a developer, engineer, architect or executive, we have content that you’d learn from and enjoy. What if I could take this written content and produce audio versions so you could listen to it on the go? I know that I don’t always have time to read a whitepaper or an ebook, but could listen to one during my commute or while I’m working! I’d venture to guess that we have this in common. 🙂

In addition to providing a way to offer up audio versions of our most popular content, a podcast could feature interviews with our customers and industry experts. Our customers do amazing things and I know that when ever I have the chance to speak with them, I leave feeling inspired by what I’ve learned and what they’ve achieved. And we wouldn’t be talking about just tech – we’d have episodes that discuss leadership, managing distributed teams, market trends and analysis, and more.

Well four months, 12 interviews and hours of editing and work later, I’m proud to present The Connext Podcast. We’re kicking off the launch of this new project with four episodes. New episodes will be available every other Wednesday. Head on over to Soundcloud or www.rti.com/podcast to subscribe and listen. I mean, how often can you say you learned something new in ~30 minutes? Well, The Connext Podcast has you covered for the next four days. Happy listening!

Three Simple Steps to Achieving Peak DDS Performance 1

RTI Connext® DDS provides an order of magnitude performance improvement over most other messaging middleware. But occasionally we run into customers who are trying to improve the performance of their DDS communications. This performance improvement can be achieved in either throughput or latency. In this blog, I will go through the three simple steps required to assess the performance of your system and will also review some of the most common ways customers have improved performance of their DDS communications.

Step 1: What performance should you be getting?

Compare the numbers you are getting with the comprehensive DDS benchmarks that RTI provides here: https://www.rti.com/products/dds/benchmarks.

If you are not getting close to the numbers you see in the DDS benchmarks, there are a couple things to try:

Use RTI Perftest to make sure you’re comparing apples to apples.

The configuration of the NIC and the network switch, as well as the maximum network throughput and the CPU, all have an impact on the final DDS performance results. So, to make a fair comparison run the DDS benchmarks on your hardware.  RTI makes the DDS benchmark program, “RTI Perftest,” available in source code format with complete documentation.  You can find a copy of Perftest here:  https://community.rti.com/downloads/rti-connext-dds-performance-test

Make sure you are running your tests using the network interface you think you are using.

DDS enables shared memory and UDPv4 transports by default. If Shared memory is available between two nodes DDS will use that by default. But if there are many network interfaces available DDS will only use the first four. I’ve seen developers want to test out a certain network interface, say Infiniband, but it was not one of the first four listed and so DDS was not adding it to the mix. In fact, on Windows systems, the order that network interfaces are listed by the OS, and thus selected by DDS, is random and so the network interface you are actually using can change from run to run. In fact, DDS will actually send the same data over two paths, if they exist, to the same endpoint. This can take up CPU time and slow throughput.  You can explicitly select the interface you want (or do not want)  using the transport QOS “allow-interfaces” property.   Here is a good RTI Community article on the subject: https://community.rti.com/howto/control-or-restrict-network-interfaces-nics-used-discovery-and-data-distribution.

Following is the actual XML code for “allow_interfaces” and “deny_interfaces” QOS that lets you explicitly pick the network interface you want to use or do not want to use:

<participant_qos>
 <property>
  <value>
   <element>
    <name>dds.transport.UDPv4.builtin.parent.deny_interfaces</name>
    <value>10.15.*</value>
   <element>
   <element>
    <name>dds.transport.UDPv4.builtin.parent.allow_interfaces</name>
    <value>10.10.*,192.168.*</value>
   </element>
  </value>
 </property>
</participant_qos>

Step 2. Use the RTI DDS tools to diagnose your performance issues. 

Use RTI Monitor to look for the number of ACKs, NACKs, dropped packets, and duplicate packets.  If these numbers are high, it can be due to several things:

  • Transport buffer sizes are too small
  • MTU  is not optimized for switch
  • There may be too many heartbeats causing multiple resends for single NACKs, indicating the reader is not keeping up
  • The CPU and memory process(es) are bound.

Use RTI Monitor or Admin Console to compare QOS settings of the DataReaders and DataWriters.  Sometimes you are not using the QOS values you think you are using.

A great way to learn about using the Admin Console and the Monitor tools is to watch this video from our tools lead, Ken Brophy.

Step 3. Now let’s start to look at your application to see how we can speed things up by changing the “shape” of the data in motion.

RTI DDS gives you many ways to fine tune your system using QOS settings. This flexibility is great because you have a lot of control over how DDS works.  But all the options can be daunting! I won’t go over every setting (this blog would quickly grow to be a textbook) but I will hit on what I feel are the most important settings to check in regards to performance.

First, don’t use strict reliability if it is not needed. Strict reliability makes sure that every sample reaches every reliable destination and will re-send samples if necessary. Resending samples and the structure that supports them take time and memory.  Many applications would be fine missing a sample very occasionally or waiting longer for it to be re-transmitted.

If you do need to use strict reliability then start with the DDS built-in profile “StrictReliable.HighThroughput”.  It is a good idea in general to use the built-in profiles that RTI provides. These built-in profiles are set up by RTI to have all of the default settings needed for the most common DDS use cases. The built-in profiles can be used as-is or can be used as the basis for your QOS configuration and then tweaked for your specific needs. You can read about using DDS built-in profiles and get a working example here:  https://community.rti.com/examples/built-qos-profiles 

Using Extensible types (XTypes) and sequences of structures can hurt performance. DDS serializes and de-serializes data it sends and receives, and this process takes a lot longer with complicated data types.

Adjust heartbeat_period/ ACKNACK combo.  In reliable communications, the DataWritersends DDS data samples and heartbeats to reliable DataReaders. A DataReader responds to a heartbeat by sending an ACKNACK, which tells the DataWriter what the DataReader has received so far. In addition, the DataReader can request missing DDS samples (by sending an ACKNACK) and the DataWriter will respond by resending the missing DDS samples. So, the heartbeat_period can control how quickly a data reader can acknowledge receipt of a sample or ask for a sample to be re-sent, impacting performance.  Here is an article that talks about how the heartbeat_period can impact latency and throughput.

Modify the Asynchronous Publisher configuration to use flow control to lower the data rate. Sometimes if the data rate from the writer is too fast, the reader gets swamped and the resulting dropped samples and resends slow down the system. Lowering the writer’s data rate a little leaves room for repairs, etc. This gives DDS time to handle incoming data and avoids costly resends. You can use a flow controller to shape the output traffic your publisher will generate. By using an asynchronous publisher and custom flow controller you can lower the data rate. You can see a working example of how to use the asynchronous publisher here: https://community.rti.com/examples/asynchronous-publisher

For smaller sample sizes, use batching and/or Turbo Mode. Batching groups of small samples into a single large packet is more efficient to send and can result in a large throughput increase. But note that while the use of batching increases throughput, it can hurt latency when little data is being sent (because of the added time needed to batch small samples). In high-throughput cases, though, average latency results because of all the CPU saved on the subscriber side of the interface.

Turbo Mode is an experimental feature that uses an intelligent algorithm that adjusts the number of bytes in each batch at runtime according to current system conditions, such as write speed (or write frequency) and sample size. This intelligence gives Turbo Mode the ability to increase throughput at high message rates and avoid negatively impacting message latency at low message rates.

Here is an article that goes into detail on how to use batching and includes a working example: https://community.rti.com/examples/batching-and-turbo-mode

Use multicast for topics with more than a couple of subscribers. Multicast allows a publisher to send to multiple readers with a single write, greatly reduces network and publisher-side processor utilization.  Note that sometimes this feature is not available at the network level.  Here is a good article on  how to implement multicast: https://community.rti.com/best-practices/use-multicast-one-many-data

For reliable communications modify the Send Window size. When a reliable DataWriter writes a DDS sample, it keeps that sample in its queue until it has received acknowledgments from all of its subscribing DataReaders that the sample was received. The number of outstanding DDS samples allowed is referred to as the DataWriter’s “send window.” Once the number of outstanding DDS samples has reached the send window size, subsequent writes will block until an outstanding DDS sample is acknowledged. Anytime the writer blocks, it hurts performance. You can read about adjusting the Send Window in section 6.5.3.4 of the DDS User’s Manual.

Modify the transport settings. Whether you are using UDPv4 or shared memory or a custom transport, having the right buffer sizes and message sizes configured is extremely important when trying to optimize performance. Following is XML code for modifying transport message size and buffers sizes for the UDPv4 transport:

<participant_qos>
 <property>
  <value>
   <element>
    <name>dds.transport.UDPv4.builtin.parent.message_size_max</name>
    <value>65536</value>
   </element>
   <element>
    <name>dds.transport.UDPv4.builtin.send_socket_buffer_size</name>
    <value>524288</value>
   </element>
   <element>
    <name>dds.transport.UDPv4.builtin.recv_socket_buffer_size</name>
    <value>2097152</value>
   </element>
  </value>
 </property>
</participant_qos>

Note that the sizes used here are suggestions for optimizing performance when using large samples. You can make these values smaller for smaller samples.

I hope this advice is helpful in getting the best performance out of your DDS Application. I’ve listed the tips I’ve found most helpful for improving DDS performance but there are other methods that can also be helpful depending on the circumstances. In order to get more information on improving throughput or latency (or really help with any other Connext DDS issue), I encourage you to check out the RTI Community portal. The RTI Community portal is an excellent source of information and support! And of course, always feel free to contact our great support department or your local Field Application Engineer for further help.