Automatic QoS Enforcement with DDS & Software Defined Networking Reply

Andrew King

Author: Andrew L. King Ph.D. Candidate, Department of Computer and Information Science, University of Pennsylvania

Good abstractions drive progress in computing. The first turing-complete computers such as the ENIAC exposed a register-level programming abstraction enabling programmers to reconfigure the machines for different tasks. In the 1950s – 1970s the maturation of compiled languages enabled programmers to more easily port software from one machine architecture to another. From 1980 to the late 1990s saw the development of modularity abstractions (e.g., object oriented programming) making it easier to distribute the development of large software across large teams.  Throughout the 2000s to the present we’ve seen the development of programming abstractions designed to make it easier to build distributed and concurrent software systems.  DDS is a perfect example and its success proves how useful and necessary good abstractions are.


I’d argue that good abstractions for programming the IIoT must capture timing behavior. For example, DDS lets the programmer specify the maximum interval between updates to a topic data-instance via the DEADLINE parameter. Unfortunately, so far, holistic abstractions for timing have been elusive. While DDS can automatically perform runtime checks to ensure that DataReaders and DataWriters have consistent  DEADLINE QoS settings,  it is still possible for the DEADLINE QoS to be violated if the underlying network fabric delays or drops an update. Indeed, Bringing timing fully into the programming abstraction is difficult because the timing behavior of a software system only emerges when you map the software onto a particular platform (i.e., processor, operating system, and networking fabric).

In the context of a critical distributed hard real-time system, the developer must devote significant effort provisioning and configuring the underlying network fabric to ensure the required timing behavior. The configuration process usually involves setting priorities or transmission schedules for each time critical flow. Sometimes, If the underlying network fabric will be shared by more than one logically independent application, traffic policers and/or special partitioning features are setup to keep the different applications isolated. These network configurations are usually static and setup offline: If you want to add a new node to the network (e.g., a sensor or actuator) you have to bring the system down, manually reconfigure the network, then bring the system back up. Wouldn’t it be much nicer if the programmer could simply specify the logical timing behavior for the system and then trust that the specification would always be satisfied?

DDS and recent advances in Software Defined Networking (SDN) now make it possible to fully lift timing properties for distributed systems into a programming abstraction. The MIDdleware Assurance Substrate (MIDAS) is a research prototype developed in the PRECISE Center at the University of Pennsylvania. MIDAS  uses the OpenFlow SDN protocol to capture the QoS specifications of DDS clients as they come online and then control  the packet switching behavior of the network fabric to ensure those specifications are always satisfied: Based on the captured QoS specifications, MIDAS will generate  new network configurations to support the QoS of the DDS dataflows.  MIDAS will then apply schedulability analysis to verify that the configurations will guarantee the specified QoS. Finally, If the requested QoS can be guaranteed MIDAS will push the new configuration onto the network transparently (The QoS of existing flows will not be disrupted by the reconfiguration process itself).

Technologies like DDS make the development and operation of distributed real-time systems easier, faster, and overall less costly. Adding technologies like MIDAS to the mix takes these benefits to the next level:

  • Reduce Development Time: Automation of network configuration removes a significant chunk of development effort allowing developers to focus more on creating functionality (and hence value for their customers).
  • Reduce Equipment Costs: Many common COTS ethernet switches now support OpenFlow and can be used with MIDAS to support hard real-time systems. Many deployed ethernet switches can be made OpenFlow capable with a simple firmware update.
  • Increased Flexibility and Adaptability: MIDAS lets you add and remove network nodes at runtime. All resource provisioning is dynamic and ensures that no QoS is disrupted during configuration.
  • Increased Reliability: MIDAS’ network configurations ensures traffic bursts won’t overload buffers in networking equipment which means packets will only be dropped if there is a hardware failure.

If you are interested in knowing more about MIDAS and how it works please read the associated technical paper* or contact me (Andrew King, I’d also like to invite you to watch the the following youtube video. It demonstrates a distributed hard real-time system built using RTI’s Connext DDS with real-time behavior enforced by MIDAS:

*The technical paper describes an earlier prototype which used a simple custom publish/subscribe middleware. The current prototype is designed to work with DDS but the fundamental technical details are unchanged.

Speed Your Time to Market with Connext Pro Tools 3

It was two weeks until the demo.

We had this single opportunity to build a working microgrid control system that needed to:

  • Run on Intel and ARM processors
  • Target Linux and Windows platforms
  • Include applications written in C, C++, Java, SCALA, Lua, and LabVIEW
  • Talk to legacy equipment speaking ModBus and DNP3 protocols
  • Perform real-time control while meeting all of the above requirements

In this post, I’ll talk about the real-world problems we faced and how the tools included in RTI Connext DDS Pro helped us solve our integration problems in just a couple of days. Common issues encountered in most projects are highlighted, with specific RTI tools for addressing each. Along the way you’ll find links to supporting videos and articles for those who want a deeper dive. My hope is that you find this a useful jumping off point for learning how to apply RTI tools to make your DDS development quicker and easier.

The Big Demo

This was the first working demo of the Smart Grid Interoperability Panel’s Open Field Message Bus (OpenFMB), a new way of controlling devices at the edge of the power grid in real time by applying IoT technologies like DDS (see this link for more info).

OpenFMB cartoon

Here’s a block diagram of the system showing hardware architectures, operating systems, and languages:

OpenFMB demo net diagram-2

As we brought the individual participants onto the network, we encountered a number of problems. A description of challenges and the tools we used to address each follows. Scan the list of headings and see if you’ve had to debug any of these issues in your DDS system, then check out the links to learn a few new tips.  As you do, think about how you would try to diagnose the problems without the tools mentioned.

Problem: Network configuration problems

Tools: RTI DDS Ping

The team from Oak Ridge National Labs was working on the LabVIEW GUI that would be the main display.  Their laptop could not see data from any of the clients on the network. We checked the basics to make sure their machine was on the same subnet – always check the basics first!  While the standard ping utility can confirm basic reachability between machines, it doesn’t check that the ports necessary for DDS discovery are open.  The rtiddsping utility does exactly that, and it told us in seconds that the firewall installed on their government-issued laptop was preventing DDS discovery traffic.  For a great rundown on how to check the basics, see this community post.

Problem: Is my app sending data?

Tools: Spy, Admin Console

A common question among the vendors using DDS for the first time was whether their application was behaving properly: Was it sending data at the proper intervals, and did the data make sense? For a quick check, we used the RTI DDS Spy utility. Spy provides a simple subscriber that can filter selectively for specific types and topics, and it can print the individual samples it receives, allowing you to quickly see the data your app is writing.  Every vendor used DDS Spy as a sanity check after initially firing up their application.

Sometimes an update to the same topic can come from multiple publishers in the system. Not sure which one wrote the latest update?  A command line switch for Spy (“-showSampleIdentity”) allows you to see where an update originated.

Spy is a console app that can be deployed on embedded targets for basic testing.  Its small size, quick startup, and simplicity are its main advantages.  Details on usage are here.

Problem: Data type mismatch

Tools: Admin Console, Monitor

One vendor reported that in an earlier test they were seeing data from one of the other apps, and now they were not. Admin Console quickly showed us that a data type mismatch was to blame – that is, two topics with the same name but different data types. These types of mismatches can be difficult to diagnose, especially for large types with many members. Admin Console leverages the data-centricity of DDS to introspect the data types as understood by each application in your system. It then presents both a simplified view and an “equivalent IDL” view that makes it easy to compare the types in side-by-side panes. This is especially valuable in situations where you don’t have the source IDL from every application.

In this case, one vendor had not synced up with the GitHub repository for the latest IDL, so they were working from an older version of the file. They pulled the latest files from GitHub, rtiddsgen created new type-specific code for them, and after a quick recompile their app was able to read and write the updated topics.

Data type introspection

Admin Console shows data types

Problem: QoS mismatch

Tools: Admin Console, Monitor

Next to discovery, Quality of Service (QoS) mismatches are the most common problem experienced by DDS users during integration. With so many knobs to turn, how do you make sure that settings are compatible? The OpenFMB project had its fair share of QoS mismatches at first. Admin Console spots these quickly and tells you the specific QoS settings that are in conflict. You can even click on the QoS name and go directly to the documentation. QoS information shared during discovery is used by Admin Console to detect mismatches.

QoS Mismatch

Admin Console identifies a reliability QoS mismatch

Problem: Is the system functioning as expected?

Tools: Admin Console, Monitor

While Spy provides basic text output for live data, you can’t beat a graph for seeing how data changes over time. For more sophisticated data visualization, we turned to Admin Console. The data visualization feature built into Admin Console was a huge help in quickly determining how the system as a whole was working. It even allowed us to scroll through historical data to better understand how we arrived at the current state. To find out more about data visualization, see this short intro video, or this deep dive video.

Data visualization

Visualize your data with Admin Console

Problem: Performance tuning

Tools: Monitor, Admin Console

When it comes to performance tuning, Monitor should be your go-to tool. Monitor works with a special version of the DDS libraries that periodically publish real-time performance data from your application. The debug libraries are minimally intrusive, and the data is collected and presented by Monitor.

Using Monitor, you can learn about:

  • Transmission and reception statistics
  • Missed deadlines
  • High-water marks on caches
  • QoS mismatches
  • Data type conflicts
  • Samples lost or rejected
  • Loss of liveliness

It’s important to note that not every QoS setting is advertised during discovery. Many QoS settings apply to an application’s local resource management and performance tuning, and these are not sent during discovery. With Monitor you can inspect these, too.

Problem: Transforming data in flight

Tools: Prototyper with Lua, DDS Toolkit for LabVIEW

We wanted a large GUI to show what was happening in the microgrid in real time.  The team at Oak Ridge National Labs volunteered to create a GUI in LabVIEW. The DDS Toolkit for Labview allows you to grab data from DDS applications and use it in LabVIEW Virtual Instruments (VIs). There are some limitations however, as we found out. The Toolkit does not handle arrays of sequences, which some types in the OpenFMB data model use. We needed a quick solution that would allow the LabVIEW VI to read these complex data types.

One of the cool new tools in the Connext DDS Pro 5.2 toolbox is Prototyper with Lua. Prototyper allows you to quickly create DDS-enabled apps with little to no programming: define your topics and domain participants in XML, add a simple Lua script, and you can be up on a DDS domain in no time. (Check out Gianpiero’s blog post on Prototyper)

Back at the hotel one evening I wrote a simple Lua script that allows Prototyper to read the complex DDS topics containing arrays of sequences and then republish them to a different, flattened topic for use by the LabVIEW GUI. I was able to test it offline using live data recorded earlier in the lab, which brings us to…

Problem: Disconnected development

Tools: Record, Replay, Prototyper with Lua

A geographically dispersed development team built the OpenFMB demo. With the exception of those few days in Knoxville, no one on the team had access to all the components in the microgrid at one time. So how do you write code for your piece of the puzzle when you don’t have access to the other devices in the system?

When I worked on the Lua bridge for the LabVIEW GUI, I used the Connext Pro Record and Replay services.  In the lab, I had recorded about 10 minutes of live data as we ran the system through all the use cases.  Later that evening in the hotel, I was able to play this data back as I worked on the Lua scripts.  Replay allows you selectively play back topics, looping the playback so it runs continuously.  You can also choose to play the data at an accelerated rate – this is a huge time saver that enables you to simulate days or hours worth of runtime in just a few minutes.

Recording console

Recording Console

One of the really neat things Prototyper does once it’s running is to periodically reload the Lua script.  This made developing the bridge to LabVIEW very quick: Replay played data continuously in an accelerated mode; I had an editor open on the Lua script; and as I made and saved changes they were instantly reflected in Prototyper which was running constantly – no need to restart to see changes to the script.  The conversion script was done in just a couple of hours.

Prototyper also came in handy for quickly creating apps to generate simulated data.  The LabVIEW GUI was developed entirely offline without any of the real-world devices, using some topics generated by the Replay services and others that were bridged or simulated with Prototyper.  I’d email a simulator script to ORNL, they’d do some LabVIEW work and send me an updated VI, and then I’d run that locally to verify it.   ORNL did an amazing job integrating real-time data from the DDS domain along with visual elements from the SGIP cartoons, and the GUI was the centerpiece of the demo.


The final GUI, written in LabVIEW

Main Takeaways

When we showed up in New Orleans a couple weeks later, the entire system was brought up in about 30 minutes, which is remarkable considering some of the applications (like the LabVIEW GUI) had never even been on a network with the actual hardware. Everything just worked.

The rich set of tools provided by RTI Connext DDS Pro allowed us to solve our integration problems quickly during the short week in Knoxville, and to carry on development at many remote locations. Admin Console, Monitor, DDS Ping, and DDS Spy got our system up and running.  Record, Replay, and Prototyper made it possible for remote development teams to work in the absence of real hardware.  DDS Toolkit for LabVIEW enabled us to create a sophisticated GUI quickly.  And even after the event, we can continue to do development and virtual demos using these tools.

Are You Considering Migrating from PrismTech OpenSplice to RTI Connext DDS? 1


With the recent acquisition of PrismTech by the Taiwanese company ADLINK, we are seeing increased demand for porting from OpenSplice to RTI Connext DDS. One of the benefits of going with a standard like Data Distribution Service for Real-Time Systems (DDS) is that you have options if your middleware supplier becomes unreliable for any reason. That’s why the DDS community went to so much trouble to develop more than just a wire protocol standard. The DDS specification also includes API definitions, with the explicit goal of making it easier to port.

In particular, DDS specifications include both a PIM (platform independent model) and a set of language PSMs (platform specific models). The PIM defines all the important user-visible API concepts including the DDS Entities (DomainParticipant, Topic, Publisher, Subscriber, etc.), their operations and behavior, the QoS, the listeners, and so on. All DDS implementations use the PIM. That means that the structure and major options in your code will translate unchanged to a different DDS implementation.

The PSMs are much newer and they are only unambiguously defined for C++ and Java. Other programming languages come from the IDL-derived mappings (also standard). If you used one of the new standard language PSM APIs, it’s even easier to port your application. Even if you didn’t, the common PIM makes it a straightforward process. Of course, in general, there are some differences between versions, both API variability and “non standard” options, features, and configuration. Still, compared to porting to a new middleware architecture, the work required to port is minimal.

How hard is it to port an existing application from OpenSplice DDS to the RTI Connext DDS in practice? 

It’s not trivial, but it’s not that bad, either. One significant difference between these two SDKs is in the IDL syntax. Although most of the IDL syntax is identical, there is a difference in how “key attributes” are indicated. Connext DDS uses the DDS-XTYPES standard format consisting of a “@Key” annotation placed inside a comment. OpenSplice uses a (non-standard) “#pragma keylist” directive. We can see this difference in the example below.

OpenSpliceDDS IDL Equivalent Connext DDS IDL
struct Coordinates {

   float latitude;

   float longitude;

   float altitude;

struct Flight  {

   string<32>  airlineCode;

   long        flightNumber;

   string<3>   departureAirport;

   string<3>   arrivalAirport;

   long long   departureTime;

   Coordinates currentPosition;

#pragma keylist Flight airlineCode flightNumber

struct Coordinates {

   float latitude;

   float longitude;

   float altitude;

struct Flight  {

   string<32>  airlineCode;  //@Key

   long        flightNumber; //@Key

   string<3>   departureAirport;

   string<3>   arrivalAirport;

   long long   departureTime;

   Coordinates currentPosition;



The first step in the migration is to modify the IDL files replacing #pragma keylist with the //@Key annotation, as shown above.

Once the IDL file(s) have been migrated, the rtiddsgen tool can be used to generate all the network marshaling/unmarshaling code. This allows the data to understood by any computer, independent of processor architecture, operating system, and programming language. The rtiddsgen tool is included with the Connext DDS SDK. In addition to generating the network marshaling code, rtiddsgen can generate example code and makefiles / build projects for the programming languages and platforms of your choice.

Beyond this there are a few application code edits that may be required. These affect the initialization of the TypeSupport, the use of the ReturnCode_t type, the use of the length() operation on types String and Sequence, and the way sequences are passed to the DataReader read/take operations.

RTI has experience doing this! Some RTI customers, including many of our largest and happiest customers, started with OpenSplice before they discovered the performance, quality, tools, and top-flight support from RTI. So, before you start this effort, contact us. We have a migration SDK including compatible headers, examples, and more. Better yet, we can offer you a cost-effective service to help you transition quickly and correctly.