Why I Joined RTI Reply

With a fresh perspective, I thought I could write about this small company in Silicon Valley that you probably haven’t heard of: Real-Time Innovations, Inc. (RTI). RTI has been quietly working on a technology called DDS that could be one of the most important and fundamental tools for the industrial internet revolution. If you haven’t heard, the industrial internet, or Industrial Internet of Things (IIoT), is going to change the world in ways we haven’t seen since the industrial revolution. My grandparents saw communications technology change from horse and cart, to the proliferation of the automobile and internet. This next revolution is going to be a much bigger deal.

next revolution

I worked with RTI for 3 years while running LocalGrid technologies. I can’t take the credit for selecting RTI as a vendor. However, I think our CTO did his homework and chose RTI and Connext DDS for the technical benefits and also because of the company’s dedication to the quality and reliability of their product. That level of product care, I’m sure, comes in part from their long history working with the hardest problems like the US DoD. You can read a bit more about this here.

So why did I join RTI? The initial motivation was the relationship I had with the company, and the product. With RTI’s Connext™ DDS toolkit, and Connext DDS Secure, it was clear that DDS would be a fundamental and critically important technology for LocalGrid’s efforts in the Smart Grid market. Having worked on many system integration projects where we struggled to create scalable, simple to use, and robust communications solutions, I knew how important this middleware would be. But it was the influence and leadership of this small company that really attracted me, first to working with them as a partner, and then as an employer.

So when I found myself looking for my next career opportunity, I knew I wanted a company growing in an exciting technology space. The product and the company seemed like a great fit. However, when I started interviewing, what really struck me was how much everyone seemed to be working together. The stories about the company strengths, benefits, and motivation were consistent across all groups in the company, and included everyone. Everyone was ‘pulling together’. Instead of telling me how great their team was, each group in the company talked about how good sales, or engineering, or marketing was doing. The company is small, about 100 people, but clearly punching above their weight in multiple industries. It has a culture of service: “we won’t let our customers fail” is a phrase I heard many times in the interview process. This can only be evidence of the teamwork and a sense of ownership.

2016 RTI Company Photo CKO

My first week on the job happened to coincide with the Company Kick-off week. During this week, every RTI employee from all over the world meets in person at our HQ in Sunnyvale, CA – definitely a good time to join; I got to meet a lot of people at the company in person. This is important in a company where 50% of the staff work remotely. The most important thing I took away from that first week was a great look at the company culture, which I would describe principally as an immense amount of trust between everyone. That translates into a lot of questions and a lot of feedback. Of course there are growing pains as with any small company. And they don’t have unlimited resources or funds. But, clearly with this teamwork, and especially in a new a undefined market, RTI is successfully ‘failing towards success’. That’s great, because they aren’t afraid to try, to put the company and product out in front, and learn about their customers and markets as they go. With teamwork, this works.

As an engineering company with a complex product, I can confirm that they have a very ‘geek’ friendly culture. But more than that, it is a culture where people communicate and express themselves freely. I’ve run my own meetings and attended many others where, when the question is inevitably asked, “are there any questions” and then there is only silence. At RTI, there is never silence. There is always a question or a comment and it comes from all departments in the company: Engineering, Sales, Marketing, Operations, no matter what the topic is. Everyone seems to understand the importance of what RTI is doing and everyone cares. Add the clear respect and trust that exists in the company, and this translates into lots of questions, feedback and collaboration. This is how such a small group can be the most influential company in their space, beating out companies like Google and GE.

Unlike many Silicon Valley companies, people stay here for a long time. I’ve met engineers that have been here for 15+ years as the company’s technology has changed, evolved, and matured. They move from engineering to sales or marketing and to management along the way. The people make a huge impact on the sense of ownership and the culture. I am very happy to be part of this amazing little company, and I suggest that everyone keeps their eye on RTI. The impact on our society, businesses and the economy will be huge, even if you will never know it.

Speed Your Time to Market with Connext Pro Tools 3

It was two weeks until the demo.

We had this single opportunity to build a working microgrid control system that needed to:

  • Run on Intel and ARM processors
  • Target Linux and Windows platforms
  • Include applications written in C, C++, Java, SCALA, Lua, and LabVIEW
  • Talk to legacy equipment speaking ModBus and DNP3 protocols
  • Perform real-time control while meeting all of the above requirements

In this post, I’ll talk about the real-world problems we faced and how the tools included in RTI Connext DDS Pro helped us solve our integration problems in just a couple of days. Common issues encountered in most projects are highlighted, with specific RTI tools for addressing each. Along the way you’ll find links to supporting videos and articles for those who want a deeper dive. My hope is that you find this a useful jumping off point for learning how to apply RTI tools to make your DDS development quicker and easier.

The Big Demo

This was the first working demo of the Smart Grid Interoperability Panel’s Open Field Message Bus (OpenFMB), a new way of controlling devices at the edge of the power grid in real time by applying IoT technologies like DDS (see this link for more info).

OpenFMB cartoon

Here’s a block diagram of the system showing hardware architectures, operating systems, and languages:

OpenFMB demo net diagram-2

As we brought the individual participants onto the network, we encountered a number of problems. A description of challenges and the tools we used to address each follows. Scan the list of headings and see if you’ve had to debug any of these issues in your DDS system, then check out the links to learn a few new tips.  As you do, think about how you would try to diagnose the problems without the tools mentioned.

Problem: Network configuration problems

Tools: RTI DDS Ping

The team from Oak Ridge National Labs was working on the LabVIEW GUI that would be the main display.  Their laptop could not see data from any of the clients on the network. We checked the basics to make sure their machine was on the same subnet – always check the basics first!  While the standard ping utility can confirm basic reachability between machines, it doesn’t check that the ports necessary for DDS discovery are open.  The rtiddsping utility does exactly that, and it told us in seconds that the firewall installed on their government-issued laptop was preventing DDS discovery traffic.  For a great rundown on how to check the basics, see this community post.

Problem: Is my app sending data?

Tools: Spy, Admin Console

A common question among the vendors using DDS for the first time was whether their application was behaving properly: Was it sending data at the proper intervals, and did the data make sense? For a quick check, we used the RTI DDS Spy utility. Spy provides a simple subscriber that can filter selectively for specific types and topics, and it can print the individual samples it receives, allowing you to quickly see the data your app is writing.  Every vendor used DDS Spy as a sanity check after initially firing up their application.

Sometimes an update to the same topic can come from multiple publishers in the system. Not sure which one wrote the latest update?  A command line switch for Spy (“-showSampleIdentity”) allows you to see where an update originated.

Spy is a console app that can be deployed on embedded targets for basic testing.  Its small size, quick startup, and simplicity are its main advantages.  Details on usage are here.

Problem: Data type mismatch

Tools: Admin Console, Monitor

One vendor reported that in an earlier test they were seeing data from one of the other apps, and now they were not. Admin Console quickly showed us that a data type mismatch was to blame – that is, two topics with the same name but different data types. These types of mismatches can be difficult to diagnose, especially for large types with many members. Admin Console leverages the data-centricity of DDS to introspect the data types as understood by each application in your system. It then presents both a simplified view and an “equivalent IDL” view that makes it easy to compare the types in side-by-side panes. This is especially valuable in situations where you don’t have the source IDL from every application.

In this case, one vendor had not synced up with the GitHub repository for the latest IDL, so they were working from an older version of the file. They pulled the latest files from GitHub, rtiddsgen created new type-specific code for them, and after a quick recompile their app was able to read and write the updated topics.

Data type introspection

Admin Console shows data types

Problem: QoS mismatch

Tools: Admin Console, Monitor

Next to discovery, Quality of Service (QoS) mismatches are the most common problem experienced by DDS users during integration. With so many knobs to turn, how do you make sure that settings are compatible? The OpenFMB project had its fair share of QoS mismatches at first. Admin Console spots these quickly and tells you the specific QoS settings that are in conflict. You can even click on the QoS name and go directly to the documentation. QoS information shared during discovery is used by Admin Console to detect mismatches.

QoS Mismatch

Admin Console identifies a reliability QoS mismatch

Problem: Is the system functioning as expected?

Tools: Admin Console, Monitor

While Spy provides basic text output for live data, you can’t beat a graph for seeing how data changes over time. For more sophisticated data visualization, we turned to Admin Console. The data visualization feature built into Admin Console was a huge help in quickly determining how the system as a whole was working. It even allowed us to scroll through historical data to better understand how we arrived at the current state. To find out more about data visualization, see this short intro video, or this deep dive video.

Data visualization

Visualize your data with Admin Console

Problem: Performance tuning

Tools: Monitor, Admin Console

When it comes to performance tuning, Monitor should be your go-to tool. Monitor works with a special version of the DDS libraries that periodically publish real-time performance data from your application. The debug libraries are minimally intrusive, and the data is collected and presented by Monitor.

Using Monitor, you can learn about:

  • Transmission and reception statistics
  • Missed deadlines
  • High-water marks on caches
  • QoS mismatches
  • Data type conflicts
  • Samples lost or rejected
  • Loss of liveliness

It’s important to note that not every QoS setting is advertised during discovery. Many QoS settings apply to an application’s local resource management and performance tuning, and these are not sent during discovery. With Monitor you can inspect these, too.

Problem: Transforming data in flight

Tools: Prototyper with Lua, DDS Toolkit for LabVIEW

We wanted a large GUI to show what was happening in the microgrid in real time.  The team at Oak Ridge National Labs volunteered to create a GUI in LabVIEW. The DDS Toolkit for Labview allows you to grab data from DDS applications and use it in LabVIEW Virtual Instruments (VIs). There are some limitations however, as we found out. The Toolkit does not handle arrays of sequences, which some types in the OpenFMB data model use. We needed a quick solution that would allow the LabVIEW VI to read these complex data types.

One of the cool new tools in the Connext DDS Pro 5.2 toolbox is Prototyper with Lua. Prototyper allows you to quickly create DDS-enabled apps with little to no programming: define your topics and domain participants in XML, add a simple Lua script, and you can be up on a DDS domain in no time. (Check out Gianpiero’s blog post on Prototyper)

Back at the hotel one evening I wrote a simple Lua script that allows Prototyper to read the complex DDS topics containing arrays of sequences and then republish them to a different, flattened topic for use by the LabVIEW GUI. I was able to test it offline using live data recorded earlier in the lab, which brings us to…

Problem: Disconnected development

Tools: Record, Replay, Prototyper with Lua

A geographically dispersed development team built the OpenFMB demo. With the exception of those few days in Knoxville, no one on the team had access to all the components in the microgrid at one time. So how do you write code for your piece of the puzzle when you don’t have access to the other devices in the system?

When I worked on the Lua bridge for the LabVIEW GUI, I used the Connext Pro Record and Replay services.  In the lab, I had recorded about 10 minutes of live data as we ran the system through all the use cases.  Later that evening in the hotel, I was able to play this data back as I worked on the Lua scripts.  Replay allows you selectively play back topics, looping the playback so it runs continuously.  You can also choose to play the data at an accelerated rate – this is a huge time saver that enables you to simulate days or hours worth of runtime in just a few minutes.

Recording console

Recording Console

One of the really neat things Prototyper does once it’s running is to periodically reload the Lua script.  This made developing the bridge to LabVIEW very quick: Replay played data continuously in an accelerated mode; I had an editor open on the Lua script; and as I made and saved changes they were instantly reflected in Prototyper which was running constantly – no need to restart to see changes to the script.  The conversion script was done in just a couple of hours.

Prototyper also came in handy for quickly creating apps to generate simulated data.  The LabVIEW GUI was developed entirely offline without any of the real-world devices, using some topics generated by the Replay services and others that were bridged or simulated with Prototyper.  I’d email a simulator script to ORNL, they’d do some LabVIEW work and send me an updated VI, and then I’d run that locally to verify it.   ORNL did an amazing job integrating real-time data from the DDS domain along with visual elements from the SGIP cartoons, and the GUI was the centerpiece of the demo.

LabVIEW GUI

The final GUI, written in LabVIEW

Main Takeaways

When we showed up in New Orleans a couple weeks later, the entire system was brought up in about 30 minutes, which is remarkable considering some of the applications (like the LabVIEW GUI) had never even been on a network with the actual hardware. Everything just worked.

The rich set of tools provided by RTI Connext DDS Pro allowed us to solve our integration problems quickly during the short week in Knoxville, and to carry on development at many remote locations. Admin Console, Monitor, DDS Ping, and DDS Spy got our system up and running.  Record, Replay, and Prototyper made it possible for remote development teams to work in the absence of real hardware.  DDS Toolkit for LabVIEW enabled us to create a sophisticated GUI quickly.  And even after the event, we can continue to do development and virtual demos using these tools.

Modern Asynchronous Request/Reply with DDS-RPC 2

Complex distributed systems often use multiple styles of communication, interaction patterns, to implement functionality. One-to-many publish-subscribe, one-to-one request-reply, and one-to-one queuing (i.e., exactly-once delivery) are the most common. RTI Connext DDS supports all three. Two of them are available as standard specifications from the Object Management Group (OMG): the DDS specification (duh!) and newer Remote Procedure Call (RPC) over DDS (DDS-RPC) specification.

DDS-RPC specification is a significant generalization of RTI’s own Request-Reply API in C++ and Java. There are two major differences: (1) DDS-RPC includes code generation from IDL interface keyword and (2) asynchronous programming using futures.

Let’s look at an example straight from the DDS-RPC specification.

module robot 
{
  exception TooFast {};
  enum Command { START_COMMAND, STOP_COMMAND };
  struct Status {
    string msg;
  };

  @DDSService
  interface RobotControl 
  {
    void  command(Command com);
    float setSpeed(float speed) raises (TooFast);
    float getSpeed();
    void  getStatus(out Status status);
  };

}; //module robot

It’s pretty obvious what’s going on here. RobotControl is an IDL interface to send commands to a robot. You can start/stop it, get its status, and control its speed. The setSpeed operation returns the old speed when successful. The robot knows its limits and if you try to go too fast, it will throw a TooFast exception right in your face.

To bring this interface to life you need a code generator. As of this writing, however, rtiddsgen coughs at the interface keyword. It does not recognize it. But sometime in not-so-distant future it will.

For now, we’ve the next best thing. I’ve already created a working example of this interface using hand-written code.  The API of the hand-written code matches exactly with the standard bindings specified in DDS-RPC. As a matter of fact, the dds-rpc-cxx repository is an experimental implementation of the DDS-RPC API in C++. Look at the normative files to peak into all the gory API details.

Java folks… did I mention that this is a C++ post? Well, I just did… But do not despair. The dds-rpc-java repository has the equivalent Java API for the DDS-RPC specification. However, there’s no reference implementation. Sorry!

The Request/Reply Style Language Binding

DDS-RPC distinguishes between the Request/Reply and the Function-Call style language bindings. The Request-Reply language binding is nearly equivalent to RTI’s Request/Reply API with some additional asynchronous programming support. More on that later.

Here’s a client program that creates a requester to communicate with a RobotControl service. It asks for the current speed and increases it by 10 via getSpeed/setSpeed operations. The underlying request and reply topic names are “RobotControlRequest” and “RobotControlReply”. No surprise there.

RequesterParams requester_params = 
  dds::rpc::RequesterParams()
    .service_name("RobotControl")
    .domain_participant(...); 

Requester<RobotControl_Request, RobotControl_Reply> 
  requester(requester_params); 

// helper class for automatic memory management 
helper::unique_data<RobotControl_Request> request; 
dds::Sample<RobotControl_Reply> reply_sample; 
dds::Duration timeout = dds::Duration::from_seconds(1); 
float speed = 0; 

request->data._d = RobotControl_getSpeed_Hash; 
requester.send_request(*request); 

while (!requester.receive_reply( 
               reply_sample, 
               request->header.requestId, 
               timeout)); 

speed = reply_sample.data().data._u.getSpeed._u.result.return_; 
speed += 10; 
request->data._d = RobotControl_setSpeed_Hash; 
request->data._u.setSpeed.speed = speed; 

requester.send_request(*request); 

while (!requester.receive_reply( 
                reply_sample, 
                request->header.requestId, 
                timeout)); 

if(reply_sample.data().data._u.setSpeed._d == robot::TooFast_Ex_Hash) 
{
  printf("Going too fast.\n"); 
} 
else 
{ 
  speed = reply_sample.data().data._u.setSpeed._u.result.return_; 
  printf("New Speed = %f", speed); 
}

There’s quite a bit of ceremony to send just two requests to the robot. The request/reply style binding is lower-level than function-call style binding. The responsibility of packing and unpacking data from the request and reply samples falls on the programmer. An alternative, of course, is to have the boilerplate code auto-generated. That’s where the function-call style binding comes into picture.

The Function-Call Style Language Binding

The same client program can be written in much more pleasing way using the function-call style language binding.

robot::RobotControlSupport::Client 
  robot_client(rpc::ClientParams()
                 .domain_participant(...)
                 .service_name("RobotControl"));
float speed = 0;

try
{
  speed = robot_client.getSpeed();
  speed += 10;
  robot_client.setSpeed(speed);
}
catch (robot::TooFast &) {
  printf("Going too fast!\n");
}

Here, a code generator is expected to generate all the necessary boilerplate code to support natural RPC-style programming. The dds-rpc-cxx repository contains the code necessary for the RobotControl interface.

Pretty neat, isn’t it?

Not quite… (and imagine some ominous music and dark skies! Think Mordor to set the mood.)

An Abstraction that Isn’t

The premise of RPC, the concept, is that accessing a remote service can be and should be as easy as a synchronous local function call. The function-call style API tries to hide the complexity of network programming (serialization/deserialization) behind pretty looking interfaces. However, it works well only until it does not…

Synchronous RPC is a very poor abstraction of latency. Latency is a hard and insurmountable problem in network programming. For reference see the latency numbers every programmer should know.

If we slow down a computer to a time-scale that humans understand, it would take more than 10 years to complete the above synchronous program assuming we were controlling a robot placed in a different continent. Consider the following table taken from the Principles of Reactive Programming course on Coursera to get an idea of what human-friendly time-scale might look like.

latency

The problem with synchronous RPC is not only that it takes very long to execute but the calling thread will be blocked doing nothing for that long. What a waste!

Dealing with failures of the remote service is also a closely related problem. But for now let’s focus on the latency alone.

The problem discussed here isn’t new at all. The solution, however, is new and quite exciting, IMO.

Making Latecy Explicit … as an Effect

The DDS-RPC specification uses language-specific future<T> types to indicate that a particular operation is likely going to take very long time. In fact, every IDL interface gives rise to sync and async versions of the API and allows the programmers to choose.

The client-side generated code for the RobotControl interface includes the following asynchronous functions.

class RobotControlAsync
{
public:

  virtual dds::rpc::future<void> command_async(const robot::Command & command) = 0;
  virtual dds::rpc::future<float> setSpeed_async(float speed) = 0;
  virtual dds::rpc::future<float> getSpeed_async() = 0;
  virtual dds::rpc::future<robot::RobotControl_getStatus_Out> getStatus_async() = 0;

  virtual ~RobotControlAsync() { }
};

Note that every operation, including those that originally returned void return a future object, which is a surrogate for the value that might be available in future. If an operation reports an exception, the future object will contain the same exception and user will be able to access it. The dds::rpc::future maps to std::future in C++11 and C++14 environments.

As it turns out, C++11 futures allow us to separate invocation from execution of remote operations but retrieving the result requires waiting. Therefore, the resulting program is barely an improvement.

try
{
  dds::rpc::future<float> speed_fut = 
    robot_client.getSpeed_async(); 
  // Do some other stuff  
  while(speed_fut.wait_for(std::chrono::seconds(1)) == 
        std::future_status::timeout);
  
  speed = speed_fut.get();
  speed += 10;
  
  dds::rpc::future<float> set_speed_fut = 
    robot_client.setSpeed_async(speed);
  // Do even more stuff  
  while(set_speed_fut.wait_for(std::chrono::seconds(1)) == 
        std::future_status::timeout);
  
  set_speed_fut.get();
}
catch (robot::TooFast &) {
  printf("Going too fast!\n");
}

The client program is free to invoke multiple remote operations (say, to different services) back-to-back without blocking but there are at least three problems with it.

  1.  The program has to block in most cases to retrieve the results. In cases where result is available before blocking, there’s problem #2.
  2. It is also possible that while the main thread is busy doing something other than waiting, the result of the asynchronous operation is available. If no thread is waiting on retrieving the result, no progress with respect that chain of computation (i.e., adding 10 and calling setSpeed) won’t proceed. This is essentially what’s known as continuation blocking because the subsequent computation is blocked due to missing resources.
  3. Finally, the programmer must also do correlation of requests with responses because in general, the order in which the futures will be ready is not guaranteed to be the same as the requests were sent. Some remote operations may take longer than the others. To resolve this non-determinism, programmers may have to implement state-machines very carefully.

For the above reasons, this program is not a huge improvement over the request/reply style program at the beginning. In both cases, invocation is separate from retrieval of the result (i.e., both are asynchronous) but the program must wait to retrieve the results.

That reminds me of the saying:

“Blocking is the goto of Modern Concurrent Programming”

—someone who knows stuff

In the multicore era, where the programs must scale to utilize the underlying parallelism, any blocking function call robs the opportunity to scale. In most cases, blocking will consume more threads than strictly necessary to execute the program. In responsive UI applications, blocking is a big no-no. Who has tolerance for unresponsive apps these days?

Composable Futures to the Rescue

Thankfully people have thought of this problem already and proposed improved futures in the Concurrency TS for the ISO C++ standard. The improved futures support

  • Serial composition via .then()
  • Parallel composition via when_all() and when_any()

A lot has been written about improved futures. See Dr. Dobb’s, Facebook’s Folly, and monadic futures. I won’t repeat that here but an example of .then() is in order. This example is also available in the dds-rpc-cxx repository.

for(int i = 0;i < 5; i++)
{
  robot_client
    .getSpeed_async()
    .then([robot_client](future<float> && speed_fut) {
          float speed = speed_fut.get();
          printf("getSpeed = %f\n", speed);
          speed += 10;
          return remove_const(robot_client).setSpeed_async(speed);
      })
    .then([](future<float> && speed_fut) {
        try {
          float speed = speed_fut.get();
          printf("speed set successfully.\n");
        }
        catch (robot::TooFast &)
        {
          printf("Going too fast!\n");
        }
      });
}

printf("Press ENTER to continue\n");
getchar();

The program with .then() sets up a chain of continuations (i.e., closures passed in as callbacks) to be executed when the dependent futures are ready with their results. As soon as the future returned by getSpeed_async is ready, the first closure passed to the .then() is executed. It invokes setSpeed_async, which returns yet another future. When that future completes, the program continues with the second callback that just prints success/failure of the call.

Reactive Programming

This style of chaining dependent computations and constructing a dataflow graph of asynchronous operations is at the heart of reactive programming. It has a number of advantages.

  1. There’s no blocking per request. The program fires a series of requests, sets up a callback for each, and waits for everything to finish (at getchar()).
  2. There’s no continuation blocking because often the implementation of future is such that the thread that sets value of the promise object (the callee side of the future), continues with callback execution right-away.
  3. Correlation of requests with replies isn’t explicit because each async invocation produces a unique future and its result is available right in the callback. Any state needed to complete the execution of the callback can be captured in the closure object.
  4. Requires no incidental data structures, such as state machines and std::map for request/reply correlation. This benefit is a consequence of chained closures.

The advantages of this fluid style of asynchronous programming enabled by composable futures and lambdas is quite characteristic of modern asynchronous programming in languages such as JavaScript and C#. CompletableFuture in Java 8 is also provides the same pattern.

The Rabbit Hole Goes Deeper

While serial composition of futures (.then) looks cool, any non-trivial asynchronous program written with futures quickly get out of hands due to callbacks. The .then function restores some control over a series of asynchronous operations at the cost of the familiar control-flow and debugging capabilities.

Think about how you might write a program that speeds up the robot from its current speed to some MAX in increments of 10 by repeatedly calling getSpeed/setSpeed asynchronously and never blocking.

Here’s my attempt.

dds::rpc::future<float> speedup_until_maxspeed(
  robot::RobotControlSupport::Client & robot_client)
{
  static const int increment = 10;

  return
    robot_client
      .getSpeed_async()
      .then([robot_client](future<float> && speed_fut) 
      {
        float speed = speed_fut.get();
        speed += increment;
        if (speed <= MAX_SPEED)
        {
          printf("speedup_until_maxspeed: new speed = %f\n", speed);
          return remove_const(robot_client).setSpeed_async(speed);
        }
        else
          return dds::rpc::details::make_ready_future(speed);
      })
      .then([robot_client](future<float> && speed_fut) {
        float speed = speed_fut.get();
        if (speed + increment <= MAX_SPEED)
          return speedup_until_maxspeed(remove_const(robot_client));
        else
          return dds::rpc::details::make_ready_future(speed);
      });
}

// wait for the computation to finish asynchronously
speedup_until_maxspeed(robot_client).get();

This program is unusually complex for what little it achieves. The speedup_until_maxspeed function appears to be recursive as the second lambda calls the function again if the speed is not high enough. In reality the caller’s stack does not grow but only heap allocations for future’s shared state are done as successive calls to getSpeed/setSpeed are made.

The next animation might help understand what’s actually happening during execution. Click here to see a larger version.

dot-then-animation

The control-flow in a program with even heavy .then usage is going to be quite hard to understand, especially when there are nested callbacks, loops, and conditionals. We lose the familiar stack-trace because internally .then is stitching together many small program fragments (lambdas) that have only logical continuity but awkward physical and temporal continuity.

Debugging becomes harder too. To understand more what’s hard about it, I fired up Visual Studio debugger and stepped the program though several iterations. The call-stack appears to grow indefinitely while the program is “recursing”. But note there are many asynchronous calls in between. So the stack isn’t really growing. I tested with 100,000 iterations and the stack did not pop. Here’s a screenshot of the debugger.

stacktrace-wo-await

So, .then() seems like a mixed blessing.

Wouldn’t it be nice if we could dodge the awkwardness of continuations, write program like it’s synchronous but execute it fully asynchronously?

Welcome to C++ Coroutines

Microsoft has proposed a set of C++ language and library extensions called Resumable Functions that helps write asynchronous code that looks synchronous with familiar loops and branches. The latest proposal as of this writing (N4402) includes a new keyword await and its implementation relies on improved futures we discussed already.

Update: The latest C++ standard development suggests that the accepted keyword will be coawait (for coroutine await).

The speedup_until_maxspeed function can be written naturally as follows.

dds::rpc::future<void> test_iterative_await(
  robot::RobotControlSupport::Client & robot_client)
{
  static const int inc = 10;
  float speed = 0;

  while ((speed = await robot_client.getSpeed_async())+inc <= MAX_SPEED)
  {
    await robot_client.setSpeed_async(speed + inc);
    printf("current speed = %f\n", speed + inc);
  }
}

test_iterative_await(robot_client).get();

I’m sure C# programmers will immediately recognize that it looks quite similar to the async/await in .NET. C++ coroutines bring a popular feature in .NET to native programmers. Needless to say such a capability is highly desirable in C++ especially because it makes asynchronous programming with DDS-RPC effortless.

The best part is that compiler support for await is already available! Microsoft Visual Studio 2015 includes experimental implementation of resumable functions. I have created several working examples in the dds-rpc-cxx repository. The examples demonstrate await with both Request-Reply style language binding and Function-Call style language binding in the DDS-RPC specification.

Like the example before, I debugged this example too. It feels quite natural to debug because as one would expect, the stack does not appear to grow indefinitely. It’s like debugging a loop except that everything is running asynchronously. Things look pretty solid from what I could see! Here’s another screenshot.

stacktrace-await

Concurrency

The current experimental implementation of DDS-RPC uses thread-per-request model to implement concurrency. This is a terrible design choice but there’s a purpose and it’ very quick to implement. A much better implementation would use some sort of thread-pool and an internal queue (i.e., an executor). The concurrency TS is considering adding executors to C++.

Astute readers will probably realize that thread-per-request model implies that each request completes in its own thread and therefore a question arises regarding the thread that executes the remaining code. Is the code following await required to be thread-safe? How many threads might be executing speedup_until_maxspeed at a time?

Quick test (with rpc::future wrapping PPL tasks) of the above code revealed that the code following await is executed by two different threads. These two threads are never the ones created by the thread-per-request model. This implies that there’s a context switch from the thread that received the reply to the thread that resumed the test_iterative_await function. The same behavior is observed in the program with explicit .then calls. Perhaps the resulting behavior is dependent on the actual future/promise types in use. I also wonder if there is a way to execute code following await in parallel? Any comments, Microsoft?

A Sneak Peek into the Mechanics of await

A quick look into N4402 reveals that the await feature relies on composable futures, especially .then (serial composition) combinator. The compiler does all the magic of transforming the asynchronous code to a state machine that manages suspension and resumption automatically. It is a great example of how compiler and libraries can work together producing a beautiful symphony.

C++ coroutines work with any type that looks and behaves like composable futures. It also needs a corresponding promise type. The requirements on the library types are quite straight-forward, especially if you have a composable future type implemented somewhere else. Specifically, three free functions, await_ready, await_suspend, and await_resume must be available in the namespace of your future type.

In the DDS-RPC specification, dds::rpc::future<T> maps to std::future<T>. As std::future<T> does not have the necessary functionality for await to work correctly, I implemented dds::rpc::future<T> using both boost::future and concurrency::task<T> in Microsoft’s TPL.  Further, the dds::rpc::future<T> type was adapted with its own await_* functions and a promise type.

  template <typename T>
  bool await_ready(dds::rpc::future<T> const & t)
  {
    return t.is_ready();
  }

  template <typename T, typename Callback>
  void await_suspend(dds::rpc::future<T> & t, Callback resume)
  {
    t.then([resume](dds::rpc::future<T> const &)
    {
      resume();
    });
  }

  template <typename T>
  T await_resume(dds::rpc::future<T> & t)
  {
    return t.get();
  }

Adapting boost::future<T> was straight-forward as the N4402 proposal includes much of the necessary code but some tweaks were necessary because the draft and the implementation in Visual Studio 2015 appear slightly different. Implementing  dds::rpc::future<T> and dds::rpc::details::promise<T> using concurrency::task<T>, and concurrency::task_completion_event<T> needed little more work as both types had to be wrapped inside what would be standard types in near future (C++17). You can find all the details in future_adapter.hpp.

There are a number resources available on C++ Coroutines if you want to explore further. See the video recording of the CppCon’15 session on “C++ Coroutines: A Negative Overhead Abstraction”. Slides on this topic are available here and here. Resumable Functions proposal is not limited to just await. There are other closely related capabilities, such as the yield keyword for stackless coroutines, the await for keyword, generator, recursive_generator and more.

Indeed, this is truly an exciting time to be a C++ programmer.