How to Integrate RTI Connext DDS Micro with Container-Based Applications [tutorial] 2

integration of container based apps with connext dds

Container-based microservices are all the rage as software architects and engineers work to bring the flexibility and scalability of the cloud to the edge. To support real-time communication between those microservices with a guaranteed Quality of Service (QoS), DDS makes the perfect companion. This post covers the steps necessary to integrate RTI Connext DDS Micro with container-based applications. The steps required and benefits of the technology will be described in the context of a simple publisher/subscriber example.

ConnextDDSMicroContainer

Before we get started a little background is helpful. At an abstract level, containers are used to run individual, isolated applications on your machine. Each container provides operating-system level capabilities making it possible to run isolated Linux systems on one host.  Containers serve as a lightweight alternative to full machine virtualization, which requires use of hypervisors to manage multiple operating systems. Docker is the world’s leading software containerization platform. As such, it is often used interchangeably when discussing container technology even though there are other alternatives.  Please ensure you have Docker installed and functioning correctly on your machine by following the getting started documentation.

One of the first considerations when creating an image is what base image to build on.  For our purposes, we use Alpine Linux as the base image for the container. Alpine is a very lightweight, thin Linux weighing in at only 5 MB. Because it is so minimalistic, containers using it have faster build times while still including the most necessary and important functions. This makes Alpine a good option to use with micro-DDS. RTI Connext DDS Micro comes with a large number of pre-built and tested libraries for various operating systems.  However, the binaries aren’t available for this configuration, but luckily, Connext DDS Micro is available in source code form and can be built easily.  So let’s get started with that task.

Building the Example

Our first task is to build the Connext DDS Micro libraries and create a build image or build-pack.  The goal of the build image is to assist in building the runtime image from source code, 3rd party libraries, etc.  Remember that images are the main component in building containers, and when working with Docker the blueprint is contained in a Dockerfile.  Here is the Dockerfile for the build image or build-pack:

FROM alpine:3.3

# Install Alpine packages to support build of RTI Micro DDS
RUN apk add --update alpine-sdk bash cmake linux-headers openjdk7-jre && rm -rf /var/cache/apk/*

# Extract RTI Micro DDS host tools and point to Alpine JRE for build
COPY RTI_Connext_Micro_Host-2.4.8.zip RTI_Connext_Micro_Host-2.4.8.zip
RUN unzip RTI_Connext_Micro_Host-2.4.8.zip
RUN rm -rf /rti_connext_micro.2.4.8/rtiddsgen/jre/i86Linux
RUN ln -s /usr/lib/jvm/default-jvm/jre /rti_connext_micro.2.4.8/rtiddsgen/jre/i86Linux

# Extract RTI Micro DDS source and patch for build
COPY RTI_Connext_Micro-2.4.8-source.zip RTI_Connext_Micro-2.4.8-source.zip
RUN unzip RTI_Connext_Micro-2.4.8-source.zip
COPY  patch/posixMutex.c rti_connext_micro.2.4.8/source/unix/src/osapi/posix/

# Build RTI Micro DDS
RUN mkdir /build \
    && cd /build \
    && cmake -DRTIMICRO_BUILD_LANG:STRING=C++/rti_connext_micro.2.4.8/source/unix \
    && make \
    && cp -R /build/lib /rti_connext_micro.2.4.8 \
    && rm -rf /build 

It isn’t that different than you’d expect to see in a standard build script.   The first line identifies the base image to be used.   As previously mentioned we’ll be using Alpine version 3.3 available from the public Docker registry.  Next, we install some build dependencies using apk (Alpine package manager).  After that we unzip and patch the source and use traditional cmake and make commands to build the C++ libraries.    To build an image from this Dockerfile we change to the directory containing this file and execute the build command.

$ docker build –t dds-base .

The –t command tags the image with a human-friendly string versus a randomly generated one for future use.    So the build image or build-pack has been created. Let’s use this image to create the publisher and subscriber images.  

The creation of the publisher and subscriber are similar and accomplished in two steps.  The first step uses the previously created build image or build-pack to compile the executable and the second takes the generated executable and packages it in a runtime image.   This two-phased approach minimizes the size of the container since the build tools and intermediate artifacts are discarded when the runtime image is created.  The two Dockerfiles used in creating the images are intuitively called Dockerfile.build and Dockerfile.run.

FROM dds-base:latest                                    (Dockerfile.build)

# Add publisher source code for build
COPY /src /src

# Compile sources to executable
RUN set -ex \
    && cd /src \
    && /rti_connext_micro.2.4.8/rtiddsgen/scripts/rtiddsgen -replace -language microC++ HelloWorld.idl \
    && g++ -Wall -DRTI_UNIX -DRTI_LINUX -DRTI_POSIX_THREADS -I. -I/rti_connext_micro.2.4.8/include -I/rti_connext_micro.2.4.8/include/rti_me *.cxx -L/rti_connext_micro.2.4.8/lib/i86Linux2.6gcc4.4.5/ -o HelloWorld_publisher -L/rti_connext_micro.2.4.8/lib/i86Linux2.6gcc4.4.5/ -lrti_me_cppz -lrti_me_rhsmz -lrti_me_whsmz -lrti_me_discdpdez -lrti_me_discdpdez -lrti_mez -ldl -lpthread -lrt \
    && chmod +x HelloWorld_publisher \
    && mv HelloWorld_publisher /bin

# copy the runtime dockerfile into the context
COPY Dockerfile.run Dockerfile

#export the dockerfile and executable as a tar stream
CMD tar -cf - Dockerfile /bin 
FROM alpine:3.3                                          (Dockerfile.run)

# Include Standard C++ Library
RUN apk add --update libstdc++ && rm -rf /var/cache/apk/*

# Add service and application
COPY /bin/HelloWorld_publisher /bin/HelloWorld_publisher
RUN chmod a+x /bin/HelloWorld_publisher

# Start publisher using multicast for discovery
CMD ["/bin/HelloWorld_publisher", "-peer", "239.255.0.1"] 

The two steps are accomplished through a series of docker build and run commands.  

$ docker build --force-rm -t dds-builder -f Dockerfile.build .

$ docker run --rm dds-builder | docker build --force-rm -t dds-publisher - ; 

The docker build using the Dockerfile.build file copies the source code and builds the binary.  The run command creates a container from that built image.  When that container executes it packages up the available binary into a tar stream and setups the resources for the runtime image build.   The last docker build uses the Dockerfile.run file and the tar stream piped in to copy the files to the appropriate location after installing C++ standard library runtime. 

The subscriber follows the exact same approach.  Change into the subscriber directory and repeat the previous docker build and run commands using the subscriber rather than the publisher name when tagging the image.

$ docker build --force-rm -t dds-builder -f Dockerfile.build .

$ docker run --rm dds-builder | docker build --force-rm -t dds-subscriber - ; 

The images have been built but before proceeding, we should verify that by executing the docker images command.  This command lists all the images available in your local registry.  You should see the two images we created in the previous steps.

$ docker images | grep dds
REPOSITORY        TAG                 IMAGE ID            CREATED             SIZE 
dds-subscriber    latest              0442ffc6ca02        2 minutes ago      8.098 MB 
dds-publisher     latest              15fa3c2ed441        4 minutes ago      8.084 MB

Running the Example

Now that we built the images we should take them for a test drive and ensure the example runs successfully.  Open two terminal windows.  We’ll use one for the publisher and the other for the subscriber.  In one of the windows start the publisher using the docker run command.

$ docker run -t dds-publisher

If everything is successful you should see the “Hello World” text followed by a number that is incremented after every message is published.

Hello World (0) 
Hello World (1) 
Hello World (2) 
Hello World (3) 
Hello World (4) 
Hello World (5) 
Hello World (6) 
… 

With the publisher successfully running we can start the subscriber and see if the DDS messages are being received across the two containers over the Docker bridge network.

$ docker run -d -t dds-subscriber

The output should look similar to this, proving the subscriber is working:

Sample received     msg: Hello World (9)  
Sample received     msg: Hello World (10)  
Sample received     msg: Hello World (11)  
Sample received     msg: Hello World (12)  
Sample received     msg: Hello World (13)  
Sample received     msg: Hello World (14) 
… 

The number starts with the most recent published because the QoS didn’t have any history settings.  Both of these containers will continue to run until they are manually stopped or the container engine is brought down.  After the learning curve of containers is overcome the rest is just DDS the way you’ve (hopefully) done in the past.

Next Steps

Linux containers, especially Docker, are providing improvements across the DevOps cycle.   They provide a convenient packaging mechanism and promote a modular, microservice based architecture.  Using DDS as the data bus between container-based microservices provides an asynchronous publish/subscribe data bus for these services to communicate when traditional synchronous REST-based approaches are insufficient.   Together they make a solid choice for the Industrial Internet and Internet of Things software architecture.   Take the next step and start using RTI Connext DDS Micro with your container-based architecture today.

Special thanks to Katelyn Schoedl, Research Intern, GE Global Research and Joel Markham, Senior Research Engineer, GE Global Research for authoring this guest blog post – THANK YOU! 

Databus vs. Database: The 6 Questions Every IIoT Developer Needs to Ask 3

importantQuestionsDatabasevsDatabus

The Industrial Internet of Things (IIoT) is full of confusing terms.  That’s unavoidable; despite its reuse of familiar concepts in computing and systems, the IIoT is a fundamental change in the way things work.  Fundamental changes require fundamentally new concepts.  One of the most important is the concept of a “databus”.

The soon-to-be-released IIC reference architecture version 2 contains a new pattern called the “layered databus” pattern.  I can’t say much more now about the IIC release, but going through the documentation process has been great for driving crisp definitions.

The databus definition is:

A databus is a data-centric information-sharing technology that implements a virtual, global data space.  Software applications read and update entries in a global data space. Updates are shared between applications via a publish-subscribe communications mechanism.

Key characteristics of a databus are:

  1. the participants/applications directly interface with the data,
  2. the infrastructure understands, and can therefore selectively filter the data, and
  3. the infrastructure imposes rules and guarantees of Quality of Service (QoS) parameters such as rate, reliability, and security of data flow.

Of course,  new concepts generate questions.  Some of the best questions came from an architect from a large database company.  We usually try to explain the databus concept from the perspective of a networking or software architect.  But, data science is perhaps a better approach.  Both databases and databuses are, after all, data science concepts.

Let’s look at the 6 most common questions.

Question 1: How is a databus different from a database (of any kind)?

Short answer: A database implements data-centric storage.  It saves old information that you can later search by relating properties of the stored data.  A databus implements data-centric interaction.  It manages future information by letting you filter by properties of the incoming data.

Long answer: Data centricity can be defined by these properties:

  • The interface is the data. There are no artificial wrappers or blockers to that interface like messages, or objects, or files, or access patterns.
  • The infrastructure understands that data. This enables filtering/searching, tools, & selectivity.  It decouples applications from the data and thereby removes much of the complexity from the applications.
  • The system manages the data and imposes rules on how applications exchange data. This provides a notion of “truth”.  It enables data lifetimes, data model matching, CRUD interfaces, etc.

A relational database is a data-centric storage technology. Before databases, storage systems were files with application-defined (ad hoc) structure.  A database is also a file, but it’s a very special file.  A database knows how to interpret the data and enforces access control.  A database thus defines “truth” for the system; data in the database can’t be corrupted or lost.

By enforcing simple rules that control the data model, databases ensure consistency.  By exposing the data to search and retrieval by all users, databases greatly ease system integration.  By allowing discovery of data and schema, databases also enable generic tools for monitoring, measuring, and mining information.

Like a database, data-centric middleware (a databus) understands the content of the transmitted data.  The databus also sends messages, but it sends very special messages.  It sends only messages specifically needed to maintain state.  Clear rules govern access to the data, how data in the system changes, and when participants get updates.  Importantly, only the infrastructure sends messages.  To the applications, the system looks like a controlled global data space.  Applications interact directly with data and data “Quality of Service” (QoS) properties like age and rate.  There is no application-level awareness or concept of “message”.  Programs using a databus read and write data, they do not send and receive messages.

Database vs Databus

A database replaces files with data-centric storage that finds the right old data through search. A databus replaces messages with data-centric connectivity that finds the right future data through filtering. Both technologies make system integration much easier, supporting much larger scale, better reliability, and application interoperability.

With knowledge of the structure and demands on data, the databus infrastructure can do things like filter information, selecting when or even if to do updates.  The infrastructure itself can control QoS like update rate, reliability, and guaranteed notification of peer liveliness.  The infrastructure can discover data flows and offer those to applications and generic tools alike.  This knowledge of data status, in a distributed system, is a crisp definition of “truth”.  As in databases, the infrastructure exposes the data, both structure and content, to other applications.  This accessible source of truth greatly eases system integration.  It also enables generic tools and services that monitor and view information flow, route messages, and manage caching.

Question 2: “Software applications read and update entries in a global data space. Updates are shared between applications via a publish-subscribe communications mechanism.”  Does that mean that this is a database that you interact with via a pub-sub interface?

Short answer: No, there is no database.  A database implies storage: the data physically resides somewhere.  A databus implements a purely virtual concept called a “global data space”.

Long answer: The databus data space defines how to interact with future information.  For instance, if “you” are an intersection controller, you can subscribe to updates of vehicles within 200m of your position.  Those updates will then be delivered to you, should a vehicle ever approach.  Delivery is guaranteed in many ways (start within .01 secs, updated 100x/sec, reliable, etc.).  Note that the data may never be stored at all.  (Although some QoS settings like reliability may require some local storage.)  You can think of a data space as a set of specially-controlled data objects that will be filled with information in the exact way you specify, although that information is not (in general) saved by the databus…it’s just delivered.

Question 3: “The participants/applications directly interface with the data.”  Could you elaborate on what that means?

With “message-centric” middleware, you write an application that sends data, wrapped in messages, to another application.  You may do that by having clients send data to servers, for instance.  Both ends need to know something about the other end, usually including things like the schema, but also likely assumed properties of the data like “it’s less than .01 seconds old”, or “it will come 100x/second”, or at least that there is another end alive, e.g. the server is running.  All these assumed properties are completely hidden in the application code, making reuse, system integration, and interoperability really hard.

With a databus, you don’t need to know anything about the source applications.  You make clear your data needs, and then the databus delivers it.  Thus, with a databus, each application interacts only with the data space.  As an application, you simply write to the data space or read from the data space with a CRUD interface.  Of course, you may require some QoS from that data space, e.g. you need your data updated 100x per second.  The data space itself (the databus) will guarantee you get that data (or flag an error).  You don’t need to know if there are only one or 27 redundant sources of that data, or if it comes over a network or shared memory, or if it’s a C program on Linux or a C# program on Windows.  All interactions are with your own view of the data space.  It also makes sense, for instance, to write data to a space with no recipients.  In this case, the databus may do absolutely nothing, or it may cache information for later delivery, depending on your QoS settings.

Note that both database and databus technologies replace the application-application interaction with application-data-application interaction.  This abstraction is absolutely critical.  It decouples applications and greatly eases scaling, interoperability, and system integration.  The difference is really one of old data stored in a (likely centralized) database, vs future data sent directly to the applications from a distributed data space.

Question 4: “The infrastructure understands, and can therefore selectively filter the data.” Isn’t that true of all pub-sub, where you can register for “events” of interest to you?

Most pub-sub is very primitive.  An application “registers interest”, and then everything is simply sent to that application.  So, for instance, an intersection collision detection algorithm could subscribe to “vehicle positions”.   The infrastructure then sends messages from any sensor capable of producing positions, with no knowledge of the data inside that message.  Even “content filtering” pub-sub offers only very simple specs and requires the system to pre-select what’s important for all.  There’s no real control of flow.

A databus is much more expressive.  That intersection could say “I am interested only in vehicle positions within 200m, moving at 10m/s towards me.  If a vehicle falls into my specs, I need to be updated 200 times a second.  You (the databus) need to guarantee me that all sensors feeding this algorithm promise to deliver data that fast…no slower or faster.  If a sensor updates 1000 times a second, then only send me every 5th update.  I also need to know that you actually are in touch with currently-live sensors (which I define as producing in the last 0.01secs) on all possible roadway approaches at all times.  Every sensor must be able to store 600 old samples (3 seconds worth), and update me with that old data if I need it.”   (These are a few of the 20+ QoS settings in the DDS standard.)

Note that a subscribing application in the primitive pub-sub case is very dependent on the actual properties of its producers.  It has to somehow trust that they are alive (!), that they have enough buffers to save the information it may need, that they won’t flood it with information nor provide it too slowly.  If there are 10,000 cars being sensed 1000x/sec, but only 3 within 200m, it will have to receive 10,000*1000 = 10m samples every second just to find the 3*200 = 600 it needs to pay attention to.  It will have to ping every single sensor 100x/second just to ensure it is active.  If there are redundant sensors on different paths, it has to ping them all independently and somehow make sure all paths are covered.  If there are many applications, they all have to ping all the sensors independently.  It also has to know the schema of the producers, etc.

The application in the second case will, by contrast, receive exactly the 600 samples it cares about, comfortable in the knowledge that at least one sensor for each path is active.  The rate of flow is guaranteed.  Sufficient reliability is guaranteed.  The total dataflow is reduced by 99.994% (we only need 600/10m samples, and smart middleware does filtering at the source).  For completeness, note that the collision algorithm is completely independent of the sensors themselves.  It can be reused on any other intersection, and it will work with one sensor per path or 17.  If during runtime, the network gets too loaded to meet the data specs (or something fails), the application will be immediately notified.

Question 5: How does a databus differ from a CEP engine?

Short answer: a databus is a fundamentally distributed concept that selects and delivers data from local producers that match a simple specification.  A CEP engine is a centralized executable service that is capable of much more complex specifications, but must have all streams of data sent to one place.

Long answer: A Complex Event Processing (CEP) engine examines an incoming stream of data, looking for patterns you program it to identify.  When it finds one of those patterns, you can program it to take action. The patterns can be complex combinations of past and incoming future data.  However, it is a single service, running on a single CPU somewhere.  It transmits no information.

A databus also looks for patterns of data.  However, the specifications are simpler; it makes decisions about each data item as it’s produced.  The actions are also simpler; the only action it may take is to send that data to a requestor.  The power of a databus is that it is fundamentally distributed.  The looking happens locally on potentially hundreds, thousands, or even millions of nodes.  Thus, the databus is a very powerful way to select the right data from the right sources and send them to the right places.  A databus is sort of like a distributed set of CEP engines, one for every possible source of information, that are automatically programmed by the users of that information.  Of course, the databus has many other properties beyond pattern matching, such as schema mediation, redundancy management, transport support, an interoperable protocol, etc.

Question 6: What application drove the DDS standard and databuses?

The early applications were in intelligent robots, “information superiority”, and large coordinated systems like navy combat management.  These systems needed reliability even when components fail, data fast enough to control physical processes, and selective discovery and delivery to scale.  Data centricity really simplified application code and controlled interfaces, letting teams of programmers work on large software systems over time.  The DDS standard is an active, growing family of standards that was originally driven by both vendors and customers.  It has significant use across many verticals, including medical, transportation, smart cities, and energy.

If you’d like to learn about how intelligent software is sweeping the IIoT, be sure to download our whitepaper on the future of the automotive industry,”The Secret Sauce of Autonomous Cars“.

More Reasons to Love Eddy Reply

If you follow RTI blogs, you would remember Eddy was our project code name for Connext DDS Professional version 5.2.0. And you would remember how much we loved Eddy when it was released in summer last year. During the cold winter and spring, we spent a great amount of effort to make Eddy even better. Now Eddy has matured into version 5.2.3, which we are announcing this week!

One of the reasons we loved 5.2.0 was its support for unbounded sequences and strings. This feature enables our customers to efficiently manage memory when dealing with samples containing sequence or string members whose maximum size is unknown or quite large. A good example of a system that can benefit from this feature is video surveillance, where a developer may not know in advance the maximum size of the video frames sent on the wire. When we released Eddy, the feature was available for C++, C and .NET developers. In version 5.2.3, we added Java support for this feature. If you code in Java, check it out. You can learn more about unbounded sequences and strings in this Eddy blog.

Another reason we loved 5.2.0 was its ability to serialize/deserialize samples into/from a buffer. Applications could use this feature for different needs. For example, customers could save the serialized data in a database, disk or memory and access it offline to perform data analytics. In version 5.2.3, we added support for this feature to DynamicData in Java and .NET through the APIs:

  • DynamicData.to_cdr_buffer
  • DynamicData.from_cdr_buffer

Also, don’t forget to check out Connext Tools. With version 5.2.3, you can now replay multiple files as easy as how you recorded them, with simple configuration and no extra steps. If you are new to Connext Tools, take a look at this video explaining our data visualization feature in details, with great tips on leveraging it for debugging and developing your applications.  

You can never have too much security! Version 5.2.3 continues the mission of securing our customer’s systems with the latest and greatest OpenSSL. Well, almost the latest. Just a few days before we released 5.2.3, which supports OpenSSL 1.0.2g, OpenSSL announced version 1.0.2h.

More good news: The 5.2.3 release is even more “supportive” than 5.2.0. It adds support for the latest Visual Studio version (VS2015) and the latest Mac OS X (version 10.11, El Capitan).

Lastly, we have special news for mobile developers. The 5.2.3 release introduces Connext DDS to one more mobile operating system in addition to Android. Yes, iOS! Version 5.2.3 supports iOS 8.2. We look forward to your innovative iOS DDS applications soon!

iOS Suppport 5.3.2

So, if you loved Eddy, I am sure you will love our latest and greatest 5.2.3 release of RTI Connext DDS even more. Download the free trial and give it a try today!