Databus vs. Database: The 6 Questions Every IIoT Developer Needs to Ask 3


The Industrial Internet of Things (IIoT) is full of confusing terms.  That’s unavoidable; despite its reuse of familiar concepts in computing and systems, the IIoT is a fundamental change in the way things work.  Fundamental changes require fundamentally new concepts.  One of the most important is the concept of a “databus”.

The soon-to-be-released IIC reference architecture version 2 contains a new pattern called the “layered databus” pattern.  I can’t say much more now about the IIC release, but going through the documentation process has been great for driving crisp definitions.

The databus definition is:

A databus is a data-centric information-sharing technology that implements a virtual, global data space.  Software applications read and update entries in a global data space. Updates are shared between applications via a publish-subscribe communications mechanism.

Key characteristics of a databus are:

  1. the participants/applications directly interface with the data,
  2. the infrastructure understands, and can therefore selectively filter the data, and
  3. the infrastructure imposes rules and guarantees of Quality of Service (QoS) parameters such as rate, reliability, and security of data flow.

Of course,  new concepts generate questions.  Some of the best questions came from an architect from a large database company.  We usually try to explain the databus concept from the perspective of a networking or software architect.  But, data science is perhaps a better approach.  Both databases and databuses are, after all, data science concepts.

Let’s look at the 6 most common questions.

Question 1: How is a databus different from a database (of any kind)?

Short answer: A database implements data-centric storage.  It saves old information that you can later search by relating properties of the stored data.  A databus implements data-centric interaction.  It manages future information by letting you filter by properties of the incoming data.

Long answer: Data centricity can be defined by these properties:

  • The interface is the data. There are no artificial wrappers or blockers to that interface like messages, or objects, or files, or access patterns.
  • The infrastructure understands that data. This enables filtering/searching, tools, & selectivity.  It decouples applications from the data and thereby removes much of the complexity from the applications.
  • The system manages the data and imposes rules on how applications exchange data. This provides a notion of “truth”.  It enables data lifetimes, data model matching, CRUD interfaces, etc.

A relational database is a data-centric storage technology. Before databases, storage systems were files with application-defined (ad hoc) structure.  A database is also a file, but it’s a very special file.  A database knows how to interpret the data and enforces access control.  A database thus defines “truth” for the system; data in the database can’t be corrupted or lost.

By enforcing simple rules that control the data model, databases ensure consistency.  By exposing the data to search and retrieval by all users, databases greatly ease system integration.  By allowing discovery of data and schema, databases also enable generic tools for monitoring, measuring, and mining information.

Like a database, data-centric middleware (a databus) understands the content of the transmitted data.  The databus also sends messages, but it sends very special messages.  It sends only messages specifically needed to maintain state.  Clear rules govern access to the data, how data in the system changes, and when participants get updates.  Importantly, only the infrastructure sends messages.  To the applications, the system looks like a controlled global data space.  Applications interact directly with data and data “Quality of Service” (QoS) properties like age and rate.  There is no application-level awareness or concept of “message”.  Programs using a databus read and write data, they do not send and receive messages.

Database vs Databus

A database replaces files with data-centric storage that finds the right old data through search. A databus replaces messages with data-centric connectivity that finds the right future data through filtering. Both technologies make system integration much easier, supporting much larger scale, better reliability, and application interoperability.

With knowledge of the structure and demands on data, the databus infrastructure can do things like filter information, selecting when or even if to do updates.  The infrastructure itself can control QoS like update rate, reliability, and guaranteed notification of peer liveliness.  The infrastructure can discover data flows and offer those to applications and generic tools alike.  This knowledge of data status, in a distributed system, is a crisp definition of “truth”.  As in databases, the infrastructure exposes the data, both structure and content, to other applications.  This accessible source of truth greatly eases system integration.  It also enables generic tools and services that monitor and view information flow, route messages, and manage caching.

Question 2: “Software applications read and update entries in a global data space. Updates are shared between applications via a publish-subscribe communications mechanism.”  Does that mean that this is a database that you interact with via a pub-sub interface?

Short answer: No, there is no database.  A database implies storage: the data physically resides somewhere.  A databus implements a purely virtual concept called a “global data space”.

Long answer: The databus data space defines how to interact with future information.  For instance, if “you” are an intersection controller, you can subscribe to updates of vehicles within 200m of your position.  Those updates will then be delivered to you, should a vehicle ever approach.  Delivery is guaranteed in many ways (start within .01 secs, updated 100x/sec, reliable, etc.).  Note that the data may never be stored at all.  (Although some QoS settings like reliability may require some local storage.)  You can think of a data space as a set of specially-controlled data objects that will be filled with information in the exact way you specify, although that information is not (in general) saved by the databus…it’s just delivered.

Question 3: “The participants/applications directly interface with the data.”  Could you elaborate on what that means?

With “message-centric” middleware, you write an application that sends data, wrapped in messages, to another application.  You may do that by having clients send data to servers, for instance.  Both ends need to know something about the other end, usually including things like the schema, but also likely assumed properties of the data like “it’s less than .01 seconds old”, or “it will come 100x/second”, or at least that there is another end alive, e.g. the server is running.  All these assumed properties are completely hidden in the application code, making reuse, system integration, and interoperability really hard.

With a databus, you don’t need to know anything about the source applications.  You make clear your data needs, and then the databus delivers it.  Thus, with a databus, each application interacts only with the data space.  As an application, you simply write to the data space or read from the data space with a CRUD interface.  Of course, you may require some QoS from that data space, e.g. you need your data updated 100x per second.  The data space itself (the databus) will guarantee you get that data (or flag an error).  You don’t need to know if there are only one or 27 redundant sources of that data, or if it comes over a network or shared memory, or if it’s a C program on Linux or a C# program on Windows.  All interactions are with your own view of the data space.  It also makes sense, for instance, to write data to a space with no recipients.  In this case, the databus may do absolutely nothing, or it may cache information for later delivery, depending on your QoS settings.

Note that both database and databus technologies replace the application-application interaction with application-data-application interaction.  This abstraction is absolutely critical.  It decouples applications and greatly eases scaling, interoperability, and system integration.  The difference is really one of old data stored in a (likely centralized) database, vs future data sent directly to the applications from a distributed data space.

Question 4: “The infrastructure understands, and can therefore selectively filter the data.” Isn’t that true of all pub-sub, where you can register for “events” of interest to you?

Most pub-sub is very primitive.  An application “registers interest”, and then everything is simply sent to that application.  So, for instance, an intersection collision detection algorithm could subscribe to “vehicle positions”.   The infrastructure then sends messages from any sensor capable of producing positions, with no knowledge of the data inside that message.  Even “content filtering” pub-sub offers only very simple specs and requires the system to pre-select what’s important for all.  There’s no real control of flow.

A databus is much more expressive.  That intersection could say “I am interested only in vehicle positions within 200m, moving at 10m/s towards me.  If a vehicle falls into my specs, I need to be updated 200 times a second.  You (the databus) need to guarantee me that all sensors feeding this algorithm promise to deliver data that fast…no slower or faster.  If a sensor updates 1000 times a second, then only send me every 5th update.  I also need to know that you actually are in touch with currently-live sensors (which I define as producing in the last 0.01secs) on all possible roadway approaches at all times.  Every sensor must be able to store 600 old samples (3 seconds worth), and update me with that old data if I need it.”   (These are a few of the 20+ QoS settings in the DDS standard.)

Note that a subscribing application in the primitive pub-sub case is very dependent on the actual properties of its producers.  It has to somehow trust that they are alive (!), that they have enough buffers to save the information it may need, that they won’t flood it with information nor provide it too slowly.  If there are 10,000 cars being sensed 1000x/sec, but only 3 within 200m, it will have to receive 10,000*1000 = 10m samples every second just to find the 3*200 = 600 it needs to pay attention to.  It will have to ping every single sensor 100x/second just to ensure it is active.  If there are redundant sensors on different paths, it has to ping them all independently and somehow make sure all paths are covered.  If there are many applications, they all have to ping all the sensors independently.  It also has to know the schema of the producers, etc.

The application in the second case will, by contrast, receive exactly the 600 samples it cares about, comfortable in the knowledge that at least one sensor for each path is active.  The rate of flow is guaranteed.  Sufficient reliability is guaranteed.  The total dataflow is reduced by 99.994% (we only need 600/10m samples, and smart middleware does filtering at the source).  For completeness, note that the collision algorithm is completely independent of the sensors themselves.  It can be reused on any other intersection, and it will work with one sensor per path or 17.  If during runtime, the network gets too loaded to meet the data specs (or something fails), the application will be immediately notified.

Question 5: How does a databus differ from a CEP engine?

Short answer: a databus is a fundamentally distributed concept that selects and delivers data from local producers that match a simple specification.  A CEP engine is a centralized executable service that is capable of much more complex specifications, but must have all streams of data sent to one place.

Long answer: A Complex Event Processing (CEP) engine examines an incoming stream of data, looking for patterns you program it to identify.  When it finds one of those patterns, you can program it to take action. The patterns can be complex combinations of past and incoming future data.  However, it is a single service, running on a single CPU somewhere.  It transmits no information.

A databus also looks for patterns of data.  However, the specifications are simpler; it makes decisions about each data item as it’s produced.  The actions are also simpler; the only action it may take is to send that data to a requestor.  The power of a databus is that it is fundamentally distributed.  The looking happens locally on potentially hundreds, thousands, or even millions of nodes.  Thus, the databus is a very powerful way to select the right data from the right sources and send them to the right places.  A databus is sort of like a distributed set of CEP engines, one for every possible source of information, that are automatically programmed by the users of that information.  Of course, the databus has many other properties beyond pattern matching, such as schema mediation, redundancy management, transport support, an interoperable protocol, etc.

Question 6: What application drove the DDS standard and databuses?

The early applications were in intelligent robots, “information superiority”, and large coordinated systems like navy combat management.  These systems needed reliability even when components fail, data fast enough to control physical processes, and selective discovery and delivery to scale.  Data centricity really simplified application code and controlled interfaces, letting teams of programmers work on large software systems over time.  The DDS standard is an active, growing family of standards that was originally driven by both vendors and customers.  It has significant use across many verticals, including medical, transportation, smart cities, and energy.

If you’d like to learn about how intelligent software is sweeping the IIoT, be sure to download our whitepaper on the future of the automotive industry,”The Secret Sauce of Autonomous Cars“.

How OPC UA and DDS Joined Forces Reply


It all started, appropriately, at National Instrument’s annual show called NIWeek in Austin, Texas. There, Thomas Burke, President & Executive Director at the OPC Foundation, approached me and asked if I was interested in helping build a partnership between the two most important connectivity solutions in the IIoT. Because of RTI’s leadership at the IIC and within DDS, we were well placed to lead.

That was the start of a great journey.

It was easy to agree to Thomas’s proposal. Both communities were struggling with how to differentiate our core value propositions. As everyone now knows, in practice, OPC UA and DDS solve very different problems. They focus on different industries. Even in the same application, they solve different use cases.

Nonetheless, the world thought we were at war. Why?   You need to understand the confusion of a new, very hot market. The Internet changed banking, retail, and travel agencies.  It created huge new companies and ended many others.  But, it didn’t touch most industrial applications.  Factories, plants, hospitals, and power systems operate today the same way they did 20 years ago.

Suddenly that is changing.  Gartner, the analyst firm, predicts that the “smart machine era” will be the most disruptive in the history of IT.  The CEO of General Electric famously said if you go to bed an industrial manufacturer, you will wake up a software and analytics company.  The modernization of the industrial landscape—the “Industrial Internet of Things” (IIoT)—will impact virtually every industry on the planet.

Mega trends that sweep through huge swaths of the economy like that always cause a lot of stress.

In this case, the stress was a perceived clash of industry alliances. The German industrial leadership has been developing a new architecture for manufacturing called Industrie 4.0. The German government invested over a billion Euros in Industrie 4.0 over most of a decade. Then, in 2014, five large US companies founded the Industrial Internet Consortium (IIC). The IIC struck a nerve in the market, and quickly grew to include hundreds of companies. Since both the IIC and Industrie 4.0 are working on “industrial systems” architecture, people assume they compete. A challenging reporter wrote an article on the implications for world dominance, and a conflict was born.

Then, that same reporter posted an opinion that the conflict was really technical, rather than political, and the most important technical conflict was between OPC UA and DDS. Suddenly, both communities were embroiled in controversy that made no real sense.

The rest, as they say, is history. Today, the IIC and Industrie 4.0 announced their cooperation. Their plan is to seek ways to combine Industrie 4.0’s depth in manufacturing with the IIC’s breadth across industries. The core technologies have similar strengths and similar goals.

Our path had its rocky stretches, but we are making great progress. We are working on mapping the architectures. The OMG has an official standards effort to define an OPC UA/DDS bridge. The OPC Foundation is building a “DDS Profile” for OPC UA pubsub. And, the IIC is creating joint testbeds that will prove the integration. We are, together, building the IIoT’s future.

The positioning document and press release going out today are the result of many people’s work. It combines input from the major DDS and OPC UA vendors, from the IIC and Industrie 4.0, and from the OMG and OPC Foundation standards organizations. I would like to particularly thank those most involved: Thomas Burke and Stefan Hoppe from OPCF, Matthias Damm from Unified Automation, and RTI’s Gerardo Pardo-Castellote. Coordinating all these organizations to make any joint statement would be impressive on its own. But, somehow, we managed the deep cooperation required to clarify the markets and design a technical integration. That’s because we all realized how important it is to build a standard, interoperable design that covers the IIoT. By coordinating our political leadership with the leading technologies, we will build, together, the future of the IIoT.

Data Connectivity in the Industrial Internet Reference Architecture Reply

Today, the Industrial Internet Consortium (IIC) released the Industrial Internet Reference Architecture (IIRA). The IIC is the largest of the Internet of Things (IoT) consortia with over 170 members ( More importantly, it’s the only one focused on industrial systems. The first public release of the IIRA is a formal overview of the systems architecture from a high-level perspective. It covers everything from business goals to system interoperability. The architecture establishes many key technical guidelines. Critically, it also eliminates many approaches; an architecture is as much about what you can’t do as what you can do.

We at Real-Time Innovations (RTI) are most excited by one key aspect: the IIRA connectivity architecture. “Connectivity”, or how things communicate, is one of the biggest challenges for the emerging Industrial Internet of Things (IIoT). The IIRA takes an innovative, distributed “databus” approach that eases interoperability while providing top performance, reliability, and security.

The Power of Common Architecture

Ultimately, the IIoT is about building distributed systems.   Connecting all the parts intelligently so the system can perform, scale, evolve, and function optimally is the crux of the IIRA.

To enable the IIoT, we need to develop a common architecture that can span computing capability, interoperate across vendors, and bridge industries. Over time, common technologies that span industries always replace bespoke systems. However, incremental adoption and adapting current technology are also crucial. The IIoT must therefore integrate many standards and connectivity technologies. The IIC architecture explicitly blends the various connectivity technologies into an interconnected future that can enable the sweeping vision of a hugely connected new world.

This is the “interoperability” problem, and it’s really RTI’s specialty. RTI participates in 15 different standards and consortia efforts. They span many industries: naval systems, avionics, power, medical devices, unmanned vehicles, consumer electronics, industrial control, and broadcast television, to name just a few. All focus on how to get systems to work together. The IIC draws on experience from these industries and more.

The Integration Challenge

When you connect many different systems, the fundamental problem is the “N squared” interconnect issue. Connecting two systems requires matching many aspects, including protocol, data model, communication pattern, and quality of service (QoS) parameters like reliability, data rate, or timing deadlines. While connecting two systems is a challenge, it is solvable with a special-purpose “bridge”. But it doesn’t scale; connecting N systems together requires N-squared bridges. As N gets large, this becomes daunting.

One way to ease this problem is to keep N small. You can do that by dictating all standards and technologies across all systems that interoperate. Many industry-specific standards bodies successfully take this path. For instance, the European Generic Vehicle Architecture (GVA) specifies every aspect of how to build military ground vehicles, from low-level connectors to top-level data models. The German Industrie 4.0 effort takes a similar pass at the manufacturing industry, making choices for ordering and delivery, factory design, technology, and product planning. Only one standard per task is allowed.

This approach eases interoperation. Unfortunately, the result is limited in scope because the rigidly-chosen standards cannot provide all functions and features. There are simply too many special requirements to effectively cross industries this way. Dictating standards also doesn’t address the legacy integration problem. These two restrictions (scope and legacy limits) make this approach unsuited to building a wide-ranging, cross-industry Industrial Internet.

On the other end of the spectrum, you can build a very general bridge point. Enterprise web services work this way, using an “Enterprise Service Bus” (ESB) like Apache Camel. However, despite the “bus” in its name, an ESB is not a distributed concept. All systems must connect to a single point, where each incoming standard is mapped to a common object format. Because everything maps to one format, the ESB requires only “one-way” translation, avoiding the N-squared problem. Camel, for instance, supports hundreds of adapters that each convert one protocol or data source.

Unfortunately, this doesn’t work well for demanding industrial systems. The single ESB service is an obvious choke and failure point. ESBs are large, slow programs. In the enterprise, ESBs connect large-grained systems executing only a few transactions per second. Industrial applications need much faster, reliable, smaller-grained service. So, ESBs are not viable for most IIoT uses.

The IIRA Connectivity Core Standard

The IIRA takes an intermediate approach. The design introduces the concept of a “Connectivity Core Standard”. Unlike an ESB, the core standard is very much a distributed concept. Some endpoints can connect directly to the core standard. Other endpoints and subsystems connect through “gateways”. The core standard then connects them all together. This allows multiple protocols without having to bridge between all possible pairs. Each needs only one bridge to the core.

Like an ESB, this solves the N-squared problem. But, unlike an ESB, it provides a fast, distributed core, replacing the centralized service model. Legacy and less-capable connectivity technologies transform through a gateway to the core standard. There are only N transformations, where N is the number of connectivity standards.


The IIRA connectivity architecture specifies a quality-of-service controlled, secure “core connectivity standard”. All other connectivity standards must only bridge to this one core standard.

Obviously, this design requires a very functional core connectivity standard. Some systems may get by with slow or simple cores. But, most industrial systems need to identify, describe, find, and communicate a lot of data with demands unseen in other contexts. Many applications need delivery in microseconds or the ability to scale to thousands or even millions of data values and nodes. The consequences of a reliability failure can be severe. Since the core standard really is the core of the system, it has to perform.

The IIRA specifies the key functions that connectivity framework and its core standard should provide: data discovery, exchange patterns, and “quality of service” (QoS). QoS parameters include delivery reliability, ordering, durability, lifespan, and fault tolerance functions. With these capabilities, the core connectivity can implement the reliable, high-speed, secure transport required by demanding applications across industries.

The IIRA outlines several data quality of service capabilities for the connectivity core standard. These ensure efficient, reliable, secure operation for critical infrastructure.

The IIRA outlines several data quality of service capabilities for the connectivity core standard. These ensure efficient, reliable, secure operation for critical infrastructure.

Security is also critical. To make security work correctly, it must be intimately married to the architecture. For instance, the “core” standard may support various patterns and delivery capabilities. The security design must match those exactly. For example, if the connectivity supports publish/subscribe, so must security. If the core supports multicast, so must security. If the core supports dynamic plug-n-play discovery, so must security. Security that is this intimately married to the architecture can be imposed at any time without changing the code. Security becomes just another controlled quality of service, albeit more complexly configured. This is a very powerful concept.

The integrated security must extend beyond the core. The IIRA allows for that too; all other connectivity technologies can be secured at the gateways.

DDS as a Core Standard

The IIRA does not specify standards; the IIC will take that step in the next release. However, it’s clear that the DDS (Data Distribution Service) standard is a great fit to the IIRA. DDS provides automated discovery, each of the patterns specified in the IIRA, all the QoS settings, and intimately integrated security.

This is no accident. The IIRA connectivity design draws heavily on industry experience with DDS. DDS has thousands of successful applications in power systems (huge hydropower dams, wind farms, microgrids), medicine (imaging, patient monitoring, emergency medical systems), transportation (air traffic control, vehicle control, automotive testing), industrial control (SCADA, mining systems, PLC communications), and defense (ships, avionics, autonomous vehicles). The lessons learned in these applications were instrumental in the design of the IIRA.

Thank you!

Finally, I would like to close by thanking the teams that built the IIRA. This was a large effort supported by many companies. RTI was most involved on the architecture, connectivity, and distributed data & interoperability teams. Thank you all, and congratulations on your first release.