RTI Goes Parallel 3

Micro-processors, throughout their history, have tenaciously delivered on their promise of doing more with less: less time, less space and now less power. The ways to do that, however, have changed profoundly in the last decade or so.  Modern processors are not faster but fatter and denser. Chances are you are reading this post on a dual core if you’re using a handheld or half a dozen cores if you are on a desktop. The web-server that is serving this content probably has cores in the range of high teens, if not more. Soon, 100s of cores will cease to be a novelty – welcome many-cores!

The most visible upshot of the many-core revolution is that tomorrow’s distributed systems will be smaller and will run cooler. Computer systems that use large rooms and industrial chillers today, will use a small office desk and may be a retail window air-conditioning unit. At the heart of these new breed of computer systems are multi- or many-core processors. Leading chip manufacturers are packing more cores on a silicon die than ever before; turning good old micro-processors into a system-on-a-chip – here is the key part – for EVERYBODY!

Every once in a while people rediscover what Uncle Ben said long ago, “With great power comes great responsibility.” It could not be more true in this day and age of many-core computers. The onus of unleashing the massively concurrent computing power of the emerging hardware platforms lies almost exclusively in the hands of us – the programmers. Unfortunately, as many [1] [2] [3] have observed, we’re not able to keep up. The gap between the software and hardware technology is wider than ever. Multi-threading is conclusively hard and most programmers lack mechanical sympathy – the intrinsic understanding of how a program runs on a machine. Think cache misses!

So what does all this mean to data-centric messaging? Is your colossal computing power from edge-to-enterprise churning away digits without melting polar ice caps? Are you, or your team of programmers ready to extract work of dozens of processors from one? And let me ask, within budget and on-time? RTI has set sail to help you do just that.

RTI Research has secured Phase I STTR research funding from Air-Force Research Laboratory to develop Scalable Communication and Scheduling Techniques for Many-core Systems. The research will be a collaboration between RTI and the University of North Carolina (UNC) Real-Time Systems Group. We are very excited to begin research with Prof. James Anderson, an IEEE fellow.

With this research, we’re planning to develop a new infrastructure to help application developers design software that can scale up in performance as the number of cores increases.  To do this, application developers will need to switch their mind-set to becoming developers of “distributed applications” that run on many-cores.  RTI Connext can provide the fast message-passing between the cores.  However, a lot more has to happen before we realize this vision.   We will develop three key technologies:

  • Enhance and tune RTI Connext high-performance messaging specifically to optimize performance on multi- and many-core machines.  The goal is effective concurrency of the middleware both for hyper-fast data exchange between cores, and efficient use of multiple cores to send/receive large amounts of data off-board.
  • Smart resource allocation and scheduling algorithms for many-core computers. This will ensure little or no computing power is wasted. We’ll leverage the pivotal research work done at the UNC Real-time Systems group.
  • Finally, we want to help you build distributed systems that are robust, simpler to develop, and also ship on time within budget. We envision a component framework for developing scalable many-core applications that will package the new software technology in easy to use interfaces and run-times.

While we’re at it, we also want to investigate the role of data-centric messaging in the now-in-vogue concurrency models, such as Actors. Actors are fundamentally based on  asynchronous messaging. Can standardized DDS API serve as a middlware substrate between actors? Can data-centric messaging make actor programming any easier? These are some research questions we’ll explore along the way.

Want to dig deeper what we’re up to? Check out this Design West presentation.

Questions? Feedback? Comments? Write in the comments section below or write us @ manycore@rti.com

 

3 comments

  1. Hi Sumant

    I read your blog and reviewed your PPT. I am pleased RTI and DDS style data-centricthinking is being addressed to the manycore problem, I am convinced there is some value in there.

    You might find it useful to get hold of an old book in which the thinking it espouses has yet to be properly implemented. Charlie Hoare was around at the time of the transputer, the original many-core processor architecture. He identified then that the critical issue is the minimization of communication between cores. Read it here http://www.usingcsp.com/cspbook.pdf Charlie’s ability to mathematically prove his theories is intriguing and valuable.

    There is also some more recent work in tooling by a company called Acumem (Sweden) who got bought by Roguewave. They focussed in on data as the bottleneck, specifically data creating a memory bandwidth wall. They created an interesting tool called SlowSpotter that identifies code that creates such slowspots, because they have a fundamental understanding of how code, data and caches interact. You may find working with them useful as their insights are intriguing. I helped them launch their product and wrote a whitepaper for them. So if needed i can introduce you to their founder.

    I’ll also say that I know for sure that there is a European funded research program looking to use DDS idea’s and concepts to implement communication between processors. Although their focus is more around safety critical systems. Unfortunately I cannot give more details due to NDA.

    Good luck with your research.

    Like

  2. @Geoff, Thanks for the links and the wishes. CSP is certainly relevant. I think a good modern example of that would be the Go language developed by Google. I’m not sure to what extent the language captures the CSP formalism. I’m beginning to explore Actors before CSP because I think that’s the only model that deals with concurrency and distribution at the same time. ThreadSpotter from Rogue Wave also looks interesting. Prof. Erik Hagersten has written extensively on caches. I’ll look into it as I explore further.

    Like

  3. Yes prof Hagersten was the key founder at what was Acumem. Nice guy and very knowledgeable. Threadspotter is what their tool became known as…

    Like

Submit a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s