I have explained what real-time multi-process apps are and have gone through the core requirements for them. Now I’m going to explain how our java framework, ClearConnect, satisfies the requirements and solves common problems.
Data Transport Technology
We chose a point to point design so that network stability issues associated with data multicasting are avoided. The default implementation is TCP/IP (TCPChannel) however it wasn’t long before we came across the need to use solace. So we enhanced our code to support pluggable transport technologies and developed a solace implementation that preserves the point to point semantics.
Records (Data On ClearConnect)
Data that is on ClearConnect has been homogenised into simple but flexible records, so all clients (record consumers) and services (record publishers) that are on ClearConnect ‘speak the same language’. The records contain key-value pairs with a maximum depth of 2. In other words, a record can have a value that is another set of key-value pairs (a sub-map) but the sub-map cannot contain other sub-map values. The record keys can only be text and the values can be any type in the table below:
Type | Java Type | Description |
---|---|---|
TextValue | String | Any arbitrary text. |
LongValue | Long | A whole number, more precisely a 64-bit two’s complement integer. |
DoubleValue | Double | A decimal number, more precisely a double-precision 64-bit IEEE 754 floating point. |
BlobValue | byte[] | Any object serialized into an array of bytes. |
A sub-map | Map | A map that supports text keys and values that can be a TextValue, LongValue, DoubleValue or BlobValue. |
Our keep-it-simple approach to records means
- they are flexible enough to model objects up to 2 dimensions (like a spreadsheet or database table)
- you are not faced with a varied and confusing choice of types (e.g. a whole number can only be represented by a LongValue)
- it is restrictive enough to force a considered and rational approach to data modelling
Real-Time Approach
We put a lot of effort to ensure that ClearConnect is fast since a key feature is that it is real-time. On ClearConnect, records are published using a threading model that is optimised for speed and it automatically scales with increasing number of CPU cores. We also developed an optimised algorithm for transferring records which uses the image on join, followed by deltas approach that I have already discussed. Finally we have provided a choice of codecs for encoding records before transfer and decoded them at the other end.
These features result in a very efficient and fast way of sharing data while presenting options for scaling and/or optimising further based on the kind of data that is prevalent in your system. In order to ensure that the performance of ClearConnect does not degrade as we enhance and add features, we have a network throughput test which is used to benchmark our releases against each other. Also we have other performance metrics that show a latency of 9 microseconds for an average message size of 134 bytes, on an Intel core i5 machine. These are discussed in more detail on our code wiki.
High Availability
High availability is another key feature for a real-time multi-process app, the framework has to be resilient to one or more processes becoming temporarily or permanently unavailable. In ClearConnect a service can run in one of two available redundancy modes: fault tolerant and load balanced. Both rely on multiple instances (processes) to be started for a single service.
In fault tolerant mode, one instance is the primary, active one and the others are passive, warm stand-bys. If the active one becomes unavailable, one of the warm stand-bys takes over and all clients re-sync with it. No records are lost.
In load balanced mode, each instance is actively participating in record transfer and are therefore sharing the load. Service instances can be removed or new ones added at runtime without causing any disruption. This means you can scale your real-time multi-process app without stopping it, thereby delivering an uninterrupted service.
In ClearConnect, redundancy modes are possible because we have a single service (the registry) that coordinates connections between service and clients. The registry itself runs in fault tolerant mode and any services/clients of a running system only depend on it while they establish their connections. This means that the registry can be stopped completely without any effect on existing components. However when no registry service is available, new connections cannot be established.
Operational Use
From an operational point of view, development and support teams will need to be able to quickly see and diagnose issues. For this reason we supply appropriate tooling so that ClearConnect is completely transparent.
Our PlatformDesktop UI is fully featured and has the ability to show services, RPCs, clients, connections and records. It is what is used to check the health of the system as a whole and allows you to drill down into problem areas. These can be analysed in more depth by examining the logs for the problematic service or client.
Our logging has been optimised for speed and it is asynchronous so it will not block normal operation. ClearConnect logs have essential-only entries, by this I mean that the logs contain all information that is needed to diagnose issues but nothing more. We meticulously went though our log statements to ensure there is no redundant or distracting information.
Adoption
Greenfield IT projects are rare. Most projects are about enhancing an existing system, so we wanted to make sure that ClearConnect can be easily adopted. We have used the following approach for this:
- No dependency: the platform just needs Java to run so you cannot get into the situation where you have conflicting dependencies. Also you don’t need to install anything other then the platform itself.
- Convention over configuration: to start a platform service you don’t have to configure anything because default values are used that can be overridden if required.
- Discoverability: services on the platform advertise themselves and their RPCs. The code provides appropriate callbacks for services and their RPC’s coming on-line so it is easy to discover them.
The Overall Solution
From a developer point of view the solution that the Fimtra platform offers is quick to start using and easy to pick up. Many of the difficult to solve problems are taken care of, so designing and implementing apps becomes easier. You can focus your efforts on the application’s core business logic with the knowledge that data transfer and service management is efficient with rich functionality.
In an operational environment ClearConnect is very fast, transparent and has tooling that makes detecting early signs of a problem possible. ClearConnect services are resilient to restarts by automatically and seamlessly reconnecting and resuming data exchange. Also there are various options for scaling your apps; you can use hardware with an increased number of cores, add more load balanced nodes, use a tailored codec or use a faster transport technology.