Now that I have explained what I mean by real-time multi-process apps, I will go into some detail of what a framework needs to provide to develop them.
To coordinate multiple processes, a framework has to
- use a communication protocol (data transport technology) because the processes may exist on separate hosts
- enable data to flow between the processes
- provide support for invoking behaviour on the processes
Data Transport Technology
The choice of data transport technology is important. Two core requirements that need to be satisfied are
- it has to be fast because the aim is to be as close to real-time as possible
- it has to be resilient because data cannot be lost
If you want data to be transferred quickly a good starting point is to ensure that you only send the data that is absolutely necessary, so it is wise to choose a transport that minimises the use of meta data/headers. This means the choice becomes restricted to TCP/IP, UDP or any of the layers below them.
In industry I have never seen the lower layers used directly in an application, they are simply too low level. However I have seen TCP/IP and UDP used, sometimes even together. I don’t think the choice of either one is obvious but in the interest of reducing complexity I will discount a hybrid solution for now. Comparing the two does help rationalise the choice:
|Throughput (in a perfect network)||Very fast||Very fast (theoretically faster than TCP/IP)|
|Data integrity||Guaranteed due to packet acknowledgement.||Lossy. Retransmission logic needs to be implemented.|
|Network stress||Contained||Potentially intense|
TCP/IP satisfies the requirement of guaranteed data at the expense of throughput (in a perfect network). It is also unicast (point to point) which means that the stress it puts on the network is contained. However, in situations where you do want to broadcast data, the logic has to be implemented.
UDP on the other hand is faster because no packet acknowledgement takes place, however it makes situations where you want to target data to a single consumer difficult. Also retransmission logic has to be coded to ensure data integrity and this is not easy because if you stick to UDP, retransmitting data means even consumers that haven’t missed packets, receive them again. These may simply ignore duplicate packets but they still need the logic to do this. A single slow or faulty consumer that continuously requests retransmissions can result in putting the network and applications under intense stress because they get flooded with data, all of which needs to be processed. Processing a lot of data ties up resources which can cause more retransmission requests, exacerbating the problem. Ultimately this becomes an ever increasing spiral that cannot be sustained. In a stressed network, UDP may be a contributing factor and will most likely perform slower than TCP/IP. I have seen major outages in well managed, high end networks happen because of this problem.
Data Flow Between Processes
For information to be exchanged between processes the framework needs to support publishing and consumption of data. By doing this in a systematic way, a single implementation can be used and putting some rules around the data, simplifies the problem. I’ll call data that follows these rules, a record:
- a record has a unique ID and version
- a record is homogenised into a standard structure (e.g. a set of key-value pairs)
- a record is immutable
- records with the same ID but different version will contain different data
Records defined this way bring order to a potentially chaotic landscape. It also means that optimisations can be put in place that will reduce transfer speed, so that it is real-time, and the network doesn’t become flooded.
The aim is to design consumers of data (clients) so that they receive the minimum amount of data requested. This is possible when a client receives all the key-value pairs in a record that it requests (or joins) and then only the changes (or deltas) for the record, on subsequent updates. I call this approach image on join, followed by deltas. It requires the client to keep a local copy of a record and then merge changes into it for each update. In this approach it is also a good idea to detect out of sequence/missed updates so that a client can resynchronise the data if it needs to.
Controlling Behaviour Using RPCs
In the above section you may have noticed the idea of clients performing actions by requesting or resynchronising data, this implies controlling the behaviour of a record publisher (service) from the client end i.e. remotely. This is made possible with remote procedure calls (RPCs). When clients request a record they need to invoke an RPC on a service asking for a record with a given ID and usually for any subsequent updates.
Summary And Conclusions
The above sections describe minimum requirements of a framework for real-time multi-process applications. Due to their complex nature, services and clients are loosely coupled and will rely on heartbeats and timeouts to infer the state of the various processes, also interactions between them will be asynchronous. A framework should seek to conceal this complex nature with a sensible API. This is possible to do only by applying rules to data, service behaviour and client behaviour.
In ClearConnect, records, services and clients have been defined and you are provided with a complete toolbox for creating your own. The API has a prevalence of callback methods which are necessary to support the asynchronous nature. We also ensured that the implementation works out of the box with minimal configuration and added utility methods that give a synchronous feel to many operations. Our higher order API makes use of our own utilities to really simplify interactions between services and clients so that you can focus on your application’s business logic. Finally we ensured that there is complete visibility of your real-time multi-process apps with essential-only logging and tooling that allows visualisation all records, service and clients.