[Blog] Settling the Time- vs. Event-Triggered Debate
In a previous blog post, we briefly came across the differences between Event- and Time-Triggered approaches when it comes to real-time systems design. In this post, we want to focus more extensively on the differences between these approaches, and explain why the time-triggered paradigm (which is the foundation of the ASTERIOS technology) is better suited to design complex, safety-critical real-time systems. Therefore this post intends to be less practical, more academic than the previous ones – before going back into the depths of the ASTERIOS technology in future publications.
Let us recall the main principles for both approaches:
- an event-triggered system reacts asynchronously to the occurrence of events in the system, such as external interrupts (e.g. when a network device receives a packet), or the release of a shared resource (e.g. release of a mutex/semaphore/lock of any kind). The system may typically “react” by suspending a task, and running another one in response to that event;
- conversely, a time-triggered system only reacts on timer interrupts. In addition, the observable state of the system can only change synchronously at specific, physical dates, which are known prior to execution.
Thus, a common consideration on time-triggered systems is that “(almost) everything is known prior to execution” – which in all is slightly exaggerated. In a time-triggered system, the developer specifies beforehand when will the system interact with its environment; conversely, in an asynchronous, event-triggered system, interaction happens virtually all the time. Thus, the time-triggered paradigm actually drastically reduces the overall combination of possible states of the system during its lifetime. We’ll show next how this reduced complexity naturally induces “good” properties for the system; then we’ll focus on the drawbacks of this approach, and how they can be leveraged.
Time-Triggered and Correctness by Construction
The time-triggered approach enables to provide guarantees by construction on the behavior of the system. These guarantees may typically answer to safety requirements:
The first property guaranteed “by default” in time-triggered systems is liveness: no deadlock or task starvation may occur, since the activation of a task is only dictated by the time passing by. In comparison, proving the liveness of a multi-task event-triggered system is intractable in most cases (once again, due to the fact that there are a drastically larger number of cases to consider).
In a time-triggered system, resource sharing between tasks can be performed without lock, using time division; such a design is only possible because the developer is able to know prior to execution which task can be executed at any given time. For example, the figure below illustrates the sharing of a resource between two time-triggered tasks using time division:
- the first timeline symbolizes the reservation of the resource over time, as specified by the system designer;
- the next two timelines use blue and green rectangles to symbolize an actual admissible scheduling of both tasks on a single-core processor. Note that both tasks can safely be preempted while holding the resource;
The system designer is able to formally ensure prior to execution that no two tasks may ever simultaneously access a shared resource; this demonstration can be performed either manually, or automatically. In comparison, in a preemptive, event-triggered system, the resource would have to be protected with a lock (mutex or semaphore), in order to create critical sections; this would however induce an additional unpredictable synchronization time for both tasks, and in the worst case scenario (multiple locks, multiple resources, and erroneous design), to a deadlock.
Delay and Temporal Sizing
The time-triggered paradigm also enables to formally bound the worst-case response time of the system (i.e. between an input stimulus and the expected output), a parameter of paramount importance for an Instrumentation & Control loop. Once again, this boundary can be computed very easily, since the longest task activation sequence is known prior to execution, and no runtime event can ever interfere with it.
In an event-triggered system, enforcing such a bound with a 100% confidence typically requires to use high-priority routines; but then a problem arises when the event-triggered system must execute multiple tasks with different response time constraints. This usually leads to complex, multi-priority system architectures, with implicit dependencies between tasks, prone to malicious issues such as priority inversion.
A corollary issue is the monitoring of tasks actual execution time. In an event-triggered system making heavy use of mutexes, semaphores and critical sections, the CPU time needed by a task to complete is difficult to bound with confidence, as it heavily depends on the synchronizations performed with other tasks.
…and much more
The time-triggered paradigm is a foundation enabling the emergence of complete, formal, multi-task real-time programming models. Based on this principle, the ASTERIOS programming model offers additional guarantees, that would be costly to achieve with an event-triggered design. Most noticeably, communication determinism can be achieved with a synchronous communication pattern, as we’ve shown in detail in our post on ASTERIOS communication paradigm. This property drags reproducibility and thus better testability, as the system is guaranteed to react exactly in the same manner, when submitted to the same set of inputs.
In all, combining a synchronous communication paradigm with the fact that the number of admissible states of a time-triggered system remains bounded and tractable (either manually or by a tool), these systems are inherently more predictable than asynchronous ones. In particular, they are more robust to unexpected external events: they are for instance obviously insensitive to interrupt storms, whereas an event-triggered system requires additional monitoring to prevent itself against them.
About the Usual Pitfalls of the Time-Triggered Paradigm
A common misconception about the time-triggered paradigm is to think that, because task activation dates must be known at compile time, only periodic tasks models are admissible. Although many time-triggered programming model are indeed limited to periodic tasks (some of which can be enabled or disabled during execution), there is no theoretical limitation for this. Thus, not only does the time-triggered programming model of ASTERIOS enable aperiodic tasks, but the temporal behavior of each task may even be dynamically modified during execution (within a finite set of possible temporal behaviors).
As in any engineering field though, there is no silver bullet for real-time systems either. Because they can only change state at discrete, given time dates, time-triggered systems have a reputation of being “slower”, or at least less reactive than event-triggered systems. In addition to this, synchronous communication paradigms tend to increase the end-to-end latency within the system, in exchange for determinism.
Luckily, there are multiple ways to cope with these drawbacks. With ASTERIOS, we enable for instance to create fast “jobs” to periodically poll for new input data, and instantaneously feed computational tasks with it at minimal cost. If all the processing requires to be timed by the occurrence of an external event that comes with a certain periodicity (such as the angular position of a crankshaft, for example), we can even schedule the entire application on that particular event, rather than on a real-time timer interrupt – and still provide the same guarantees exposed above. And last but not least, we also have come up with a new synchronous communication paradigm that enables “immediate” communication between tasks, even across multiple cores, while still preserving determinism and modularity.
We will come back more extensively on the answers we bring to these issues in upcoming posts.
Conclusion – Where We Stand
In a sense, the event- vs. time-triggered paradigms debate could be transposed to the one between imperative vs. functional programming paradigms. The first paradigm is the closest to the way the machine actually works, and thus has historically been taught to generations of students in computer science. The second paradigm however aims at letting the programmer focus on what he actually wants to do: for functional languages, that’s focusing on the algorithm rather than the code; for real-time systems, that’s focusing on the real-time constraints of each task, rather than programming interrupt routines and priorities. The comparison even goes further, as in both cases, the new paradigm offers additional guarantees at compile-time.
We believe that among the many event-triggered real-time systems that have been developed in the past decades, using the event-triggered paradigm was not an actual design choice, but was simply due to the fact that no industrial solution offered an alternative. And don’t get us wrong: for many types of applications, event-triggered is just simpler! It can be designed and implemented quite easily, and if the system may tolerate an occasional delay, or even a reboot if things go seriously wrong, it may be the most cost efficient way to go (at least on a short term basis). However, for systems that either have “hard” real-time constraints, or provide safety-critical features, or simply have multiple-scale real-time constraints, and need to be modular to adapt to future evolutions (adding new tasks, changing the communication scheme, switching to multi-core architectures, etc.), we believe that the time-triggered approach is definitely the right way.
In addition to the key benefits we’ve exposed in this post, ASTERIOS leverages the usual rigidities that come with the time-triggered paradigm by allowing complex temporal behaviors within a single task, and offering an original and modular way to radically reduce the end-to-end delays when it comes to task communications, as we intend to show in the upcoming posts.
About the author
Emmanuel Ohayon is a Software Architect at KRONO-SAFE since 2014; he has contributed to the roots of ASTERIOS technology (compiler and generic part of the Real-Time Kernel) as head of the Core Team. Currently leads some dark R&D secret projects that aims to make ASTERIOS rule the World of RTOSes, in null-latency. Loves to speak of himself in the third person. Before KRONO-SAFE, he was a Research Engineer at CEA (French Alternative Energies and Atomic Energies Commission).