Concurrency: Race Conditions and Deadlocks
Why Is Concurrency Important?
Engineers are increasingly using multicore architectures and processors for various high-integrity embedded applications such as military avionics or automotive systems, beyond the traditional embedded applications such as networking and communication systems. This trend is being driven in part by the continuing rise in performance requirements for applications such as ADAS and V2V communication. To take advantage of such architectures, you would need to consider multithreading. This brings a new set of complex design and verification challenges, especially when ensuring robustness and safety of such systems. Some of these challenges are present even in systems that only utilize a single core for multitasking.
What Are the Challenges?
One of the most complex challenges of concurrent programming lies in the unpredictability of the interactions between the concurrent threads. This is because thread interleaving is nondeterministic. To test any such system leads to a combinatorial explosion in the number of test cases. This leads to another key challenge: Test cases that uncover bugs, but fail to reproduce the issue because there is no way to determine and repeat the exact sequence of events. This makes it difficult to trace the control and data flow to identify the root cause, and it becomes expensive and time consuming to debug a concurrency issue. Consequently, there have been real life scenarios where software verification and testing have missed a concurrency bug.
Even in single-thread embedded applications, you see some of these complexities play out in the asynchronous interaction with the external sensory inputs in the form of interrupts. When multiple tasks access shared resources such as global variables, you see a new class of bugs, such as race conditions, data races, and deadlocks.
A classic example of a race condition is the scenario where two clients modify the same resource on a server concurrently, as in the case of a simultaneous bank withdrawal. Refer to the concurrent computing wiki section to read more about this example.
The Polyspace Static Analysis Solution
Polyspace® tools leverage the formal methods technique of abstract interpretation for semantic analysis. The Polyspace engine can statically detect concurrency issues and trace the control and data flow to help debug the root cause of an issue. In the following example, you end up with a deadlock as two tasks (bug_deadlock_task1 and bug_deadlock_task2) are waiting on the other to enter the same critical section (acquire_sensor()). The event trace highlights the order of instructions that results in the deadlock.
Refer to the complete list of defects that Polyspace Bug Finder™ can detect. Another emerging concern with respect to concurrency bugs is their impact on security of an embedded system. Security guidelines such as CERT C and CWE accord a high priority to concurrency defects, such as race conditions.
Polyspace Green – the Absence of Race Conditions
Among the concurrency defects, race conditions are probably the most pernicious. Race conditions have been responsible for some serious run-time error failures over the years, such as the Therac-25 radiation therapy machine. Researchers published a detailed investigation of the run-time error failure that was responsible for the deaths of several patients.
Polyspace Code Prover™ is a sound static analysis tool that can help with such complex nondeterministic run-time failures because it exhaustively verifies all possible interleavings. It also highlights potential race conditions by considering the worst-case scenarios. More importantly, it verifies when the shared resources are appropriately protected by critical sections (by highlighting them in green).
For example, if you have shared global variables used by multiple threads, Polyspace Code Prover not only calls out potential race condition scenarios (highlighting them in orange), but it also provides the specific sequence of events (interleaving) in the form of the event trace.
In the following example, the global variable PowerLevel is highlighted in orange, because it is not protected against concurrent access by multiple tasks (server1, server2, and tregulate). The sequence of read and write operations on variable and the corresponding changes to the data ranges in conjuction with the event trace is valuable in debugging such race conditions. Polyspace Code Prover further provides the detailed control flow in the form of concurrent access graph highlighting the sequence of tasks and function calls as shown below.
Using Polyspace products, you can detect and debug concurrency defects and also verify the reliability of multitasking operations to ensure that your software is robust. An example of this is analyzing shared global variable usage, where Polyspace products can provide you with detailed insight and generate artifacts that help in the development and verification activities for multitasking applications.
To learn more about Polyspace products, contact the authors.