For all you late Friday night workers ... A CSP MODEL FOR JAVA MULTITHREADING =================================== Here is a draft CSP model for the sync/wait/notify primitives of Java. It's not quite as bad as I feared! This is as it should be. The Java primitives are not that hard to explain informally ... so the same should be true about any formal explanation. I would be grateful for feedback on anything wrong with this draft. We start with the ending ... CONCLUSIONS: ============ Assuming this model holds water, it gives us a formal semantics for Java multithreading. Despite all the problems encountered and warnings circulated on this topic, I've seen little concern in the Java community about the absence of such a thing. But without it, we will always remain uncomfortable about the security of any product that uses more than one thread of control. The particular semantics given here is a *CSP* one. The importance of the *CSP* is that it is an algebra for concurrent systems -- a formal piece of mathematics with which we can specify requirements precisely (including things like deadlock-freedom) and prove that our implementations satisfy them. Further, some powerful and mature CSP tools (e.g. FDR from Formal Systems Ltd. and Jeremy Martin's deadlock/sat checker) can be applied. What this means is that *any* multithreaded Java code -- not just code using the JCSP or CTJ libraries that give direct access to CSP primitives -- becomes amenable to formal and (partly) automated analysis. If this is of no interest to the Java community, then we will all be in for a lot of trouble. Java designers, implementors, testers and maintainers are running scarred of its multithreading primitives, but applications force their use. In future, if not already, finance-critical and safety-critical applications will be multithreaded. A CSP model of multithreading puts this on a solid engineering foundation. Noone can ignore that. Anyone doing so would, at the very least, be exposed to a risk of litigation should a messy accident get traced back to a system failure arising from some race condition: "So, Bill, you sold your system without really knowing whether it was deadlock-free ... and you never even tried these standard procedures that might have found out?!! Disclaimer notices notwithstanding, would you say that that shows a lack of due care or a lack of diligence towards your customer?" For example, it should now become possible to build a CSP model of the erroneous ALT algorithm: http://www.hensa.ac.uk/parallel/groups/wotug/java/discussion/4.html together with the stress test (described in the "weekend" posting), apply Jeremy's deadlock checker and watch it quickly pop up a trace that exposes the deadlock. Wow -- I wish we had the chance of this a couple of years ago! Of course, we had better repeat this for the `corrected' ALT algorithm and see if anything nasty pops up! Assuming it doesn't ;-), we can move on to *prove* that it is a correct implementation of the ALT. Most Java multithreaded code using the sync/wait/notify primitives -- including my own code -- looks suspicious. As Tony Hoare said in his 1980 Turing Award speech, there are two kinds of computer systems that sell: o those that are obviously correct ... o and those that are not obviously wrong ... and he noted that it's much easier, of course, to produce the latter. Guess which kind we are peddling! I wonder how many surprises will pop up when we start applying CSP tools to our Java codes? SYNTAX: ======= This model is built with the version of CSP in Hoare's book -- so it'll be a little old-fashioned I'm afraid. Also, I don't know the proper ASCII syntax for CSP so I'm making up my own. Subscripts: "A-subscript-i" will be written in array notation as "A[i]" Replicated operators: Performing a external choice (say) indexed over some set - e.g. [] event[i] --> P i in X will be written in one line: [] (event[i] --> P | i in X) Alphabets: The set of events in which a process, "P", is interested is "alpha (P)" OBJECTS AND THREADS: ==================== Let "Objects" be an ennumeration of all Java objects. For any particular Java application being CSP-modelled, this can be restricted to just those objects on which any threads in the application are ever `synchronized'. Usually, this will be finite and small. Let "Threads" be an ennumeration of all Java threads. For any particular Java application being CSP-modelled, this can be restricted to just those threads that are created and started. Sometimes, this may be unbounded or large. SYNCHRONIZED: ============= Let "claim" and "release" be the sets of events: claim = {claim[o][t] | o in Objects, t in Threads} release = {release[o][t] | o in Objects, t in Threads} Let "LOCK" be the set of processes: LOCK = {LOCK[o] | o in Objects} Each "LOCK[o]" process is simply a binary semaphore -- i.e. alpha (LOCK[o]) = {claim[o][t] | t in Threads} union {release[o][t] | t in Threads} LOCK[o] = [] (claim[o][t] --> release[o][t] --> LOCK[o] | t in Threads) So, once a "LOCK[o]" has been claimed by a thread, only a "release[o]" from that same thread will release it. All other events from the alphabet of "LOCK[o]" will be refused. A Java "synchronized" block: synchronized (o) { P } is modelled by: (claim[o][me] --> P) ; (release[o][me] --> SKIP) where "me" is the index of the thread executing this code. The body of the block, "P", is (recursively modelled as) a process. Notice that when there are competing processes, it is undefined which one gets the lock. The "LOCK[o]" process makes no promises as to how competing claimants are managed (e.g. in a FIFO). This is as defined (or, rather, undefined) in Java. That was easy! Actually, that was too easy :-( ... Java allows a lock-owning thread to re-claim/release the lock `atomically' (e.g. when a "synchronized" method invokes another "synchronized" method on the same object). Modelling such a stack of claims/releases in the "LOCK[o]" process would be complicated by having to deal with a lock-owning thread letting it go via a "wait" -- since it must re-claim the lock (if and when it is `notified') at the same stack level and any number of threads, at different stack levels, could have locked it in the interim). So let's leave "LOCK[o]" alone and model the "synchronized" block in the way any sensible implementation would do. Each thread (index "me") maintains a count, "count[o][me]", for any object, "o", on which it may be "synchronized". Each "count[o][me]" is initialised to zero. Then, excusing the occam-style syntax, the Java "synchronized" block: synchronized (o) { P } is modelled by: IF count[o][me] = 0 (claim[o][me] --> P') ; (release[o][me] --> SKIP) count[o][me] > 0 P' where, deeply apologising for the "++"/"--" (but it then fits on one line): P' = count[o][me]++ ; P ; count[o][me]-- i.e. the thread doesn't bother to synchronise on any claim/release events if it already has the lock! WAIT AND NOTIFY: ================ Let "wait-a", "wait-b" and "notify" be the sets of events: wait-a = {wait-a[o][t] | o in Objects, t in Threads} wait-b = {wait-b[o][t] | o in Objects, t in Threads} notify = {notify[o][t] | o in Objects, t in Threads} Let "WAIT" be the set of processes: WAIT = {WAIT[o] | o in Objects} where: alpha (WAIT[o]) = {wait-a[o][t] | t in Threads} union {wait-b[o][t] | t in Threads} union {notify[o][t] | t in Threads} WAIT[o] = WAIT[o][0] where: alpha (WAIT[o][n]) = alpha (WAIT[o]), for all n >= 0 WAIT[o][0] = ([] (wait-a[o][t] --> WAIT[o][1] | t in Threads)) [] ([] (notify[o][t] --> WAIT[o][0] | t in Threads)) and where, for n > 0: WAIT[o][n] = ([] (wait-a[o][t] --> WAIT[o][n + 1] | t in Threads)) [] ([] (notify[o][t] --> [] (wait-b[o][s] --> WAIT[o][n - 1] | s in Threads) | t in Threads)) So, there is a "WAIT[o]" process for each object "o". Initially, this has a count field (represented by a second index to "WAIT") set to zero. This count holds the number of threads that are executing a Java "wait ()" on the object "o". The Java code: o.wait (); is modelled by: release[o][me] --> wait-a[o][me] --> wait-b[o][me] --> claim[o][me] --> SKIP where "me" is the index of the thread executing this code. Note that these operations happen regardless of the value of "count[o]" and do not change it. One of the constraints in Java is that an "o.wait" (or "o.notify") can only be invoked by a thread currently holding the monitor lock for "o" -- i.e. from inside a "synchronized (o)" block. If a thread violates this constraint, the above model for "o.wait ()", running in parallel with "LOCK[o]", means that that invoking thread will be permanently blocked (i.e. deadlocked). This seems a suitable punishment (and is one that should be detectable by CSP analysis tools). The first "release[o][me]" in this sequence releases the "LOCK[o]" semaphore. Assuming the application has not violated the above constraint, this will definitely happen since "LOCK[o]" will be waiting for that "release[o][me]" (which only this thread is going to send). Next, the "wait-a[o][me]" in this sequence is always acceptable to the waiting "WAIT[o][n]" process and just increments its internal count. Note that no guarantee is made that the "WAIT[o][n]" process *will* ever accept that "wait-a[o][me]". Just as in Java, no guarantee is made that a waiting thread will ever be noticed and, subsequently, released. Assuming the "wait-a[o][me]" has happened, the thread now blocks on the "wait-b[o][me]" event. This cannot take place until the "WAIT[o][n]" process receives a "notify[o][t]". See below. Finally, this thread must compete, "claim[o][me]", against all its rivals to reclaim the "LOCK[o]". The Java code: o.notify (); is modelled by: notify[o][me] --> SKIP where "me" is the index of the thread executing this code. Note that this does not involve releasing and, subsequently, re-claiming the "LOCK[o]", which is as per the (till now unformalised) rules of Java. For the moment, the model does allow an "o.notify ()" to be invoked without holding that "LOCK[o]" -- but this is dealt with in the next section. This "notify[o][me]" is always acceptable to a "WAIT[o][n]" process. If "n" is zero, there are no threads waiting on "o" and the notification is accepted but ignored. This is as required by Java. If "n" is more than zero, there is at least one thread waiting on "o". Any thread, "s", waiting on "o" will be trying to synchronise on "wait-b[o][s]". So, when "WAIT[o][n]" accepts a "notify[o][t]", it will accept one of those "wait-b[o][s]" events. If more than one is on offer, an arbitrary choice is made. This is in line with Java's (informal) rules -- no promises are made about `fair' releases of waiting threads upon a notify. After accepting one of the "wait-b[o][s]" events, the "WAIT[o][n]" decrements its count. The lucky thread "s", that has just been released from its "wait-b[o][s]", now has to re-acquire the lock on "o" -- see the final "claim[o][me]" in the "o.wait ()" model. No special privileges are made for getting this lock back ... it has to compete with all other threads trying to claim it ... hence the "What, no chickens?" difficulty! This sad behaviour is properly captured by this CSP. An important observation is that "WAIT[o][n]" can never get stuck waiting for a "wait-b[o][s]" -- one, at least, will always be available. Hence, we can guarantee that "WAIT[o][n]", for *some* value of "n", will always be able to service any "wait-a[o][t]" or "notify[o][t]" events. This is what is meant be the phrase "is always acceptable" used twice above (and below). The above model does not enforce that a "notify[o][me]" will always be accepted -- just that it is always acceptable. A Java thread executing an "o.notify ()" probably should not block indefinitely, but should always terminate having released a waiting thread (if one exists). However, some synchronisation *must* be involved because data structures shared with the "o.wait ()" method must be inspected and updated. It may be that that synchronisation is `fair' and guarantees eventual servicing of the "o.notify ()" ... but I've not seen that promise made in the (informal) Java specification? Given that no promises are made with respect to entry to a "synchronized" block or eventual release from a "wait", such a promise would be inconsistent. If special versions of Java emerge (e.g. for real-time applications) that *do* make promises of eventual entry to "synchronized" blocks and release from "o.wait ()" methods, we will have to model them with an extended CSP that captures the notion of `eventual' or `fair' servicing of events. Adrian Lawrence's CSPP, which addresses the notion of priority, would be able to do this. MONITOR RULES ON SYNC AND WAIT/NOTIFY: ====================================== A Java rule, assumed by the above CSP model, is that "o.wait ()" and "o.notify ()" can only be invoked by threads that currently hold the lock on object "o". This rule is enforced by the Java run-time system. Java applications breaking this rule may compile, but an exception will be thrown at run-time. For checking out Java applications that do not break this rule, the above model is sufficient. However, if we are concerned, the constraints are trivial to model in CSP. As already described, the rule for the "o.wait ()" is already imposed by the current model. To get the rule for "o.notify ()", the simplest way is to modify the "LOCK[o]" process to include all the "notify[o][t]" events as well. Then, it can refuse any `notify' from a thread that has not claimed it ... and accept a `notify' from a thread that has: alpha (LOCK[o]) = {claim[o][t] | t in Threads} union {release[o][t] | t in Threads} union {notify[o][t] | t in Threads} LOCK[o] = [] (claim[o][t] --> LOCKED[o][t] | t in Threads) where: alpha (LOCKED[o][t]) = alpha (LOCK[o]) LOCKED[o][t] = (release[o][t] --> LOCK[o]) [] (notify[o][t] --> LOCKED[o][t]) NOTIFY ALL: =========== To complete the model of Java thread synchronisations, we need to capture the "o.notifyAll ()". The meaning of this is to release *all* threads that are waiting on the "o" -- i.e. have executed an "o.wait ()". If no threads are waiting, that's fine and nothing happens as a result of the invocation. This is easy: define a set of "notifyAll[o][t]" methods, add the relevant subsets to the alphabets of "LOCK[o]" and "WAIT[o]" and make the obvious changes. The "LOCK[o]" extension prevents a "notifyAll[o][t]" when thread "t" does not have the claim. The "WAIT[o]" extension releases all currently waiting threads -- no new threads will be able to execute an "o.wait ()" until this has completed. The complete model is presented in one piece in the next section. SUMMARY OF THE MODEL: ===================== Sets (Ennumerable): ------------------- Objects, Threads Events: ------- claim = { claim[o][t] | o in Objects, t in Threads} release = { release[o][t] | o in Objects, t in Threads} wait-a = { wait-a[o][t] | o in Objects, t in Threads} wait-b = { wait-b[o][t] | o in Objects, t in Threads} notify = { notify[o][t] | o in Objects, t in Threads} notifyAll = {notifyAll[o][t] | o in Objects, t in Threads} `JVM' Processes: ---------------- There are two processes for each Object: LOCK = {LOCK[o] | o in Objects} WAIT = {WAIT[o] | o in Objects} where: alpha (LOCK[o]) = { claim[o][t] | t in Threads} union { release[o][t] | t in Threads} union { notify[o][t] | t in Threads} union {notifyAll[o][t] | t in Threads} LOCK[o] = [] (claim[o][t] --> LOCKED[o][t] | t in Threads) and where: alpha (LOCKED[o][t]) = alpha (LOCK[o]) LOCKED[o][t] = ( release[o][t] --> LOCK[o])] [] ( notify[o][t] --> LOCKED[o][t]) [] (notifyAll[o][t] --> LOCKED[o][t]) Also: alpha (WAIT[o]) = { wait-a[o][t] | t in Threads} union { wait-b[o][t] | t in Threads} union { notify[o][t] | t in Threads} union {notifyAll[o][t] | t in Threads} WAIT[o] = WAIT[o][0] where: alpha (WAIT[o][n]) = alpha (WAIT[o]), for all n >= 0 WAIT[o][0] = ([] (wait-a[o][t] --> WAIT[o][1] | t in Threads)) [] ([] (notify[o][t] --> WAIT[o][0] | t in Threads)) [] ([] (notifyAll[o][t] --> WAIT[o][0] | t in Threads)) and where, for n > 0: WAIT[o][n] = ([] (wait-a[o][t] --> WAIT[o][n + 1] | t in Threads)) [] ([] (notify[o][t] --> [] (wait-b[o][s] --> WAIT[o][n - 1] | s in Threads) | t in Threads)) [] ([] (notifyAll[o][t] --> RELEASE[o][n] | t in Threads)) and where: alpha (RELEASE[o][n]) = alpha (WAIT[o]), for all n >= 0 RELEASE[o][0] = WAIT[o][0] and where, for n > 0: RELEASE[o][n] = [] (wait-b[o][s] --> RELEASE[o][n - 1] | s in Threads) Counts (could be processes): ---------------------------- count = {count[o][t] | o in Objects, t in Threads} User processes: --------------- Each started thread in an application/applet/whateveret is a process. These processes form the ennumerated set: USER = {USER[me] | me in Users} where Users is a subset of (all Java) Threads. The alphabet for USER[me] is: alpha (USER[me]) = { claim[o][me] | o in Objects} union { release[o][me] | o in Objects} union { wait-a[o][me] | o in Objects} union { wait-b[o][me] | o in Objects} union { notify[o][me] | o in Objects} union {notifyAll[o][me] | o in Objects} Each "USER[me]" must initialise "count[o][me]" to zero, for each "o" in Objects. A Java "synchronized" block: synchronized (o) { P } is part of the life of "USER[me]", for some "me", and is modelled by: IF count[o][me] = 0 (claim[o][me] --> P') ; (release[o][me] --> SKIP) count[o][me] > 0 P' where: P' = count[o][me]++ ; P ; count[o][me]-- The Java code: o.wait (); is modelled by: release[o][me] --> wait-a[o][me] --> wait-b[o][me] --> claim[o][me] --> SKIP The Java code: o.notify (); is modelled by: notify[o][me] --> SKIP The Java code: o.notifyAll (); is modelled by: notifyAll[o][me] --> SKIP A Complete System: ------------------ A complete Java multithreaded system is just the parallel composition of all the `JVM' and user processes: (|| (WAIT[o] | o in Objects)) || (|| (LOCK[o] | o in Objects)) || (|| (USER[t] | t in Users)) CONCLUSIONS: ============ See up-front! Peter Welch. (26th. March, 1999)