-.- The 1st intelligent bug-search engine for model checking by APPROVER = The 1st Algorithmic & Proven Protocol Verifier Copyright (C) 1977, 2002 .. 2012-07-22 , Jan Hajek , NL NO part of this document may be published, implemented, programmed, copied or communicated by any means without an explicit & FULL REFerence to this author together with the FULL title and the website www.humintel.com/hajek for the freshest version, or www.matheory.info , plus the CopyRight note in texts and in ALL references to this. An implicit, incomplete, indirect, disconnected or unlinked reference (in your text and/or on www) does NOT suffice. All based on 1st-hand experience. ALL rights reserved. Version 1.20 with lines < 79+CrLf chars in ASCII, submitted to the webmaster of http://www.humintel.com/hajek This epaper may read better outside your email (more of left margin, PgDn/Up). It has new facilities for fast finding and browsing. Please save the last copy and use a file differencer to see only where the versions differ : download Visual Compare VC154.ZIP and run VCOMP vers1 vers2 /k /i which is the best & the brightest colorful line-by-line comparer for plain .txt files. Your comments are welcome, preferably interlaced into this .txt file. New terms: PQ-search , IQ-search = intelligent queueing search , P/S-graph = progress/stagnation graph , tempoblocking , syslock , bugheat , datalgo = datalgorithm , NAK-race , and ALL inflections of ALL these new terms are also subject to my conditions and CopyRight above. ALL based on experience. ALL rights reserved. Browsers may like to search here for the following markers : !!! !! ? { refs } -.- separates sections +Abstract: Constructive descriptions of synergistically designed datalgos (= data structures + algorithmics) for model checking of protocols are provided & richly documented with flowcharts, diagrams and P/S-graphs ( by January 19, 1995, I snailmailed 21 of my viewgraphs to Gerard Holzmann at Bell Labs. Now everybody can see them on my www site). -.- +Contents : +Mottos +Basic & new terms, abbreviations & symbols +Executive summary = the beef at a model checking party +APPROVER's opeRational aspects +AntiXplosion techniques in APPROVER +APPROVER's perfect data structure +IQ-search = intelligently queued search in model checking +Efficiency of APPROVER's liveness-checking phase +APPROVER's advanced pattern recognition of liveness bugs +The rest added in 2012 : +Pattern recognition of liveness violations ( 10 lines ) +Bug-finding cybernetics of safety checking +Fresh ideas on priority functions for safety checking +Fresh ideas on liveness 35 years later +References -.- +Mottos : I leave to the various futures (not to all) my garden of forking paths ... this network of times which approached one another, forked, broke off, or were unaware of another for centuries, embraces all possibilities of time. { Jorge Luis Borges, The Garden of the Forking Paths, The Labyrinths, earlier quoted in { HJ79 } } One ounce of insight is worth one megaton of hardware. Make'em, break'em, store'em, dead or alive. If in doubt, take'em out. Only the collaborators, meek-type opportunists and the dead fish swim with the stream. -.- +Basic & new terms, abbreviations & symbols : vugraph = a view-graph from the set of those posted at the end this www site. := an assignment as in LHS := RHS i.e. left-hand side := right-hand side & is a concatenation, or "and" (depends on the context) asap = as soon as possible (our goal is to find bugs asap ) B7700 = Burroughs mainframe running Burroughs Extended Algol ( BEA ) in 1977; BEA may be characterized as a Pascalized C; word size 48 bits. var = a variable modeling = modelling = specifying the algorithmic essentials of (a set of) computations or communications processes; a model is a simplified ie less hairy hence managable image of reality. LIST = Setree on the vugraph of Combined Data Structures, stores all states, DEAD or LIVE; the basic operation on LIST is the set membership testing i.e. a state present/absent. The 2ndary operation is the backtracing of the path from the root of the genealogical tree to an error-state, if found. LIVE = a state to be expanded later; or LIVE on the vugraph of my Combined Data Structures DEAD = a state which has been expanded already (must be remembered anyway), which holds also on the vugraph of Combined Data Structures, but !!! on the vugraph of Liveness Verification phase, when all states in LIST were expanded i.e. DEAD, there : DEAD is a Boolean variable for detection of deadlocks of all kinds; please read DEAD as LOCKUP on the vugraph for Liveness Verification. envelope = all control fields in a message (eg a msg nr, an ACK nr, CRC, etc) ultimate effect = an irreversible, irrevocable effect on an effector variable which in general is NOT a control variable of a program or of a communicating process and is not a part of a state vector. EIR = effective invariant relation (concerns irrevocable ultimate effects only) EPF = effective progress function (concerns irrevocable ultimate effects only) PTC = proper termination condition (if any specified) media = communication channels systematic search = not skipping any possibility ( of interleaving of atomic actions [ a tomos = indivisible in Greek ] ). event = an external event (eg an incoming message) , or an internal event (eg a timeout in a process, or just an advancement of what used to be called an [implicit] "instruction counter"). expansion of a state = by application of several events to an event-driven algorithm, several states (some new, some old) will be generated. 1st-out = to be expanded next (while other states wait to be expanded later; waiting means delay ie stacking or queueing in one way or another). priority = a heuristic evaluation function (like for evaluation of a position in chess ie of a situation on a chessboard) FIFO = 1st-in-1st-out ie a queueing discipline or order of expansion. LIFO = last-in-1st-out ie a stacking discipline or order of expansion. PQ = a priority queue (FIFO and LIFO are 2 extreme cases of 2 simplest PQs) PQ-search = a priority queue driven search in general IQ-search = intelligent queueing search for model checking a la APPROVER. P/S-graph = effective progress/stagnation graph; effective means that a mere activity (like eg thrashing) is not enough, some meaningful effects must be effectuated (eg a mere transmission of a message means nothing if it is thrown away as a [pseudo]duplicate; a message must be finally and irrevocably ie irreversibly accepted). SCB = system control block (here the & means a concatenation); = a state vector & shadow variables & auxiliary variables, where : state = state vector = control variables ( CV ) local to and shared by the interacting processes, eg : - process control blocks (PCBs) including STATUS-variables (if a model is specified as procedures mixed with some FSA like TCP was), - BREAKpointers, plus - the message envelopes stored in the media, and also - variables used in the effective invariant relation EIR A state contains bounded variables only (ie vars of a limited range). SCB's name in Approver's code was ALLV = all variables. SHV = shadows = shadow var = variables not belonging to a state, typically: - effectors = MONOTONICALLY NONDECREASING variables used only in EPF (effective progress must be measured by nondecreasing functions only), and in EIR . - data/text parts in the media, if any ; auxies = auxiliary vars = BackTurn , BackEventGen , BackLqual , BackSeqNr , ParcOut ( for no syslock ie system lockup ), TurnField ( for tempoblock ), HashKey ; eg ParcOut = P-arc outgoing from ... gotta see dz code :-) The rules of the LHS := RHS game: Rule1: Assertions may contain control vars ( CV ) and shadows ( SHV ). Rule2: SHV := fun( CV, SHV ); Rule3: CV := fun( CV ); a shadow or an auxie must never influence a CV , neither directly, nor indirectly; hence : in CONSTRUCTIVE terms a SHV must never be appearing: - on an RHS of CV := ...; ie CV := ... SHV is forbidden ; - in a condition (of a loop, or of an if-then-[else] ) - as an actual parameter or its value in procedures or functions which contain CV on the LHS := ... ; If in doubt, this is quite easily checked syntactically (eg via regexps or decent xrefs). -.- +Executive summary : At the occasion of the 25th anniversary of the 1st public introduction of APPROVER at the 16th Lake Arrowhead Workshop on Current Issues in Computer Networking and Distributed Computing, August 31, 1977, at the UCLA Conference Center, I have decided to finally disclose the full truth and nothing but the truth about APPROVER, which eg in February 1978 has found design bugs in the basic algorithm of the TCP { HJ78b }, { SC78 } on which ARPANET was ticking then, and 25 years later our beloved Internet is still TCP-driven. This epaper reveals for the 1st time certain datalgorithmic secrets which were beating in 1977 (no typo, not in 1877 :-) in the heart of APPROVER = The 1st Algorithmic & Proven Protocol Verifier. My new word "datalgorithm" expresses the synergistic union of a perfectly designed datastructure with algorithms, ultimately data-driven by a priority heuristic. The only change I make to this description is the change of the interpretation of the name APPROVER from Automated ... to Algorithmic ... in order to stress the little known fact that from early on APPROVER has accepted models specified as algorithmic procedures in Burroughs Algol { HJ78b } and not just as some simplistic micromodels of finite state automata like eg an FSA-matrix. APPROVER has performed a reachability analysis of protocols as a systematic guided ie informed search for errors in the models of algorithms which are traditionally and properly classified as protocols (for interprocess interaction, data communication, networking, mutual exclusion, etc). The search for bugs was: - systematic ie all possible specified interleavings of parallel, distributed processes communicating via (un)reliable media channels were checked for safety, liveness & proper termination PT , if any specified. - Safety was specified by assertions of the EIR kind ie as effective invariant relations, while: - Liveness was specified in terms of EPF = effective progress function automatically translated (see ** ) into & was checked via then new P/S-graphs = progress/stagnation graphs introduced for the 1st time in { HJ77a }, { HJ78b }. The liveness phase searched for : - lockups : - dynamic lockups (I called them deadlocks too, see Fig.3b in { HJ78b } p.754 , where a lockup is visualized ) - static lockups = classical deadlocks Lockups were detected as an S-subgraph from which leads no path with a P-arc , ie a P-transition (meaning an effective progress) is not reachable from such an S-subgraph anymore. - tempo-blockings (coined by me in early 1977) = temporary lockups - STB = system tempo-blocking is a TB-cycle of more processes - ITB = individual tempo-blocking, a TB-cycle of a single process / task See { HJ78b } for Fig.1b on p.753 where a TB is visualized clearly. Tempo-blockings were detected as S-cycles ie stagnation cycles consisting from S-arcs only, from which leads a path containing a P-arc ie effective progress was reachable. - Proper termination = (if specified) an ability to terminate properly from all reachable compound states. "Properness" was specified with assertions called PTC = proper termination condition. An example of a liveness bug is a NAK-race , which is my term for processes which are busily NAK-ing (not ACKing) each others outputs. - The search was a directed search ie a guided search ie a heuristic search of the best-1st-out kind or BFO . The classical searches FIFO and LIFO are naive because they use the sequential number of the state generation as a naive search-ordering criterion : - the breadth 1st search = queue-driven BFS = FIFO = 1st-in-1st-out, with the following properties: simple, raw speed, voracious use of memory, but guaranteeing the absolutely shortest paths to bugs/errors, if any. - the depth 1st search = stack-driven DFS = LIFO = last-in-1st-out, with the following properties: simplicity, raw speed, uses much less memory than the FIFO-search does, but too long paths to errors, if any found. The longer the path to a bug the more (wo)manhours it takes to interpret it. LIFO might seem ok iff no bugs are present, but aside from the sad fact that there are almost always some bugs, the real catch-22 is that we can never be 100% sure that there are no bugs if model's combinatorially exploding reachable state space does not fit into available memory, which happens more often than not with most of the realistic models which were not oversimplified. Hence the urgent need to use a smart, informed, intelligent IQ-search to find bugs asap, which is not just a matter of technocratic & economic efficiency, but rather a matter of life or death for bugs :-) Even if no bugs are present, it makes a huge difference between wishing it vs knowing there are none. Confidence has its price-tag and it is worth that price. -.- +APPROVER was (in its opeRational aspects) : - A batch program ie fully automated, no interactions, no messing; when Carl Sunshine, then with the cold-war thinktank RAND Corp. and an ex-PhD student of a Stanford professor Vinton Cerf ( Cerf co-designed TCP with Robert Kahn of ARPA , later DARPA ), has visited me in February 1978 he was surprised by 100% automated batch runs spitting out deep design bugs in TCP, which he has been authorized to fix. - The model to be checked was specified as procedures, and channels were prepared as "defines" in BEA ie as in-line substitutions in APPROVER ; - There was no precompiler, the Algolic specs were inserted on fixed places between two fixed kernel-parts of APPROVER and compiled as a single fast program (no slow interpreter, no precompiler to program & to maintain :-) - Except for the very 1st versions in early 1977 which used a FIFO-search, APPROVER has used searches which can be named as: intelligent search, smart search, informed search, heuristic search, guided search, directed search, best 1st search, MaxFO search = Maximum-priority-1st-out search. I will call it shortly IQ-search = intelligent queue(ing) search, or informed queue search, or a PQ-search = priority queue search. Btw , a MaxFO-heap PQ can be turned into a MinFO-heap PQ (and v.v.) simply by changing priority's sign. -.- +AntiXplosion techniques in APPROVER : 1. Few bits were used per variable (most changed modulo small-ranged M ), so that a dozen of small-ranged variables were packed in one 48-bit word. Eg. a small modulus M of message numbering (hence also ACK-nrs) was used. Packing saved RAM and made comparisons and copying faster. Anti-overflow checking is not needed on unsigned integers operated upon by modulo M. 2. Breaks : by inserting home-made Breaks into the models of protocols, only the necessary atomic ie indivisible actions were specified, hence all necessary points of interleaving. By specifying only the necessary breakpoints (like eg message transmitted or received) I got a coarse !!! grained interleaving, hence only a minimal necessary number of states generated 3. In the media ie channels: - CRC was represented by only one bit and the rest of such an invalid message was cleared ie its bits were set to 000...000, so that there was only one kind of an CRC-invalid message . - Nonsequential channels were modelled as sets of messages, thus avoiding all the possible permutations of sequences. Nevertheless all of these messages in a set would be received. Hence all possible I/O sequences were systematically tried out without much combinatorial explosions which would have otherwise taken place due to unnecessary permutations of the contents of the media . 4. Shadow variables were not a part of the SCB . They were used only within EIR , EPF , PTC . 5. Liveness checking phase was (because it had to be) executed with all messages valid (ie with CRC ok). Q: why ? A: read under the ** -.- +APPROVER's perfectly designed data structure : Three synergistic data structures jointly supported highly efficient storage & execution of all frequent operations : - HashTable HT for instant LIST-membership testing of a state, DEAD or LIVE. - LIST of all unique states with 2 kinds of independent explicit links: - chains of hashing synonyma. Alternatively, an open addressed hashing could do without these particular links (then the LIST has become an open addressed hashtable). Hence no unresolved collisions, no losses, no blue-haired SPINsters gambling a la Vegas or Reno. - genealogical tree with one link from each state to its 1st predecessor (from which it has been expanded ie generated by an application of an event). Such genealogical links are not strictly necessary for bugfinding, but they are necessary for bugtracing ie for backtracing a bug (no backtracking involved) ie for a reconstruction and human interpretation of a path to a bug, if any found. LIST could be a linkless implementation of a set of states ( DEAD or LIVE ) as an open addressed hashtable, if no bug was expected anymore, thus saving memory. - Priority queue PQ for an IQ-search: each element in a PQ contained two variables : the priority and the index (or pointer) to a LIVE state on the LIST . So PQ contained only the Prio-weights of & pointers to the LIVE states waiting to be expanded and thus become DEAD but on the LIST until the end of a model checking run. A DEAD state was immediately deleted from a PQ [this took time O(log(H)) if H in PQ ] including the restauration of the heap's top from which the just expanded ( hence now DEAD ) state has been deleted by replacing it with the very last state in the heap and sifting it down until its proper position determined by its priority was reached. Sifting is "hole moving" which is about twice as fast as swapping (similarly in a non-naive insertion sort). An alt- ernative theoreticians' view is that a deletion of the top element from a HeaPQ splits it into two heaps which have to be merged into one heap. Q: which description do you prefer ? -.- +IQ-search = intelligently queued search for model checking : Strictly speaking I have told about APPROVER everything, but not always all to everybody. For example, by January 19, 1995, Gerard Holzmann at Bell Labs has received from me 21 viewgraphs including all flowcharts of APPROVER's key datalgorithms. Even my IQ-search engine has been explained in my courses, but nobody from the wannabe structured programmers has paid much attention to it. But the situation has changed since 1977; there is more code, less structure, more bugs, more mess, so maybe also more attention to non-naive solutions :-) Searches like LIFO or FIFO (= DFS = depth-1st or BFS = breadth-1st) are naive brute force searches which seem to run fast only to naive junior programmers. APPROVER has used a priority queue PQ to find safety bugs asap . The PQ was implemented as a heap (hence I called it a HeaPQ ). Heap is a beautiful linkless implementation of a PQ which always has its most important member at its top, hence instantly accessible. An insertion into a heap or deletion of the top-element from a heap takes O(log(H)) time where H is the number of currently heaped elements (ie not a number of states on a LIST ). Hence we know how to RAM-efficiently (because linklessly) implement a PQ working at a reasonable raw speed slower than a simplistic LIFO, but due to being a "PQ with IQ" with incomparably higher chance to find a safety bug, if any, in no time. The key question was (and is, until you read this) : Q: How to formulate a priority criterion ie a heuristic which will feel the heat of a bug, if any, and ferret it out much faster than a naive, though seemingly fast brute force LIFO-search aka DFS would ? A: My solution for APPROVER's safety phase was based on an analogy with heat-seeking missiles which has occurred to me when watching (on tv) I guess Israeli jets throwing away aluminum flares as decoys to mislead infrared heat-seekers. Roughly speaking, a heat-seeking missile glides along the steepest gradient of heat towards its target. On one hand a design of such a missile is easy, because heat is a well understood concept, while bugs are not, otherwise they would not hit us all the time. On the other hand, a missile must hit a moving target, while bugs are sitting ducks just waiting to be debugged (unless their makers are too fast re-typists :-) so I did not have to invent a logical analogon of Kalman filter to predict bug's trajectory. But the problem with bugs was and is that they do not emanate any radiation at anybody but (un)lucky me, who has felt their heat back in 1977. Anyway, in my view since 1977, model checkers' targets are the potential bugs to be found in a haystack called reachable statespace, and the $1M IQuestion proper was & is how to make a bug (if any) to emanate some logical equivalent of a radiation, or shortly heat, so that APPROVER could guide search to such a source of bugheat which was APPROVER's goal. For checking of the models of communication protocols I have invented and implemented in 1977 the following basic model of model's bugheat : The enigma of the IQ-search revealed - or - |------ Bugs are our goals, and bugheat is our leading light to them : | ! Every orderly interaction is somehow constrained in its ultimate effects. ! Eg a data transfer goes via at least one bounded buffer or queue eg via ! a communication channel of a limited LOGICAL capacity to transfer, and/or ! physical capacity to store, at most Nmax packets or messages, hence the ! effective invariant relation EIR (involving the ultimate ie irrevocable ! effects only, ie no control variables CV in principle) is eg this ! archetypal EIR : ! ! EIR := 0 <= Nfh - Nac <= Nmax ! ! Shadows : Nfh = nr of unique messages fetched (once only) for transmission ! Nac = nr of unique messages accepted (ie received-and-also- ! irrevocably accepted as valid nonduplicate finals) ! (unique = distinct = nonduplicate = new = fresh) ! !!! Priority := Nfh - Nac ; Quiz: do you already feel any bugheat ? ! ! While the ideal of correctness guys & gals is to (make others to believe ! that they) design algorithms so that an invariant is maintained ie not ! violated, my key idea was to make them stand on their heads, so to speak, ! by actively pushing towards the violation of an EIR. An opeRationalization ! of the meta-metaphor of "making correctness czars to stand on their heads" ! (which has not made me particularly popular in their ego-centric circles) ! was to create a POSITIVE FEEDBACK trying hard to "destabilize" asap the ! invariant EIR as much as feasible, so that it will be VIOLATED asap , if ! possible. A negative feedback tries to minimize a DIFFERENCE , while my ! positive feedback has tried to INCREASE it, and the above EIR offers ! a READILY AVAILABLE DIFFERENCE in particular. The greater the difference, ! the greater the "bugheat" ie the greater the chances that a protocol will ! go wrong, eg will irrevocably lose a message or will irrevocably accept ! the same message twice. Hence during the safety verification phase, my ! priority queue PQ has used the shadow variable Ndif = Nfh - Nac as the ! priority ie the heuristic evaluation function. This way safety bugs were ! found in no time, usually in just a few seconds in sliding window protocols ! on a B7700 mainframe in 1977 (= now an old Pentium ?). So far for the basic ! version of my key idea to steer or push a protocol towards its constraints ! expressed as an EIR. Needless to say that the generality of a PQ lends ! itself to literally infinite flexibility (and potential creativity) in ! formulating all kinds of priorities between both naive extremes of a ! FIFO = BFS and a LIFO = DFS , so that various tradeoffs between the use of ! available memory and the time (in which safety bugs are found by steering ! towards violations of EIR ) are possible. ! ! Example1: EIR , EPF and Priority for a mutual exclusion algorithm with ! 2 critical sections A, B : ! EIR := not( inCsA and inCsB ) ! EPF := NcsA & NcsB ie concatenated shadow counters of entries ! into the respective critical sections. ! Priority := NcsA - NcsB; a heuristic for bugheat ! ! ! Example2: EIR , EPF and Priority for a stop-&-wait protocol for a ! bidirectional data transfer between processes A and B : ! here nr means a shadow count (ie not any numbering in messages' ! envelopes which belong to the SCB , hence not to effectors ) ! AfhA = nr of unique messages fetched & transmitted by A to B. ! AdeA = nr of unique messages (originally fetched & transmitted by A ) ! finally deleted by A so that they cannot be retransmitted. ! AacA = nr of unique messages effectively accepted (not just received) ! by A from B. ! ! EIR := ( 0 <= AfhA - BacB <= 1 ) and ! ( 0 <= AfhA - AdeA <= 1 ) and ! ( 0 <= BfhB - AacA <= 1 ) and ! ( 0 <= BfhB - BdeB <= 1 ) and ! ( all AacA and all BacB accepted in proper sequence ) ! ! EPF := AfhA & AdeA & AacA & BfhB & BdeB & BacB ie all shadows concatenated ! ! Priority := AfhA - BacB; or a similar heuristic for bugheat ! ! Note that if these shadow counters are designed to change in ! a NONDECREASING fashion only, then any change of an EPF during ! a state-transition from a predecessor to its immediate successor !! ** is necessarily an EFFECTIVE PROGRESS (of a successor wrt ** ! its predecessor) and the & is indeed literally just a concatenation. ! Hence a slightest inequality of their EPF-subvectors is sufficient ! to decide that a transition is a P-transition, while their equality ! means an S-transition ie a stagnation (though not necessarily a ! nonactivity) ie no effective progress wrt to the predecessor. Hence my !! constructively opeRational trick was to bytewise compare EPF-subvectors !! ie the concatenated EPF variables of a successor with its predecessor's, !! and iff they differed, then to set the P/S-bit as 1 for P, else 0 for S. !! That was how the P/S-arcs were created already during the safety-phase. !! It is obvious that when messages are lost or become CRC-invalid, they !! will necessarily cause effective stagnations (despite of busy activity). !!! Therefore during the liveness-phase no events invalidating or losing !! messages were applied. After the safety-checking-phase was finished !! without any safety bugs found, the liveness-checking-phase was run by !! always copying the single next state from the LIST into another array !! ALLV (similar to LIST) of states to be checked for liveness properties, !! and then expanding this state ST by applying to it, and to its successors !! (to be) stored in ALLV , all events except those which would invalidate !! or lose a message (as explained above). In contrast with ALLV , on the !!! LIST there were all states reachable under all events possible, including !!! the LQUAL-events making media unreliable ( LQUAL stands for "line's !!! quality"). If a P-arc was made, then ST could not belong to a lockup !!! cycle ( DEAD:=FALSE ), so the next state ST was copied into ALLV[0] and !! just described search-cycle started again. See the flowcharts enclosed. !! The number of nested loops which apply events to the state ST is given !! by the number of subsets of events which are NOT mutually exclusive; eg !! !! 0. Expand the best-1st state ST at the top of a priority queue PQ !! (this is not an event, just the outermost driving loop) !! 1. a set of process' turns up to its next Break !! 2. set of events (eg a timeout, or message propagation in a channel) !! 3. set of line-ok/KO events ( message unharmed, lost, or !! CRC-invalidated ) !! Note that eg a message-propagation step is independent from the line !! quality event (ok/KO), hence to cover all possible combinations they !! must be in separate nested loops to provide for mutual NON-exclusivity. ! ! By now only junior programmers will not know how APPROVER's liveness- ! phase was programmed: simple, fast, few parts, easy maintenance :-) ! ! ! Example3: EIR, EPF and Priority for a Go-N-back unidirectional B-to-A ! transfer: ! L = M-1, where M = the modulus of message numbers, hence ! L is a LOGICAL "buffering capacity" of the protocol; here ! LOGICAL means the designed "effective window width". ! Physical buffering capacity would be L+1 messages. Internet ! has a physical capacity virtually unlimited wrt to a single ! pair of communicating processes, which nevertheless has a ! limited logical capacity due to a numbering limited by M . ! ! EIR := ( 0 <= BfhB - AacA <= L ) and ! ( 0 <= BfhB - BdeB <= L ) and ! ( 0 <= AacA - BdeB <= L ) and ! ( all XacX accepted in proper sequence ) ! ! EPF := AacA & BfhB & BdeB ! ! Priority := BfhB - AacA; or a similar heuristic for bugheat ! ! To my best knowledge, such a deadly effective systematic search for bugs, ! based on my key idea of intentionally trying to violate an EIR in a ! PQ-driven reachability analysis of protocols has not only never been ! published before 1977, but also not until now here. The idea of a positive ! feedback makes it reasonable to call it "bug-finding cybernetics" because ! kubernetes means a steersman in Greek, and the art & science of bug-finding ! is to steer our (wo)man-(wo)machine searches towards likely bugs asap . ! |------------- -.- +Efficiency of APPROVER's liveness-checking phase: Q: asked by Gerard Holzmann ( Bell Labs ): Is the complexity of the liveness verification phase quadratic in the number of states in the state graph ? A: Fortunately not :-) - Let N be the nr of distinct states reached (= generated & stored in LIST ) during the 1st phase = safety-checking phase. - As said here above and as flowcharts suggest, during the 2nd phase = liveness-checking phase a single next root-state ST was copied to the array's ALLV[0] ( actually to ALLV[1] , the code tells me ), and expanded ie its successors generated and expanded by applying to it all but the bad-line events. When an effective progress was made ie a P-transition ie a P-arc , the root-state ST was no more a part of a an inescapable lockup (= DEADlock in my terminology of 1977), but ST could still be a part of an escapable tempo-blocking cycle ( TB ). Now watch out: QuickSort is theoretically a quadratic sorter, but this has no practical significance. Similarly, it makes little sense to meditate about the worst case scenario for APPROVER's liveness phase because : 1. For the overwhelming nr of states ST copied from LIST to ALLV[0] as roots for liveness-checking, the P-arc has occured after a small nr of transitions. 2. If an ST was a part of a lockup (= DEADLOCK ) then such a stagnation- cycle of inescapable S-arcs was found during expansions of this ST and its successors all happily listed together in ALLV . To find a lockup did take more time than to reach a P-arc in 1. here, but then we were finished, because One lockup a day, keeps the users away anyway :-) ie when I find a lockup, I either proclaim protocol to be no good, or I try to fix it in which case I have to re-run APPROVER from the very start. Summarizing 1. & 2. : finding a lockup, if any, didn't take much time, and if found, we were done anyway (with that version). In other words, finding a lockup-cycle when the current ST is part of it might have cost few seconds extra in 1977, but that didn't matter, because then we should stop anyway. 3. Remains the situation when no lockup was found ie we are in the happy cycle of 1. here, when a small nr of states are expanded per each state loaded from LIST to ALLV[0] as a next-root-for-liveness checking. DEADlock detection causes no slowdown. What remains is APPROVER's search for tempoblockings aka TB (kinda tuberculosis :-). Do they cause a slowdown ? No! Q: why not ? A: because they were always present in large numbers (they are unavoidable, methinks, except perhaps in tightly coupled tasks like eg mutex = mutual exclusion algos), and noone wants to see them all, so i always have set a couple of run parameters ( APPROVER was not an interactive prog, it has run in batch-mode and therefore it had to be fully automated ) telling how many DEADlockups and TB's I want to see. As I said, 1,2,3 LOCKUPs and 1,2,3 TBs is more than enough to start fixing, and then a new run of a verifier must begin from scratch anyway. Hope your thirst for arcane algorithmic knowledge is quenched with my no-cheapo answer, so that u dont have to fight with windmills while SPINning. This is NOT a joke! What I want to say is that one can keep himself busy with solving non-problems, and many theoreticians do so their whole lives. I was done with APPROVER essentially in one year. Now I feel finished by this Total Recall. -.- +APPROVER's advanced pattern recognition of liveness bugs ( 22 lines ) : So far we were concerned about bug-finding and backtracing, but what about bug interpretation which must be performed by fragile human mind ? Liveness violations are particularly funless to interpret, because unlike the safety- bugs formed by a single state and a path to it, a liveness bug consists from a multi-state stagnation-subgraph ( S-subgraph, or S-cycle ) and a path to it. The only single-state liveness bug is the classical deadlock. The strenuous human task to interpret liveness bugs was made easier by APPROVER's liveness-bug-pattern recognition algorithmics which were at the cutting edge in 1977, and probably are still advanced 25 years later in 2002. Here the key idea was to apply a COMMUTATIVE HASHfun to an S-subgraph which represented a lockup or tempo-blocking. Normally a hashfun should be computed by means of non-commutative operations. But if we want to recognize various S-subgraphs as essentially equivalent permutations of the same states, we have to use a COMMUTATIVE HASHfun. When done with this bugpattern recognition, it was easy to backtrace a path from any of the states in an S-subgraph to the root. Backtracing could be done from all the states in an S-subgraph and the shortest path used for display. This filtering can be combined with displaying only small(est) S-subgraphs. The length of a path to the to-be-displayed S-subgraph can be taken into account, so that human effort spent on interpretation of a liveness violation can be greatly reduced. Happy bug-reading :-)) -.- +The rest added in 2012 : -.- +Pattern recognition of liveness violations ( 10 lines ) From the 3 facts, namely that : 1. each state serves as a seed for BFS , and 2. a liveness violation is a S-subgraph, and 3. a path from the root state (of the total state space, not just a seed state of a BFS ) to each state exists, it follows that every violation of liveness is detected as many times as there are states in its S-subgraph, although the paths to it differ. To avoid unnecessary interpretations of identical S-subgraphs, Approver recognized permutationally ISOMORPHIC S-subgraphs. The key idea is to use a COMMUTATIVE HASHing FUNction (based eg on multiplications or additions), so that hashing synonyma will filter all but one isomorph out. -.- +Bug-finding cybernetics of safety checking : The priority heuristic guiding the systematic search by reordering it, was directly inspired by a heat-seeking missile as-if selecting the trajectory along its maximum gradient of heat. The operational insight is that priority tactics are trying for ever increasing PrF ie differences via a positive feedback. Yet on the strategic & teleological level, the PQ operates as a cybernetic mechanism with negative feedback trying to decrease heuristic distance from targeted as-if bugs. Such heuristic searches discovered bugs in haystacks of reachable state- spaces in seconds. Hence it is reasonable to believe that a protocol is safe even when a search cannot be completed due to exhausted memory filled with combinatorially exploded states. Then more certainty obtains from re-checking safety with various priority functions. The easiest is to re-explore the state-space from the 'opposite side' of its vast universe by simply changing the sign of PrF , thus turning the maximum-1st-out ( MaxFO ) PQ into the minimum-1st-out ( MinFO ). While the direct inspiration were heat-seeking missiles, the meta-inspiration was Hajek's years-long interest in cybernetics in general, and in the works of the leading British cybernetician W. Ross Ashby in particular. The last two chapters of Ashby's book An Introduction to Cybernetics, and his speculative "Design for an intelligence amplifier" (IA), identified selectivity as the key to "amplifying regulation" in very large systems. Referring to Ashby, Herbert Simon, the only winner of both Nobel Prize for economy and Turing Award for AI , agreed. Ashby's imaginary IA was supposed to employ homeostasis ie a multidimensional negative feedback self-stabilizing towards a solution. Unlike IA , Approver employed positive feedback to trying to destabilize a protocol, with bugs (if present) as solutions. In his book The Sciences of the Artificial [ SI70 p.90-2 & 96-7], with biological evolution in mind, Simon used his parable comparing the (in)efficiencies of two unreliable watchmakers to show in quantitative probabilistic terms the power of "hierarchization" and of what Ashby has called "selection by components". Although Approver amplified selectivity via positive feedback tactics, its PQ-driven heuristic selectivity (of states for expansion) had a similarly huge effect on bug-finding as a hierarchization. Unbelieving Thomases may recall the huge selection powers of a binary search on an ordered array or on a naturally ordered interval of real numbers. Since binary search can operate on an ordered domain only, it is virtually ie as-if hierarchical. Even a heuristicaly (because imperfectly) (p)re-ordered prioritized search has great powers to ferret bugs out of the proverbial haystack. The meta-metaphor is that EIR plays the role of an Archimedean point outside (!) the model (not Earth), while the corresponding priority mechanism serves as an intelligence gathering and amplifying lever for unhinging an unsafe algorithmic model of a protocol (fortunately not the World). -.- +Fresh ideas on priority functions for safety checking If protocol's state space does not fit into RAM , we should try to disprove it with several PrF's. The flexibility of a PQ-guided search allows for creativity in devising PrF's. As heuristics, PrF's should be tested on each protocol safe only until it filled RAM. Then I recommend to intentionally introduce a safety error and try to find it with several PrF's. Repeat this with several safety errors to identify the PrF which has found the bugs in the minimum total time. This way one can learn to devise powerful PrF's. Now some hints on PrF's. A linear combination with a weight 0 <= w <= 1 , eg. PrF = -( w*PLcurr + (1-w)*PLrest ) is allowed if its parts are commensurable. The - sign is for MaxFO PQ and short path to a possible safety violation. PLcurr is the current path length ie the number of transitions (ie edges in the graph of reachable states from its root) to the current state; PLrest is the estimated number of transitions expected to be needed to reach a safety violation, if any present. Now the question arises how to meaningfully estimate PLrest ? I have used similar PrF's already in 1977 (no typo, not in 1877 :-) but here come my fresh ideas of 2012. If , after a connection was established, it took PLcurr steps to reach NfhA - NacB, then a guesstimate is that it may take proportionally as many steps to reach the dangerous NfhA - NacB == Nmax + const. By proportionality PLrest/(Nmax + const - (NfhA - NacB)) == PLcurr/(NfhA - NacB), const > 0, so PLrest = PLcurr*( (Nmax + const +1)/(NfhA - NacB +1) - 1 ) where the extra +1 makes /0 impossible. A PrF with incommensurable parts should be, in general, a weighted geometric mean PrF = ((Part1**w1)*(Part2**w2)*...)**(1/(w1+w2+...)) with weights >= 0, ** as power raising, and all Parts of the kind "the more, the better". With each Part == (Value - its lower bound) as eg (NfhA - NacB), the geomean becomes essentially isomorph with Nash product in the game theory, when w's are bargaining powers. For more see [6] for "Ideal Decision Making", section "Idea7: How a consumer should choose". Back to practice. To disprove a data transfer protocol of state space in excess of RAM, I would try the linear PrF above and at least the following three PrF's in the MaxFO PQ: PrF = NfhA - NacB; PrF = -(Nmax + 2 - (NfhA -NacB)); and for a two-way data transfer this : PrF = (NfhA -NacB)*(NfhB -NacA)/((Nmax +2 -(NfhA-NacB)*(Nmax +2 -(NfhB-NacA)). -.- +Fresh ideas on liveness checking : Just in case you like the simultaneous search for deadlocks & TB-cycles , you should know that there is a Catch-22 to it and trade-offs. Idea 1: While checking protocol's safety, among the events causing transitions were three channel/line quality events impacting messages/packets: lost , CRC invalid, CRC valid. The reason for checking liveness with "valid" only was that "lost" and "invalid" would generate too many obviously inevitable TB-cycles not worth our attention and time. Now I realize that with "valid" only I cannot exclude the possibility of a missed deadlock (depending on how an unknown protocol in a black box might react). Recalling Murphy's law Hofstadterized by me into "Murphstadter's law M: Whatever may go wrong it will, even if you take into account Murphstadter's law; goto M;" , it is fair to say, without appologies to Wittgenstein, that "What one cannot exclude, one must see as possible" or even shorter "Whatever cannot be excluded, is possible". Einstein's famous slogan "Der Herr Gott ist raffiniert, aber boshaft ist Er nicht" (The Lord is subtle, but not malicious), expressed his essentially Augustinian worldview aka Weltanschauung. This may hold for the physicist's Nature presumably created by the well meaning God, but not for living matter including mutating viruses and germs [ WI54 p188] and certainly not for fallible, vicious or wicked human (mis)creations. Our post-modern world of artifacts, including Internet, is increasingly Machiavellian and Manichaean ie manipulated by capable sorcerers and powerful adversaries. Recall counter-countermeasures like misleading radars with clouds of aluminum stripes, heat-seeking missiles with aluminum flares, antiwarhead missiles with metalic baloons, and Stuxnet, DuQu or Flame/Flamer viruses used by Ahura-Mazda aka Ohrmuzd in his power struggle with Ahriman aka Angra Mainyu. It is up to us to assume a world of Augustinian demons of disorder (chaos and entropy), or a paranoid world of deceptive Manichaean demons [ WI54 p.34-36 & 188-193] and [ HJ78b p.751 ], or a mix of both. Fortunately for us, the possibility of a missed deadlock can be excluded simply by checking the liveness twice with the following switch. First hunt only for deadlocks with "valid", "invalid" and "lost" ; then hunt only for TB-cycles with "valid" only, hence channel/line quality events can be omitted. Even a junior programmer should be able to optionally switch between hunt for deadlock only and TB-cycle only runs. The less tightly coupled the protocol (ie the larger its window size), the more TB-cycles will be possible even with "valid" only events. Therefore Approver had various options, also to limit the number of deadlocks and/or TB-cycles to be detected and viewed. Idea 2: When hunting for deadlocks only, substantial runtime savings obtain, if already during the safety-checking phase we register in one shadow P-bit with each state whether or not a P-transition was made to at least one of the state's successors. If yes, then the state cannot belong to a deadlocked S-subgraph, hence such a state can be skipped as a seed for a BFS. The positive P-bit value can be propagated along the path back to the root state of the "Big Bang", thus substantially decreasing the number of states to be checked as possible deadlock states later. One ounce of preprocessing may be worth one kiloton of hardware or a megaton of CPU-time. Idea 3: Finally, it is wise to check for TB-cycles only after we are convinced that the protocol is safe and without a deadlock, otherwise the protocol has to be modified and re-verified before we start spending our attention and time on far less serious and not completely avoidable TB-cycles. Fortunately, all three ideas, or "facts of liveness", combine perfectly :-) -.- +References { refs } : Q: Which refs are books and which are papers ? A: Unlike in the titles of (e)papers here listed, in the titles of books and periodicals all words start with a CAP, except for the insignificants like eg.: a, and, der, die, das, for, from, in, of, or, to, the, with, etc. HJ77a = Jan Hajek : Self-synchronization and blocking in data transfer protocols, report THE-RC29286, April 1977; this was the 1st version of my Kyoto paper HJ78b which had to be submitted a year before the ICCC-78 was scheduled. The local pound-foolish & penny wise Calvinists have not seen much use in my strange activities, so they wisely let me pay a 1/2 of the costs of the trip myself :-( Btw , it was clear to me at that time that Pentagon's ARPANET will put the squeeze on the Commies , and I was eager to help the beloved DoD to do just that. And the history has proven me right: ARPANET's successor Internet has become the most powerful freedom fighting machine of the Free West. HJ77b = Jan Hajek : Automatic verification and computer aided design of commu- nication protocols, minireport THE-RC29458, May 12, 1977; this was a one page abstract of my talk given at the Lake Arrowhead Workshop, see HJ77c. HJ77c = Jan Hajek : Automatic verification and computer aided design of commu- nication protocols, minireport THE-RC29458a, Sept 22, 1977; this was the final version of my presentation of APPROVER = Automated & Proven PROtocol VERifier at the Lake Arrowhead Workshop, California, August 31, 1977. From its very beginning, APPROVER was sufficiently advanced to perform reachability analysis of protocols specified as procedures in Burroughs Algol, so that much more primitive finite state models ( FSA ) had to be translated into Algol and not vice versa. ARPA's (later: DARPA ) TCP = Transmission Control Program (later: Protocol) has been modelled by Carl Sunshine, RAND Corp., and design bugs were found in TCP by APPROVER { Sunshine 1978 }. ARPANET was the mother of Internet still TCP-driven. HJ78a = Jan Hajek : Progress report on the Automatic and Proven Protocol Verifier, Computer Communication Review, ACM SigComm 8/1, January 1978, 15-16. HJ78b = Jan Hajek : Automatically verified data transfer protocols, Proc. 4th Intl. Computer Communications Conference, Kyoto, Sept. 1978, 749-756, with 4 protocols and their Effective Progress/Stagnation graphs aka P/S-graphs. On p.752 the last but one line left the phaseA should be phasesA HJ79 = Jan Hajek : Protocols verified by APPROVER, Computer Communication Review, ACM SigComm 9/1, January 1979, 32-34 ( originally INWG Note 169, Sept.1978 ). SC78 = Carl A. Sunshine : Survey of protocol definition and verification techniques, Computer Communication Review, ACM SigComm, vol.8, July 1978 nr.3, 35-41, see Table 1 for APPROVER & Hajek & ARPA's TCP on p. 39. SI70 = Herbert Simon : The Sciences of the Artificial , 1970 TA81 = Andrew Tanenbaum : Computer Networks, 1981; this 1st edition p.182 mentions APPROVER without saying that it has verified all protocols (sliding windows too) in this book before it went to print WI54 = Norbert Wiener : The Human Use of Human Beings - Cybernetics and Society 1954 -.-