-.-
              The 1st intelligent bug-search engine for model checking by
   APPROVER = The 1st Algorithmic & Proven Protocol Verifier

      Copyright (C) 1977, 2002 .. 2012-07-22 ,  Jan Hajek , NL

NO part of this document may be published, implemented, programmed, copied
   or communicated by any means without an explicit & FULL REFerence to this
author together with the FULL title and the website  www.humintel.com/hajek
for the freshest version, or www.matheory.info , plus the CopyRight note in
texts and in ALL references to this. An implicit, incomplete, indirect,
disconnected or unlinked reference (in your text and/or on www) does NOT
suffice. All based on 1st-hand experience.  ALL rights reserved.

Version 1.20 with lines < 79+CrLf chars in ASCII, submitted to the webmaster of
             http://www.humintel.com/hajek
This epaper may read better outside your email (more of left margin, PgDn/Up).
It has new facilities for fast finding and browsing. Please save the last copy
and use a file differencer to see only where the versions differ : download
Visual Compare VC154.ZIP and run VCOMP vers1 vers2 /k /i  which is the best &
the brightest colorful line-by-line comparer for plain .txt files.
Your comments are welcome, preferably interlaced into this .txt file.

New terms:  PQ-search , IQ-search = intelligent queueing search , P/S-graph =
            progress/stagnation graph , tempoblocking , syslock , bugheat ,
datalgo = datalgorithm , NAK-race , and ALL inflections of ALL these new terms
are also subject to my conditions and CopyRight above.
ALL based on experience.  ALL rights reserved.

Browsers may like to search here for the following markers :
!!! !!  ? { refs }  -.- separates sections

+Abstract: Constructive descriptions of synergistically designed datalgos
           (= data structures + algorithmics) for model checking of protocols
   are provided & richly documented with flowcharts, diagrams and P/S-graphs
( by January 19, 1995, I snailmailed 21 of my viewgraphs to Gerard Holzmann at
  Bell Labs. Now everybody can see them on my www site).

-.- +Contents :

+Mottos
+Basic & new terms, abbreviations & symbols
+Executive summary = the beef at a model checking party
+APPROVER's opeRational aspects
+AntiXplosion techniques in APPROVER
+APPROVER's perfect data structure
+IQ-search = intelligently queued search in model checking
+Efficiency of APPROVER's liveness-checking phase
+APPROVER's advanced pattern recognition of liveness bugs
+The rest added in 2012 :
   +Pattern recognition of liveness violations ( 10 lines )
   +Bug-finding cybernetics of safety checking
   +Fresh ideas on priority functions for safety checking
   +Fresh ideas on liveness 35 years later
+References

-.- +Mottos :

  I leave to the various futures (not to all) my garden of forking paths ...
  this network of times which approached one another, forked, broke off,
  or were unaware of another for centuries, embraces all possibilities of
  time.  { Jorge Luis Borges, The Garden of the Forking Paths, The Labyrinths,
           earlier quoted in { HJ79 } }

  One ounce of insight is worth one megaton of hardware.

  Make'em, break'em, store'em, dead or alive.  If in doubt, take'em out.

  Only the collaborators, meek-type opportunists and the dead fish swim with
  the stream.


-.- +Basic & new terms, abbreviations & symbols :

vugraph = a view-graph from the set of those posted at the end this www site.
:= an assignment as in LHS := RHS i.e. left-hand side := right-hand side
&  is a concatenation, or "and" (depends on the context)
asap  = as soon as possible (our goal is to find bugs asap )
B7700 = Burroughs mainframe running Burroughs Extended Algol ( BEA ) in 1977;
        BEA may be characterized as a Pascalized C; word size 48 bits.
var   = a variable
modeling = modelling = specifying the algorithmic essentials of (a set of)
           computations or communications processes;
       a model is a simplified ie less hairy hence managable image of reality.
LIST = Setree on the vugraph of Combined Data Structures, stores all states,
       DEAD or LIVE; the basic operation on LIST is the set membership
       testing i.e. a state present/absent. The 2ndary operation is the
       backtracing of the path from the root of the genealogical tree to
       an error-state, if found.
LIVE = a state to be expanded later; or LIVE on the vugraph of my Combined
         Data Structures
DEAD = a state which has been expanded already (must be remembered anyway),
         which holds also on the vugraph of Combined Data Structures, but
!!!      on the vugraph of Liveness Verification phase, when all states in
         LIST were expanded i.e. DEAD, there :
DEAD   is a Boolean variable for detection of deadlocks of all kinds; please
         read DEAD as LOCKUP on the vugraph for Liveness Verification.
envelope = all control fields in a message (eg a msg nr, an ACK nr, CRC, etc)
ultimate effect = an irreversible, irrevocable effect on an effector variable
         which in general is NOT a control variable of a program or of a
         communicating process and is not a part of a state vector.
EIR = effective invariant relation (concerns irrevocable ultimate effects only)
EPF = effective progress function  (concerns irrevocable ultimate effects only)
PTC = proper termination condition (if any specified)
media = communication channels
systematic search = not skipping any possibility ( of interleaving of atomic
           actions [ a tomos = indivisible in Greek ] ).
event = an external event (eg an incoming message) , or
        an internal event (eg a timeout in a process, or just an advancement
          of what used to be called an [implicit] "instruction counter").
expansion of a state = by application of several events to an event-driven
          algorithm, several states (some new, some old) will be generated.
1st-out = to be expanded next (while other states wait to be expanded later;
          waiting means delay ie stacking or queueing in one way or another).
priority = a heuristic evaluation function (like for evaluation of a position
           in chess ie of a situation on a chessboard)
FIFO =  1st-in-1st-out ie a queueing discipline or order of expansion.
LIFO = last-in-1st-out ie a stacking discipline or order of expansion.
PQ   = a priority queue (FIFO and LIFO are 2 extreme cases of 2 simplest PQs)
PQ-search = a priority queue driven search in general
IQ-search = intelligent queueing search for model checking a la APPROVER.
P/S-graph = effective progress/stagnation graph; effective means that a mere
       activity (like eg thrashing) is not enough, some meaningful effects
       must be effectuated (eg a mere transmission of a message means nothing
       if it is thrown away as a [pseudo]duplicate; a message must be finally
       and irrevocably ie irreversibly accepted).
SCB = system control block   (here the  &  means a concatenation);
    = a state vector & shadow variables & auxiliary variables, where :
state = state vector = control variables ( CV ) local to and shared by the
      interacting processes, eg :
      -  process control blocks (PCBs) including STATUS-variables (if a
         model is specified as procedures mixed with some FSA like TCP was),
      -  BREAKpointers, plus
      -  the message envelopes stored in the media, and also
      -  variables used in the effective invariant relation EIR
      A state contains bounded variables only (ie vars of a limited range).
      SCB's name in Approver's code was ALLV = all variables.
SHV = shadows = shadow var = variables not belonging to a state, typically:
      -  effectors = MONOTONICALLY NONDECREASING variables used only in EPF
        (effective progress must be measured by nondecreasing functions only),
         and in EIR .
      -  data/text parts in the media, if any ;
auxies = auxiliary vars = BackTurn , BackEventGen , BackLqual , BackSeqNr ,
            ParcOut ( for no syslock ie system lockup ), TurnField ( for
                      tempoblock ), HashKey ;
         eg ParcOut = P-arc outgoing from ... gotta see dz code :-)

The rules of the  LHS := RHS  game:
Rule1:  Assertions may contain control vars ( CV ) and shadows ( SHV ).
Rule2:  SHV := fun( CV, SHV );
Rule3:   CV := fun( CV );  a shadow or an auxie must never influence a CV ,
                           neither directly, nor indirectly; hence :
  in CONSTRUCTIVE terms a SHV must never be appearing:
  - on an RHS of CV := ...; ie CV := ... SHV is forbidden ;
  - in a condition (of a loop, or of an if-then-[else] )
  - as an actual parameter or its value in procedures or functions which
    contain CV on the LHS := ... ;
If in doubt, this is quite easily checked syntactically (eg via regexps or
   decent xrefs).


-.- +Executive summary :

At the occasion of the 25th anniversary of the 1st public introduction of
APPROVER at the 16th Lake Arrowhead Workshop on Current Issues in Computer
Networking and Distributed Computing, August 31, 1977, at the UCLA Conference
Center, I have decided to finally disclose the full truth and nothing but the
truth about APPROVER, which eg in February 1978 has found design bugs in the
basic algorithm of the TCP { HJ78b }, { SC78 } on which ARPANET was ticking
then, and 25 years later our beloved Internet is still TCP-driven.
This epaper reveals for the 1st time certain datalgorithmic secrets which
were beating in 1977 (no typo, not in 1877 :-) in the heart of APPROVER =
The 1st Algorithmic & Proven Protocol Verifier. My new word "datalgorithm"
expresses the synergistic union of a perfectly designed datastructure with
algorithms, ultimately data-driven by a priority heuristic. The only change
I make to this description is the change of the interpretation of the name
APPROVER from Automated ... to Algorithmic ... in order to stress the little
known fact that from early on APPROVER has accepted models specified as
algorithmic procedures in Burroughs Algol { HJ78b } and not just as some
simplistic micromodels of finite state automata like eg an FSA-matrix.

APPROVER has performed a reachability analysis of protocols as a systematic
guided ie informed search for errors in the models of algorithms which are
traditionally and properly classified as protocols (for interprocess
interaction, data communication, networking, mutual exclusion, etc).

The search for bugs was:
- systematic ie all possible specified interleavings of parallel,
     distributed processes communicating via (un)reliable media channels were
     checked for safety, liveness & proper termination PT , if any specified.
- Safety was specified by assertions of the EIR kind ie as effective invariant
           relations, while:
- Liveness was specified in terms of EPF = effective progress function
           automatically translated (see ** ) into & was checked via then
           new P/S-graphs = progress/stagnation graphs introduced for the 1st
           time in { HJ77a }, { HJ78b }. The liveness phase searched for :
   - lockups :
       - dynamic lockups (I called them deadlocks too, see Fig.3b in { HJ78b }
                          p.754 , where a lockup is visualized )
       - static lockups = classical deadlocks
       Lockups were detected as an S-subgraph from which leads no path with
           a  P-arc , ie a  P-transition (meaning an effective progress) is
           not reachable from such an S-subgraph anymore.
   - tempo-blockings (coined by me in early 1977) = temporary lockups
       - STB = system tempo-blocking  is  a TB-cycle of more processes
       - ITB = individual tempo-blocking, a TB-cycle of a single process / task
       See { HJ78b } for Fig.1b on p.753 where a TB is visualized clearly.
       Tempo-blockings were detected as S-cycles ie stagnation cycles
             consisting from S-arcs only, from which leads a path containing
             a P-arc ie effective progress was reachable.
   - Proper termination = (if specified) an ability to terminate properly
            from all reachable compound states. "Properness" was specified
            with assertions called PTC = proper termination condition.
   An example of a liveness bug is a NAK-race , which is my term for
   processes which are busily NAK-ing (not ACKing) each others outputs.

- The search was a directed search ie a guided search ie a heuristic search
  of the best-1st-out kind or BFO . The classical searches FIFO and LIFO are
  naive because they use the sequential number of the state generation as a
  naive search-ordering criterion :
  - the breadth 1st search = queue-driven BFS = FIFO =  1st-in-1st-out, with
    the following properties: simple, raw speed, voracious use of memory,
    but guaranteeing the absolutely shortest paths to bugs/errors, if any.
  - the depth 1st search = stack-driven DFS = LIFO = last-in-1st-out, with the
    following properties: simplicity, raw speed, uses much less memory than
    the FIFO-search does, but too long paths to errors, if any found. The
    longer the path to a bug the more (wo)manhours it takes to interpret it.
    LIFO might seem ok iff no bugs are present, but aside from the sad fact
    that there are almost always some bugs, the real catch-22 is that we can
    never be 100% sure that there are no bugs if model's combinatorially
    exploding reachable state space does not fit into available memory,
    which happens more often than not with most of the realistic models
    which were not oversimplified. Hence the urgent need to use a smart,
    informed, intelligent IQ-search to find bugs asap, which is not just a
    matter of technocratic & economic efficiency, but rather a matter of
    life or death for bugs :-) Even if no bugs are present, it makes a
    huge difference between wishing it vs knowing there are none. Confidence
    has its price-tag and it is worth that price.


-.- +APPROVER was (in its opeRational aspects) :

  - A batch program ie fully automated, no interactions, no messing; when
    Carl Sunshine, then with the cold-war thinktank RAND Corp. and an
    ex-PhD student of a Stanford professor Vinton Cerf ( Cerf co-designed
    TCP with Robert Kahn of ARPA , later DARPA ), has visited me in February
    1978 he was surprised by 100% automated batch runs spitting out deep
    design bugs in TCP, which he has been authorized to fix.
  - The model to be checked was specified as procedures, and channels were
    prepared as "defines" in BEA ie as in-line substitutions in APPROVER ;
  - There was no precompiler, the Algolic specs were inserted on fixed places
    between two fixed kernel-parts of APPROVER and compiled as a single fast
    program (no slow interpreter, no precompiler to program & to maintain :-)
  - Except for the very 1st versions in early 1977 which used a FIFO-search,
    APPROVER has used searches which can be named as: intelligent search,
    smart search, informed search, heuristic search, guided search,
    directed search, best 1st search, MaxFO search = Maximum-priority-1st-out
    search. I will call it shortly IQ-search = intelligent queue(ing) search,
    or informed queue search, or a PQ-search = priority queue search. Btw , a
    MaxFO-heap PQ can be turned into a MinFO-heap PQ (and v.v.) simply by
    changing priority's sign.


-.- +AntiXplosion techniques in APPROVER :

1. Few bits were used per variable (most changed modulo small-ranged M ),
    so that a dozen of small-ranged variables were packed in one 48-bit word.
    Eg. a small modulus M of message numbering (hence also ACK-nrs) was used.
    Packing saved RAM and made comparisons and copying faster. Anti-overflow
    checking is not needed on unsigned integers operated upon by modulo M.
2. Breaks : by inserting home-made Breaks into the models of protocols,
     only the necessary atomic ie indivisible actions were specified, hence
     all necessary points of interleaving. By specifying only the necessary
     breakpoints (like eg message transmitted or received) I got a coarse
!!!  grained interleaving, hence only a minimal necessary number of states
     generated
3. In the media ie channels:
   - CRC was represented by only one bit and the rest of such an invalid
         message was cleared ie its bits were set to 000...000, so that
         there was only one kind of an CRC-invalid message .
   - Nonsequential channels were modelled as sets of messages, thus avoiding
         all the possible permutations of sequences. Nevertheless all of
         these messages in a set would be received. Hence all possible
         I/O sequences were systematically tried out without much
         combinatorial explosions which would have otherwise taken place
         due to unnecessary permutations of the contents of the media .
4. Shadow variables were not a part of the SCB . They were used only within
     EIR , EPF , PTC .

5. Liveness checking phase was (because it had to be) executed with all
            messages valid (ie with CRC ok). Q: why ?  A: read under the **


-.- +APPROVER's perfectly designed data structure :

Three synergistic data structures jointly supported highly efficient
      storage & execution of all frequent operations :
- HashTable HT for instant LIST-membership testing of a state, DEAD or LIVE.
- LIST of all unique states with 2 kinds of independent explicit links:
    - chains of hashing synonyma. Alternatively, an open addressed hashing
          could do without these particular links (then the LIST has become
          an open addressed hashtable). Hence no unresolved collisions,
          no losses, no blue-haired SPINsters gambling a la Vegas or Reno.
    - genealogical tree with one link from each state to its 1st predecessor
          (from which it has been expanded ie generated by an application
           of an event). Such genealogical links are not strictly necessary
          for bugfinding, but they are necessary for bugtracing ie for
          backtracing a bug (no backtracking involved) ie for a reconstruction
          and human interpretation of a path to a bug, if any found.
    LIST could be a linkless implementation of a set of states ( DEAD or
          LIVE ) as an open addressed hashtable, if no bug was expected
          anymore, thus saving memory.
- Priority queue PQ for an IQ-search: each element in a PQ contained two
     variables :  the priority and the index (or pointer) to a LIVE state
     on the LIST . So PQ contained only the Prio-weights of & pointers to the
     LIVE states waiting to be expanded and thus become DEAD but on the LIST
     until the end of a model checking run. A DEAD state was immediately
     deleted from a PQ [this took time O(log(H)) if H in PQ ] including the
     restauration of the heap's top from which the just expanded ( hence now
     DEAD ) state has been deleted by replacing it with the very last state
     in the heap and sifting it down until its proper position determined by
     its priority was reached. Sifting is "hole moving" which is about twice
     as fast as swapping (similarly in a non-naive insertion sort). An alt-
     ernative theoreticians' view is that a deletion of the top element from
     a HeaPQ splits it into two heaps which have to be merged into one heap.
     Q: which description do you prefer ?


-.- +IQ-search = intelligently queued search for model checking :

Strictly speaking I have told about APPROVER everything, but not always all
to everybody. For example, by January 19, 1995, Gerard Holzmann at Bell Labs
has received from me 21 viewgraphs including all flowcharts of APPROVER's key
datalgorithms. Even my IQ-search engine has been explained in my courses, but
nobody from the wannabe structured programmers has paid much attention to it.
But the situation has changed since 1977; there is more code, less structure,
more bugs, more mess, so maybe also more attention to non-naive solutions :-)

Searches like LIFO or FIFO (= DFS = depth-1st or BFS = breadth-1st) are naive
brute force searches which seem to run fast only to naive junior programmers.
APPROVER has used a priority queue PQ to find safety bugs asap . The PQ
was implemented as a heap (hence I called it a HeaPQ ). Heap is a beautiful
linkless implementation of a PQ which always has its most important member
at its top, hence instantly accessible. An insertion into a heap or deletion
of the top-element from a heap takes O(log(H)) time where H is the number of
currently heaped elements (ie not a number of states on a LIST ). Hence we
know how to RAM-efficiently (because linklessly) implement a PQ working at
a reasonable raw speed slower than a simplistic LIFO, but due to being a
"PQ with IQ" with incomparably higher chance to find a safety bug, if any,
in no time. The key question was (and is, until you read this) :

Q: How to formulate a priority criterion ie a heuristic which will feel the
   heat of a bug, if any, and ferret it out much faster than a naive, though
   seemingly fast brute force LIFO-search aka DFS would ?
A: My solution for APPROVER's safety phase was based on an analogy with
   heat-seeking missiles which has occurred to me when watching (on tv) I
guess Israeli jets throwing away aluminum flares as decoys to mislead
infrared heat-seekers. Roughly speaking, a heat-seeking missile glides along
the steepest gradient of heat towards its target. On one hand a design of
such a missile is easy, because heat is a well understood concept, while bugs
are not, otherwise they would not hit us all the time. On the other hand, a
missile must hit a moving target, while bugs are sitting ducks just waiting
to be debugged (unless their makers are too fast re-typists :-) so I did not
have to invent a logical analogon of Kalman filter to predict bug's trajectory.
But the problem with bugs was and is that they do not emanate any radiation at
anybody but (un)lucky me, who has felt their heat back in 1977.
Anyway, in my view since 1977, model checkers' targets are the potential bugs
to be found in a haystack called reachable statespace, and the $1M IQuestion
proper was & is how to make a bug (if any) to emanate some logical equivalent
of a radiation, or shortly heat, so that APPROVER could guide search to such
a source of bugheat which was APPROVER's goal.
For checking of the models of communication protocols I have invented and
implemented in 1977 the following basic model of model's bugheat :

             The enigma of the IQ-search revealed - or -
|------ Bugs are our goals, and bugheat is our leading light to them :
|
!  Every orderly interaction is somehow constrained in its ultimate effects.
!  Eg a data transfer goes via at least one bounded buffer or queue eg via
!  a communication channel of a limited LOGICAL capacity to transfer, and/or
!  physical capacity to store, at most Nmax packets or messages, hence the
!  effective invariant relation EIR (involving the ultimate ie irrevocable
!  effects only, ie no control variables CV in principle) is eg this
!  archetypal EIR :
!
!    EIR := 0 <= Nfh - Nac <= Nmax
!
!  Shadows :  Nfh = nr of unique messages fetched (once only) for transmission
!             Nac = nr of unique messages accepted (ie received-and-also-
!                      irrevocably accepted as valid nonduplicate finals)
!                   (unique = distinct = nonduplicate = new = fresh)
!
!!!  Priority := Nfh - Nac ;    Quiz: do you already feel any bugheat ?
!
!  While the ideal of correctness guys & gals is to (make others to believe
!  that they) design algorithms so that an invariant is maintained ie not
!  violated, my key idea was to make them stand on their heads, so to speak,
!  by actively pushing towards the violation of an EIR. An opeRationalization
!  of the meta-metaphor of "making correctness czars to stand on their heads"
!  (which has not made me particularly popular in their ego-centric circles)
!  was to create a POSITIVE FEEDBACK trying hard to "destabilize" asap the
!  invariant EIR as much as feasible, so that it will be VIOLATED asap , if
!  possible. A negative feedback tries to minimize a DIFFERENCE , while my
!  positive feedback has tried to INCREASE it, and the above  EIR  offers
!  a READILY AVAILABLE DIFFERENCE in particular. The greater the difference,
!  the greater the "bugheat" ie the greater the chances that a protocol will
!  go wrong, eg will irrevocably lose a message or will irrevocably accept
!  the same message twice. Hence during the safety verification phase, my
!  priority queue PQ has used the shadow variable  Ndif = Nfh - Nac  as the
!  priority ie the heuristic evaluation function. This way safety bugs were
!  found in no time, usually in just a few seconds in sliding window protocols
!  on a B7700 mainframe in 1977 (= now an old Pentium ?). So far for the basic
!  version of my key idea to steer or push a protocol towards its constraints
!  expressed as an EIR. Needless to say that the generality of a PQ lends
!  itself to literally infinite flexibility (and potential creativity) in
!  formulating all kinds of priorities between both naive extremes of a
!  FIFO = BFS and a LIFO = DFS , so that various tradeoffs between the use of
!  available memory and the time (in which safety bugs are found by steering
!  towards violations of EIR ) are possible.
!
!  Example1:  EIR , EPF and Priority for a mutual exclusion algorithm with
!             2 critical sections A, B :
!    EIR := not( inCsA and inCsB )
!    EPF :=       NcsA  &   NcsB  ie concatenated shadow counters of entries
!                                 into the respective critical sections.
!    Priority := NcsA - NcsB;   a heuristic for bugheat
!
!
!  Example2:  EIR , EPF and Priority for a stop-&-wait protocol for a
!                bidirectional data transfer between processes A and B :
!     here  nr  means a shadow count (ie not any numbering in messages'
!               envelopes which belong to the SCB , hence not to effectors )
!    AfhA = nr of unique messages fetched & transmitted by A to B.
!    AdeA = nr of unique messages (originally fetched & transmitted by A )
!              finally deleted by A so that they cannot be retransmitted.
!    AacA = nr of unique messages effectively accepted (not just received)
!              by A from B.
!
!    EIR := ( 0 <= AfhA - BacB <= 1 ) and
!           ( 0 <= AfhA - AdeA <= 1 ) and
!           ( 0 <= BfhB - AacA <= 1 ) and
!           ( 0 <= BfhB - BdeB <= 1 ) and
!           ( all  AacA and all BacB accepted in proper sequence )
!
!    EPF := AfhA & AdeA & AacA & BfhB & BdeB & BacB ie all shadows concatenated
!
!    Priority := AfhA - BacB;   or a similar heuristic for bugheat
!
!    Note that if these shadow counters are designed to change in
!         a NONDECREASING fashion only, then any change of an EPF during
!    a state-transition from a predecessor to its immediate successor
!!  **   is necessarily an EFFECTIVE PROGRESS (of a successor wrt     **
!    its predecessor) and the  &  is indeed literally just a concatenation.
!    Hence a slightest inequality of their EPF-subvectors is sufficient
!    to decide that a transition is a P-transition, while their equality
!    means an S-transition ie a stagnation (though not necessarily a
!    nonactivity) ie no effective progress wrt to the predecessor. Hence my
!!   constructively opeRational trick was to bytewise compare EPF-subvectors
!!   ie the concatenated EPF variables of a successor with its predecessor's,
!!   and iff they differed, then to set the P/S-bit as 1 for P, else 0 for S.
!!   That was how the P/S-arcs were created already during the safety-phase.
!!   It is obvious that when messages are lost or become CRC-invalid, they
!!   will necessarily cause effective stagnations (despite of busy activity).
!!!  Therefore during the liveness-phase no events invalidating or losing
!!   messages were applied. After the safety-checking-phase was finished
!!   without any safety bugs found, the liveness-checking-phase was run by
!!   always copying the single next state from the LIST into another array
!!   ALLV (similar to LIST) of states to be checked for liveness properties,
!!   and then expanding this state ST by applying to it, and to its successors
!!   (to be) stored in ALLV , all events except those which would invalidate
!!   or lose a message (as explained above). In contrast with ALLV , on the
!!!  LIST there were all states reachable under all events possible, including
!!!  the LQUAL-events making media unreliable ( LQUAL stands for "line's
!!!  quality"). If a P-arc was made, then ST could not belong to a lockup
!!!  cycle ( DEAD:=FALSE ), so the next state ST was copied into ALLV[0] and
!!   just described search-cycle started again. See the flowcharts enclosed.
!!   The number of nested loops which apply events to the state ST is given
!!   by the number of subsets of events which are NOT mutually exclusive; eg
!!
!!   0. Expand the best-1st state ST at the top of a priority queue PQ
!!             (this is not an event, just the outermost driving loop)
!!     1. a set of process' turns up to its next Break
!!       2. set of events (eg a timeout, or message propagation in a channel)
!!         3. set of line-ok/KO events ( message unharmed, lost, or
!!            CRC-invalidated )
!!   Note that eg a message-propagation step is independent from the line
!!        quality event (ok/KO), hence to cover all possible combinations they
!!   must be in separate nested loops to provide for mutual NON-exclusivity.
!
!    By now only junior programmers will not know how APPROVER's liveness-
!    phase was programmed: simple, fast, few parts, easy maintenance :-)
!
!
!  Example3:  EIR, EPF and Priority for a Go-N-back unidirectional B-to-A
!             transfer:
!    L = M-1, where M = the modulus of message numbers, hence
!    L  is a LOGICAL "buffering capacity" of the protocol; here
!            LOGICAL means the designed "effective window width".
!       Physical buffering capacity would be L+1 messages. Internet
!       has a physical capacity virtually unlimited wrt to a single
!       pair of communicating processes, which nevertheless has a
!       limited logical capacity due to a numbering limited by M .
!
!    EIR := ( 0 <= BfhB - AacA <= L ) and
!           ( 0 <= BfhB - BdeB <= L ) and
!           ( 0 <= AacA - BdeB <= L ) and
!           ( all  XacX  accepted in proper sequence )
!
!    EPF := AacA & BfhB & BdeB
!
!    Priority := BfhB - AacA;   or a similar heuristic for bugheat
!
!  To my best knowledge, such a deadly effective systematic search for bugs,
!  based on my key idea of intentionally trying to violate an EIR in a
!  PQ-driven reachability analysis of protocols has not only never been
!  published before 1977, but also not until now here. The idea of a positive
!  feedback makes it reasonable to call it "bug-finding cybernetics" because
!  kubernetes means a steersman in Greek, and the art & science of bug-finding
!  is to steer our (wo)man-(wo)machine searches towards likely bugs asap .
!
|-------------

-.- +Efficiency of APPROVER's liveness-checking phase:

Q: asked by Gerard Holzmann ( Bell Labs ): Is the complexity of the liveness
   verification phase quadratic in the number of states in the state graph ?

A: Fortunately not :-)
- Let N be the nr of distinct states reached (= generated & stored in LIST )
        during the 1st phase = safety-checking phase.
- As said here above and as flowcharts suggest, during the 2nd phase =
  liveness-checking phase a single next root-state ST was copied to
the array's ALLV[0] ( actually to ALLV[1] , the code tells me ), and
expanded ie its successors generated and expanded by applying to it
all but the bad-line events. When an effective progress was made ie
a P-transition ie a P-arc , the root-state ST was no more a part of
a an inescapable lockup (= DEADlock in my terminology of 1977), but
ST could still be a part of an escapable tempo-blocking cycle ( TB ).
Now watch out:
QuickSort is theoretically a quadratic sorter, but this has no practical
   significance. Similarly, it makes little sense to meditate about the
   worst case scenario for APPROVER's liveness phase because :
1. For the overwhelming nr of states ST copied from LIST to ALLV[0] as
   roots for liveness-checking, the P-arc has occured after a small nr
   of transitions.
2. If an ST was a part of a lockup (= DEADLOCK ) then such a stagnation-
   cycle of inescapable S-arcs was found during expansions of this ST
and its successors all happily listed together in ALLV . To find a lockup
did take more time than to reach a P-arc in 1. here, but then we were
finished, because One lockup a day, keeps the users away anyway :-)
ie when I find a lockup, I either proclaim protocol to be no good, or I
try to fix it in which case I have to re-run APPROVER from the very start.

Summarizing 1. & 2. : finding a lockup, if any, didn't take much time, and
if found, we were done anyway (with that version). In other words, finding a
lockup-cycle when the current ST is part of it might have cost few seconds
extra in 1977, but that didn't matter, because then we should stop anyway.

3.  Remains the situation when no lockup was found ie we are in the
    happy cycle of 1. here, when a small nr of states are expanded per
each state loaded from LIST to ALLV[0] as a next-root-for-liveness checking.
DEADlock detection causes no slowdown. What remains is APPROVER's
search for tempoblockings aka TB (kinda tuberculosis :-).  Do they cause
a slowdown ?  No! Q: why not ?  A: because they were always present in
large numbers (they are unavoidable, methinks, except perhaps in tightly
coupled tasks like eg mutex = mutual exclusion algos), and noone wants to
see them all, so i always have set a couple of run parameters ( APPROVER was
not an interactive prog, it has run in batch-mode and therefore it had to
be fully automated ) telling how many DEADlockups and TB's I want to see.
As I said,  1,2,3 LOCKUPs and 1,2,3 TBs is more than enough to start fixing,
and then a new run of a verifier must begin from scratch anyway.

Hope your thirst for arcane algorithmic knowledge is quenched with my no-cheapo
answer, so that u dont have to fight with windmills while SPINning. This is NOT
a joke! What I want to say is that one can keep himself busy with solving
non-problems, and many theoreticians do so their whole lives. I was done with
APPROVER essentially in one year. Now I feel finished by this Total Recall.


-.- +APPROVER's advanced pattern recognition of liveness bugs ( 22 lines ) :

So far we were concerned about bug-finding and backtracing, but what about
bug interpretation which must be performed by fragile human mind ?  Liveness
violations are particularly funless to interpret, because unlike the safety-
bugs formed by a single state and a path to it, a liveness bug consists from
a multi-state stagnation-subgraph ( S-subgraph, or S-cycle ) and a path to it.
The only single-state liveness bug is the classical deadlock.
   The strenuous human task to interpret liveness bugs was made easier by
APPROVER's liveness-bug-pattern recognition algorithmics which were at the
cutting edge in 1977, and probably are still advanced 25 years later in 2002.
Here the key idea was to apply a COMMUTATIVE HASHfun to an S-subgraph which
represented a lockup or tempo-blocking. Normally a hashfun should be computed
by means of non-commutative operations. But if we want to recognize various
S-subgraphs as essentially equivalent permutations of the same states, we have
to use a COMMUTATIVE HASHfun. When done with this bugpattern recognition, it
was easy to backtrace a path from any of the states in an S-subgraph to the
root. Backtracing could be done from all the states in an S-subgraph and the
shortest path used for display. This filtering can be combined with displaying
only small(est) S-subgraphs. The length of a path to the to-be-displayed
S-subgraph can be taken into account, so that human effort spent on
interpretation of a liveness violation can be greatly reduced.
Happy bug-reading :-))


-.- +The rest added in 2012 :

-.- +Pattern recognition of liveness violations ( 10 lines )

From the 3 facts, namely that :
1. each state serves as a seed for BFS , and
2. a liveness violation is a S-subgraph, and
3. a path from the root state (of the total state space, not just a seed state
   of a BFS ) to each state exists,
it follows that every violation of liveness is detected as many times as
there are states in its S-subgraph, although the paths to it differ. To avoid
unnecessary interpretations of identical S-subgraphs, Approver recognized
permutationally ISOMORPHIC S-subgraphs. The key idea is to use a COMMUTATIVE
HASHing FUNction (based eg on multiplications or additions), so that hashing
synonyma will filter all but one isomorph out.


-.- +Bug-finding cybernetics of safety checking :

The priority heuristic guiding the systematic search by reordering it, was
directly inspired by a heat-seeking missile as-if selecting the trajectory along
its maximum gradient of heat. The operational insight is that priority tactics
are trying for ever increasing PrF ie differences via a positive feedback. Yet
on the strategic & teleological level, the PQ operates as a cybernetic mechanism
with negative feedback trying to decrease heuristic distance from targeted as-if
bugs. Such heuristic searches discovered bugs in haystacks of reachable state-
spaces in seconds. Hence it is reasonable to believe that a protocol is safe
even when a search cannot be completed due to exhausted memory filled with
combinatorially exploded states. Then more certainty obtains from re-checking
safety with various priority functions. The easiest is to re-explore the
state-space from the 'opposite side' of its vast universe by simply changing
the sign of PrF , thus turning the maximum-1st-out ( MaxFO ) PQ into the
minimum-1st-out ( MinFO ).
While the direct inspiration were heat-seeking missiles, the meta-inspiration
was Hajek's years-long interest in cybernetics in general, and in the works of
the leading British cybernetician W. Ross Ashby in particular. The last two
chapters of Ashby's book An Introduction to Cybernetics, and his speculative
"Design for an intelligence amplifier" (IA), identified selectivity as the
key to "amplifying regulation" in very large systems. Referring to Ashby,
Herbert Simon, the only winner of both Nobel Prize for economy and Turing
Award for AI , agreed. Ashby's imaginary IA was supposed to employ homeostasis
ie a multidimensional negative feedback self-stabilizing towards a solution.
Unlike IA , Approver employed positive feedback to trying to destabilize a
protocol, with bugs (if present) as solutions. In his book The Sciences of the
Artificial [ SI70 p.90-2 & 96-7], with biological evolution in mind, Simon used
his parable comparing the (in)efficiencies of two unreliable watchmakers
to show in quantitative probabilistic terms the power of "hierarchization" and
of what Ashby has called "selection by components". Although Approver amplified
selectivity via positive feedback tactics, its PQ-driven heuristic selectivity
(of states for expansion) had a similarly huge effect on bug-finding as a
hierarchization. Unbelieving Thomases may recall the huge selection powers of
a binary search on an ordered array or on a naturally ordered interval of real
numbers. Since binary search can operate on an ordered domain only, it is
virtually ie as-if hierarchical. Even a heuristicaly (because imperfectly)
(p)re-ordered prioritized search has great powers to ferret bugs out of the
proverbial haystack.
The meta-metaphor is that EIR plays the role of an Archimedean point
outside (!) the model (not Earth), while the corresponding priority mechanism
serves as an intelligence gathering and amplifying lever for unhinging an
unsafe algorithmic model of a protocol (fortunately not the World).


-.- +Fresh ideas on priority functions for safety checking

If protocol's state space does not fit into RAM , we should try to disprove it
with several PrF's. The flexibility of a PQ-guided search allows for creativity
in devising PrF's. As heuristics, PrF's should be tested on each protocol safe
only until it filled RAM. Then I recommend to intentionally introduce a safety
error and try to find it with several PrF's. Repeat this with several safety
errors to identify the PrF which has found the bugs in the minimum total time.
This way one can learn to devise powerful PrF's. Now some hints on PrF's.
A linear combination with a weight 0 <= w <= 1 , eg.
PrF = -( w*PLcurr + (1-w)*PLrest ) is allowed if its parts are commensurable.
  The - sign is for MaxFO PQ and short path to a possible safety violation.
PLcurr is the current path length ie the number of transitions (ie edges in
       the graph of reachable states from its root) to the current state;
PLrest is the estimated number of transitions expected to be needed to
       reach a safety violation, if any present. Now the question arises
how to meaningfully estimate PLrest ?  I have used similar PrF's already in
1977 (no typo, not in 1877 :-) but here come my fresh ideas of 2012. If ,
after a connection was established, it took PLcurr steps to reach NfhA - NacB,
then a guesstimate is that it may take proportionally as many steps to reach
the dangerous NfhA - NacB == Nmax + const. By proportionality
PLrest/(Nmax + const - (NfhA - NacB)) == PLcurr/(NfhA - NacB), const > 0, so
PLrest = PLcurr*( (Nmax + const +1)/(NfhA - NacB +1) - 1 ) where the extra
+1 makes /0 impossible.
A PrF with incommensurable parts should be, in general, a weighted geometric
mean PrF = ((Part1**w1)*(Part2**w2)*...)**(1/(w1+w2+...)) with weights >= 0,
** as power raising, and all Parts of the kind "the more, the better".
With each Part == (Value - its lower bound) as eg (NfhA - NacB), the geomean
becomes essentially isomorph with Nash product in the game theory, when w's
are bargaining powers. For more see [6] for "Ideal Decision Making", section
"Idea7: How a consumer should choose".
Back to practice. To disprove a data transfer protocol of state space in
excess of RAM, I would try the linear PrF above and at least the following
three PrF's in the MaxFO PQ:
PrF =  NfhA - NacB;
PrF = -(Nmax + 2 - (NfhA -NacB));  and for a two-way data transfer this :
PrF = (NfhA -NacB)*(NfhB -NacA)/((Nmax +2 -(NfhA-NacB)*(Nmax +2 -(NfhB-NacA)).


-.- +Fresh ideas on liveness checking :

Just in case you like the simultaneous search for deadlocks & TB-cycles ,
        you should know that there is a Catch-22 to it and trade-offs.
Idea 1: While checking protocol's safety, among the events causing transitions
were three channel/line quality events impacting messages/packets: lost , CRC
invalid, CRC valid. The reason for checking liveness with "valid" only was
that "lost" and "invalid" would generate too many obviously inevitable
 TB-cycles not worth our attention and time. Now I realize that with
"valid" only I cannot exclude the possibility of a missed deadlock (depending
 on how an unknown protocol in a black box might react).
Recalling Murphy's law Hofstadterized by me into "Murphstadter's law M:
Whatever may go wrong it will, even if you take into account Murphstadter's
 law; goto M;" , it is fair to say, without appologies to Wittgenstein, that
"What one cannot exclude, one must see as possible" or even shorter "Whatever
 cannot be excluded, is possible". Einstein's famous slogan "Der Herr Gott
ist raffiniert, aber boshaft ist Er nicht" (The Lord is subtle, but not
malicious), expressed his essentially Augustinian worldview aka Weltanschauung.
This may hold for the physicist's Nature presumably created by the well meaning
God, but not for living matter including mutating viruses and germs [ WI54 p188]
and certainly not for fallible, vicious or wicked human (mis)creations. Our
post-modern world of artifacts, including Internet, is increasingly
Machiavellian and Manichaean ie manipulated by capable sorcerers and powerful
adversaries. Recall counter-countermeasures like misleading radars with clouds
of aluminum stripes, heat-seeking missiles with aluminum flares, antiwarhead
missiles with metalic baloons, and Stuxnet, DuQu or Flame/Flamer viruses used
by Ahura-Mazda aka Ohrmuzd in his power struggle with Ahriman aka Angra Mainyu.
It is up to us to assume a world of Augustinian demons of disorder (chaos and
entropy), or a paranoid world of deceptive Manichaean demons [ WI54 p.34-36 &
188-193] and [ HJ78b p.751 ], or a mix of both.
Fortunately for us, the possibility of a missed deadlock can be excluded
simply by checking the liveness twice with the following switch. First hunt
only for deadlocks with "valid", "invalid" and "lost" ; then hunt only for
TB-cycles with "valid" only, hence channel/line quality events can be omitted.
Even a junior programmer should be able to optionally switch between hunt for
deadlock only and TB-cycle only runs. The less tightly coupled the protocol
(ie the larger its window size), the more TB-cycles will be possible even with
"valid" only events. Therefore Approver had various options, also to limit
the number of deadlocks and/or TB-cycles to be detected and viewed.

Idea 2: When hunting for deadlocks only, substantial runtime savings obtain,
if already during the safety-checking phase we register in one shadow
P-bit with each state whether or not a P-transition was made to at least one
of the state's successors. If yes, then the state cannot belong to a deadlocked
S-subgraph, hence such a state can be skipped as a seed for a BFS.
The positive P-bit value can be propagated along the path back to the root
state of the "Big Bang", thus substantially decreasing the number of states to
be checked as possible deadlock states later. One ounce of preprocessing may
be worth one kiloton of hardware or a megaton of CPU-time.

Idea 3: Finally, it is wise to check for TB-cycles only after we are convinced
that the protocol is safe and without a deadlock, otherwise the protocol has
to be modified and re-verified before we start spending our attention and time
on far less serious and not completely avoidable TB-cycles.

Fortunately, all three ideas, or "facts of liveness", combine perfectly :-)


-.- +References { refs } :

Q: Which refs are books and which are papers ?
A: Unlike in the titles of (e)papers here listed, in the titles of books and
   periodicals all words start with a CAP, except for the insignificants like
   eg.:  a, and, der, die, das, for, from, in, of, or, to, the, with, etc.

HJ77a = Jan Hajek : Self-synchronization and blocking in data transfer
     protocols, report THE-RC29286, April 1977; this was the 1st version
of my Kyoto paper HJ78b which had to be submitted a year before the
ICCC-78 was scheduled. The local pound-foolish & penny wise Calvinists
have not seen much use in my strange activities, so they wisely let me
pay a 1/2 of the costs of the trip myself :-(  Btw , it was clear to
me at that time that Pentagon's ARPANET will put the squeeze on the
Commies , and I was eager to help the beloved DoD to do just that.
And the history has proven me right: ARPANET's successor Internet has
     become the most powerful freedom fighting machine of the Free West.

HJ77b = Jan Hajek : Automatic verification and computer aided design of commu-
     nication protocols, minireport THE-RC29458, May 12, 1977; this was a one
     page abstract of my talk given at the Lake Arrowhead Workshop, see HJ77c.

HJ77c = Jan Hajek : Automatic verification and computer aided design of commu-
     nication protocols, minireport THE-RC29458a, Sept 22, 1977; this was the
final version of my presentation of APPROVER = Automated & Proven PROtocol
VERifier at the Lake Arrowhead Workshop, California, August 31, 1977.
From its very beginning, APPROVER was sufficiently advanced to perform
reachability analysis of protocols specified as procedures in Burroughs
Algol, so that much more primitive finite state models ( FSA ) had to be
translated into Algol and not vice versa.  ARPA's (later: DARPA ) TCP =
Transmission Control Program (later: Protocol) has been modelled by
Carl Sunshine, RAND Corp., and design bugs were found in TCP by APPROVER
   { Sunshine 1978 }. ARPANET was the mother of Internet still TCP-driven.

HJ78a = Jan Hajek : Progress report on the Automatic and Proven Protocol
     Verifier, Computer Communication Review, ACM SigComm 8/1, January 1978,
     15-16.

HJ78b = Jan Hajek : Automatically verified data transfer protocols, Proc. 4th
     Intl. Computer Communications Conference, Kyoto, Sept. 1978, 749-756,
with 4 protocols and their Effective Progress/Stagnation graphs aka P/S-graphs.
       On p.752 the last but one line left the phaseA should be phasesA

HJ79 = Jan Hajek : Protocols verified by APPROVER,
     Computer Communication Review, ACM SigComm 9/1, January 1979, 32-34
     ( originally INWG Note 169, Sept.1978 ).

SC78 = Carl A. Sunshine : Survey of protocol definition and verification
     techniques, Computer Communication Review, ACM SigComm, vol.8, July 1978
     nr.3, 35-41, see Table 1 for APPROVER & Hajek & ARPA's TCP on p. 39.

SI70 = Herbert Simon : The Sciences of the Artificial , 1970

TA81 = Andrew Tanenbaum : Computer Networks, 1981; this 1st edition p.182
     mentions APPROVER without saying that it has verified all protocols
     (sliding windows too) in this book before it went to print

WI54 = Norbert Wiener : The Human Use of Human Beings - Cybernetics and Society
       1954
-.-