.-
   Ideal Decision Making & Money Making (much on Kelly criterion/formula/rule)
         An open letter to Professor T.W. Korner, Cambridge, UK

         Jan Hajek , NL , CopyRight (C) 2008 - 2012 , v.5.9 , 2012-2-23

Find: invest , stock , horse , fortune , gain , return , risk , loss , loser ,
      win , winner , Kelly strategy , Kelly criterion , Kelly formula ,
Kelly rule , Kelly fraction =/= fractional Kelly , payoff == payout , odds ,
optim , likely , probab , visual , path , combinat , multitree == acyclic graph
== loopless graph , geomean == weighted geometric mean , estimat ; rebalanc ,
Markowitz portfolio ; game , Nash , status quo ; comparative evaluation ;
Bayes ; Markov ; DP == dynamic programming , greedy , myopic ;  vector ;
algo == algorithm ; prog == program ; MST == Maximum / minimum spanning tree

Additional abbreviations:  Afaik = As far as I know ;  CC == current capital ,
   arithmean == arithmetic mean == expected value == E[ . ] ,
     harmean ==   harmonic mean ,     CLT == central limit theorem ,
   NDM == Naive Decision Making ,     pdf == probability density function

Names: Arrow Bellman Bernstein Boruvka Dijkstra Feynman Graham Haigh Halmos
       Jarnik Kalaba Kaufmann Kelly Korner Kruskal Mesterton-Gibbons MacLean
Markowitz Marschak Mirowski Murphy Paulos Poundstone Prim Roberts Sollin Thorp
Vince Wilkinson Ziemba ; RAND == RAND Corp. ( the famous think-tank in Santa
Monica, CA, where I have lectured in 1977, no typo, not in 1877 :-)

.- Contents :
Symbols:
Idea1:  A minibestseller = Ideal title + more visuals - no-op math
Idea2:  Money management for an ideal investment
Idea3:  Murphy's conditions
Idea4:  Visualization and explanation of Kelly strategy as a tree
Idea5:  On preferring & decision making
Idea6:  On Bellman's optimality principle
Idea7:  How a consumer should choose
Idea8:  On MST (probabilistic maximum spanning trees too)
Idea9:  Wisdoms relevant to our themes
Errata: Typos old & new
References:

This letter suggests a coherent cluster of enhancements ( briefly Ideas ) for
the 2nd edition of Professor Korner's nice book Naive Decision Making ( NDM )
of 2008. This is an open letter so that it will hopefully trigger others to
suggest enhancements for the 2nd edition. However, this letter contains some
fresh insights & innovations which should NOT be lifted without a proper and
full reference to the author of this letter (a draft of possible epapers).

Dear Professor Korner == Koerner ,

It was a great pleasure to scan in the library your last book of 2008 on
Naive Decision Making ( NDM ) for your insights & explanations. Here I suggest
Idea1 ... Idea9 for improvements & extensions of the hopefully forthcoming 2nd
edition with additional sections ( simply added at the end of the 1st edition,
hence without any laborious renumberings; such add-ons can just refer back to
the existing sections ). Moreover I have some questions Q1: !!Q2: Q5a -g ... Q9
numbered for easy reference & reply. At the end of this letter you will find a
bunch of typos ferreted out of NDM .

.- Symbols:
 == equivalence or synonymity ;   ~ negation ie complement, eg ~i
  = equal ;   <> unequal ;   =. approximately equal ;     := assignment op;
 (.) an argument of eg the natural logarithm ln(.) or a probability Pr(.)
 |y| == abs( y ) == absolute value
 ^ == power ;   (.)^(1/2) == (.)^0.5 == sqrt( . ) == square root
 *  .  )(  ].[  ==  plain multiplications, except in numbers as eg 2.718
 Prod( eg Rj )  ==  Product_j( . ) == a chain multiplication
  Sum[ eg Xj ]  ==      Sum_j[ . ] == a chain addition
 e = 2.718281828 == the base of ln( . )
 E[ r.v. ] == an expected value == arithmetic mean of a random variable
 E[R] == expected return
 { and } embrace a comment in a prog or algo , or an explicit and often full
         in-line reference for easy in-context finding without backjumping.
 Indexing is kept so simple as to minimize syntactic suggar :
 Investment periods  are indexed j , 0..j..n in Vj , Pj , Rj ;
 j_1 == [j-1] ,  n_1 == [n-1]
 j is the preferred index for other (non-investment, non-Kelly) tasks or apps ;
 possible tree-outcomes: indexed k , 0..k..n in Vk , Pk , Rk within a period ;
 simultaneous assets are indexed i: fmi, R1i , R2i , Pi .
 fm , R0 , R1 , R2 , Vn  are so named as to be uniquely findable ;
 Rc , Vo == an original value , are almost uniquely findable. Search for
 f. f= f = f's  will find almost all f's . There are P( , Pr( and Sum[Pr] .
@ digit (if any) are marking fixes & new parts (later have higher digits).

.- Idea1:  A minibestseller = Ideal title + more visuals - no-op math :
- In the title of the 2nd edition replace the 1+4-letter word Naive with some
  other word ( Ideal is my ideal replacement ), because a hesitant potential
buyer (even a librarian) has not yet read your justification of the word Naive
which creates negative associations with trivial or dumb . Dijkstra aka EWD
used to say: 1 deadlock a day, keeps the user away. I say: 1 Naive a day,
      keeps the reader away.
- NDM lacks visuals. The 2nd edition should contain more pics for the post-mod
      vidiotized masses of Buridan's asses . NDM contains only 3 graphics (on
pp.151, 267, 336) which are not enough for a book of 370 pages in our foxy
21st century. Reco to throw in the pics of :
  + My multitrees, at least 2 for P=0.5 hence symmetrical, and 2 for P > 0.5
    ie asymmetrical, always one for an optimal Kelly fraction f=fm , and one
    for f <> fm.   See Idea4 for fresh, meaningful ( filled with meanings and
    highly explanatory ) multitrees , so easy to type.
  + Reward sets like in { M. Mesterton-Gibbons : An Introduction to
    Game-Theoretical Modelling, 1992, p.91 } , where his fig.3.1 visualizes
    quantitative relationships of Nash equilibrium vs Pareto-optimal points,
    etc. His reward sets are not so easy to draw as my easy to calc & type
    multitrees, but M-G is so right on p.90 where he says that "a picture of
    a reward set is worth several thousand words."


.- Idea2:  Money management for an ideal investment :
   I am glad that you spend more than a couple of pages on { John L. Kelly :
A new interpretation of information rate , Bell Systems Technical Journal,
  July 1956, pp.917-26 } . Alas , like { Kelly 1956 }, { Haigh 2003 },
{ Poundstone 2005 } , and many others , NDM shows the Kelly strategy only for
a horse race when a loss always means the whole fraction  f  lost , ie you
consider only the loss R2=1 ( in fact |R2| as we build the - sign visibly into
our formula ) in my more generally applicable closed form formula for compound
return  Rc  & optimal  fm  with a loss 0 < R2 <= 1.
Let W+L = n be some anticipated or actual counts ( ie absolute frequencies )
of  W wins and L losses ( ie W ups and L downs over n investment periods ).
Our gain R1 > 0 here is your u-1 ( my 1+R1 = u == payout ratio in NDM p71-2 ==
payout multiplier on pp.2,7,15,17-8 == payoff factor ). Then starting with an
initial value  Vo , after  n  reinvestments of a fixed fraction  f ( hopefully
a true fraction ie 0 < f < 1 , not an unrestricted factor, since f <= 0, the
smaller the worse, signals high probability of loss and for more reinvestments
an almost sure loss , while f = 1 suggests risking the whole farm , and f > 1
suggests leverage ie risking borrowed assets which is not wise and certainly
not prudent ) of our current fortune Vj , our final compound value would be :

  Vn = Vo.gains.losses = Vo.(gain^W)(loss^L) == Vo.(growth^W)(decay^L)

     = Vo.((1 + f.R1)^W).((1 - f.R2)^L)  , 0 < R1 ,  0 < R2 <= 1 , so that

1+Return over n successive reinvestments after the initial one Vo is :

1+Rn = (V1/Vo)(V2/V1)...(Vj/Vj_1)...(Vn_1/Vn_2)(Vn/Vn_1)  = Vn/Vo , hence the

geomean compound return per each of n investment periods (steps or stages) is

 Rc = (Vn/Vo)^(1/n) -1
    = ( (1 + f.R1)^P ).( (1 - f.R2)^(1-P) ) - 1   for 2 returns ( R2 == loss )
= Prod( (1 + f.Rj)^Pj ) - 1  for j returns ( some of them losses with - sign )
as
 (e^a)(e^b)..= e(a+b+..) , (x^y)^z = x^(y.z) , x^y = (e^ln(x))^y = e^(y.ln(x)),
is
  Vn = Vo.(1+Rc)^n = Vo.((1 + f.R1)^(n.P)).(( 1 - f.R2)^(n(1-P)))
     = Vo.  exp( n.( P.ln(1 + f.R1) + (1-P)ln(1 - f.R2)  ) )
     = Vo.  exp( n. E[ ln(1 + f.Rj) ] ) for more returns ( losses with - sign )
     = Vo.  exp( n.    ln(1 + f.Rc) ) = Vo.(1+Rc)^n  as above ,

   P = W/n = W/(W+L) = (n-L)/n = 1 - L/n
 1-P = L/n = L/(W+L) = (n-W)/n = 1 - W/n

are proportions ie relative frequencies , hopefully approaching probabilities
of wins & losses in the long run ( Keynes : In the long run we are all dead ).
Rc is MAXimized for the optimal fraction  f=fm  by my fortune's formula for fm
neither shown in { Kelly 1956 }, { Haigh 2003 }, { Poundstone 2005 }, { Korner
2008 }, { Vince }, nor by almost all other authors.  Since the same  fm  which
maximizes Rc maximizes any smooth monotonic increasing function of Rc , we
prefer to maximize wrt  f  this :

   ln(1+Rc) = ln( (Vn/Vo)^(1/n) ) = (1/n)ln(Vn/Vo) =
    ln((1 + f.R1)^P ) + ln((1 - f.R2)^(1-P)) =
    ln( 1 + f.R1).P   + ln( 1 - f.R2).(1-P)

Quiz: Now you know exactly what to do, so use your highschool math :
     1. To justify why a logarithmization is advantageous ; if you cannot ,
        then try to get  fm  in a closed form ie as an analytic formula
        without using ln( . );
     2. By any means get  fm  ( for P & R1, R2 ) in a closed form ie as an
        analytic formula .
Try it before you read on.

Using Ln(.) has many advantages :
Adv1: Ln(.) reduces products . ie * to + , and powers ^ to * ie . , hence :
Adv2: Expressions are simple, since an extreme is found by setting a first
      derivative to 0 while a derivative of a Sum = Sum of derivatives .
Adv3: Ln(.)'s have simple Taylor expansions .
Adv4: A product of n successive ratios from n+1 successive values is Vn/Vo ,
      so unlike in the usual (here inappropriate) arithmean , the here proper
      Sum:j=1..n[ ln(Vj/Vj_1) ] = ln(Vn/Vo) does not require to know all the
      intermediate values; only Vo , Vn , n are needed to compute the geomean
      (Vn/Vo)^(1/n) = 1+Rc = the mean growth ratio per period . Moreover, only
      a sum (not a product) expressed as ln(Vn/Vo) can be plugged into CLT :
Adv5: A sum (not a product) is needed for estimating probabilities like
      Pr(V <= D) , 1 - Pr(V > D) , Pr(V <= x.Vo ) , Pr( x.Vo <= Vn <= y.Vo )
      which follow from the central limit theorem; do plug ln(Vn/Vo) into CLT.
Adv6: If ln(X) has a normal distribution ( aka Gaussian pdf ), then X is
      said to be lognormal , which is skewed ( due to X > 0 only ), hence a
      pdf of X has a thicker tail , which is a better model of stock prices .
Adv7: Ln(.) fits the Weber-Fechner law of physiological sensation ( dB, pH,
      Richter scale ), and roughly with psychological valuation of eg wealth
and earnings ; would doubling your salary make you twice as happy, and if yes,
for how long ? People everywhere care much more about their locally relative
living standard than about its absolute level. This is confirmed by the top
professor of psychology in Princeton and Nobel Prize winner ( economy 2002 )
Daniel Kahneman and his four coauthors of "Would you be happier if you were
richer ? A focussing illusion" { Science, vol.312, 2006-6-30, p. 1908-10 }.
What matters to Jill & Joe is their vs others mutual economic competitiveness
         and jealousy when keeping up with the Joneses.
Adv8: No differential equation has to be solved directly. We saw

 Vn = Vo.exp( n.E[ ln(1 + Rj) ] )  for many returns (losses with - ), which is

in fact the general solution of a differential equation when various growths
and/or decays are mixed in a sequence. It can be seen from the simplest useful
differential equation for either a growth only or for a decay only :

 y' == dy/dt = k.y  , k is a constant , has the solution :

 y = y0.e^(k.t) , y0 is the initial condition ie y(t=0) .

This solution is derived by dividing our difeq by y and then logdif :
 y'/y  = k ;        integrate wrt t to get:
 ln|y| = k.t + C ;  exponetiate :
   |y| = e^(k.t +C) = e^C . e^(k.t) ;  since e^any > 0 we get:
    y  = Const.e^(k.t) ;  for y=y0 we get:
    y0 = Const.e^0 = Const ,  hence
    y =  y0.e^(k.t)  is the solution

Adv9: Ln(.) has more advantages which are of no or limited use here.


At last my fortune's formula for the ideal(ized) investment (in stocks too) :
A bit of elementary calculus solving for  f  the d/d(f)[ ln(1+Rc) ] = 0 yields
the wanted optimal fraction f=fm generalized for 0 < R2 <= 1 not shown by most
authors including Kelly :

fm =  (P.R1  - (1-P).R2)/(R1.R2) = E[R] /(R1.R2) == Expected[ Return ]/(R1.R2)
   =   P/R2  - (1-P)/R1 ;   let's plug in data from { Paulos 2003 pp.96-99 } :
   = 0.5/0.6 -   0.5/0.8  =. 0.208 =. 0.2 is the optimal fraction ; consider

R1=0.7 & R2=0.2 giving the factor   fm =  1.786 > 1 suggesting leverage ;
R1=0.6 & R2=0.8 giving the fraction fm = -0.208 < 0 signaling losses ;    our
R1=0.8 & R2=0.6 giving the fraction fm =  0.208 > 0 suggests profits, while

 f=1 yields Rc = 0.72^0.5 = 0.85 per period < 1 is LOSSy, like in Paulos , so

 V52 = Vo*((1+0.8)*(1 -0.6))^26 = 10000*0.72^26
     = Vo*Rc^52 = 10000*0.85^52 =      1.95 for f=1 of a LOSER ,  versus :

 fm = 0.2 yields Rc = 1.02^0.5 = 1.010 > 1 hence GAINful ( if repeated ) , so

 V52 = Vo.Rc^52 = 10000*1.010^52 = 17000    for fm=0.208 of a WINNER .

Our optimization of  f  has not just increased gain, it has turned an almost
surely a LOSER into a likely WINNER !!  One sure loser is professor Paulos
in his book { A Mathematican Plays the Stock Market, 2003 } where on pp.96-99
he well explains the need to use the geomean , but he is unaware of the real
possibility to turn a sure LOSER into a likely WINNER via the optimal Kelly
fraction  fm . Hence he is also unaware of the further trade-offs offered by
fractional Kelly ( ie using a fraction of the Kelly fraction , only briefly
touched upon in NDM p.363 )  f = c.fm , 0 < c <= 1 ( if P=0.5 then for c=2 is
Rc=0 ie a break-even point ie the most likely long term outcome is risky Vo ) :

IF 0 < c < 1 is chosen properly , eg c = 1/2 or 1/3 or 1/4 ,
THEN a sure LOSER for f=1 , will for f = c.fm  become a likely WINNER with
     somewhat smaller gain than for  f=fm , but with MUCH SMALLER RISK .
{ Edward O. Thorp : The Kelly criterion in blackjack, sports betting, and the
  stock market , 1997, 2000, 2006 } reveals a lot of opeRational meaning of
the Kelly criterion via two simple looking, though not simplistic, formulas
for  f = c.fm :

Pr( increasing to y.Vo before decreasing to x.Vo ) == Pr(..) in my next table

  = ( 1 - x^((2/c)-1) )/( 1 - (x/y)^((2/c)-1) )   for 0 < x < 1 < y ,
  = ( 1 - x )/( 1 - x/y )  if c = 1 ie for the full Kelly fraction

where c is a fraction of the Kelly fraction ie an investor uses

  f = c.fm , 0 < c < 1 ( < 2 = break-even point if P=0.5 ), and the gain
 Gc = c.(2 - c) is a fraction of the full ( ie if c=1 ) Kelly gain .

Pr(V <= x.Vo ) = x^((2/c)-1)  = the chance of  ever decreasing to x.Vo ,

       a complement of which is the chance of never decreasing to x.Vo .
IF c=1 ie full Kelly AND x = 1/a THEN Pr(V <= Vo/a) = 1/a = x  tells us that
          full Kelly gives a bumpy ride, hence a fractional Kelly ( c < 1 ie
f = c.fm < fm ) is recommended with the following tradeoffs :

  x < y   c  (2/c)-1  Pr(..)  Gc      Pr(V <= x.Vo )
---------------------------------------------------- for f = c.fm , 0 < c < 1
 1/2  2   1     1     2/3     1       x   = 1/2   = 0.5
 1/2  2  1/2    3     8/9     0.75    x^3 = 1/8   = 0.125
 1/2  2  1/3    5     0.97    0.555   x^5 = 1/32  = 0.03125
 1/2  2  1/4    7     0.992   0.438   x^7 = 1/128 = 0.00781
 1/2  2  1/5    9     0.998   0.36    x^9 = 1/512 = 0.00195

By investing eg only  f = fm/3 , our gain will most likely be 55.5 % of the
most likely gain from fm . This decrease of gain is traded-off for the much
increased probability 0.97 ( > 2/3 = 66.6 % if c=1 ) that our Vo will double
( by the factor y=2 ) before it will be halved ( by x=1/2 ).

1 - Pr(..) = Pr( not( increasing to y.Vo before decreasing to x.Vo ) ),
             which { Poundstone 2005, p.231 , no formulas } interprets thus :
full Kelly has 1/3 chance of halving Vo before doubling it, while
half Kelly has 1/9 chance of halving Vo before doubling it.

A simpler case :
  If R1 = R2 then fm = ( 2P -1 )/R1 , as eg in a simple Brownian motion .
The simple cases of betting only on one runner ( eg in a horse race ) :
  If R2=1    then fm = P - (1-P)/R1 = (P.(1+R1) -1)/R1 = (P.u -1)/(u-1) ;
  if R2=1=R1 then fm = P - (1-P) = 2P -1 , 0.5 < P < 1 for 0 < f < 1, and
  as P = W/(W+L), fm = (W-L)/(W+L) , which I recognized as the form of the
        solution (eg via the combinatorial Andre Reflection Method) of the
simplest random walk problem ( which is isomorph with the Ballot Problem ).
No wonder that even the simplest Kelly fraction has meaningful links to
other paths problems (and even to Catalan numbers). In this simplest Case1
we plug  fm = 2P -1 into ln(G) = P.ln(1+f) + (1-P)ln(1-f) to get the maximum
   ln(Gmax) = ln(2) + P.ln(P) + (1-P)ln(1-P) = 0.693 - H(P) in nats , or
1 - entropy in bits ; entropies in Case2 Case3 Case4 are in { Kelly 1956 } .

Even more general is to include a riskless return R0 , with at least one Rj
0 < R0 < Rj , and at least one other such that -1 <= Rj < 0 , hence now the
+ before both terms , as the + allows to write a single argument in a product
  of two or more ( Rj & Pj )'s for a single asset :

1+Rs = Prod( (  f.(1+Rj) + (1-f)(1+R0) )^Pj )
     = Prod( ( 1 + f.Rj  +    (1-f)R0  )^Pj )
     = Prod( (     1+R0  +   f.(Rj-R0) )^Pj ) , which for P & 1-P becomes

     = ( (1+R0 + f.(R1-R0))^P )( (1+R0 + f.(R2-R0))^(1-P) ) , -1 <= R2 < 0.
Quiz:
I humbly reco to set to 0 the 1st derivative wrt f of ln(1+Rc) & ln( 1+Rs ),
and get those more general ( than for horses ) formulas for the optimal Kelly
fraction  fm  which maximizes the growth rate of wealth. Then play with some
fractional Kelly ie with fractions c , 0 < c < 1 , of Kelly fraction  fm  to
see how   f = c.fm  will change returns & risks ie Vn & Pr in my multitrees
shown further below as Idea4 .

Quiz: Try more values of a single asset (a 1-dimensional case) with Sum[Pj]=1 :
For 2 pairs of values (R1, P ) & (R2, 1-P) we got an equation linear in f ;
for 3 pairs of values (R1, P1)...(R3, P3) you should get A.f^2 + B.f + C = 0 ,
      as the derivative d[fun(f)]/df lowers the powers of f by 1 from 3 to 2 ;
for 4 pairs of values (R1, P1)...(R4, P4) you should apply a Taylor expansion
      about zero :
      for -1 < x <= 1 :  ln(1+x) =  x - (x^2)/2 + (x^3)/3 - (x^4)/4 + ...
      for  0 <= x < 1 :  ln(1-x) = -x - (x^2)/2 - (x^3)/3 - (x^4)/4 + ...

      where  x  will be  f.Rj  in our case , and use only the terms not higher
      than f^3 , derivative of which will f^2 and lower, leading to a quadratic
      again.
Try it, as I did.

By now it might seem that enough was said already, afterall, the optimal  fm
is derived. That does not reveal enough of all the various rich MEANINGS ( how
you may get rich or become poor ) when taking risks with f's . More INSIGHTS ,
understanding , explanation , obtains from examples VISUALized in my binomial
multitrees , which I believe cannot be made more informative than they are
in the Idea4 . Can you enhance them ?


Intermezzo:
Since 1990, Ralph Vince has published on his "optimal f" fraction (here fv ),
inspired by the Kelly criterion or strategy (here fk) of 1956. Although it is
commendable that Vince has shown a formula for a fraction ( not an unbounded
factor ), I feel compelled to ask few FUNDAMENTAL QUESTIONS wrt his fv ,
answers to which I could not find in his writings. He is explicit about his
legitimate wish for a fraction 0 <= f <= 1 which maximizes terminal wealth
relative TWR (hence also growth factor G = TWR^(1/n) ), but his theory has
to be QUESTIONED due to the following SEVERE SHORTCOMINGS in his writings.
Executive summary : (all here is "as far as I can reasonably know")
     ( == is equivalence or synonymity , eg  he == Vince , his == Vince's )
- No explanation of WHY he uses /abs(Worstloss) == /Vw and WHY it is better
     than other divisors, eg my /abs(Averageloss) == /Va . My answer:
   1. rj = Vj/Vw ensures all f.rj >= -1 hence 0 <= TWR = Product[1 + f.rj ]
                 which is the Kelly's idea to "bet a fraction of your farm"
         plugged into the CLASSICAL formula for the COMPOUNDED RETURN eg in
         the widely used book by Haim Levy & Marshall Sarnat : Capital
                 Investment and Financial Decisions, 1982, ch.12, p.276.
   2. rj = Vj/Va can be used too since we MAXimize TWR wrt f , so some f's
                 are ignored anyway :-)
- No proof of his claimed optimality of his "optimal f" based on his
     chosen divisor /abs(Worstloss) == /Vw . Here follows a counterexamle :
- It is NOT true that his /Vw maximizes his TWR & growth G. You can verify
     that his favorite data values :
+9 +18 +7 +1 +10 -5 -3 -17 -7 eg in his Handbook p.121 etc, expressed as
0.09 0.18 0.07 0.01 0.10 -0.05 -0.03 -0.17 -0.07  ( proportional changes
             do not change fv & TWRv and fa & TWRa ), lead indeed to :
his fv=0.24 "yields" TWRv =Product[1 + fv.Vj/Vw ] = 1.095698 and
my  fa=0.11 "yields" TWRa =Product[1 + fa.Vj/Va ] = 1.095702 ,
       but CONTRARY to his claims, my TWRa > his TWRv while my fa << fv ie
    fa IS MUCH LESS RISKY than his fv . Next I show that both TWRa & TWRv
       are UNrealistic , ILLusory PSEUDO-"yields" :
- His TWR & growth factor G suggest ILLusory high yields because any
      REAListic TWR must be computed without any /Vw or /Va or /Velse :
      RealTWRv = Product[1 + fv*Rj ] = 1.028951 << 1.095698 = TWRv by Vince
      RealTWRa = Product[1 + fa*Rj ] = 1.013835 << 1.095702 = TWRa by me .
        On the last 2 lines the ILLUSIONS are   << . Unlike TWRv & TWRa ,
      RealTWRv & RealTWRa are changed by proportional changes of Vj's .
- He has NOT defined optimality well, so his fv yields JUST A tradeoff :
 "bet whole farm" fk=1.00 & RealTWRk = 1.087584 we reject as TOO RISKY ,
           so his fv=0.24 & RealTWRv = 1.028951 is JUST A tradeoff because
               my fa=0.11 & RealTWRa = 1.013835 is MUCH LESS RISKY than fv .
     A decent measure of optimality could be :
     100.(RealTWRv -1)/fv = 2.8951/0.24 = 12.06
     100.(RealTWRa -1)/fa = 1.3835/0.11 = 12.58 , or another one :
                    fv/fa    =  0.24/0.11  = 2.18 times LARGER RISK versus
 (RealTWRv -1)/(RealTWRa -1) = 28951/13835 = 2.09 , so we MAY say that
     the relative gain is NOT worth the relative risk of MUCH LARGER fv .
- His estimates of JOINT probabilities [ in his Handbook pp 305, 306, 309,
      310 etc are NOT justified and are highly QUESTIONABLE .
- More upon request.
The end of executive summary .

{ Ralph Vince } makes a well ment attempt to get a true fraction ie always
                0 <= f <= 1, while Kelly-like fm > 1 may result. An
f > 1 suggests risking borrowed assets aka leveraging, which is not prudent in
general, and if returns are small in particular, so that to make a noticable
profit we have to reinvest many times ( large n ), thus greatly increasing our
risks of a sequence of repeated losses of a fraction f (an almost ruinous run,
not just a single loss of an invested fraction f of our current capital ).

The formula 2 in { Vince 2010 } is readily seen to be isomorphous with our Rc ,
if our R1 & R2 are replaced by his ratios Rj/Rb (b == index of the biggest loss
so far or anticipated ) here denoted as r1 & r2 . Hence the direct way to his
fv  for 2 outcomes is via my analytic formula for  fm .
E.g. when we either are payed $2 with P=0.5 or we lose $0.8 as in his Table 1

 r1 = R1/R2 = 2.0/0.8 ,  r2 = R2/R2 = 0.8/0.8 = 1 ( for two outcomes is r2=1 ,

his minus sign of a loss is already built into my fm ) so that my closed form
solution for fv ( Vince shows NO analytic formula yielding fv directly ) is

  fv = P/r2 - (1-P)/r1 = 0.5*( 1 - 0.8/2.0 ) = 0.300  <  0.375 = fm .

Proof (for 2 outcomes) of fv <= fm if 0 < R2 <= 1 as assumed far above :

fv =   P/1  - (1-P)R2/R1              <=  fm = [ P.R1 -(1-P)R2 ]/(R1.R2)
fv = [ P.R1 - (1-P)R2 ]/R1 = E[R]/R1  <=  fm = E[R]/(R1.R2)
                                1/R1  <=          1/(R1.R2) , if 0 < R2 <= 1

hence for just 2 outcomes R1 , R2 , is fv/fm = R2  ie fv = fm.R2 in the table :

For me is Pr(V <= x.Vo) a good measure of risk ( choose your own x ).
Here are few evaluations (for the simplest P = 0.5 = 1-P used by Ralph Vince) :
denote eg P52fm == Pr(Vn <= Vo , if fm & n=52 )
          P52fv == Pr(Vn <= Vo , if fv & n=52 ) ;

  In    | Out: fm.R2 = fv ; note that all P..fm > P..fv
R1   R2        fm  >   fv    P5fm  P5fv   P10fm P10fv  P52fm P52fv
2.0  0.1     4.750 > 0.475   0.101 0.026  0.036 0.003  0.000 0.000
2.0  0.5     0.75  > 0.375   0.236 0.153  0.155 0.073  0.010 0.0007
2.0  0.6     0.583 > 0.350   0.262 0.195  0.184 0.112  0.02  0.003
2.0  0.8     0.375 > 0.300   0.31  0.278  0.242 0.203  0.055 0.029
1.8  0.6     0.555 > 0.333   0.279 0.213  0.204 0.130  0.029 0.005
0.08 0.06    2.083 > 0.125   0.437 0.378  0.410 0.330  0.302 0.159
0.8  0.6     0.208 > 0.125   0.437 0.412  0.410 0.376  0.302 0.235 so that our
last case has R(fm)^1  = 1.010 > 1.0087 = R(fv)^1   per  1 period  ,
              R(fm)^5  = 1.053 > 1.0443 = R(fv)^5   per  5 periods ,
              R(fm)^10 = 1.109 > 1.0906 = R(fv)^10  per 10 periods ,
              R(fm)^52 = 1.709 > 1.57   = R(fv)^52  per 52 periods ,

so  fv < fm  means a tradeoff of less gain ( with fv than with fm ) for
considerably less risk ( with fv than with fm ). Whether a tradeoff is optimal
or not, depends on a mathematical definition of optimality .
If Worstloss Rb -> 0 then fv -> P = 0.5 here, while fm -> infinity suggests
large leveraging ie risking borrowed assets, which should hardly be optimal
for a prudent investor .

For multiple outcomes I rewrite Ralph Vince's formula with a slight change in
our notation here : for more than 2 outcomes it is more convenient to define
returns Rj's as signed numbers ie losses with the implicit - sign. Let

  Rb == worstloss Rj < 0, so at least one Rj/abs(Rb) = -1 ( b == BigBad )


  fv = f which Maximizes Prod(     1 + f.Rj/abs(Rb)   )
     = f which Maximizes  Sum[ ln( 1 + f.Rj/abs(Rb) ) ]  because the extreme of

any function  Fun(X)  occurs for the same Xe as when extremizing a monotonic
function Mono(Fun(X)) ; here Fun(X) == Prod( .., f, ..), Mono( . ) == ln( . ).

Caution ! Warning to prevent confusion & illusion of faster growth (higher G) :
While Ralph Vince is quite clear & explicit, I do not see him warning that to
get the real exponential growth factor per trade Greal == 1 + real geomean
return (real == realistic at least for backtesting his fv ie as-if using fv on
the data from which fv was obtained ) we MUST plug his fv ( or any f obtained
by any means ) into the following formula with Rj only, ie NO Rj/abs(Rb) :

    Gtrue1 = ( Prod( 1 + fv.Rj ) )^(1/n)       without any worstloss Rb .

The maximization can serve only for obtaining his fv , but his maximized
Product (he calls it GHPR ) is NO way a realistic growth factor per trade,
future or past .
Btw, from a sequence of n+1 values Vo,V1..Vn (say stock prices ) we easily
    obtain the sequence of returns    R1..Rn ( no Ro == R0 ) thus :

   R1 = (V1 - Vo)/Vo = V1/Vo - 1 ,
   R2 = (V2 - V1)/V1 = V2/V1 - 1 , ... etc ,  hence for the superrisky f=1 is

  Gf1 = ( Prod( 1 + 1*Rj ) )^(1/n) = ( Vn/Vo )^(1/n)  so for a geomean return

of a sequence of values Vj we need to know just Vo, Vn & n , while for an
arithmean all other values are needed ( although an arithmean can be updated
sequentially from a recurrence formula through which all values Vj must pass ).

End of intermezzo on Ralph Vince's  fv ;  back to Kelly-like  fm .


Ideological and philosophical issues :
+ { Philip Mirowski : Machine Dreams - Economics Becomes a Cyborg Science,
    2002 } pp.372-6, 382, 389 are on the ideological impact of Kelly's paper
    on the Cowles economists Jacob Marschak and the nobelist Kenneth Arrow .
+ On rationality { Peter Bernstein : Against the Gods , 1996 } :
  + see his Name index for many refs eg to Frank Knight vs Keynes , to
        Richard Thaler , etc ;
  - he does not mention Kelly and Latane ;
  + p232-46: on game theory ( 241-3 ! );
  + p246: "Victorian concept of rational behavior", "measurement always
           dominates intuition",  p249: "Victorian faith in measurement" ;
  + p252: "The strategic role of diversification is Markowitz's key insight."
     I prefer to say that: The nobelist (economy 1990) Harry Markowitz has
     opeRATIONALized the old folks wisdom of "Don't keep all your eggs in one
     basket" by quantifying the arithmean-variance trade-offs when diversifying
     investments for a single investment period.
     The Kelly-inspired geomean formula generalized for diversified stocks (not
     just horses ) says how much to invest, if at all, into which asset over
     next investment periods, so as to optimize an exponential growth over a
     long term. My multitrees visualize possible & probable returns & risks
         (see Idea4 below).
  + p253: "close resemblance between diversification and .. games of strategy."
     "one player is the investor and the other player is the stock market." ...
     "By making the best of a bad bargain - by diversifying instead of striving
      to make a killing - the investor at least maximizes the probability of
      survival."
  + regression to the mean : pp.167, 170-178, 182, 184-5, 206, 249, 264, 271-2,
      p182: " Dependence on reversion to the mean tends to be perilous when the
      mean itself is in flux ."  Also in :
    { John Allen Paulos : A Mathematician Plays the Stock Market , 2003 } :
      pp.107-9 on regression to the mean : increased likelihood that after
      a peek a downfall will follow.
  + p179: "analysis is 100% hindsight. Worse, even small differences in annual
      returns over many years produce big differences in the investor's wealth
      in the long run."

Two generalizations will not be handled here :
1: The easy one:
  1+Rc and 1+Rs can be generalized for  Sum[Pk]=1 , k >= 2 , each multinomial
proportion aka relative frequency  Pk  having its own return  Rk . Although
Taylor expasions may help for a single asset , a simple numerical evaluation
of Vn or 1+Rn or 1+Rc by trying out many f's between f=0 and f=1 will do even
for non-programmers using a calc . Eg for k1+k2+k3 = n :

Vn(k1,k2,k3) = Vo.( [1 + f.R1]^k1 )( [1 + f.R2]^k2 )( [1 + f.R3]^k3 ) with
               at least one Rk < 0, and at least one Rk > 0.

P(k1,k2,k3) = (P1^k1)(P2^k2)(P3^k3).n!/(k1!k2!k3!)  where the last fraction
         is called multinomial coefficient aka degeneracy factor in physics.

2: Diversification is tough but vital ( assets are indexed i ) :
   Optimal fractions  fmi  with 0 <= Sum[ fmi ] <= 1 for a diversification of
simultaneous investments into many assets ( each with Pi , R1i , R2i ie not
just all R2i = -1 like when betting on multiple horses with only 1 winner )
will be needed for prudent practical use. The fundamental difficulty arises
from the hard to get yet needed joint probabilities, number of which increases
exponentially with the number of assets. For Z assets, each simplistically
assumed to either jump up by R1i or to fall down by R2i ( ie each asset with
marginal probabilities Pi & 1-Pi ) we would need 2^Z joint probabilities :-(

Joint probabilities easily obtain if we assume mutually independent returns
on our assets . In that case joint probabilities are just products of the
marginal P's. Analytic solution is possible for 2 assets under special
circumstances like eg: the same returns R1 = R2 = R > 0 and the same losses
L1 = L2 = L < R , D = R-L > 0 , all with the same probabilities P1=0.5=P2.
From these symmetries it obviously follows that the fractions f1 = f2 = f.
For s assets there are 2^s joint probabilities in general, and in our case
of  s=2 there are 4 joints P = P1*P2 = 0.5*0.5 = 1/4 , hence

     G      =      [   (1 + 2Rf) *   (1 - 2Lf) *     (1 + Df)*(1 + Df) ]^(1/4)
  ln(G)     =      [ ln(1 + 2Rf) + ln(1 - 2Lf) + 2*ln(1 + Df) ]/4
d(ln(G))/df = 0 = [ 2R/(1 + 2Rf) -2L/(1 - 2Lf) +  2D/(1 + Df) ]/4

Bringing the rhs to one common denominator ( its value does not matter due to
the 0 = rhs ), yields a quadratic equation. It is only a quadratic due to only
3 different ln terms in our ln(G) :

  0 = -8DLR.f^2 +(3D(R - L) -4LR).f +(D+R-L)
  0 =  8DLR.f^2 -(3DD       -4LR).f -2D     =  A.f^2 + B.f + C

The positive root f = (-B + sqrt(B^2 - 4AC))/(2A), if any, is the optimal
fraction fm ( fm=f1=f2 for both assets, in our symmetric case ) which yields
the maximum geomean Gm . Let's plug in R=1 ie doubling, and L=0.5 ie halving :

A = 8(1-0.5)0.5*1 = 2 , B = -(3*0.5*0.5 -4*0.5*1) = 1.250 , C = -2*0.5 = -1, so

fm = ( -1.250 + sqrt(1.250^2 -4*2*(-1)) )/(2*2) = 0.46 ie the maximum geomean
Gm obtains if f1 = f2 = fm = 0.46  ie  if at the beginning of all investment
   periods we reinvest 46% of our current capital in each asset , so that
100 - 2*46 = 8% of our current capital will stay uninvested during all periods.

Gs = the G above specifically with R=1, L=0.5,  D = R-L = 0.5 :
   = [ (1 + 2f) * (1 - f) * (1 + 0.5*f)^2 ]^0.25
Gm = Gs(f=fm=0.46) = 1.119 ie the geomean return = 1.119 - 1 = 0.119 ie 11.9 %

Let's plug some other f=f1=f2 into our specific Gs :
f=f1=f2   G      100*2f = % of capital invested ie at risk
  0.5    1.118   100 %
  0.46   1.119    92 %
  0.23   1.087    46 % only, while (11.9 - 8.7)/11.9 = only 27 % less than Gm
  0.1    1.045    20 % only, while (11.9 - 4.5)/11.9 =      62 % less than Gm

   2f     G(2f in 2 assets)   G(2f in 1 asset, fm = 0.5(1-0.5)/(1*0.5) = 0.5)
  0.46   1.087             > 1.06  , 0.46 =.   fm for our single asset
  0.92   1.119             > 1.018 , 0.92 =.  2fm is extremely risky

Due to the symmetries we reinvest the same amount A1=A2=A into our 2 assets.
Our fm is a fixed portion ie a constant, so we are supposed to rebalance our
portfolio before we reinvest. Rebalancing means that if one asset has ended
with a value A1 > A2 of the other asset (one was more profitable or has lost
less than the other), we reinvest a part of the last A1 into new A2 so that we
start each new period with a new A1=A2=A (and end with A1 <=> A2). Rebalancing
means that the winner is subsidizing the loser , because the winner may become
a loser in the next period. However, before the next period, the investor is
free to update his P's, R's & L's. Bayesian updating of P after n = W + L
wins and losses gives P = (W+a)/(W+a + L+b) = (W+a)/(n + a+b) where a,b obtain
from a beta pdf B(a+W, b+L). The maximum a posteriori estimate aka MAP ie the
posterior mode uses a=b=0.5 , hence  P = (W+a-1)/(n+a+b-2) = (W -0.5)/(n-1).


Joint probabilities are not needed when betting on horses in a race. This
joyful fact is due to only a single winner , hence possible gains are mutually
exclusive. { Kelly 1956 } has derived formulas for optimal f's and for Gmax in
    4 cases :
Case1 : Bet on just 1 horse when R1=1=R2 : Vn = Vo.( (1+f)^W )( (1-f)^L ), so
      fm = P - (1-P) = 2P -1 , 0.5 < P < 1 for 0 < f < 1; due to P = W/(W+L) is
      fm = (W-L)/(W+L) = W/n - L/n = W/n - (n-W)/n  = 2W/n -1 = 2P -1 , hence
ln(Gmax) = ln(2) + P.ln(P) + (1-P)ln(1-P) = 0.693 - H(P) in nats ie 1 - entropy
in bits. Entropic expressions for less simple Cases 2-4 are in { Kelly 1956 }.

Bets on K horses in a race of  h  horses indexed  i ( 0 <= K <= h ) :

Case2 : more general case for fair odds R1i and R2i = 1. Kelly does not explain
        that odds are fair if E[R] = 0 ie neither loss nor win can be expected:

   0 = E[R] = Pi.R1i - (1-Pi).R2i = Pi(1+R1i) -1 hence  Pi(1+R1i) = 1  so

   1+R1i = 1/Pi  so  Pi = 1/(1+R1i) and  Sum[Pi] = Sum[ 1/(1+R1i) ] == Q = 1.

 For an  event  i is Pi = 1/(1+Odds(i)) ie  Odds(i) = (1-Pi)/Pi ; but for a
 complementary ~i are Odds(~i) = (1 -(1-Pi))/(1-Pi) =  Pi/(1-Pi)  used eg in
 medicine (with i == "negative" == healthy ;  ~i == "positive" in medicine ).
 Even payout == even money == R1=1=R2 == fair odds = 1:1 = 1 with Pi=0.5=P(~i)

Case3 : fair odds not required ie 1+R1i <> 1/Pi yet Sum[ 1/(1+R1i) ] == Q = 1
        ie as-if no track take.

Case4 : in the real world there is a track take ie  Sum[ 1/(1+R1i) ] == Q > 1
        and Sum[ fmi ] <= 1 ie we don't have to bet the whole farm. Alas ,
{ Kelly  1956 } is hard to follow , has an opaque Gmax formula , no example ;
{ Korner 2008 pp.81-82 } is a tough sledding,  no Gmax formula , no example ;
{ Haigh  2003 } is intelligible , no Gmax , 2 examples with results, describes
Kelly's Case4 computational procedure better (without deriving it); in his 2
examples the horse B has different odds in the race of h = 5 horses A,B,C,D,E.
Payoff odds aka payout odds = profit:stake  = profit/stake = R1i . When your
horse wins, you get back  (1 + payout odds) = 1+R1i per every $ betted.
Caution : Haigh's payout odds alpha = payout - 1 = our return R1i , while
 { Kelly 1956 } defines his   alpha = payout = 1+R1i . Be reminded that

E[R] = P.R1 + (1-P)R2 = Pi.R1i + (1-Pi)(-1) = P.(1+R1i) -1 , where Pi is your

private estimated probability of winning for the i-th horse . This Pi will
seldom, if ever, equal the pseudoprobability computed from payout odds , ie

 Pi <> stake/(stake + profit) = 1/(1 + profit/stake) = 1/(1+R1i); normally we

bet on profit > stake , and the sum of pseudoprobabilities == Sum[ 1/(1+R1i) ]
== Q > 1 like here :
I programmed Case4 by Kelly from Haigh and checked his results, but will not
describe it here. Let me just say that it starts with sorting the horses in
DECreasing order of their e1[.] = P[.]*(1+R1[.]) because E[.] = e1[.] -1 :

Given :  payout . Computed:                fraction .  If bet on one horse only
  [.]  P   odds =   R1     1/(1+R1)   e1      f          fm  for a single horse
C  1  0.2   9:2 =  4.5      0.182     1.1    0.06   >   0.0222 ie bet 2%
A  2  0.4  13:8 =  1.625    0.381     1.05   0.106  >   0.0308 ie bet 3%
B  3  0.3   9:4 =  2.25     0.308     0.975  0.0625 >> -0.0111 ie no single bet
D  4  0.08  5:1 =  5        0.167     0.48   0         -0.104  ie no single bet
E  5  0.02 20:1 = 20        0.048     0.42   0         -0.029  ie no single bet
                   Sum: Q = 1.085 > 1 :
IF Sum[ 1/(1+R1i) ] == Q <= 1 ie an UNrealistic race without a track take THEN
BEGIN
  IF Pi is UNknown (ie we use less information than Kelly needs)
  THEN we cannot use Kelly , yet we can make sure profit if betting fractions
       fmi = (1/(1+R1i))/Q , 1/Q normalizes to  Sum[ fmi ]=1 ; no Pi needed :-)
  ELSE Kelly's Case4 procedure yields   fmi = Pi ( is hard to estimate ), hence
       Kelly's Gmax changes with estimates of Pi's ; if Q <= 1 then in
       Kelly's Gmax his 2nd term becomes (1-1)*ln(0) which I define as 0 in a
    Sum, ie I skip  his 2nd term in his Gmax to avoid a crash of my prog. In a
    Product 0^0 is undefined and must be interpreted as X^0 =1 (not as 0^X =0).
END ELSE
IF Q > 1 THEN it is a realistic race with a track take, Haigh / Kelly's Case4 ;
here we get from Haigh's data :
If R1[3] = 2.25 (as above for the horse B )
  then Q = 1.085, bet total fraction = Sum[f] = 0.06 + 0.106 + 0.0625 = 0.23 ;
if R1[3] = 2.00 (for B )
  then Q = 1.110, bet total fraction = Sum[f] = 0.0337 + 0.0515 + 0   = 0.085 .

Note that the horse B has f > 0 despite fm < 0 ; fm < 0 because expected gain
E[3] = e1[3] -1 = 0.3*(1 + 2.25) -1 = -0.025 < 0 . This counter-intuitive
result of  f > 0 despite fm < 0 is mentioned in { Kelly 1956 p.925 } and is
justified in { Haigh 2003 p.238 }. Depending on the B's odds , Haigh bets on
K=3 horses when B has odds 9:4 = 2.25 , or on K=2 horses when B has odds 2:1.
I get the same fractions  f  as Haigh , but I do not get the same Gmax as he
gets. Since Haigh does not tell how to compute Gmax , and the formula for Gmax
in { Kelly 1956 p.925 } seemed suspicious to me ( strangely enough, Kelly's
formula for Gmax does not contain any fraction f[.] == his a(s) ), I wrote my
own transparent formula for  G == Gmax , here presented in the crystal clear
form as a loopless algorithm :
{ nr. of f's bet = K <= h = nr. of horses in a race }
  G := 0.0;      { K is the CAPital K in Haigh 2003 p.360 }
{ h horses race ;  K = h causes (1-1)*LN(1-1) = 0*LN(0) = UNdefined ; but we
    know that it is how we compute a product term 0^0 = UNdefined too, so we
    are free to interpret it as X^0 = 1 ; hence my  if 1 > f[1]... ie as-if
    G:=G + 0  ie as-if finally G:=G * 1 .
} case K of  { the K is the computed number of optimal fractions f[.] used }
1: begin { payout - 1 = return = R[.] ; P[.] == probability }
     G :=  P[1] *LN(1.0 +f[1]*R[1]); { LN(.) == logarithm   }
    if 1 > f[1] then G:=G +(1 -P[1])*LN(1 -f[1]);
   end;   { wins are mutex == mutually exclusive winners  eg horses   }
2: begin  { wins are mutex , hence when  K=2 ie betting on 2 horses : }
     G :=  P[1] *LN(1.0 +f[1]*R[1] -f[2])
         + P[2] *LN(1.0 -f[1]      +f[2]*R[2]);
     if 1 > f[1]+f[2] then G:=G +(1 -P[1]-P[2])*LN(1 -f[1] -f[2]);
   end;
   { if 1 > f[... could be replaced by IF K < h THEN , but I believe that
     if 1 > f[... is numerically safer in preventing LN( nr. <= 0 ) }
3: begin  { wins are mutex , hence when  K=3 ie betting on 3 horses : }
     G :=  P[1] *LN(1.0 +f[1]*R[1] -f[2]      -f[3])
         + P[2] *LN(1.0 -f[1]      +f[2]*R[2] -f[3])
         + P[3] *LN(1.0 -f[1]      -f[2]      +f[3]*R[3]);
     if 1 > f[1]+f[2]+f[3] then G:=G +(1-P[1]-P[2]-P[3])*LN(1-f[1]-f[2]-f[3]);
   end;
{ 4: ... etc }
  end{case};   G := exp( G );  write('optimized growth factor G = ', G );

My formula for G is intuitively clear, even obvious, so it cannot be wrong .
My G == Gmax values = growth factors are 1.0051 , and 1.003  ie rate = 0.3% ;
{ Haigh 2003 p.238 } gets growth factors 1.0035 , and 1.0021 ie rate = 0.21%.
By now Kelly's opaque formula for Gmax yields the same growth factors I get.
I prefer transparent formulas and algorithms, because too clever opaque ones
easily become sources of misunderstandings and errors , especially when
changing the ever present assumptions, explicit and implicit. Authors make
their assumptions ( mathematical, but also about readers' knowledge of the
subject and math ), while readers make other, often wishful, assumptions.

Some relationships between Kelly-optimal reinvestments ( Kor ), and in 1990
    Nobel prize winning Markowitz-optimal portfolio theory ( Mop ) :
R1: Harry Markowitz has been sympathetic to the use of geomean since 1957.
R2: Mop is for a single period , while Kor is a multi-period strategy.
R3: Markowitz efficient frontier ( of Markowitz-optimal portfolios ) cannot
    single out the best portfolio , Kor identifies the best .
R4: While Kor is based on maximizing a geomean , Mop is based on arithmetic
    mean-variance optimization which produces efficient portfolios in the
    sense of best Return/risk ie Return/volatility ratios. We could call it
    signal-to-noise ratio s/n = 1/( n/s = relative error aka coefficient of
    variation aka noise-to-signal ratio ) = arithmean/(standard deviation)
    = A/Sd , not A/(Sd^2) = A/Variance .
    The essence of the difference between an arithmean A and a geomean G is
    an approximation via Var = Sd^2 . Any r.v. X with ALL values > 0 has a
    geomean :

    G(X) = exp(E[ln(X)]) <= E[X] == A (Jensen's ineq.). Good approximations :

    G =. A - Var/(2A) = A( 1 - 0.5(Sd/A)^2 ) =.. sqrt( A^2 - Var )

  { William J. Bernstein & David Wilkinson : Diversification , rebalancing ,
    and the geometric mean frontier , 1997-11-24 } write :

  { p.8  } " G =. R - Var/(2(1+R)) " ie their G seems to differ from our
             G =. A - Var/(2A)  , but above we have
   1+Rn =  Vn/Vo for n reinvestment periods, hence
     Rc = (Vn/Vo)^(1/n) -1 = the geomean compound return per period .

  Clearly a geomean G is meaningful only as a product of numbers > 0.
  Q: Why not >= 0 ?  A: Because a computable Sum[ln(.)] requires all (.) > 0.
  Moreover, G > 0 only if ALL its arguments are > 0 , since a 0 annuls a
  product. While in an arithmean the Sum hides a  0 , the Prod( . ) in a
  geomean reveals a 0 , so  G=0  always gives a warning on the possibility of
  a final return =  0  ie possibly total loss .
  So we assume & check for returns Rj > -1 entering a Product of ALL 1+Rj > 0.
  Now plug 1+R for A into our G =. A - Var/(2A) = 1+R - Var/(2(1+R)) ,
  so  Rc = 1+R - Var/(2(1+R)) -1 = R - Var/(2(1+R)) = their G which requires
      that their Var is computed from 1+Rj , although they do not say that.

R5: { p.15 } "Thus no portfolio which is not Markowitz efficient can be
    geometric mean efficient; equivalently, every geometric mean efficient
    portfolio must be Markowitz efficient." This I translate into logic as
    ( ~Meff implies ~Geff ) == ( Geff implies Meff ), which is a fancy way
    of saying that  Geff portfolios form a subset of Meff ones, ie all
    Kelly optimal portfolios are always Markowitz optimal ( but not v.v. ).
    Draw your 1st colorful Venn diagram into your coloring book :-)
R6: { p.15 } "Only that part of the Markowitz frontier to the left of this
    [ Gf(s) ] maximum is geometric mean efficient."
R7: { p.16 } "Markowitz frontier greatly overestimates the true reward of
     increasing risk."


.- Idea3:  Murphy's conditions :
One Stanford professor has spent a lot of math on Kelly on pp.95-125 of his
book { Roy E. Murphy : Adaptive Processes in Economic Systems, 1965 }, where
on pp.123-5 he has nicely visualized ( Fig.7.4 on p.123 ) his derivation of
his inequalities (7.5.12) & (7.5.19) , ( his P1 & P2 are our P & 1-P) ,
which I combined into

  0 < E[R]  ie  R2/(R1+R2) < P < ((1+R1)R2)/(R1+R2)  ie  E[R] < R1.R2

about which he says "Provided that the probability function of the
environmental stochastic process does not violate eqs. (7.5.12) and
(7.5.19) , it is quite likely that the subjective probabilities will not
violate  eq. (7.4.35) in the later periods of the process."
His equation (7.4.35) is  0 < p.R1 - (1-p)R2 < R1.R2   where  p  is an
estimate of the next P. Note that his (7.4.35) is just another way of
writing our
            0 < fm = (P.R1 - (1-P).R2)/(R1.R2) < 1  above.

My explanation of Murphy's terminology :
  "environmental stochastic process" == what actually happens ;
  "subjective probabilities" == p == user's guesstimate of P.

Q1: Any learned comments on (the practical value of) the Murphy's condition
    for a good P ??

Murphy rightly worries about our lack of knowledge of P, but I do not see
him worrying about our lack of knowledge of R1 & R2 in the next period.
I do worry, because our compound fortune has the value (P = W/n , W+L = n) :

Vn = Vo.((1+Rc)^n) = Vo.((1 + f.R1)^(n.P))((1 - f.R2)^(n.(1-P)))
                   = Vo.((1 + f.R1)^W    )((1 - f.R2)^L) ,   so after each

investment period we shall see how much we just won or lost, in a tiny prog :
       Vn := our new fortune became known ;
IF our Vn increased THEN
BEGIN     R1 := the last gain became known; W:=W+1
END ELSE  R2 := the last loss became known;
n :=n+1;   P := W/n;   // 1-P = 1 -W/n = (n-W)/n = L/n
R1:=??;   R2 := ??;    // HOW to best adjust for the next period ??
Of course, we can keep separate moving geomeans of few recent ups R1 ,
and downs R2 , and use these geomeans as estimates of R1, R2, for the
next period. Do you know something smarter ??
Assuming only mild, not wild, changes in the next period, we see that it
      is trivial to update P ( we could do a Bayesian update of P ), but :
!!Q2: How should we best adjust both R1 & R2 so as to best estimate the
      best fraction f = c.fm for the next period ?? This is a problem of
prediction from the (recent) past values. Since the last & past Vn's, R1's,
R2's, P's, and f's are known, HOW should we best incorporate all those values
( yes, f's and c's too, because they were and will be decisive for turning a
LOSER into a likely WINNER ) into our estimate of the best  fm  for the next
period ??  Any constructive ideas ??

Notes:
The worked examples below should surprise, or even shock, the reader twice :
1st surprise by an almost sure Big Loss ( quantified & visualized below );
2nd surprise by how almost surely a Big Loser ( like prof. Paulos ) may be
    turned, as if by magic ( no Parrondo paradox needed ), into a likely
    Tidy Winner ( quantified & visualized below ).
Like Kelly , Murphy was an entropist, no problem with me, but why bother
with entropy the already perplexed reader, when it is not necessary for a
derivation and understanding of Kelly strategy ?  Btw, when Shannon asked
 Von Neumann how he should call his key formula , John Von answered :
"Call it entropy, because nobody really understands it, so you will always
 have an advantage in a debate." :-)


.- Idea4:  Visualization and explanation of Kelly strategy as a tree :
Binomial (ie for P & 1-P only) outcomes can be easily visualized by means of
Pascal's triangle of POSSIBLE outcomes, which should be thrown into the 2nd
edition. You should tell that an m-nary regular tree is a compact way how to
list and visualize ALL possible  m-nary sequences or paths. Pascal's triangle
is a binary regular tree showing all possible binary decisions , paths or
sequences from 000..00 to 111..11 , countable via combinatorics . I wonder
why almost nobody who writes about Kelly shows any tree ?? { Poundstone 2005
p.214 } shows a simple tree for P=0.5 & R2=1 , while I pack more than twice
as much info into my multitree . As a constructive example of much-telling
graphics is my maxi-info and easy to "draw" (by typing only) multitree ie
a loopless graph aka acyclic graph which visualizes Kelly strategy as-if in
action in POSSIBLE & PROBABLE worlds, not available elsewhere ( not to be
   confused with the parallel worlds of Hugh Everett III ).
Q: Could it be that the outcomes  V  in my multitrees are coarse-grained
   evaluations of Feynman's path integrals over random walks in The Garden of
   Forking Paths by Jorge Luis Borges (and Gell-Mann's "histories" ) ?

1st tree: P=0.5, R1=0.8 = gain, R2=0.6 = loss, the fraction f=1.0 ( all
          data are the same as in { Paulos 2003, pp.96-99 } ie no fraction ie
if he/we would invest all we have (initialy Vo set apart, thereafter whatever
current capital CC has become of it). Then we get :

(1 -  f)CC = the cash we keep, while  f.CC we put at risk; both added give
(1 -  f) + f*payoff = 1 + f*(payoff -1) =  a factor which changes our CC
(1 -1.0) + 1.0 *0.4 = 1 + 1.0*(0.4 - 1) = 1 - 1.0*0.6 = 0.4 = loss factor ,
(1 -1.0) + 1.0 *1.8 = 1 + 1.0*(1.8 - 1) = 1 + 1.0*0.8 = 1.8 = gain factor ;
after n= 4 weeks is  V4 = Vo.G^4  = Vo*0.849^4  = 5184 <  Vo ,
after n=52 weeks is V52 = Vo.G^52 = Vo*0.849^52 =.   2 << Vo .

The most likely value after n steps can be computed by the general formula
Vn = Prod( Vk^Pk ) , eg  V1 = (4000^0.5)(18000^0.5) = 8485 , however
P = 0.5 allows the simpler  Vn = (Prod(Vk))^(1/(n+1)), n+1 because there
are 0..k..n nodes on each level 0..n, shown here as a multitree of POSSIBLE
outcomes (as nodes), and the most LIKELY outcomes denoted as Vo, V1,..,Vn :

n=0                   Vo = ( 10000 )^(1/1) = 10000
    10000*0.4 ............... / \ ...................... 10000*1.8 = 18000
n=1                V1 = ( 4000 * 18000  )^(1/2) = 8485 < Vo
     4000*0.4 ............ /  \  /  \ .................. 18000*1.8 = 32400
n=2             V2 = ( 1600 * 7200 * 32400 )^(1/3) = 7200 < Vo
     1600*0.4 .......  /   \ /    \ /   \ .............. 32400*1.8 = 58320
n=3          V3 = ( 640 * 2880 * 12960 * 58320 )^(1/4) = 6109 << Vo
                    /  \ /    \  /    \ /   \
n=4       V4 = ( 256 * 1152 * 5184 * 23328 * 104976 )^(1/5) = 5184 << Vo
nr. of losses L  4=n    3      2       1       0
nr. of wins   W  0      1      2       3       4=n      W+L = n  ;
paths 2^n = 16 = 1  +   4   +  6    +  4   +   1    towards their $$ , hence
$$'s Probab. 0.0625 + 0.25  + 0.375 + 0.25 + 0.0625  of the k-th value ,
 Sum[Pr] =   <------- 0.688 ------->+<--- 0.312 --> = 1
    losses : <------- V < Vo ------>|<-- Vo < V --> : gains

The larger the  n , the more n+1 of various outcomes on n-th level, hence
the smaller the probabilities Pr's of outcomes (since Sum[Pr's] = 1 is
distributed over n+1 Pr's).  For  f=1  after n=52 weeks of investments :
- The largest possible outcome is an "exponentially huge" number with Pr=.0
- The most likely outcome V52 = 1.95 << 10k = Vo, with Pr=0.11 which is the
  largest Pr, yet quite low, since it is just one out of n+1 = 53 Pr's which
  always sum to 1. Even more telling is this :
          D  =  1.95 = V52   10k = Vo   etc
  Pr(V <= D) =      0.5         0.94    a classical homework for the reader


2nd tree: again P=0.5, now f=0.2 =. fm , and with all Pr's shown, has :
(1 -  f) + f*payoff = 1 + f*(payoff -1) =  a factor which changes our CC
(1 -0.2) + 0.2*0.4  = 1 + 0.2*(0.4 - 1) = 1 - 0.2*0.6 = 0.88 = loss factor;
(1 -0.2) + 0.2*1.8  = 1 + 0.2*(1.8 - 1) = 1 + 0.2*0.8 = 1.16 = gain factor;
after n=52 weeks is V52 = Vo*G^52 = Vo*1.010^52 =   17k >> Vo ;
after n= 4 weeks is  V4 = Vo*G^4  = Vo*1.010^4  = 10420  > Vo , shown as a
multitree of POSSIBLE outcomes together with their PROBABILITIES (at nodes),
and the most LIKELY outcomes Vn :

n=0                   Vo = ( 10000 )^(1/1) = 10000 = Vo
                      Pr =    1.0
       10000*0.88 ........... / \ ..................... 10000*1.16 = 11600
n=1                V1 = ( 8800 * 11600 )^(1/2) = 10103 > Vo
                   Pr =    0.5 +  0.5  = 1
     8800*0.88 ........... /   \ /  \...................11600*1.16 = 13456
n=2             V2 = ( 7744 * 10208 * 13456 )^(1/3) = 10208 > Vo
                Pr =   0.25 +  0.5  + 0.25  = 1
   7744*0.88 ......... /   \ /    \ /   \ ..............13456*1.16 = 15609
n=3         V3 = ( 6815 * 8983 * 11841 * 15609 )^(1/4) = 10313 > Vo
            Pr =  0.125 +0.375 + 0.375 + 0.125 = 1
   6815*0.88 ..... /   \  /   \  /    \  /   \ .........15609*1.16 = 18106
n=4      V4 = ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420 > Vo
        Pr  = 0.0625 + 0.25 + 0.375 +  0.25 + 0.0625 = 1
    Sum[Pr] = <-- 0.312 --->+<------- 0.688 -------> = 1
     losses : <-- V < Vo -->|<------ Vo < V -------> : gains

The larger the n , the more n+1 of various outcomes, hence the smaller
the probabilities Pr's of outcomes (since Sum[Pr's]=1 is distributed over
n+1 Pr's).  For  f=0.2 after n=52 weeks of investments :
- The largest possible outcome is 22.5M with Pr=. 0. Note that 22.5M << an
  "exponentially huge" value for f=1 above.
- The most likely outcome V52 =. 17k > 10k = Vo, with Pr=0.11 which is the
  largest Pr, yet quite low, since it is just one out of n+1 = 53 Pr's which
  always sum to 1. Even more telling is this :
          D  =  10k = Vo   17k = V52   etc
  Pr(V <= D) =     0.34       0.5      a classical homework for the reader

IF a binomial multitree is SYMMETRICAL wrt Pr's ( due to P=0.5 = 1-P )
THEN it holds :
+  If 0 < k < n (ie a node has 2 parents) then each Pr = the mean of the
     2 parental Pr's on the previous ie (n-1)th level.
 On each level n with n even (ie odd n+1 of V's), it holds :
+  The maximum Pr = the median Pr = both central Pr's on the level n-1.
+  The weighted geomean Vn = the median POSSIBLE Value on the level n , eg:

   V4 = ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420
      = ((5997^0.0625)(7905^0.25)......(18106^0.0625) = 10420

Combinatorics count :
Probabilities in a binomial pdf are computable via a recurrent combinatorial
expression. Here is the easiest way how to compute , also with a calc ,

 Pr(k,n) = (n choose k)(P^k)((1-P)^(n-k))  for a single n and for all k :

 Pkn := (1-P)^n ;    write( 0, Pkn );           { == Pr(0,n) find it below }
 for k:=1 to n do begin Pkn := Pkn*(n-k+1)*P/((1-P)k);  write( k, Pkn ) end;

Note that we need no (n choose k), hence neither factorials, nor Stirling's
approximation, and not even any logarithm, except possibly to implement the
single (1-P)^n via exp(n*ln(1-P)) , unless done via a simple loop .
Our interpretation of P(k,n) :

(P^k)((1-P)^(n-k)) = probability of one particular sequence of  n  events :
                     k events e1 and n-k complementary events e0 ; it is a
joint probability assuming independence;  ^ expresses repeated occurrences
                  regardless of their order .
(n choose k) = n!/(k!(n-k)!) = n!((n-k)!k!) = (n choose n-k)  ie symmetry ;
             = nr. of such sequences ( regardless of the order of e1's and
                   e0's ), all mutually exclusive, hence :
P(k,n) = probability of occurrence of any such a sequence ( any mix ,
         regardless of order, of k e1 with n-k of e0 ) = probability of
 any (ie regardless of order) such a path in a multitree

We saw above that POSSIBLE paths are symmetrically distributed on n-th level:
(n choose k) = (n choose n-k)   = nr. of paths to a  k-th node on n-th level
                 in a multitree = nr. of paths to (n-k)th node on n-th level
(n choose k) are the node values in Pascal's triangle .
(n choose 0) = (n choose n  ) = 1  on both extreme branches of a multitree
(n choose 1) = (n choose n-1) = n
(n choose 2) = (n choose n-2) = n(n-1)/2
Sum:k=0..n[(n choose k)] = 2^n = nr. of paths to all the nodes on n-th level

So far on POSSIBILITIES , now more on PROBABILITIES of certain paths :
Only for P = 1/2 = (1-P) it holds P(k,n) = (n choose k)/(2^n).
For any  P , 0 < P < 1, is Sum:k=0..n[ Pr(k,n) ] = 1 on any n-th level , and
Pr(0,n) = 1*1*(1-P)^(n-0) = (1-P)^n ;  Pr(n,n) = 1* P^n *1 = P^n
Pr(1,n) = n*P*(1-P)^(n-1)           ;     n = W+L = wins + losses , hence :
  if P = probability of a win in one bet (or step, stage, period) ;  then :
Pr( W >= 1 of n ) == Pr(at least 1 win in n bets) == Pr(no loss in n bets) =
Pr( L  = 0 in n ) == Pr(not(loss & loss &...loss) == 1 - (1-P)^n , eg for :
n=3 & P=0.5 is Pr(.) = 1 -Pr(0,3) = 1 - (1-0.5)^3 = 0.875 = 1 -0.125 2nd tree
n=3 & P=0.7 is Pr(.) = 1 -Pr(0,3) = 1 - (1-0.7)^3 = 0.973 = 1 -0.027 3rd tree

Pr( at least m wins in n bets ) =      Sum:k=m..n[Pr(k,n)], k ie horizontal;
                                = 1 -Sum:k=0..m-1[Pr(k,n)], k ie horizontal.

Note the pattern Pr(at least 1) = 1 - Prod:j=0..n( 1-Pj  ), j ie vertically;
if all Pj=P then Pr(at least 1) = 1 - (1-P)^n , which follows from the basic
knowledge that probability is isomorph with counting in the set theory which
is isomorph with logic , and from DeMorgan's rule of logic :
not(a or b) = not(a) & not(b) , hence (a or b) = not( not(a) & not(b) ).
By common sense: At least one == not(none) = not(not(a) & ...& not(z) ).
Here isomorph means a 1-to-1 correspondence so strong that the formulas
for aggregate probabilities obtain mechanically ( ie by simple rules of a
straight translation ) from the hopefully better known rules of logic :
   Logic    Probabilities
  not(.)      1 - Pr(.)
 (a & b)    Pr(a)*Pr(b) = joint probability assuming stochastic independence
 (a or b) = 1 - (1 - Pr(a))(1 - Pr(b)) = Pr(a) + Pr(b) - Pr(a)*Pr(b)
(a or b or ... or z) = 1 - (1 - Pr(a))(1 - Pr(b))...(1 - Pr(z))
                     = 1 - (1 - Pr)^m if all m Pr(.)'s = Pr

Why don't they tell it in books ? Why didn't they teach us that at school ?

Idea8 contains 1 -Prod(1 - p[i,j]) , with probabilities of failure p[i,j] ,
as a total value of a MAXimum spanning tree ( MST ) model MAXimizing success
( eg reliability ) ie minimizing the probability of a total failure .

The similar basic pattern can be seen in the worn out Birthday problem ( how
many people have to meet for the chances > 50 % that there will be at least
one duplicate birthday ) :
Let j  be the number of people gathered; then the probability of at least one
            duplicate is for  n (eg = 365 days/year) :
 Pr(j,n) = 1 - Prod:i=1..j  ((n -(i-1))/n)
         = 1 - Prod:i=0..j-1( 1 - i/n    )  this pattern not shown in books
 Programmed as a recurrent formula for, say, up to 25 people :
 n := 365;  Pr := 1.0;    { Pr == Pr(no duplicate) }
 for i:=0 to 25-1 do { i+1 = j = nr of people gathered }
 begin Pr := Pr*(1.0 - i/n); { Pr:=1 if j < 2 -> i+1 since i = j-1 so j=i+1 }
  writeln('Pr(at least 1 duplicate when j = ',   i+1 ,' people meet) = ',
           (1.0 - Pr):7:5 );  { 1 - Pr(no duplicate) = Probab(at least 1) }
 end;

The Birthday problem approximated by Paul Halmos for P=0.5 = the probability
of pairing over  n  possible & equiprobable shared values (eg shared birthday)
requires  1.18*sqrt(n) = eg 1.18*sqrt(365 days/year) =. 23  people to gather.
The Birthday problem can be freshly posed as other Murphy's law for clustering
of random events ( eg 2 failures of a pump , or 2 storms in one period , etc )
{ Garza 1987 }.

Q: Why don't I employ Markov chains here ?
A: Although I believe that Markov chains could be used here, I do not see any
   need for or any advantage from using Markov chains. Do you ? Recently in a
library I found a thin book { Ronald A. Howard : Dynamic Programming and
Markov Processes , 1969 } which you may like to scan.


Back to Kelly :
For P=0.5 , R1=0.8 , R2=0.6 , we saw trees with the NAIVE LOSER's f=1 { like
 in Paulos 2003 , pp.96-99 , no tree } , while our SMART WINNER's f=0.2 =.
 fm = 0.208 .
For a seemingly reasonable yet EXTREMELY RISKY f = 2.fm = 0.41666 we get

 Vn =  Vo.((1 +       f.R1 )^P  )((1 -       f.R2 )^(1-P))
    = 10k.((1 + 0.41666*0.8)^0.5)((1 - 0.41666*0.6)^0.5  )
    = 10k.( 1.333*0.75 )^0.5 = 10000 = Vo ie f = 2.fm = 0.41666  is here (due

to P=0.5) a break-even point when the most likely risky Vn = the riskless Vo .
Vn = Vo for  f = 2.fm  holds only if P=0.5 because only then is V a symmetric
 function of f . This can also be verified by asking & answering the question :
Q: For which f is Vn = Vo ie Vn/Vo = 1 = (1+Rc)^n = 1+Rc  ie ln(1+Rc) = 0 ?
A: From the above Vn/Vo = 1 we can drop P only if P=0.5 , and get :

  1 = (1 + f.R1)(1 - f.R2) = 1 + f.R1 - f.R2 - R1.R2.f^2 , hence

  f = (R1 - R2)/(R1.R2) = 1/R2 - 1/R1 = 1/0.6 - 1/0.8 = 0.41666 = 2.fm

Further illustration of an opeRational meaning of Kelly fractions obtains
from my formula for the number of reinvestments  n  needed for the likely
multiplication of the initial capital Vo into  Vn = M.Vo . The  n  matters,
because from Keynes we know that "In the long run we are all dead."
From above we have W = n.P , L = n.(1-P) , and :

  Vn = Vo.((1+Rc)^n) = Vo.((1 + f.R1)^n.P)((1 - f.R2)^(n.(1-P)))

 ln(M) = ln(Vn/Vo) = n.ln(1+Rc) = n.P.ln(1 + f.R1) + n.(1-P).ln(1 - f.R2)

 so  n = ln(M)/ln(1+Rc) . Quiz: With which probability does this n asure
    Vn = M.Vo ??

Here comes my table showing n = fun( f , M=2 , P=0.5 , R1=0.8 , R2=0.6 ) :
 f has ln(1+Rc) needs n  reinvestments to make  Vn = M.Vo = 2.Vo :
0.5    < 0      n = -69 < 0 ; it is impossible to make Vn = 2.Vo
0.4166   0      n =  oo , n = infinity for f = 2.fm due to P=0.5 ie symmetry
0.41  0.0006556    1057
0.4   0.0016        434 = n reinvestments
0.3   0.00833        83
0.2   0.0103         67 = n is minimized for fm = 0.208 which is optimal
0.1   check my  n =  92
0.05  check my  n = 158
Feel free to compute n = fun( f, etc ) for P=0.7 in the following trees.


Asymmetrical multitrees (due to P <> 0.5 hence f=2.fm does not give Vn = Vo) :

Due to P=0.5 , the above multitrees are symmetrical. I reco to show them also
for P <> 0.5 , ie asymmetrical multitrees ( 1st f <> fm, 2nd f=fm ) :

A similar multitree for P changed to P=0.7 ie 1-P = 0.3, is ASYMMETRICAL
wrt probabilities Pr's , while all the possible values V stay UNchanged.
For f = 0.2 , which is very suboptimal for P=0.7 , we have :

3rd tree: P=0.7 , f=0.2 ( << fm =. 0.8 ), has :
(1 -  f) + f*payoff = 1 + f*(payoff -1) =  a factor which changes our CC
(1 -0.2) + 0.2*0.4  = 1 + 0.2*(0.4 - 1) = 1 - 0.2*0.6 = 0.88 = loss factor
(1 -0.2) + 0.2*1.8  = 1 + 0.2*(1.8 - 1) = 1 + 0.2*0.8 = 1.16 = gain factor
  has the final  V4 = Vo.G^4 = Vo*1.068^4 = Vo*1.3 = 13000 > Vo :

P <> 0.5 makes a multitree ASYMMETRICAL , hence its V1, V2,.., V4 must be
         computed as a weighted geomean (of V's weighted by Pr's) :
Vn = Prod( Vk^Pk ) , eg  V1 = (8800^0.3)(11600^0.7) = 10677

n=0                     Vo = 10000
                       Pr =   1.0
       10000*0.88 ........... / \ ..................... 10000*1.16 = 11600
n=1                 V =    8800  11600   give geomean V1 = 10677 > Vo
                   Pr =    0.3 + 0.7  = 1
     8800*0.88 ..........  /  \  /  \...................11600*1.16 = 13456
n=2              V =   7744   10208   13456      give V2 = 11401 > Vo
                Pr =   0.09 + 0.42  + 0.49 = 1
   7744*0.88 ......... /   \ /    \ /   \ ..............13456*1.16 = 15609
n=3          V =   6815   8983   11841  15609    give V3 = 12173 > Vo
            Pr =  0.027 +0.189 + 0.441 +0.343 = 1
   6815*0.88 ..... /   \ /    \  /    \ /   \ ..........15609*1.16 = 18106
n=4      V =   5997   7905   10420   13736   18106 ,  V4 = 12998 > Vo
       Pr  = 0.0081 + 0.076 + 0.265 + 0.412 + 0.240 = 1
   Sum[Pr] = <--- 0.084 --->+<------- 0.917 ------> = 1
    losses : <--- V < Vo -->|<------ Vo < V ------> : gains

IF a binomial multitree is ASYMMETRICAL wrt Pr's (due to P <> 0.5 <> 1-P)
THEN it holds :
+  If P > 0.5 then the distribution is skewed towards higher Pr's
              else skewed towards lower Pr's.
+  The weighted geomean Vn <> the median POSSIBLE V on the level n, eg:

   V4 = ((5997^0.0081)(7905^0.076).....(18106^0.240)) = 12998
     <> ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420 which is
!! still the median of POSSIBLE V's but NOT anymore the median of the
   PROBABLE V's .

The similar multitree for P = 0.7 ie 1-P = 0.3 again, but now with the
OPTIMIZED f = 0.8 =. fm . The change of f has changed V's while Pr's stay
as they were since P has not changed :

4th tree: P=0.7 , f=0.8 =. fm , has :
(1 -  f) + f*payoff = 1 + f*(payoff -1) =  a factor which changes our CC
(1 -0.8) + 0.8*0.4  = 1 + 0.8*(0.4 - 1) = 1 - 0.8*0.6 = 0.52 = loss factor
(1 -0.8) + 0.8*1.8  = 1 + 0.8*(1.8 - 1) = 1 + 0.8*0.8 = 1.64 = gain factor
  has the final  V4 = Vo.G^4 = Vo*1.162^4 = Vo*1.8229 = 18229 >> Vo ;
P <> 0.5 makes a multitree ASYMMETRICAL , hence its V1, V2,.., V4 must be
         computed as a weighted geomean (of V's weighted by Pr's) :
Vn = Prod( Vk^Pk ) , eg  V1 = (5200^0.3)(16400^0.7) = 11620

n=0                     Vo = 10000
                         Pr = 1.0
       10000*0.52 ........... / \ ..................... 10000*1.64 = 16400
n=1                 V =   5200  16400    give geomean V1 = 11620 > Vo
                   Pr =    0.3 + 0.7  = 1
     5200*0.52 ........... /  \  /  \...................16400*1.64 = 26896
n=2              V =   2704   8528   26896       give V2 = 13502 > Vo
                Pr =   0.09 + 0.42 + 0.49  = 1
   2704*0.52 ......... /   \ /    \ /   \ ..............26896*1.64 = 44109
n=3          V =   1406   4435   13986   44109  ,     V3 = 15688 >> Vo
            Pr =  0.027 + 0.189 + 0.441 + 0.343  = 1
   1406*0.52 ..... /   \ /    \  /    \  /   \ .........44109*1.64 = 72339
n=4       V =   731    2306   7273    22937   72339 , V4 = 18229 >> Vo
        Pr  = 0.008 + 0.076 + 0.265 + 0.412 + 0.240 = 1
    Sum[Pr] = <------- 0.349 ------>+<-- 0.652 ---> = 1
     losses : <------ V < Vo ------>|<-- Vo < V --> : gains

Comparison of the 3rd and the 4th multitree , both with P=0.7 :
3rd tree: f=0.2 << fm , has Sum[Pr]=0.917 of V > Vo ,  V4 = 12998
4th tree: f=0.8 =. fm , has Sum[Pr]=0.652 of V > Vo ,  V4 = 18229
so c.fm  allows for tradeoffs between a high chance of a not too high V > Vo
versus a not too high chance of a high V >> Vo . These tradeoffs quantify
the wisdom of "The higher the returns, the higher the risks".


.- Idea5:  On preferring & decision making :
Readers should be told that Rc & Rs are in fact weighted geomeans Gw , so
rarely found in books on Probabs & Stats, not even in the big Encyclopedia of
Statistics. Many a perplexed reader will appreciate to read few facts on the
means in general :
A mean is a value m such that a fun(m,..,m) = fun(X1,..,Xn) ; so infinitely
many means can be devised by using powers like Xj^2 , Xj^3 , etc; eg the so
called root mean squared rms = sqrt(Sum[Xj^2]/n).

Harmonic mean <= geomean <= arithmetic mean aka expected value E[ r.v. ]
              <= root mean squared aka rms
      with the = only in the case of all Xj being equal.

H = (G^2)/A  <=  G = sqrt(A.H) = exp(Sum[ln(Xj)]/n) <=  A = Sum[Xj]/n <= rms

      where  Sum[ln(Xj)]/n = ln( (Prod(Xj))^(1/n) ) = ln( (Prod(Xj^(1/n)) ).

Similar inequalities hold for weighted means (with a mix of positive weights
which need not be just 1/n ).

Q: When to use which mean-operators ?
A: Arithmean reduces commensurable ie directly exchangable values (in the same
   units , physical , monetary or whatever, eg same boxes) to a single mean.
   Unlike the geomean and harmean , arithmean allows for negative values,
   which are more or less compensated by positive values in a mix , and a
   zero in a mix does not cause arithmean = 0.
Geomean reduces either a set of incommensurable values (in Idea7 below), or
   an exponential growth (or decay) due to multiplicative factors ( like eg
   price indexes CPI , RPI , or an inflation index ), to a single mean factor.
   Above used weights like our counts W & L , and proportions like P & 1-P ,
   express repetitions ie frequencies (absolute, and relative, respectively).
   Below in the Idea7 , weights have no such frequentist interpretation; they
   just express how much we prefer or value an aspect , attribute , feature ,
   characteristic .
Harmean reduces ratios to a single mean value, eg to the mean speed S :
   Speeds  Sj = distance Dj/time Tj  , hence  Tj = Dj/Sj , so that the
      total Time = Sum[Tj] = Sum[Dj/Sj] = Sum[Dj]/S , so the mean speed S is
     S = Sum[Dj]/Sum[Dj/Sj] with distances Dj as weights , so for all Dj = D1
     S = n.D1/(D1.Sum[1/Sj])   = n/Sum[1/Sj] ,  and 1/S = Sum[1/Sj]/n .
   Resistors Rj in parallel: 1/R = Sum[1/Rj] = Sum[Ij/U] = I/U  because
     all parallel currents Ij sum up to I , while a voltage  U is common to
     all resistors Rj in parallel.
     Quiz: Compute the mean resistance , and explain what it means .
     Hint: Read the above definition of a mean in general.
   Capacitors Cj in series : 1/C = Sum[1/Cj]  because Q/C = Sum[Q/Cj]
     where Q is an electric charge.
     Quiz: Compute the mean capacity , and explain what it means .
   Variance reduction obtained by minimizing the mean squared error MSE of
     independent quantities of the same kind combined into a posterior
     variance V yields 1/V = Sum[1/Vj] , so V = 1/Sum[1/Vj] , hence in theory
     even poor additional estimates will decrease the total variance V.
     M/V = Sum[ Mj/Vj ]  for means , measurements , monies M , where one Vj
                         and one  Mj  may be the prior values. An analogon
     is Millman's theorem with Uj/Rj = Ij's for electrical circuits.
     For unbiased Gaussians is  1/Vj = 1/MSEj is a measure of precision aka
     Fisher information which is essentially a local measure of change of
     a pdf ; it is additive ( like Shannon's information , which is a
     global measure invariant wrt to local changes like swaps or sorting ).

Afaik , it holds MAXimum(Gw) =. MEDian(Gw) . MEDian in probabilistic terms
means 1/2 = P(X <= MED) or P(MED <= X) = 1/2 , hence for a finely discrete
X it holds  P(X <= MED) =. P(MED <= X) . MEDian is also the measure of the
central tendency which minimizes the sum of ABSolute deviations from it.
Hence  Rc  MAXimized wrt the fraction  f  MAXimizes the MEDian wealth ,
growth rate, end value ( moreover in the shortest time, almost always ,
albeit in a long run, when we will already be dead anyway , according to
Keynes ). MAXed Vn is also the most probable outcome .

A r.v. X has geomean Gw(X) = exp(E[ln(X]) <= E[X] == A (Jensen's inequality).
Approximate relations between a geomean G, arithmean A, and a variance of X
Var = Sd^2 are (the best 1st) :

  G =. A - Var/(2A) = A( 1 - 0.5(Sd/A)^2 ) =.. sqrt( A^2 - Var ) , hence

G^2 =.. A^2 - Sd^2  is what I call a Pythagorean approximation .

  H =. A(1 - (Sd/A)^2 )  estimates a harmean .

My little discovery: for the means of just 2 positive numbers is their

  A - G = ( Hellinger distance aka Matusita distance )/2

That the arithmean is a misleading price index (eg of stocks ) follows from
these approximations which allow to keep the arithmean A almost unchanged
( or it may even grow ) when the geomean G decreases while the variance of X
Var = Sd^2 (ie volatility ie instability) increases in predepression periods
preceding a turning point.


.- Idea6:  On Bellman's optimality principle :
In the 2nd edition of NDM add a section explaining Richard Bellman's
OPTIMALITY principle of dynamic programming ( DP ) which is a cute
paradigm for many tasks in both deterministic and stochastic decision making.
Sequential multistage optimizations like Kelly , shortest path , MST , and
many more tasks fit the DP paradigm : An optimal partial solution should be
extended by the best element available so as to stay optimal. DP solutions are
GREEDY aka MYOPIC in the positive sense of not needing more foresight than 1
period ( ie 1 step or 1 stage ) ahead.
Kelly strategy needs foresight (in terms of P, R1 , R2 ) only 1 period ahead
ie it is MYOPIC aka GREEDY , hence it fits Bellman's principle of optimality
in his paradigm of dynamic programming math . DP has been applied to Kelly
already in the very 1st paper on Kelly's paper in { Richard Bellman & R.
Kalaba ( both at RAND ): On the role of dynamic programming in statistical
communication theory, IRE Transactions on Information Theory, Sept. 1957,
197-203 }. No wonder that in 1960 a dynamic programmer R. Kalaba has also
rediscovered the 5th MST algorithm ( pp.53 & 54 fig.6 in the superpaper by
{ Graham & Hell , 1985 } referred below ).
The 2nd earliest paper on Kelly was { Michael Marcus ( at RAND ) :
  The utility of a communication channel and applications to suboptimal
  information handling procedures, IRE Trans. on Information Theory, 1958,
147-151 }. Tell about these semantic links to your info-hungry readers.


.- Idea7:  How a consumer should choose :
Let's consider a consumer who has to choose the best (eg a PC, house,
or spouse ) from several alternatives, each having more than 1 param
(or property or feature, all quantifiable in incommensurable units, and
for all holds the Pigs' Principle: The more of the same the tastier :-) ,
say, RAM size in MB and CPU speed in MHz (or IQ & BTW & $$ ; btw, BTW in
NLese is VAT in UKese, but here I mean the brest-to-waist ratio; who wants to
maximize VAT anyway ? :-).
The sum MB+MHz makes no sense, but the product MB*MHz does, and so does a
geomean G = [ (MB * MHz) ]^(1/2).  If we value MB twice as much as MHz,
then we should use the weighted geomean :

    Gw = [ (MB^2)(MHz^1)       ]^(1/(2+1)  )  for PC's ;
    Gw = [ (IQ^3)(BTW^2)($$^4) ]^(1/(3+2+4))  for spouses .

Q3: Is it necessary to use [...]^(1/SumOfWeights) ??
A3: Methinks not, since the largest weighted product Pw (or Gw ) wins,
    since Pw1 > Pw2 makes sense, hence Pw1/Pw2 makes sense too, because in a
product of ratios (like price indexes CPI or RPI ) the incommensurable units
or scales annul (while comparing Sums of values in different units or scales
would be nonsense). Now comes my question :

Q4: In your chap.8 pp.223-228 you discuss (x -x0)(y -y0) which I believe is
    traditionally called the NASH PRODUCT ( not in your Index ), where
(x0,y0) you call STATUS QUO point ( aka disagreement point, I would call
it a NO DEAL point ). When choosing the best, we are making tradeoffs, but
there are some UNacceptable bottom values for each param ie MUST BE MORE
than x0 of IQ , more than y0 of BTW , and more than z0 of $$ , ie :

 x > x0 > 0,  y > y0 > 0,  z > z0 > 0,  so we could use the formula

 F1a = (x -x0)(y -y0)(z -z0)  for comparative evaluations .

Ex1a: I can get for free only 1 old PC , and use it as a backup. I am
      free to choose from the following ancient crunchers :
PC1 has 32MB & 200MHz;  PC2 has 48MB & 150MHz (all reasonable values).
The UNacceptably low values are 16MB & 100MHz , or less.
If I doN't know what is more important (both weights = 1), I compute :

 F1a(PC1) = (32 -16)(200 -100) = 16*100 = 1600
 F1a(PC2) = (48 -16)(150 -100) = 32* 50 = 1600

So, ceteris paribus , none of the both PC's is preferred. The fact that
both PC's yield exactly 1600 is just a(n un)lucky coincidence.

Somebody suggested to me to use the product of ratios like
(x -x0)/(x9 -x0) (where the denominator is the range of values) :

 F1s = (x -x0)(y -y0)(z -z0)/[ (x9 -x0)(y9 -y0)(z9 -z0) ]

Such a scaling cannot harm, but it is superfluous for MECHANICAL comparative
evaluations and decision making, because in a ratio of such F1s' the
denominators will annul, and the relation < or = or > will hold anyway.
See { Fred Roberts ( at RAND ): Measurement Theory with Applications to
Decisionmaking, Utility Theory, and the Social Sciences, 1979, p.82 } , and
in a game-theoretical setting { Mesterton-Gibbons , 1992, p.102 } .
However, HUMAN decision making is likely to appreciate the scalings .

Although I am no game theorist, I know that NASH PRODUCT in its more
general form is the product like

 F1aw = ((x -x0)^w1)((y -y0)^w2)((z -z0)^w3)  where wj's are

BARGAINING POWERS.  For comparative evaluations the wj's can play the
role of the weights. For more on weighting see Q5d below.

So far on x0,y0,z0 as the UNacceptably LOW  values of our params.
Quite often we can define UNachievably HIGH values of our params, say,
x9,y9,z9. Then an evaluation formula could be a product of ratios like

(x -x0)/(x9 -x) with x9 > x > x0 > 0, y9 > y > y0 > 0, z9 > z > z0 > 0 :

 F1b = (x -x0)(y -y0)(z -z0)/[ (x9 -x)(y9 -y)(z9 -z) ]

which can also be re-interpreted as a ratio of Pros/Cons to be MAXimized.
In fact Pros & Cons need not to be based on the same variables. Pros can
be a product of M terms, and Cons a product of N other terms, M <> N.

Ex1b: Again    PC1 has 32MB & 200MHz ; PC2 has 48MB & 150MHz .
      Again the UNacceptably low are :   x0 =  16MB & 100MHz = y0,
   now with the UNreachably good large   x9 = 120MB & 300MHz = y9 .

 F1b(PC1) = [(32 -16)/(120 -32)].[(200 -100)/(300 -200)] = 0.182
 F1b(PC2) = [(48 -16)/(120 -48)].[(150 -100)/(300 -150)] = 0.148

so PC1 is preferred, ceteris paribus .

Another possibility are ratios like (x -x0/x)/(1 -x/x9) , hence

 F2b = (1 -x0/x)(1 -y0/y)(1 -z0/z)/[ (1 -x/x9)(1 -y/y9)(1 -z/z9) ]

Ex2b: same params as in Ex1b :

 F2b(PC1) = [(1 -16/32)/(1 -32/120)].[(1 -100/200)/(1 -200/300)] = 1.023
 F2b(PC2) = [(1 -16/48)/(1 -48/120)].[(1 -100/150)/(1 -150/300)] = 0.741

so PC1 is preferred again.

What about an asymmetrically (wrt x0 vs x9) structured

 F2c = (x -x0)(y -y0)(z -z0)/[ (1 -x/x9)(1 -y/y9)(1 -z/z9) ]  ??

Q5:  My question proper now is :
Q5a: What is your learned opinion on the stuff of this Idea7 in general ??

Q5b: Do you prefer F1b over F2b over F2c for other reasons than for its
     symmetry and its simplicity (less typos when using a calc) ??
     IF yes THEN for which reasons, if any ?? ELSE why NOT ??

Q5c: IF we have a mix of params, some of the kind The LARGER the better,
        and some The SMALLER the better , eg we want a spouse with a
        LOWER body-mass index (BMI) x, and HIGH y in $$ and HIGH z in IQ
     THEN we can swap the x-term in the numerator with the x-term in
          the denominator :

  F2cMm = (x9 -x)(y -y0)(z -z0)/[ (x -x0) (y9 -y)(z9 -z) ]

     Do you know a better way ??

Q5d: (on weighted products)
    A weight should be applied to the whole subterm, ie

  F2w = [((x-x0)/(x9-x))^w1].[((y-y0)/(y9-y))^w2].[((z-z0)/(z9-z))^w3]

  Since eg 5^2     > 5 , but eg 0.3^2     = 0.09 < 0.3 , and
           5^(1/2) < 5 , but    0.3^(1/2) = 0.55 > 0.3 ,
  - I wonder whether a mathematician like you would restrict magnitudes
    of the terms to be weighted so as to get (term)^w > 1 ??
    Since 1^w = 1 , I do not like 1 <= .   What about you ??
    Restrictions could be imposed by rescaling by some large enough factors.
  - 0 < wj should be used, eg [(x -x0)^2].(y -y0).[(z -z0)^0.5]
  - Weights may be normalized to sum up to 1, but I wonder whether it has
    any (DIS)advantage ??
  Any suggestions on weighted products ??

Q5e: Which of my F's do you (NOT) like and why (NOT) ??

Q5f: Do you know better formulas ?? Any refs ??

Q5g: The values of the UNachievably HIGH  x9,y9,z9, should be chosen so
     that x,y,z  cannot get too close to  x9,y9,z9 . Can you recommend
     a simple rule (of thumb) for chosing x9,y9,z9  in practice ??


.- Idea8:  On MST (probabilistic maximum spanning trees too) :
I have applied MST as a MAXimum ( not minimum ) spanning tree ( ST )
to estimations of probability of vectors P(y,x1,..,xn) , where y is
a class-event to be guessed, and x1,..,xn are predictor-events. P is used
for automated decision making aka supervised learning. Since the counts ie
absolute frequencies of the complete vectors are always too low, the problem
arises : which subvectors should be used to get the best APPROXimation of
those P(y,x1,..,xn)'s. I believe that there is still enough room for your
creativity which could improve this Idea8 of mine (implemented and tested
by me ages ago), hence this humble suggestion :
An MST can be used for estimations of joint probabilities  P(y,x1,..,xn)
used for Bayesian decision making ( the denominator in the P(y|x1...,xn)
is irrelevant for choosing y because P(x1,..,xn) is the same for all y's ).
My point here is that in practice the counts over the complete vector
(y,x1,..,xn) are always too low, so we have to use counts over subvectors
(ie truncated vectors), and the task is to determine what are the best
truncations to use in the chain product :

P(y,x1..,xn) =  P(y)P(x1|y)P(x2|x1,y))P(x3|x2,x1,y)..P(xn|xn_1,..,x1,y)
             =. P(x1,y).Product_k:2,..,n[ P(xk|xj,y) ], j<>k ,

which will best approximate P(y,x1,..,xn). My solution: the resulting
approx should yield the MAXImum P obtainable from the MST-based chain
product. Hope you like it. Although I have hands-on-&-in programming &
    testing experience with it, there is enough room for improvements :
Q6: Which pairs (k,j) should be chosen based on which criteria ??
    I choose the probabilistically most dependent pairs (k,j) first (from
the pool of the free pairs ie from the remaining pairs ). How exactly would
you choose the the BEST-1st candidates P(xk|xj,y) ?? In which order ie by
which priority formula for dependence ??

Afaik , my solution is invariant wrt the choice of the root x1 .

MST can be used to implement the Union-Find algorithmics :
+ To construct a spanning forest ( SF ) of an undirected graph ( UG ) ;
+ to decompose an UG (of relations) into its connected components ie into
  several disconnected subgraphs ;
+ to answer questions about a 2-way connectivity (not about 1-way reachability
  as Warshall's algo does). Upon request I will supply the list of 7 lucky
  questions for the lucky dragons.
+ to partition a set of equivalence relations into equivalence classes,
  and to answer questions similar about these classes and their elements.
  Equivalence is reflexive, symmetric and transitive.
  Equivalence can mean a kind of sameness like eg connectedness.

{ R.C. Prim , Shortest connection networks , Bell Systems Technical Journal,
  36, 1957, 1389-1401 } , where he referred to Kruskal 1956, and they both
 referred to Otakar Boruvka 1926 , says on pp.1397-8 about MST :
MiniST minimizes all INCreasing symmetric funs, and
       MAXimizes all decreasing symmetric funs of the arc values ( lengths ,
          weights , costs ).
MaxiST MAXimizes all INCreasing symmetric funs, and
       minimizes all decreasing symmetric funs of the arc values.
Afaik , a symmetric function of variables x1,..,xn is a function which is
invariant wrt any permutation of those variables.
Since MST is GREEDY ie MYOPIC , it fits Bellman's dynamic programming ( DP )
paradigm, and since in their book belgian / french professors A. Kaufmann
& Cruon { Dynamic Programming, 1967, on pp.265-68 in the sect.40 on using
NonAdditive values in DP } on p.267 (low) , say that in DP models we may
replace addition by  "a law of internal composition over a subset V of
real numbers, provided this law is ASSOCIATIVE and compatible with the
relation of ORDER of these numbers. The only concrete problems known to us
in which the value of a policy is defined by an operation other than addition
are those where the value of a policy is the product of the present values.".
    Well, a log(Prod(.)) = Sum of logs.
Q7: Which properties must that ORDER relation have ?? { Fred Roberts :
    Measurement Theory , 1979, p.15 } has a table of 7 order relations :
    quasi, weak, simple, strict simple, strict weak, partial, strict
    partial order ; you choose, we learn :-) Eg partial order relation
    is reflexive & transitive & antisymmetric. Of course, an order
    cannot be symmetric ( while eg equivalence must be, find above ).

Q8: How does Kaufmann & Cruon's condition fit Prim's symmetric funs ??

Q9: Does it also hold for non-tree-like non-circular shortest paths ??

This knowledge helps to solve tasks less common than the shortest ST of
roads, pipes, or wires. A nice task is to get the most reliable or safe ST
of paths when probabilities of failures p[i,j] are given for all pairs of
nodes i,j . Assuming independence of failures, the JOINT probability of
success (= nonfailure) is     Prod(1 - p[i,j]) , so JOINT probability of
failure (= nonsuccess) is 1 - Prod(1 - p[i,j]) , which is to be minimized,
and which is an INCreasing fun of the p[i,j]'s. This task solves MiniST
of as-if additive lengths   p[i,j] .  But we are free to compute MaxiST
of as-if additive lengths  -p[i,j] . See { Prim, 1957, p.1398 }. Here
the additivity of a MST works because the same (.) which extremizes a
product Prod(.) extremizes also its monotonic log(Prod(.)) = Sum of logs.

MST should be named as such in your section on shortest paths; my point is
that when I see something on shortest paths, then I peek into the index to
check for the presence/absence of MST by Kruskal 1956 ( who refs only to
Otakar Boruvka ), Prim 1957 ( who refs to Kruskal and to Boruvka ), and by
{ Dijkstra 1959 , who refs to Kruskal }.  O. Boruvka was a Czech student
of math in Paris, who has published his MST algorithm in 1926 when the
costs of a power-line net had to be minimized during the electrification
of rural Moravia (the birthplace of Gregor Mendel , Kurt Goedel , Sigmund
Freud, etc). A different algo of 1930 by Jarnik also in Czechoslovakia
is similar to Prim's and Dijkstra's. Otakar Boruvka has died about 1992.
The history of MST was well researched by then the chief mathematician
of Bell Labs Ronald Graham & Pavol Hell in Canada, in their superpaper
{ On the history of the minimum spanning tree problem, Annals of the
History of Computing, 7/1, January 1985, pp.43-57; fig.6 on p.54 nicely
visualizes historical motivations & relationships among 5 MST algos }
where they tell that O. Boruvka's algo of 1926 was rediscovered in 1960
by G. Sollin , who has published in French, but it is also described
in English in yet another book by a belgian professor { A. Kaufmann :
Graphs , Dynamic Programming , and Finite Games , 1967, pp.32-5 }. Boruvka's
( hence also Sollin's ) MST algo is similar to Kruskal's , but it is based
on extending subgraphs rather than nodes as Kruskal does. { Graham & Hell ,
pp.53-4, fig.6 } write that MST algo has been also rediscovered in 1960 by
an early hour dynamic programmer R. Kalaba at RAND , who already in 1957
coauthored the very 1st paper on Kelly's paper (find him above & below).
Clearly MST fits Bellman's dynamic programming ( DP ) paradigm already
mentioned above.


.- Idea9:  Wisdoms relevant to our themes :

General wisdoms :

  Math is about insights and proofs. Calculation and computing are about
       numbers and results. Here you get insights, derivations, results.

  There is no sight without insight .  Man sieht nur was Man weist.

  A picture may be worth 1k words , a formula may be worth infinitely many
    pictures . { JH }

  It is a simple task to make things complex , but
  it is a complex task to make things simple but not simplistic .

Specific wisdoms :

  Harry Markowitz ( 1990 Nobel prize for economy) has opeRATIONALized the old
  folks wisdom of "Don't keep all your eggs in one basket" by quantifying the
  arithmean-variance trade-offs when diversifying investments for a single
  investment period.
  The Kelly-inspired geomean formula generalized for diversified stocks (not
  just horses) says how much to invest, if at all, into which asset over next
  investment periods, so as to optimize an exponential growth over a long term,
  if possible. My multitrees visualize its possible & probable returns & risks
  (see Idea4 above).  { JH }

  The wisdom of "The higher the returns, the higher the risks" is quantified
  (at the end of Idea4 , just above Idea5 ) by the comparison :
    3rd tree: f=0.2 << fm , has Sum[Pr]=0.917 of V > Vo ,  V4 = 12998
    4th tree: f=0.8 =. fm , has Sum[Pr]=0.652 of V > Vo ,  V4 = 18229

  Uncertainty is different from risk , when odds are fixed , known , or
  calculable .  { after Frank Knight 1921 }

  You are unlikely to get an edge out of what you see in the news.
     { Edward Thorp quoted on p.142 in Poundstone 2005 }

  Most people don't understand that these strategies are statistical and
  that winnings of a few percent are possible if you are prepared to play
  for a long time, during which you have a good chance of losing all your
  money .   { Jeremy Bernstein : Physicists on Wall Street , p.165 }

  Dependence on reversion to the mean tends to be perilous when the mean
  itself is in flux .  { Peter Bernstein : Against the Gods , p182 }

  I can calculate the motions of heavenly bodies, but not the madness of
  people.  { Isaac Newton , after losing 20k pounds in the stock market }

  In the long run, we're all dead .  { John Maynard Keynes , 1929 }


As you see, all ideas for the enhanced 2nd edition fit nicely together,
and also into your nice NDM . Make the 2nd edition wonderful. The ideal
volume of the Ideal Decision Making would be 444 pages.


.- Errata: Typos old & new :
Typos not yet in Korner's corrections on  www.dpmms.cam.ac.uk/~twk :
         wrong <- correct  :
p.75 that this <- than this
p.75 "quadruple initial fortune and if throw 81 heads and 93" swap(ped)
p.75 by 8 <- by 7.43
p.79 line 9: we have supposed that ...  > 1 <- < 1  ( clashes with p.78 )
p.79 there is (i) and then (i) again
p.79 the 2nd (ii) <- (iii)
p.86 de Witt <- De Witt ( in NLese, unless the 1st name preceds, in which
      case we write eg Jan de Witt )
p.252 sleep for half per hour <- an hour
p.283-284 the last eq: numerator and denominator are (to be) swapped ; check
      it against the similar formulas on p.299-300
p.285 & elsewhere: replace l, which looks like 1, by L; this should be
      easy since there aren't too many small l's around
p.333 Let us write q=1-p ... we will reject <- accept
p.362 Huygen's <- Huygens's
p.362 & p.363  a a <- a : try hard not to use a small a to avoid eg a a
      at (ii); this may be hard, since an a is a heavily overloaded symbol

Typos in your errata :
p.6  equality signs should NOT be reversed (see your corrections for p.4+5 )
p.65 ..... footnote 23 <- 21
p.75 ... 2.6.7 line 1. <- (ii)
p.77: Exercise 2.6.10: Are you sure that your fix is ok ??
p.290 Expression <- p.291

Improve your Index :
Add page nrs : to Kelly : XII, 308 ;  to Thorp : 331
Missing in the index (while explicitly present in the text) :
- QuickSort 175 ; payout ratio 71-2 ; sorting 175 ; miniMax 204
- status quo point 226-8, 232-3
- disagreement point == status quo point ( I call it: no deal point )
- prisoner's dilemma 213, 232-3, 271-2, 276
- RSA 138 ; Penney 267 ; Chicken 214 ; Hoeffding's ineq. 329-30, 332 ;
Missing in the index (while implicitly present in the text) :
- geometric mean == geomean
- Nash product (x -x0)(y -y0)
- Minimum spanning tree == MST in your sect. on shortest paths


.- References:  { only books & periodicals have all keywords in their title
start with Caps ; not so in the titles of papers. Unlike in the text, our
    format of the names is Familyname Firstname; quiz: for which 2 reasons ? }
Bellman Richard, Kalaba R:
+ On the role of dynamic programming in statistical information theory,
    IRE Trans. on Information Theory, 1957/9, 197-203
+ Dynamic programming and statistical communication theory, PNAS, 13, 1957,
    749-51
Bernstein Peter L: Against the Gods - The Remarkable Story of Risk , 1996,
  has a separate Name index
Bernstein William J. & Wilkinson David: Diversification , rebalancing ,
  and the geometric mean frontier , 1997-11-24 , on www
Dijkstra E.W: A note on two problems in connexion with graphs,
  Numerische Mathematik, 1, 1959, 269-71
Garza Gene: Murphy's law and probability or how to compute your misfortune,
  Mathematics Magazine, 60/3, June 1987, 159-65
Graham Ronald & Hell Pavol: On the history of the minimum spanning tree
  problem, Annals of the History of Computing, 7/1, January 1985, 43-57,
  on p.54 fig.6 nicely visualizes historical motivations & relationships
  among 5 MST algos
Haigh John: Taking Chances, 2nd ed. 2003 ; for Kelly on horses only see the
  chap.11 pp.235-8 & Appendix V ; on www are his errata as "Corrections"
  for both editions separately
Howard Ronald A: Dynamic Programming and Markov Processes , 1969
Kahneman Daniel et al: Would you be happier if you were richer? A focussing
  illusion ;  Science, vol.312, 2006-6-30, 1908-10
Kaufmann A: Graphs, Dynamic Programming, and Finite Games , 1967, see p.32
Kaufmann A & Cruon : Dynamic Programming, 1967; on pp.265-68 is the sect.40
  on using NonAdditive values in DP
Kelly John L: A new interpretation of information rate , Bell Systems
  Technical Journal, July 1956, 917-26
Korner Tom W: Naive Decision Making, 2008, find his errata on www and more
  of his typos here above.
MacLean Leonard C, Thorp Edward O, Ziemba William T. (editors) :
  The Kelly Capital Growth Investment Criterion : Theory & Practice , 2011,
  850pp. This book edited by 3 eminent prof(i)s bundles all key papers on the
     Kelly criterion ; 3 good papers are already at :
 www.edwardothorp.com / Publications / New in 2010 :
  1st paper : Good and bad properties of the Kelly criterion, 2010-1-1
  2nd paper : KellySimulationsNew.pdf is on simulations     , 2010-1-18
  3rd paper : Understanding_Kelly_New.doc  by Thorp (more by Thorp below)
Marcus Michael: The utility of a communication channel and applications
  to suboptimal information handling procedures, IRE Trans. on Information
  Theory, 1958, 147-151
Mesterton-Gibbons M: An Introduction to Game-Theoretical Modelling, 1992,
  p.91, where his fig.3.1 graphically visualizes quantitative relationships
  of Nash equilibrium , Pareto-optimal points, etc ; see also p.102
Mirowski Philip: Machine Dreams - Economics Becomes a Cyborg Science, 2002;
  pp.372-6, 382, 389 are on Kelly strategy
Murphy Roy E: Adaptive Processes in Economic Systems, 1965; on pp.123-5 he
  has nicely visualized (Fig.7.4 on p.123) his derivation of his inequalities
  (7.5.12) & (7.5.19)
Paulos John Allen: A Mathematican Plays the Stock Market, 2003; on pp.96-99 he
  well explains the need to use a geomean , but is unaware of the possibility
  to turn a sure LOSER into a WINNER via Kelly fraction
Poundstone William: Fortune's Formula, 2005; too little math, good notes & refs
Prim R.C: Shortest connection networks , Bell Systems Technical Journal,
  36, 1957, 1389-1401
Roberts Fred: Measurement Theory with Applications to Decisionmaking,
  Utility Theory, and the Social Sciences, 1979, see pp.15 & 82
Thorp Edward O: The Kelly criterion in blackjack, sports betting, and the
      stock market , 1997, 2000, 2006 = the last version of 45 pp at :
  www.edwardothorp.com (more under MacLean above)
Vince Ralph: Optimal f and the Kelly criterion, IFTA Journal, 2010 ; his
  in-line (a)-(j) and 1b[r=0] point to the Notes at the end of his paper .
  Since 1990 he has written 5 books on his "optimal f".

More upon request. I hope that you will bless us with your own deep(er)
insights & explanations.
Looking forward to your thoughtful feedback, please don't hesitate to
correct me , especially if I am not wrong ! :-)

Jan Hajek in NL (we are just simple folks here)
.-