.- Ideal Decision Making & Money Making (much on Kelly criterion/formula/rule) An open letter to Professor T.W. Korner, Cambridge, UK Jan Hajek , NL , CopyRight (C) 2008 - 2012 , v.5.9 , 2012-2-23 Find: invest , stock , horse , fortune , gain , return , risk , loss , loser , win , winner , Kelly strategy , Kelly criterion , Kelly formula , Kelly rule , Kelly fraction =/= fractional Kelly , payoff == payout , odds , optim , likely , probab , visual , path , combinat , multitree == acyclic graph == loopless graph , geomean == weighted geometric mean , estimat ; rebalanc , Markowitz portfolio ; game , Nash , status quo ; comparative evaluation ; Bayes ; Markov ; DP == dynamic programming , greedy , myopic ; vector ; algo == algorithm ; prog == program ; MST == Maximum / minimum spanning tree Additional abbreviations: Afaik = As far as I know ; CC == current capital , arithmean == arithmetic mean == expected value == E[ . ] , harmean == harmonic mean , CLT == central limit theorem , NDM == Naive Decision Making , pdf == probability density function Names: Arrow Bellman Bernstein Boruvka Dijkstra Feynman Graham Haigh Halmos Jarnik Kalaba Kaufmann Kelly Korner Kruskal Mesterton-Gibbons MacLean Markowitz Marschak Mirowski Murphy Paulos Poundstone Prim Roberts Sollin Thorp Vince Wilkinson Ziemba ; RAND == RAND Corp. ( the famous think-tank in Santa Monica, CA, where I have lectured in 1977, no typo, not in 1877 :-) .- Contents : Symbols: Idea1: A minibestseller = Ideal title + more visuals - no-op math Idea2: Money management for an ideal investment Idea3: Murphy's conditions Idea4: Visualization and explanation of Kelly strategy as a tree Idea5: On preferring & decision making Idea6: On Bellman's optimality principle Idea7: How a consumer should choose Idea8: On MST (probabilistic maximum spanning trees too) Idea9: Wisdoms relevant to our themes Errata: Typos old & new References: This letter suggests a coherent cluster of enhancements ( briefly Ideas ) for the 2nd edition of Professor Korner's nice book Naive Decision Making ( NDM ) of 2008. This is an open letter so that it will hopefully trigger others to suggest enhancements for the 2nd edition. However, this letter contains some fresh insights & innovations which should NOT be lifted without a proper and full reference to the author of this letter (a draft of possible epapers). Dear Professor Korner == Koerner , It was a great pleasure to scan in the library your last book of 2008 on Naive Decision Making ( NDM ) for your insights & explanations. Here I suggest Idea1 ... Idea9 for improvements & extensions of the hopefully forthcoming 2nd edition with additional sections ( simply added at the end of the 1st edition, hence without any laborious renumberings; such add-ons can just refer back to the existing sections ). Moreover I have some questions Q1: !!Q2: Q5a -g ... Q9 numbered for easy reference & reply. At the end of this letter you will find a bunch of typos ferreted out of NDM . .- Symbols: == equivalence or synonymity ; ~ negation ie complement, eg ~i = equal ; <> unequal ; =. approximately equal ; := assignment op; (.) an argument of eg the natural logarithm ln(.) or a probability Pr(.) |y| == abs( y ) == absolute value ^ == power ; (.)^(1/2) == (.)^0.5 == sqrt( . ) == square root * . )( ].[ == plain multiplications, except in numbers as eg 2.718 Prod( eg Rj ) == Product_j( . ) == a chain multiplication Sum[ eg Xj ] == Sum_j[ . ] == a chain addition e = 2.718281828 == the base of ln( . ) E[ r.v. ] == an expected value == arithmetic mean of a random variable E[R] == expected return { and } embrace a comment in a prog or algo , or an explicit and often full in-line reference for easy in-context finding without backjumping. Indexing is kept so simple as to minimize syntactic suggar : Investment periods are indexed j , 0..j..n in Vj , Pj , Rj ; j_1 == [j-1] , n_1 == [n-1] j is the preferred index for other (non-investment, non-Kelly) tasks or apps ; possible tree-outcomes: indexed k , 0..k..n in Vk , Pk , Rk within a period ; simultaneous assets are indexed i: fmi, R1i , R2i , Pi . fm , R0 , R1 , R2 , Vn are so named as to be uniquely findable ; Rc , Vo == an original value , are almost uniquely findable. Search for f. f= f = f's will find almost all f's . There are P( , Pr( and Sum[Pr] . @ digit (if any) are marking fixes & new parts (later have higher digits). .- Idea1: A minibestseller = Ideal title + more visuals - no-op math : - In the title of the 2nd edition replace the 1+4-letter word Naive with some other word ( Ideal is my ideal replacement ), because a hesitant potential buyer (even a librarian) has not yet read your justification of the word Naive which creates negative associations with trivial or dumb . Dijkstra aka EWD used to say: 1 deadlock a day, keeps the user away. I say: 1 Naive a day, keeps the reader away. - NDM lacks visuals. The 2nd edition should contain more pics for the post-mod vidiotized masses of Buridan's asses . NDM contains only 3 graphics (on pp.151, 267, 336) which are not enough for a book of 370 pages in our foxy 21st century. Reco to throw in the pics of : + My multitrees, at least 2 for P=0.5 hence symmetrical, and 2 for P > 0.5 ie asymmetrical, always one for an optimal Kelly fraction f=fm , and one for f <> fm. See Idea4 for fresh, meaningful ( filled with meanings and highly explanatory ) multitrees , so easy to type. + Reward sets like in { M. Mesterton-Gibbons : An Introduction to Game-Theoretical Modelling, 1992, p.91 } , where his fig.3.1 visualizes quantitative relationships of Nash equilibrium vs Pareto-optimal points, etc. His reward sets are not so easy to draw as my easy to calc & type multitrees, but M-G is so right on p.90 where he says that "a picture of a reward set is worth several thousand words." .- Idea2: Money management for an ideal investment : I am glad that you spend more than a couple of pages on { John L. Kelly : A new interpretation of information rate , Bell Systems Technical Journal, July 1956, pp.917-26 } . Alas , like { Kelly 1956 }, { Haigh 2003 }, { Poundstone 2005 } , and many others , NDM shows the Kelly strategy only for a horse race when a loss always means the whole fraction f lost , ie you consider only the loss R2=1 ( in fact |R2| as we build the - sign visibly into our formula ) in my more generally applicable closed form formula for compound return Rc & optimal fm with a loss 0 < R2 <= 1. Let W+L = n be some anticipated or actual counts ( ie absolute frequencies ) of W wins and L losses ( ie W ups and L downs over n investment periods ). Our gain R1 > 0 here is your u-1 ( my 1+R1 = u == payout ratio in NDM p71-2 == payout multiplier on pp.2,7,15,17-8 == payoff factor ). Then starting with an initial value Vo , after n reinvestments of a fixed fraction f ( hopefully a true fraction ie 0 < f < 1 , not an unrestricted factor, since f <= 0, the smaller the worse, signals high probability of loss and for more reinvestments an almost sure loss , while f = 1 suggests risking the whole farm , and f > 1 suggests leverage ie risking borrowed assets which is not wise and certainly not prudent ) of our current fortune Vj , our final compound value would be : Vn = Vo.gains.losses = Vo.(gain^W)(loss^L) == Vo.(growth^W)(decay^L) = Vo.((1 + f.R1)^W).((1 - f.R2)^L) , 0 < R1 , 0 < R2 <= 1 , so that 1+Return over n successive reinvestments after the initial one Vo is : 1+Rn = (V1/Vo)(V2/V1)...(Vj/Vj_1)...(Vn_1/Vn_2)(Vn/Vn_1) = Vn/Vo , hence the geomean compound return per each of n investment periods (steps or stages) is Rc = (Vn/Vo)^(1/n) -1 = ( (1 + f.R1)^P ).( (1 - f.R2)^(1-P) ) - 1 for 2 returns ( R2 == loss ) = Prod( (1 + f.Rj)^Pj ) - 1 for j returns ( some of them losses with - sign ) as (e^a)(e^b)..= e(a+b+..) , (x^y)^z = x^(y.z) , x^y = (e^ln(x))^y = e^(y.ln(x)), is Vn = Vo.(1+Rc)^n = Vo.((1 + f.R1)^(n.P)).(( 1 - f.R2)^(n(1-P))) = Vo. exp( n.( P.ln(1 + f.R1) + (1-P)ln(1 - f.R2) ) ) = Vo. exp( n. E[ ln(1 + f.Rj) ] ) for more returns ( losses with - sign ) = Vo. exp( n. ln(1 + f.Rc) ) = Vo.(1+Rc)^n as above , P = W/n = W/(W+L) = (n-L)/n = 1 - L/n 1-P = L/n = L/(W+L) = (n-W)/n = 1 - W/n are proportions ie relative frequencies , hopefully approaching probabilities of wins & losses in the long run ( Keynes : In the long run we are all dead ). Rc is MAXimized for the optimal fraction f=fm by my fortune's formula for fm neither shown in { Kelly 1956 }, { Haigh 2003 }, { Poundstone 2005 }, { Korner 2008 }, { Vince }, nor by almost all other authors. Since the same fm which maximizes Rc maximizes any smooth monotonic increasing function of Rc , we prefer to maximize wrt f this : ln(1+Rc) = ln( (Vn/Vo)^(1/n) ) = (1/n)ln(Vn/Vo) = ln((1 + f.R1)^P ) + ln((1 - f.R2)^(1-P)) = ln( 1 + f.R1).P + ln( 1 - f.R2).(1-P) Quiz: Now you know exactly what to do, so use your highschool math : 1. To justify why a logarithmization is advantageous ; if you cannot , then try to get fm in a closed form ie as an analytic formula without using ln( . ); 2. By any means get fm ( for P & R1, R2 ) in a closed form ie as an analytic formula . Try it before you read on. Using Ln(.) has many advantages : Adv1: Ln(.) reduces products . ie * to + , and powers ^ to * ie . , hence : Adv2: Expressions are simple, since an extreme is found by setting a first derivative to 0 while a derivative of a Sum = Sum of derivatives . Adv3: Ln(.)'s have simple Taylor expansions . Adv4: A product of n successive ratios from n+1 successive values is Vn/Vo , so unlike in the usual (here inappropriate) arithmean , the here proper Sum:j=1..n[ ln(Vj/Vj_1) ] = ln(Vn/Vo) does not require to know all the intermediate values; only Vo , Vn , n are needed to compute the geomean (Vn/Vo)^(1/n) = 1+Rc = the mean growth ratio per period . Moreover, only a sum (not a product) expressed as ln(Vn/Vo) can be plugged into CLT : Adv5: A sum (not a product) is needed for estimating probabilities like Pr(V <= D) , 1 - Pr(V > D) , Pr(V <= x.Vo ) , Pr( x.Vo <= Vn <= y.Vo ) which follow from the central limit theorem; do plug ln(Vn/Vo) into CLT. Adv6: If ln(X) has a normal distribution ( aka Gaussian pdf ), then X is said to be lognormal , which is skewed ( due to X > 0 only ), hence a pdf of X has a thicker tail , which is a better model of stock prices . Adv7: Ln(.) fits the Weber-Fechner law of physiological sensation ( dB, pH, Richter scale ), and roughly with psychological valuation of eg wealth and earnings ; would doubling your salary make you twice as happy, and if yes, for how long ? People everywhere care much more about their locally relative living standard than about its absolute level. This is confirmed by the top professor of psychology in Princeton and Nobel Prize winner ( economy 2002 ) Daniel Kahneman and his four coauthors of "Would you be happier if you were richer ? A focussing illusion" { Science, vol.312, 2006-6-30, p. 1908-10 }. What matters to Jill & Joe is their vs others mutual economic competitiveness and jealousy when keeping up with the Joneses. Adv8: No differential equation has to be solved directly. We saw Vn = Vo.exp( n.E[ ln(1 + Rj) ] ) for many returns (losses with - ), which is in fact the general solution of a differential equation when various growths and/or decays are mixed in a sequence. It can be seen from the simplest useful differential equation for either a growth only or for a decay only : y' == dy/dt = k.y , k is a constant , has the solution : y = y0.e^(k.t) , y0 is the initial condition ie y(t=0) . This solution is derived by dividing our difeq by y and then logdif : y'/y = k ; integrate wrt t to get: ln|y| = k.t + C ; exponetiate : |y| = e^(k.t +C) = e^C . e^(k.t) ; since e^any > 0 we get: y = Const.e^(k.t) ; for y=y0 we get: y0 = Const.e^0 = Const , hence y = y0.e^(k.t) is the solution Adv9: Ln(.) has more advantages which are of no or limited use here. At last my fortune's formula for the ideal(ized) investment (in stocks too) : A bit of elementary calculus solving for f the d/d(f)[ ln(1+Rc) ] = 0 yields the wanted optimal fraction f=fm generalized for 0 < R2 <= 1 not shown by most authors including Kelly : fm = (P.R1 - (1-P).R2)/(R1.R2) = E[R] /(R1.R2) == Expected[ Return ]/(R1.R2) = P/R2 - (1-P)/R1 ; let's plug in data from { Paulos 2003 pp.96-99 } : = 0.5/0.6 - 0.5/0.8 =. 0.208 =. 0.2 is the optimal fraction ; consider R1=0.7 & R2=0.2 giving the factor fm = 1.786 > 1 suggesting leverage ; R1=0.6 & R2=0.8 giving the fraction fm = -0.208 < 0 signaling losses ; our R1=0.8 & R2=0.6 giving the fraction fm = 0.208 > 0 suggests profits, while f=1 yields Rc = 0.72^0.5 = 0.85 per period < 1 is LOSSy, like in Paulos , so V52 = Vo*((1+0.8)*(1 -0.6))^26 = 10000*0.72^26 = Vo*Rc^52 = 10000*0.85^52 = 1.95 for f=1 of a LOSER , versus : fm = 0.2 yields Rc = 1.02^0.5 = 1.010 > 1 hence GAINful ( if repeated ) , so V52 = Vo.Rc^52 = 10000*1.010^52 = 17000 for fm=0.208 of a WINNER . Our optimization of f has not just increased gain, it has turned an almost surely a LOSER into a likely WINNER !! One sure loser is professor Paulos in his book { A Mathematican Plays the Stock Market, 2003 } where on pp.96-99 he well explains the need to use the geomean , but he is unaware of the real possibility to turn a sure LOSER into a likely WINNER via the optimal Kelly fraction fm . Hence he is also unaware of the further trade-offs offered by fractional Kelly ( ie using a fraction of the Kelly fraction , only briefly touched upon in NDM p.363 ) f = c.fm , 0 < c <= 1 ( if P=0.5 then for c=2 is Rc=0 ie a break-even point ie the most likely long term outcome is risky Vo ) : IF 0 < c < 1 is chosen properly , eg c = 1/2 or 1/3 or 1/4 , THEN a sure LOSER for f=1 , will for f = c.fm become a likely WINNER with somewhat smaller gain than for f=fm , but with MUCH SMALLER RISK . { Edward O. Thorp : The Kelly criterion in blackjack, sports betting, and the stock market , 1997, 2000, 2006 } reveals a lot of opeRational meaning of the Kelly criterion via two simple looking, though not simplistic, formulas for f = c.fm : Pr( increasing to y.Vo before decreasing to x.Vo ) == Pr(..) in my next table = ( 1 - x^((2/c)-1) )/( 1 - (x/y)^((2/c)-1) ) for 0 < x < 1 < y , = ( 1 - x )/( 1 - x/y ) if c = 1 ie for the full Kelly fraction where c is a fraction of the Kelly fraction ie an investor uses f = c.fm , 0 < c < 1 ( < 2 = break-even point if P=0.5 ), and the gain Gc = c.(2 - c) is a fraction of the full ( ie if c=1 ) Kelly gain . Pr(V <= x.Vo ) = x^((2/c)-1) = the chance of ever decreasing to x.Vo , a complement of which is the chance of never decreasing to x.Vo . IF c=1 ie full Kelly AND x = 1/a THEN Pr(V <= Vo/a) = 1/a = x tells us that full Kelly gives a bumpy ride, hence a fractional Kelly ( c < 1 ie f = c.fm < fm ) is recommended with the following tradeoffs : x < y c (2/c)-1 Pr(..) Gc Pr(V <= x.Vo ) ---------------------------------------------------- for f = c.fm , 0 < c < 1 1/2 2 1 1 2/3 1 x = 1/2 = 0.5 1/2 2 1/2 3 8/9 0.75 x^3 = 1/8 = 0.125 1/2 2 1/3 5 0.97 0.555 x^5 = 1/32 = 0.03125 1/2 2 1/4 7 0.992 0.438 x^7 = 1/128 = 0.00781 1/2 2 1/5 9 0.998 0.36 x^9 = 1/512 = 0.00195 By investing eg only f = fm/3 , our gain will most likely be 55.5 % of the most likely gain from fm . This decrease of gain is traded-off for the much increased probability 0.97 ( > 2/3 = 66.6 % if c=1 ) that our Vo will double ( by the factor y=2 ) before it will be halved ( by x=1/2 ). 1 - Pr(..) = Pr( not( increasing to y.Vo before decreasing to x.Vo ) ), which { Poundstone 2005, p.231 , no formulas } interprets thus : full Kelly has 1/3 chance of halving Vo before doubling it, while half Kelly has 1/9 chance of halving Vo before doubling it. A simpler case : If R1 = R2 then fm = ( 2P -1 )/R1 , as eg in a simple Brownian motion . The simple cases of betting only on one runner ( eg in a horse race ) : If R2=1 then fm = P - (1-P)/R1 = (P.(1+R1) -1)/R1 = (P.u -1)/(u-1) ; if R2=1=R1 then fm = P - (1-P) = 2P -1 , 0.5 < P < 1 for 0 < f < 1, and as P = W/(W+L), fm = (W-L)/(W+L) , which I recognized as the form of the solution (eg via the combinatorial Andre Reflection Method) of the simplest random walk problem ( which is isomorph with the Ballot Problem ). No wonder that even the simplest Kelly fraction has meaningful links to other paths problems (and even to Catalan numbers). In this simplest Case1 we plug fm = 2P -1 into ln(G) = P.ln(1+f) + (1-P)ln(1-f) to get the maximum ln(Gmax) = ln(2) + P.ln(P) + (1-P)ln(1-P) = 0.693 - H(P) in nats , or 1 - entropy in bits ; entropies in Case2 Case3 Case4 are in { Kelly 1956 } . Even more general is to include a riskless return R0 , with at least one Rj 0 < R0 < Rj , and at least one other such that -1 <= Rj < 0 , hence now the + before both terms , as the + allows to write a single argument in a product of two or more ( Rj & Pj )'s for a single asset : 1+Rs = Prod( ( f.(1+Rj) + (1-f)(1+R0) )^Pj ) = Prod( ( 1 + f.Rj + (1-f)R0 )^Pj ) = Prod( ( 1+R0 + f.(Rj-R0) )^Pj ) , which for P & 1-P becomes = ( (1+R0 + f.(R1-R0))^P )( (1+R0 + f.(R2-R0))^(1-P) ) , -1 <= R2 < 0. Quiz: I humbly reco to set to 0 the 1st derivative wrt f of ln(1+Rc) & ln( 1+Rs ), and get those more general ( than for horses ) formulas for the optimal Kelly fraction fm which maximizes the growth rate of wealth. Then play with some fractional Kelly ie with fractions c , 0 < c < 1 , of Kelly fraction fm to see how f = c.fm will change returns & risks ie Vn & Pr in my multitrees shown further below as Idea4 . Quiz: Try more values of a single asset (a 1-dimensional case) with Sum[Pj]=1 : For 2 pairs of values (R1, P ) & (R2, 1-P) we got an equation linear in f ; for 3 pairs of values (R1, P1)...(R3, P3) you should get A.f^2 + B.f + C = 0 , as the derivative d[fun(f)]/df lowers the powers of f by 1 from 3 to 2 ; for 4 pairs of values (R1, P1)...(R4, P4) you should apply a Taylor expansion about zero : for -1 < x <= 1 : ln(1+x) = x - (x^2)/2 + (x^3)/3 - (x^4)/4 + ... for 0 <= x < 1 : ln(1-x) = -x - (x^2)/2 - (x^3)/3 - (x^4)/4 + ... where x will be f.Rj in our case , and use only the terms not higher than f^3 , derivative of which will f^2 and lower, leading to a quadratic again. Try it, as I did. By now it might seem that enough was said already, afterall, the optimal fm is derived. That does not reveal enough of all the various rich MEANINGS ( how you may get rich or become poor ) when taking risks with f's . More INSIGHTS , understanding , explanation , obtains from examples VISUALized in my binomial multitrees , which I believe cannot be made more informative than they are in the Idea4 . Can you enhance them ? Intermezzo: Since 1990, Ralph Vince has published on his "optimal f" fraction (here fv ), inspired by the Kelly criterion or strategy (here fk) of 1956. Although it is commendable that Vince has shown a formula for a fraction ( not an unbounded factor ), I feel compelled to ask few FUNDAMENTAL QUESTIONS wrt his fv , answers to which I could not find in his writings. He is explicit about his legitimate wish for a fraction 0 <= f <= 1 which maximizes terminal wealth relative TWR (hence also growth factor G = TWR^(1/n) ), but his theory has to be QUESTIONED due to the following SEVERE SHORTCOMINGS in his writings. Executive summary : (all here is "as far as I can reasonably know") ( == is equivalence or synonymity , eg he == Vince , his == Vince's ) - No explanation of WHY he uses /abs(Worstloss) == /Vw and WHY it is better than other divisors, eg my /abs(Averageloss) == /Va . My answer: 1. rj = Vj/Vw ensures all f.rj >= -1 hence 0 <= TWR = Product[1 + f.rj ] which is the Kelly's idea to "bet a fraction of your farm" plugged into the CLASSICAL formula for the COMPOUNDED RETURN eg in the widely used book by Haim Levy & Marshall Sarnat : Capital Investment and Financial Decisions, 1982, ch.12, p.276. 2. rj = Vj/Va can be used too since we MAXimize TWR wrt f , so some f's are ignored anyway :-) - No proof of his claimed optimality of his "optimal f" based on his chosen divisor /abs(Worstloss) == /Vw . Here follows a counterexamle : - It is NOT true that his /Vw maximizes his TWR & growth G. You can verify that his favorite data values : +9 +18 +7 +1 +10 -5 -3 -17 -7 eg in his Handbook p.121 etc, expressed as 0.09 0.18 0.07 0.01 0.10 -0.05 -0.03 -0.17 -0.07 ( proportional changes do not change fv & TWRv and fa & TWRa ), lead indeed to : his fv=0.24 "yields" TWRv =Product[1 + fv.Vj/Vw ] = 1.095698 and my fa=0.11 "yields" TWRa =Product[1 + fa.Vj/Va ] = 1.095702 , but CONTRARY to his claims, my TWRa > his TWRv while my fa << fv ie fa IS MUCH LESS RISKY than his fv . Next I show that both TWRa & TWRv are UNrealistic , ILLusory PSEUDO-"yields" : - His TWR & growth factor G suggest ILLusory high yields because any REAListic TWR must be computed without any /Vw or /Va or /Velse : RealTWRv = Product[1 + fv*Rj ] = 1.028951 << 1.095698 = TWRv by Vince RealTWRa = Product[1 + fa*Rj ] = 1.013835 << 1.095702 = TWRa by me . On the last 2 lines the ILLUSIONS are << . Unlike TWRv & TWRa , RealTWRv & RealTWRa are changed by proportional changes of Vj's . - He has NOT defined optimality well, so his fv yields JUST A tradeoff : "bet whole farm" fk=1.00 & RealTWRk = 1.087584 we reject as TOO RISKY , so his fv=0.24 & RealTWRv = 1.028951 is JUST A tradeoff because my fa=0.11 & RealTWRa = 1.013835 is MUCH LESS RISKY than fv . A decent measure of optimality could be : 100.(RealTWRv -1)/fv = 2.8951/0.24 = 12.06 100.(RealTWRa -1)/fa = 1.3835/0.11 = 12.58 , or another one : fv/fa = 0.24/0.11 = 2.18 times LARGER RISK versus (RealTWRv -1)/(RealTWRa -1) = 28951/13835 = 2.09 , so we MAY say that the relative gain is NOT worth the relative risk of MUCH LARGER fv . - His estimates of JOINT probabilities [ in his Handbook pp 305, 306, 309, 310 etc are NOT justified and are highly QUESTIONABLE . - More upon request. The end of executive summary . { Ralph Vince } makes a well ment attempt to get a true fraction ie always 0 <= f <= 1, while Kelly-like fm > 1 may result. An f > 1 suggests risking borrowed assets aka leveraging, which is not prudent in general, and if returns are small in particular, so that to make a noticable profit we have to reinvest many times ( large n ), thus greatly increasing our risks of a sequence of repeated losses of a fraction f (an almost ruinous run, not just a single loss of an invested fraction f of our current capital ). The formula 2 in { Vince 2010 } is readily seen to be isomorphous with our Rc , if our R1 & R2 are replaced by his ratios Rj/Rb (b == index of the biggest loss so far or anticipated ) here denoted as r1 & r2 . Hence the direct way to his fv for 2 outcomes is via my analytic formula for fm . E.g. when we either are payed $2 with P=0.5 or we lose $0.8 as in his Table 1 r1 = R1/R2 = 2.0/0.8 , r2 = R2/R2 = 0.8/0.8 = 1 ( for two outcomes is r2=1 , his minus sign of a loss is already built into my fm ) so that my closed form solution for fv ( Vince shows NO analytic formula yielding fv directly ) is fv = P/r2 - (1-P)/r1 = 0.5*( 1 - 0.8/2.0 ) = 0.300 < 0.375 = fm . Proof (for 2 outcomes) of fv <= fm if 0 < R2 <= 1 as assumed far above : fv = P/1 - (1-P)R2/R1 <= fm = [ P.R1 -(1-P)R2 ]/(R1.R2) fv = [ P.R1 - (1-P)R2 ]/R1 = E[R]/R1 <= fm = E[R]/(R1.R2) 1/R1 <= 1/(R1.R2) , if 0 < R2 <= 1 hence for just 2 outcomes R1 , R2 , is fv/fm = R2 ie fv = fm.R2 in the table : For me is Pr(V <= x.Vo) a good measure of risk ( choose your own x ). Here are few evaluations (for the simplest P = 0.5 = 1-P used by Ralph Vince) : denote eg P52fm == Pr(Vn <= Vo , if fm & n=52 ) P52fv == Pr(Vn <= Vo , if fv & n=52 ) ; In | Out: fm.R2 = fv ; note that all P..fm > P..fv R1 R2 fm > fv P5fm P5fv P10fm P10fv P52fm P52fv 2.0 0.1 4.750 > 0.475 0.101 0.026 0.036 0.003 0.000 0.000 2.0 0.5 0.75 > 0.375 0.236 0.153 0.155 0.073 0.010 0.0007 2.0 0.6 0.583 > 0.350 0.262 0.195 0.184 0.112 0.02 0.003 2.0 0.8 0.375 > 0.300 0.31 0.278 0.242 0.203 0.055 0.029 1.8 0.6 0.555 > 0.333 0.279 0.213 0.204 0.130 0.029 0.005 0.08 0.06 2.083 > 0.125 0.437 0.378 0.410 0.330 0.302 0.159 0.8 0.6 0.208 > 0.125 0.437 0.412 0.410 0.376 0.302 0.235 so that our last case has R(fm)^1 = 1.010 > 1.0087 = R(fv)^1 per 1 period , R(fm)^5 = 1.053 > 1.0443 = R(fv)^5 per 5 periods , R(fm)^10 = 1.109 > 1.0906 = R(fv)^10 per 10 periods , R(fm)^52 = 1.709 > 1.57 = R(fv)^52 per 52 periods , so fv < fm means a tradeoff of less gain ( with fv than with fm ) for considerably less risk ( with fv than with fm ). Whether a tradeoff is optimal or not, depends on a mathematical definition of optimality . If Worstloss Rb -> 0 then fv -> P = 0.5 here, while fm -> infinity suggests large leveraging ie risking borrowed assets, which should hardly be optimal for a prudent investor . For multiple outcomes I rewrite Ralph Vince's formula with a slight change in our notation here : for more than 2 outcomes it is more convenient to define returns Rj's as signed numbers ie losses with the implicit - sign. Let Rb == worstloss Rj < 0, so at least one Rj/abs(Rb) = -1 ( b == BigBad ) fv = f which Maximizes Prod( 1 + f.Rj/abs(Rb) ) = f which Maximizes Sum[ ln( 1 + f.Rj/abs(Rb) ) ] because the extreme of any function Fun(X) occurs for the same Xe as when extremizing a monotonic function Mono(Fun(X)) ; here Fun(X) == Prod( .., f, ..), Mono( . ) == ln( . ). Caution ! Warning to prevent confusion & illusion of faster growth (higher G) : While Ralph Vince is quite clear & explicit, I do not see him warning that to get the real exponential growth factor per trade Greal == 1 + real geomean return (real == realistic at least for backtesting his fv ie as-if using fv on the data from which fv was obtained ) we MUST plug his fv ( or any f obtained by any means ) into the following formula with Rj only, ie NO Rj/abs(Rb) : Gtrue1 = ( Prod( 1 + fv.Rj ) )^(1/n) without any worstloss Rb . The maximization can serve only for obtaining his fv , but his maximized Product (he calls it GHPR ) is NO way a realistic growth factor per trade, future or past . Btw, from a sequence of n+1 values Vo,V1..Vn (say stock prices ) we easily obtain the sequence of returns R1..Rn ( no Ro == R0 ) thus : R1 = (V1 - Vo)/Vo = V1/Vo - 1 , R2 = (V2 - V1)/V1 = V2/V1 - 1 , ... etc , hence for the superrisky f=1 is Gf1 = ( Prod( 1 + 1*Rj ) )^(1/n) = ( Vn/Vo )^(1/n) so for a geomean return of a sequence of values Vj we need to know just Vo, Vn & n , while for an arithmean all other values are needed ( although an arithmean can be updated sequentially from a recurrence formula through which all values Vj must pass ). End of intermezzo on Ralph Vince's fv ; back to Kelly-like fm . Ideological and philosophical issues : + { Philip Mirowski : Machine Dreams - Economics Becomes a Cyborg Science, 2002 } pp.372-6, 382, 389 are on the ideological impact of Kelly's paper on the Cowles economists Jacob Marschak and the nobelist Kenneth Arrow . + On rationality { Peter Bernstein : Against the Gods , 1996 } : + see his Name index for many refs eg to Frank Knight vs Keynes , to Richard Thaler , etc ; - he does not mention Kelly and Latane ; + p232-46: on game theory ( 241-3 ! ); + p246: "Victorian concept of rational behavior", "measurement always dominates intuition", p249: "Victorian faith in measurement" ; + p252: "The strategic role of diversification is Markowitz's key insight." I prefer to say that: The nobelist (economy 1990) Harry Markowitz has opeRATIONALized the old folks wisdom of "Don't keep all your eggs in one basket" by quantifying the arithmean-variance trade-offs when diversifying investments for a single investment period. The Kelly-inspired geomean formula generalized for diversified stocks (not just horses ) says how much to invest, if at all, into which asset over next investment periods, so as to optimize an exponential growth over a long term. My multitrees visualize possible & probable returns & risks (see Idea4 below). + p253: "close resemblance between diversification and .. games of strategy." "one player is the investor and the other player is the stock market." ... "By making the best of a bad bargain - by diversifying instead of striving to make a killing - the investor at least maximizes the probability of survival." + regression to the mean : pp.167, 170-178, 182, 184-5, 206, 249, 264, 271-2, p182: " Dependence on reversion to the mean tends to be perilous when the mean itself is in flux ." Also in : { John Allen Paulos : A Mathematician Plays the Stock Market , 2003 } : pp.107-9 on regression to the mean : increased likelihood that after a peek a downfall will follow. + p179: "analysis is 100% hindsight. Worse, even small differences in annual returns over many years produce big differences in the investor's wealth in the long run." Two generalizations will not be handled here : 1: The easy one: 1+Rc and 1+Rs can be generalized for Sum[Pk]=1 , k >= 2 , each multinomial proportion aka relative frequency Pk having its own return Rk . Although Taylor expasions may help for a single asset , a simple numerical evaluation of Vn or 1+Rn or 1+Rc by trying out many f's between f=0 and f=1 will do even for non-programmers using a calc . Eg for k1+k2+k3 = n : Vn(k1,k2,k3) = Vo.( [1 + f.R1]^k1 )( [1 + f.R2]^k2 )( [1 + f.R3]^k3 ) with at least one Rk < 0, and at least one Rk > 0. P(k1,k2,k3) = (P1^k1)(P2^k2)(P3^k3).n!/(k1!k2!k3!) where the last fraction is called multinomial coefficient aka degeneracy factor in physics. 2: Diversification is tough but vital ( assets are indexed i ) : Optimal fractions fmi with 0 <= Sum[ fmi ] <= 1 for a diversification of simultaneous investments into many assets ( each with Pi , R1i , R2i ie not just all R2i = -1 like when betting on multiple horses with only 1 winner ) will be needed for prudent practical use. The fundamental difficulty arises from the hard to get yet needed joint probabilities, number of which increases exponentially with the number of assets. For Z assets, each simplistically assumed to either jump up by R1i or to fall down by R2i ( ie each asset with marginal probabilities Pi & 1-Pi ) we would need 2^Z joint probabilities :-( Joint probabilities easily obtain if we assume mutually independent returns on our assets . In that case joint probabilities are just products of the marginal P's. Analytic solution is possible for 2 assets under special circumstances like eg: the same returns R1 = R2 = R > 0 and the same losses L1 = L2 = L < R , D = R-L > 0 , all with the same probabilities P1=0.5=P2. From these symmetries it obviously follows that the fractions f1 = f2 = f. For s assets there are 2^s joint probabilities in general, and in our case of s=2 there are 4 joints P = P1*P2 = 0.5*0.5 = 1/4 , hence G = [ (1 + 2Rf) * (1 - 2Lf) * (1 + Df)*(1 + Df) ]^(1/4) ln(G) = [ ln(1 + 2Rf) + ln(1 - 2Lf) + 2*ln(1 + Df) ]/4 d(ln(G))/df = 0 = [ 2R/(1 + 2Rf) -2L/(1 - 2Lf) + 2D/(1 + Df) ]/4 Bringing the rhs to one common denominator ( its value does not matter due to the 0 = rhs ), yields a quadratic equation. It is only a quadratic due to only 3 different ln terms in our ln(G) : 0 = -8DLR.f^2 +(3D(R - L) -4LR).f +(D+R-L) 0 = 8DLR.f^2 -(3DD -4LR).f -2D = A.f^2 + B.f + C The positive root f = (-B + sqrt(B^2 - 4AC))/(2A), if any, is the optimal fraction fm ( fm=f1=f2 for both assets, in our symmetric case ) which yields the maximum geomean Gm . Let's plug in R=1 ie doubling, and L=0.5 ie halving : A = 8(1-0.5)0.5*1 = 2 , B = -(3*0.5*0.5 -4*0.5*1) = 1.250 , C = -2*0.5 = -1, so fm = ( -1.250 + sqrt(1.250^2 -4*2*(-1)) )/(2*2) = 0.46 ie the maximum geomean Gm obtains if f1 = f2 = fm = 0.46 ie if at the beginning of all investment periods we reinvest 46% of our current capital in each asset , so that 100 - 2*46 = 8% of our current capital will stay uninvested during all periods. Gs = the G above specifically with R=1, L=0.5, D = R-L = 0.5 : = [ (1 + 2f) * (1 - f) * (1 + 0.5*f)^2 ]^0.25 Gm = Gs(f=fm=0.46) = 1.119 ie the geomean return = 1.119 - 1 = 0.119 ie 11.9 % Let's plug some other f=f1=f2 into our specific Gs : f=f1=f2 G 100*2f = % of capital invested ie at risk 0.5 1.118 100 % 0.46 1.119 92 % 0.23 1.087 46 % only, while (11.9 - 8.7)/11.9 = only 27 % less than Gm 0.1 1.045 20 % only, while (11.9 - 4.5)/11.9 = 62 % less than Gm 2f G(2f in 2 assets) G(2f in 1 asset, fm = 0.5(1-0.5)/(1*0.5) = 0.5) 0.46 1.087 > 1.06 , 0.46 =. fm for our single asset 0.92 1.119 > 1.018 , 0.92 =. 2fm is extremely risky Due to the symmetries we reinvest the same amount A1=A2=A into our 2 assets. Our fm is a fixed portion ie a constant, so we are supposed to rebalance our portfolio before we reinvest. Rebalancing means that if one asset has ended with a value A1 > A2 of the other asset (one was more profitable or has lost less than the other), we reinvest a part of the last A1 into new A2 so that we start each new period with a new A1=A2=A (and end with A1 <=> A2). Rebalancing means that the winner is subsidizing the loser , because the winner may become a loser in the next period. However, before the next period, the investor is free to update his P's, R's & L's. Bayesian updating of P after n = W + L wins and losses gives P = (W+a)/(W+a + L+b) = (W+a)/(n + a+b) where a,b obtain from a beta pdf B(a+W, b+L). The maximum a posteriori estimate aka MAP ie the posterior mode uses a=b=0.5 , hence P = (W+a-1)/(n+a+b-2) = (W -0.5)/(n-1). Joint probabilities are not needed when betting on horses in a race. This joyful fact is due to only a single winner , hence possible gains are mutually exclusive. { Kelly 1956 } has derived formulas for optimal f's and for Gmax in 4 cases : Case1 : Bet on just 1 horse when R1=1=R2 : Vn = Vo.( (1+f)^W )( (1-f)^L ), so fm = P - (1-P) = 2P -1 , 0.5 < P < 1 for 0 < f < 1; due to P = W/(W+L) is fm = (W-L)/(W+L) = W/n - L/n = W/n - (n-W)/n = 2W/n -1 = 2P -1 , hence ln(Gmax) = ln(2) + P.ln(P) + (1-P)ln(1-P) = 0.693 - H(P) in nats ie 1 - entropy in bits. Entropic expressions for less simple Cases 2-4 are in { Kelly 1956 }. Bets on K horses in a race of h horses indexed i ( 0 <= K <= h ) : Case2 : more general case for fair odds R1i and R2i = 1. Kelly does not explain that odds are fair if E[R] = 0 ie neither loss nor win can be expected: 0 = E[R] = Pi.R1i - (1-Pi).R2i = Pi(1+R1i) -1 hence Pi(1+R1i) = 1 so 1+R1i = 1/Pi so Pi = 1/(1+R1i) and Sum[Pi] = Sum[ 1/(1+R1i) ] == Q = 1. For an event i is Pi = 1/(1+Odds(i)) ie Odds(i) = (1-Pi)/Pi ; but for a complementary ~i are Odds(~i) = (1 -(1-Pi))/(1-Pi) = Pi/(1-Pi) used eg in medicine (with i == "negative" == healthy ; ~i == "positive" in medicine ). Even payout == even money == R1=1=R2 == fair odds = 1:1 = 1 with Pi=0.5=P(~i) Case3 : fair odds not required ie 1+R1i <> 1/Pi yet Sum[ 1/(1+R1i) ] == Q = 1 ie as-if no track take. Case4 : in the real world there is a track take ie Sum[ 1/(1+R1i) ] == Q > 1 and Sum[ fmi ] <= 1 ie we don't have to bet the whole farm. Alas , { Kelly 1956 } is hard to follow , has an opaque Gmax formula , no example ; { Korner 2008 pp.81-82 } is a tough sledding, no Gmax formula , no example ; { Haigh 2003 } is intelligible , no Gmax , 2 examples with results, describes Kelly's Case4 computational procedure better (without deriving it); in his 2 examples the horse B has different odds in the race of h = 5 horses A,B,C,D,E. Payoff odds aka payout odds = profit:stake = profit/stake = R1i . When your horse wins, you get back (1 + payout odds) = 1+R1i per every $ betted. Caution : Haigh's payout odds alpha = payout - 1 = our return R1i , while { Kelly 1956 } defines his alpha = payout = 1+R1i . Be reminded that E[R] = P.R1 + (1-P)R2 = Pi.R1i + (1-Pi)(-1) = P.(1+R1i) -1 , where Pi is your private estimated probability of winning for the i-th horse . This Pi will seldom, if ever, equal the pseudoprobability computed from payout odds , ie Pi <> stake/(stake + profit) = 1/(1 + profit/stake) = 1/(1+R1i); normally we bet on profit > stake , and the sum of pseudoprobabilities == Sum[ 1/(1+R1i) ] == Q > 1 like here : I programmed Case4 by Kelly from Haigh and checked his results, but will not describe it here. Let me just say that it starts with sorting the horses in DECreasing order of their e1[.] = P[.]*(1+R1[.]) because E[.] = e1[.] -1 : Given : payout . Computed: fraction . If bet on one horse only [.] P odds = R1 1/(1+R1) e1 f fm for a single horse C 1 0.2 9:2 = 4.5 0.182 1.1 0.06 > 0.0222 ie bet 2% A 2 0.4 13:8 = 1.625 0.381 1.05 0.106 > 0.0308 ie bet 3% B 3 0.3 9:4 = 2.25 0.308 0.975 0.0625 >> -0.0111 ie no single bet D 4 0.08 5:1 = 5 0.167 0.48 0 -0.104 ie no single bet E 5 0.02 20:1 = 20 0.048 0.42 0 -0.029 ie no single bet Sum: Q = 1.085 > 1 : IF Sum[ 1/(1+R1i) ] == Q <= 1 ie an UNrealistic race without a track take THEN BEGIN IF Pi is UNknown (ie we use less information than Kelly needs) THEN we cannot use Kelly , yet we can make sure profit if betting fractions fmi = (1/(1+R1i))/Q , 1/Q normalizes to Sum[ fmi ]=1 ; no Pi needed :-) ELSE Kelly's Case4 procedure yields fmi = Pi ( is hard to estimate ), hence Kelly's Gmax changes with estimates of Pi's ; if Q <= 1 then in Kelly's Gmax his 2nd term becomes (1-1)*ln(0) which I define as 0 in a Sum, ie I skip his 2nd term in his Gmax to avoid a crash of my prog. In a Product 0^0 is undefined and must be interpreted as X^0 =1 (not as 0^X =0). END ELSE IF Q > 1 THEN it is a realistic race with a track take, Haigh / Kelly's Case4 ; here we get from Haigh's data : If R1[3] = 2.25 (as above for the horse B ) then Q = 1.085, bet total fraction = Sum[f] = 0.06 + 0.106 + 0.0625 = 0.23 ; if R1[3] = 2.00 (for B ) then Q = 1.110, bet total fraction = Sum[f] = 0.0337 + 0.0515 + 0 = 0.085 . Note that the horse B has f > 0 despite fm < 0 ; fm < 0 because expected gain E[3] = e1[3] -1 = 0.3*(1 + 2.25) -1 = -0.025 < 0 . This counter-intuitive result of f > 0 despite fm < 0 is mentioned in { Kelly 1956 p.925 } and is justified in { Haigh 2003 p.238 }. Depending on the B's odds , Haigh bets on K=3 horses when B has odds 9:4 = 2.25 , or on K=2 horses when B has odds 2:1. I get the same fractions f as Haigh , but I do not get the same Gmax as he gets. Since Haigh does not tell how to compute Gmax , and the formula for Gmax in { Kelly 1956 p.925 } seemed suspicious to me ( strangely enough, Kelly's formula for Gmax does not contain any fraction f[.] == his a(s) ), I wrote my own transparent formula for G == Gmax , here presented in the crystal clear form as a loopless algorithm : { nr. of f's bet = K <= h = nr. of horses in a race } G := 0.0; { K is the CAPital K in Haigh 2003 p.360 } { h horses race ; K = h causes (1-1)*LN(1-1) = 0*LN(0) = UNdefined ; but we know that it is how we compute a product term 0^0 = UNdefined too, so we are free to interpret it as X^0 = 1 ; hence my if 1 > f[1]... ie as-if G:=G + 0 ie as-if finally G:=G * 1 . } case K of { the K is the computed number of optimal fractions f[.] used } 1: begin { payout - 1 = return = R[.] ; P[.] == probability } G := P[1] *LN(1.0 +f[1]*R[1]); { LN(.) == logarithm } if 1 > f[1] then G:=G +(1 -P[1])*LN(1 -f[1]); end; { wins are mutex == mutually exclusive winners eg horses } 2: begin { wins are mutex , hence when K=2 ie betting on 2 horses : } G := P[1] *LN(1.0 +f[1]*R[1] -f[2]) + P[2] *LN(1.0 -f[1] +f[2]*R[2]); if 1 > f[1]+f[2] then G:=G +(1 -P[1]-P[2])*LN(1 -f[1] -f[2]); end; { if 1 > f[... could be replaced by IF K < h THEN , but I believe that if 1 > f[... is numerically safer in preventing LN( nr. <= 0 ) } 3: begin { wins are mutex , hence when K=3 ie betting on 3 horses : } G := P[1] *LN(1.0 +f[1]*R[1] -f[2] -f[3]) + P[2] *LN(1.0 -f[1] +f[2]*R[2] -f[3]) + P[3] *LN(1.0 -f[1] -f[2] +f[3]*R[3]); if 1 > f[1]+f[2]+f[3] then G:=G +(1-P[1]-P[2]-P[3])*LN(1-f[1]-f[2]-f[3]); end; { 4: ... etc } end{case}; G := exp( G ); write('optimized growth factor G = ', G ); My formula for G is intuitively clear, even obvious, so it cannot be wrong . My G == Gmax values = growth factors are 1.0051 , and 1.003 ie rate = 0.3% ; { Haigh 2003 p.238 } gets growth factors 1.0035 , and 1.0021 ie rate = 0.21%. By now Kelly's opaque formula for Gmax yields the same growth factors I get. I prefer transparent formulas and algorithms, because too clever opaque ones easily become sources of misunderstandings and errors , especially when changing the ever present assumptions, explicit and implicit. Authors make their assumptions ( mathematical, but also about readers' knowledge of the subject and math ), while readers make other, often wishful, assumptions. Some relationships between Kelly-optimal reinvestments ( Kor ), and in 1990 Nobel prize winning Markowitz-optimal portfolio theory ( Mop ) : R1: Harry Markowitz has been sympathetic to the use of geomean since 1957. R2: Mop is for a single period , while Kor is a multi-period strategy. R3: Markowitz efficient frontier ( of Markowitz-optimal portfolios ) cannot single out the best portfolio , Kor identifies the best . R4: While Kor is based on maximizing a geomean , Mop is based on arithmetic mean-variance optimization which produces efficient portfolios in the sense of best Return/risk ie Return/volatility ratios. We could call it signal-to-noise ratio s/n = 1/( n/s = relative error aka coefficient of variation aka noise-to-signal ratio ) = arithmean/(standard deviation) = A/Sd , not A/(Sd^2) = A/Variance . The essence of the difference between an arithmean A and a geomean G is an approximation via Var = Sd^2 . Any r.v. X with ALL values > 0 has a geomean : G(X) = exp(E[ln(X)]) <= E[X] == A (Jensen's ineq.). Good approximations : G =. A - Var/(2A) = A( 1 - 0.5(Sd/A)^2 ) =.. sqrt( A^2 - Var ) { William J. Bernstein & David Wilkinson : Diversification , rebalancing , and the geometric mean frontier , 1997-11-24 } write : { p.8 } " G =. R - Var/(2(1+R)) " ie their G seems to differ from our G =. A - Var/(2A) , but above we have 1+Rn = Vn/Vo for n reinvestment periods, hence Rc = (Vn/Vo)^(1/n) -1 = the geomean compound return per period . Clearly a geomean G is meaningful only as a product of numbers > 0. Q: Why not >= 0 ? A: Because a computable Sum[ln(.)] requires all (.) > 0. Moreover, G > 0 only if ALL its arguments are > 0 , since a 0 annuls a product. While in an arithmean the Sum hides a 0 , the Prod( . ) in a geomean reveals a 0 , so G=0 always gives a warning on the possibility of a final return = 0 ie possibly total loss . So we assume & check for returns Rj > -1 entering a Product of ALL 1+Rj > 0. Now plug 1+R for A into our G =. A - Var/(2A) = 1+R - Var/(2(1+R)) , so Rc = 1+R - Var/(2(1+R)) -1 = R - Var/(2(1+R)) = their G which requires that their Var is computed from 1+Rj , although they do not say that. R5: { p.15 } "Thus no portfolio which is not Markowitz efficient can be geometric mean efficient; equivalently, every geometric mean efficient portfolio must be Markowitz efficient." This I translate into logic as ( ~Meff implies ~Geff ) == ( Geff implies Meff ), which is a fancy way of saying that Geff portfolios form a subset of Meff ones, ie all Kelly optimal portfolios are always Markowitz optimal ( but not v.v. ). Draw your 1st colorful Venn diagram into your coloring book :-) R6: { p.15 } "Only that part of the Markowitz frontier to the left of this [ Gf(s) ] maximum is geometric mean efficient." R7: { p.16 } "Markowitz frontier greatly overestimates the true reward of increasing risk." .- Idea3: Murphy's conditions : One Stanford professor has spent a lot of math on Kelly on pp.95-125 of his book { Roy E. Murphy : Adaptive Processes in Economic Systems, 1965 }, where on pp.123-5 he has nicely visualized ( Fig.7.4 on p.123 ) his derivation of his inequalities (7.5.12) & (7.5.19) , ( his P1 & P2 are our P & 1-P) , which I combined into 0 < E[R] ie R2/(R1+R2) < P < ((1+R1)R2)/(R1+R2) ie E[R] < R1.R2 about which he says "Provided that the probability function of the environmental stochastic process does not violate eqs. (7.5.12) and (7.5.19) , it is quite likely that the subjective probabilities will not violate eq. (7.4.35) in the later periods of the process." His equation (7.4.35) is 0 < p.R1 - (1-p)R2 < R1.R2 where p is an estimate of the next P. Note that his (7.4.35) is just another way of writing our 0 < fm = (P.R1 - (1-P).R2)/(R1.R2) < 1 above. My explanation of Murphy's terminology : "environmental stochastic process" == what actually happens ; "subjective probabilities" == p == user's guesstimate of P. Q1: Any learned comments on (the practical value of) the Murphy's condition for a good P ?? Murphy rightly worries about our lack of knowledge of P, but I do not see him worrying about our lack of knowledge of R1 & R2 in the next period. I do worry, because our compound fortune has the value (P = W/n , W+L = n) : Vn = Vo.((1+Rc)^n) = Vo.((1 + f.R1)^(n.P))((1 - f.R2)^(n.(1-P))) = Vo.((1 + f.R1)^W )((1 - f.R2)^L) , so after each investment period we shall see how much we just won or lost, in a tiny prog : Vn := our new fortune became known ; IF our Vn increased THEN BEGIN R1 := the last gain became known; W:=W+1 END ELSE R2 := the last loss became known; n :=n+1; P := W/n; // 1-P = 1 -W/n = (n-W)/n = L/n R1:=??; R2 := ??; // HOW to best adjust for the next period ?? Of course, we can keep separate moving geomeans of few recent ups R1 , and downs R2 , and use these geomeans as estimates of R1, R2, for the next period. Do you know something smarter ?? Assuming only mild, not wild, changes in the next period, we see that it is trivial to update P ( we could do a Bayesian update of P ), but : !!Q2: How should we best adjust both R1 & R2 so as to best estimate the best fraction f = c.fm for the next period ?? This is a problem of prediction from the (recent) past values. Since the last & past Vn's, R1's, R2's, P's, and f's are known, HOW should we best incorporate all those values ( yes, f's and c's too, because they were and will be decisive for turning a LOSER into a likely WINNER ) into our estimate of the best fm for the next period ?? Any constructive ideas ?? Notes: The worked examples below should surprise, or even shock, the reader twice : 1st surprise by an almost sure Big Loss ( quantified & visualized below ); 2nd surprise by how almost surely a Big Loser ( like prof. Paulos ) may be turned, as if by magic ( no Parrondo paradox needed ), into a likely Tidy Winner ( quantified & visualized below ). Like Kelly , Murphy was an entropist, no problem with me, but why bother with entropy the already perplexed reader, when it is not necessary for a derivation and understanding of Kelly strategy ? Btw, when Shannon asked Von Neumann how he should call his key formula , John Von answered : "Call it entropy, because nobody really understands it, so you will always have an advantage in a debate." :-) .- Idea4: Visualization and explanation of Kelly strategy as a tree : Binomial (ie for P & 1-P only) outcomes can be easily visualized by means of Pascal's triangle of POSSIBLE outcomes, which should be thrown into the 2nd edition. You should tell that an m-nary regular tree is a compact way how to list and visualize ALL possible m-nary sequences or paths. Pascal's triangle is a binary regular tree showing all possible binary decisions , paths or sequences from 000..00 to 111..11 , countable via combinatorics . I wonder why almost nobody who writes about Kelly shows any tree ?? { Poundstone 2005 p.214 } shows a simple tree for P=0.5 & R2=1 , while I pack more than twice as much info into my multitree . As a constructive example of much-telling graphics is my maxi-info and easy to "draw" (by typing only) multitree ie a loopless graph aka acyclic graph which visualizes Kelly strategy as-if in action in POSSIBLE & PROBABLE worlds, not available elsewhere ( not to be confused with the parallel worlds of Hugh Everett III ). Q: Could it be that the outcomes V in my multitrees are coarse-grained evaluations of Feynman's path integrals over random walks in The Garden of Forking Paths by Jorge Luis Borges (and Gell-Mann's "histories" ) ? 1st tree: P=0.5, R1=0.8 = gain, R2=0.6 = loss, the fraction f=1.0 ( all data are the same as in { Paulos 2003, pp.96-99 } ie no fraction ie if he/we would invest all we have (initialy Vo set apart, thereafter whatever current capital CC has become of it). Then we get : (1 - f)CC = the cash we keep, while f.CC we put at risk; both added give (1 - f) + f*payoff = 1 + f*(payoff -1) = a factor which changes our CC (1 -1.0) + 1.0 *0.4 = 1 + 1.0*(0.4 - 1) = 1 - 1.0*0.6 = 0.4 = loss factor , (1 -1.0) + 1.0 *1.8 = 1 + 1.0*(1.8 - 1) = 1 + 1.0*0.8 = 1.8 = gain factor ; after n= 4 weeks is V4 = Vo.G^4 = Vo*0.849^4 = 5184 < Vo , after n=52 weeks is V52 = Vo.G^52 = Vo*0.849^52 =. 2 << Vo . The most likely value after n steps can be computed by the general formula Vn = Prod( Vk^Pk ) , eg V1 = (4000^0.5)(18000^0.5) = 8485 , however P = 0.5 allows the simpler Vn = (Prod(Vk))^(1/(n+1)), n+1 because there are 0..k..n nodes on each level 0..n, shown here as a multitree of POSSIBLE outcomes (as nodes), and the most LIKELY outcomes denoted as Vo, V1,..,Vn : n=0 Vo = ( 10000 )^(1/1) = 10000 10000*0.4 ............... / \ ...................... 10000*1.8 = 18000 n=1 V1 = ( 4000 * 18000 )^(1/2) = 8485 < Vo 4000*0.4 ............ / \ / \ .................. 18000*1.8 = 32400 n=2 V2 = ( 1600 * 7200 * 32400 )^(1/3) = 7200 < Vo 1600*0.4 ....... / \ / \ / \ .............. 32400*1.8 = 58320 n=3 V3 = ( 640 * 2880 * 12960 * 58320 )^(1/4) = 6109 << Vo / \ / \ / \ / \ n=4 V4 = ( 256 * 1152 * 5184 * 23328 * 104976 )^(1/5) = 5184 << Vo nr. of losses L 4=n 3 2 1 0 nr. of wins W 0 1 2 3 4=n W+L = n ; paths 2^n = 16 = 1 + 4 + 6 + 4 + 1 towards their $$ , hence $$'s Probab. 0.0625 + 0.25 + 0.375 + 0.25 + 0.0625 of the k-th value , Sum[Pr] = <------- 0.688 ------->+<--- 0.312 --> = 1 losses : <------- V < Vo ------>|<-- Vo < V --> : gains The larger the n , the more n+1 of various outcomes on n-th level, hence the smaller the probabilities Pr's of outcomes (since Sum[Pr's] = 1 is distributed over n+1 Pr's). For f=1 after n=52 weeks of investments : - The largest possible outcome is an "exponentially huge" number with Pr=.0 - The most likely outcome V52 = 1.95 << 10k = Vo, with Pr=0.11 which is the largest Pr, yet quite low, since it is just one out of n+1 = 53 Pr's which always sum to 1. Even more telling is this : D = 1.95 = V52 10k = Vo etc Pr(V <= D) = 0.5 0.94 a classical homework for the reader 2nd tree: again P=0.5, now f=0.2 =. fm , and with all Pr's shown, has : (1 - f) + f*payoff = 1 + f*(payoff -1) = a factor which changes our CC (1 -0.2) + 0.2*0.4 = 1 + 0.2*(0.4 - 1) = 1 - 0.2*0.6 = 0.88 = loss factor; (1 -0.2) + 0.2*1.8 = 1 + 0.2*(1.8 - 1) = 1 + 0.2*0.8 = 1.16 = gain factor; after n=52 weeks is V52 = Vo*G^52 = Vo*1.010^52 = 17k >> Vo ; after n= 4 weeks is V4 = Vo*G^4 = Vo*1.010^4 = 10420 > Vo , shown as a multitree of POSSIBLE outcomes together with their PROBABILITIES (at nodes), and the most LIKELY outcomes Vn : n=0 Vo = ( 10000 )^(1/1) = 10000 = Vo Pr = 1.0 10000*0.88 ........... / \ ..................... 10000*1.16 = 11600 n=1 V1 = ( 8800 * 11600 )^(1/2) = 10103 > Vo Pr = 0.5 + 0.5 = 1 8800*0.88 ........... / \ / \...................11600*1.16 = 13456 n=2 V2 = ( 7744 * 10208 * 13456 )^(1/3) = 10208 > Vo Pr = 0.25 + 0.5 + 0.25 = 1 7744*0.88 ......... / \ / \ / \ ..............13456*1.16 = 15609 n=3 V3 = ( 6815 * 8983 * 11841 * 15609 )^(1/4) = 10313 > Vo Pr = 0.125 +0.375 + 0.375 + 0.125 = 1 6815*0.88 ..... / \ / \ / \ / \ .........15609*1.16 = 18106 n=4 V4 = ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420 > Vo Pr = 0.0625 + 0.25 + 0.375 + 0.25 + 0.0625 = 1 Sum[Pr] = <-- 0.312 --->+<------- 0.688 -------> = 1 losses : <-- V < Vo -->|<------ Vo < V -------> : gains The larger the n , the more n+1 of various outcomes, hence the smaller the probabilities Pr's of outcomes (since Sum[Pr's]=1 is distributed over n+1 Pr's). For f=0.2 after n=52 weeks of investments : - The largest possible outcome is 22.5M with Pr=. 0. Note that 22.5M << an "exponentially huge" value for f=1 above. - The most likely outcome V52 =. 17k > 10k = Vo, with Pr=0.11 which is the largest Pr, yet quite low, since it is just one out of n+1 = 53 Pr's which always sum to 1. Even more telling is this : D = 10k = Vo 17k = V52 etc Pr(V <= D) = 0.34 0.5 a classical homework for the reader IF a binomial multitree is SYMMETRICAL wrt Pr's ( due to P=0.5 = 1-P ) THEN it holds : + If 0 < k < n (ie a node has 2 parents) then each Pr = the mean of the 2 parental Pr's on the previous ie (n-1)th level. On each level n with n even (ie odd n+1 of V's), it holds : + The maximum Pr = the median Pr = both central Pr's on the level n-1. + The weighted geomean Vn = the median POSSIBLE Value on the level n , eg: V4 = ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420 = ((5997^0.0625)(7905^0.25)......(18106^0.0625) = 10420 Combinatorics count : Probabilities in a binomial pdf are computable via a recurrent combinatorial expression. Here is the easiest way how to compute , also with a calc , Pr(k,n) = (n choose k)(P^k)((1-P)^(n-k)) for a single n and for all k : Pkn := (1-P)^n ; write( 0, Pkn ); { == Pr(0,n) find it below } for k:=1 to n do begin Pkn := Pkn*(n-k+1)*P/((1-P)k); write( k, Pkn ) end; Note that we need no (n choose k), hence neither factorials, nor Stirling's approximation, and not even any logarithm, except possibly to implement the single (1-P)^n via exp(n*ln(1-P)) , unless done via a simple loop . Our interpretation of P(k,n) : (P^k)((1-P)^(n-k)) = probability of one particular sequence of n events : k events e1 and n-k complementary events e0 ; it is a joint probability assuming independence; ^ expresses repeated occurrences regardless of their order . (n choose k) = n!/(k!(n-k)!) = n!((n-k)!k!) = (n choose n-k) ie symmetry ; = nr. of such sequences ( regardless of the order of e1's and e0's ), all mutually exclusive, hence : P(k,n) = probability of occurrence of any such a sequence ( any mix , regardless of order, of k e1 with n-k of e0 ) = probability of any (ie regardless of order) such a path in a multitree We saw above that POSSIBLE paths are symmetrically distributed on n-th level: (n choose k) = (n choose n-k) = nr. of paths to a k-th node on n-th level in a multitree = nr. of paths to (n-k)th node on n-th level (n choose k) are the node values in Pascal's triangle . (n choose 0) = (n choose n ) = 1 on both extreme branches of a multitree (n choose 1) = (n choose n-1) = n (n choose 2) = (n choose n-2) = n(n-1)/2 Sum:k=0..n[(n choose k)] = 2^n = nr. of paths to all the nodes on n-th level So far on POSSIBILITIES , now more on PROBABILITIES of certain paths : Only for P = 1/2 = (1-P) it holds P(k,n) = (n choose k)/(2^n). For any P , 0 < P < 1, is Sum:k=0..n[ Pr(k,n) ] = 1 on any n-th level , and Pr(0,n) = 1*1*(1-P)^(n-0) = (1-P)^n ; Pr(n,n) = 1* P^n *1 = P^n Pr(1,n) = n*P*(1-P)^(n-1) ; n = W+L = wins + losses , hence : if P = probability of a win in one bet (or step, stage, period) ; then : Pr( W >= 1 of n ) == Pr(at least 1 win in n bets) == Pr(no loss in n bets) = Pr( L = 0 in n ) == Pr(not(loss & loss &...loss) == 1 - (1-P)^n , eg for : n=3 & P=0.5 is Pr(.) = 1 -Pr(0,3) = 1 - (1-0.5)^3 = 0.875 = 1 -0.125 2nd tree n=3 & P=0.7 is Pr(.) = 1 -Pr(0,3) = 1 - (1-0.7)^3 = 0.973 = 1 -0.027 3rd tree Pr( at least m wins in n bets ) = Sum:k=m..n[Pr(k,n)], k ie horizontal; = 1 -Sum:k=0..m-1[Pr(k,n)], k ie horizontal. Note the pattern Pr(at least 1) = 1 - Prod:j=0..n( 1-Pj ), j ie vertically; if all Pj=P then Pr(at least 1) = 1 - (1-P)^n , which follows from the basic knowledge that probability is isomorph with counting in the set theory which is isomorph with logic , and from DeMorgan's rule of logic : not(a or b) = not(a) & not(b) , hence (a or b) = not( not(a) & not(b) ). By common sense: At least one == not(none) = not(not(a) & ...& not(z) ). Here isomorph means a 1-to-1 correspondence so strong that the formulas for aggregate probabilities obtain mechanically ( ie by simple rules of a straight translation ) from the hopefully better known rules of logic : Logic Probabilities not(.) 1 - Pr(.) (a & b) Pr(a)*Pr(b) = joint probability assuming stochastic independence (a or b) = 1 - (1 - Pr(a))(1 - Pr(b)) = Pr(a) + Pr(b) - Pr(a)*Pr(b) (a or b or ... or z) = 1 - (1 - Pr(a))(1 - Pr(b))...(1 - Pr(z)) = 1 - (1 - Pr)^m if all m Pr(.)'s = Pr Why don't they tell it in books ? Why didn't they teach us that at school ? Idea8 contains 1 -Prod(1 - p[i,j]) , with probabilities of failure p[i,j] , as a total value of a MAXimum spanning tree ( MST ) model MAXimizing success ( eg reliability ) ie minimizing the probability of a total failure . The similar basic pattern can be seen in the worn out Birthday problem ( how many people have to meet for the chances > 50 % that there will be at least one duplicate birthday ) : Let j be the number of people gathered; then the probability of at least one duplicate is for n (eg = 365 days/year) : Pr(j,n) = 1 - Prod:i=1..j ((n -(i-1))/n) = 1 - Prod:i=0..j-1( 1 - i/n ) this pattern not shown in books Programmed as a recurrent formula for, say, up to 25 people : n := 365; Pr := 1.0; { Pr == Pr(no duplicate) } for i:=0 to 25-1 do { i+1 = j = nr of people gathered } begin Pr := Pr*(1.0 - i/n); { Pr:=1 if j < 2 -> i+1 since i = j-1 so j=i+1 } writeln('Pr(at least 1 duplicate when j = ', i+1 ,' people meet) = ', (1.0 - Pr):7:5 ); { 1 - Pr(no duplicate) = Probab(at least 1) } end; The Birthday problem approximated by Paul Halmos for P=0.5 = the probability of pairing over n possible & equiprobable shared values (eg shared birthday) requires 1.18*sqrt(n) = eg 1.18*sqrt(365 days/year) =. 23 people to gather. The Birthday problem can be freshly posed as other Murphy's law for clustering of random events ( eg 2 failures of a pump , or 2 storms in one period , etc ) { Garza 1987 }. Q: Why don't I employ Markov chains here ? A: Although I believe that Markov chains could be used here, I do not see any need for or any advantage from using Markov chains. Do you ? Recently in a library I found a thin book { Ronald A. Howard : Dynamic Programming and Markov Processes , 1969 } which you may like to scan. Back to Kelly : For P=0.5 , R1=0.8 , R2=0.6 , we saw trees with the NAIVE LOSER's f=1 { like in Paulos 2003 , pp.96-99 , no tree } , while our SMART WINNER's f=0.2 =. fm = 0.208 . For a seemingly reasonable yet EXTREMELY RISKY f = 2.fm = 0.41666 we get Vn = Vo.((1 + f.R1 )^P )((1 - f.R2 )^(1-P)) = 10k.((1 + 0.41666*0.8)^0.5)((1 - 0.41666*0.6)^0.5 ) = 10k.( 1.333*0.75 )^0.5 = 10000 = Vo ie f = 2.fm = 0.41666 is here (due to P=0.5) a break-even point when the most likely risky Vn = the riskless Vo . Vn = Vo for f = 2.fm holds only if P=0.5 because only then is V a symmetric function of f . This can also be verified by asking & answering the question : Q: For which f is Vn = Vo ie Vn/Vo = 1 = (1+Rc)^n = 1+Rc ie ln(1+Rc) = 0 ? A: From the above Vn/Vo = 1 we can drop P only if P=0.5 , and get : 1 = (1 + f.R1)(1 - f.R2) = 1 + f.R1 - f.R2 - R1.R2.f^2 , hence f = (R1 - R2)/(R1.R2) = 1/R2 - 1/R1 = 1/0.6 - 1/0.8 = 0.41666 = 2.fm Further illustration of an opeRational meaning of Kelly fractions obtains from my formula for the number of reinvestments n needed for the likely multiplication of the initial capital Vo into Vn = M.Vo . The n matters, because from Keynes we know that "In the long run we are all dead." From above we have W = n.P , L = n.(1-P) , and : Vn = Vo.((1+Rc)^n) = Vo.((1 + f.R1)^n.P)((1 - f.R2)^(n.(1-P))) ln(M) = ln(Vn/Vo) = n.ln(1+Rc) = n.P.ln(1 + f.R1) + n.(1-P).ln(1 - f.R2) so n = ln(M)/ln(1+Rc) . Quiz: With which probability does this n asure Vn = M.Vo ?? Here comes my table showing n = fun( f , M=2 , P=0.5 , R1=0.8 , R2=0.6 ) : f has ln(1+Rc) needs n reinvestments to make Vn = M.Vo = 2.Vo : 0.5 < 0 n = -69 < 0 ; it is impossible to make Vn = 2.Vo 0.4166 0 n = oo , n = infinity for f = 2.fm due to P=0.5 ie symmetry 0.41 0.0006556 1057 0.4 0.0016 434 = n reinvestments 0.3 0.00833 83 0.2 0.0103 67 = n is minimized for fm = 0.208 which is optimal 0.1 check my n = 92 0.05 check my n = 158 Feel free to compute n = fun( f, etc ) for P=0.7 in the following trees. Asymmetrical multitrees (due to P <> 0.5 hence f=2.fm does not give Vn = Vo) : Due to P=0.5 , the above multitrees are symmetrical. I reco to show them also for P <> 0.5 , ie asymmetrical multitrees ( 1st f <> fm, 2nd f=fm ) : A similar multitree for P changed to P=0.7 ie 1-P = 0.3, is ASYMMETRICAL wrt probabilities Pr's , while all the possible values V stay UNchanged. For f = 0.2 , which is very suboptimal for P=0.7 , we have : 3rd tree: P=0.7 , f=0.2 ( << fm =. 0.8 ), has : (1 - f) + f*payoff = 1 + f*(payoff -1) = a factor which changes our CC (1 -0.2) + 0.2*0.4 = 1 + 0.2*(0.4 - 1) = 1 - 0.2*0.6 = 0.88 = loss factor (1 -0.2) + 0.2*1.8 = 1 + 0.2*(1.8 - 1) = 1 + 0.2*0.8 = 1.16 = gain factor has the final V4 = Vo.G^4 = Vo*1.068^4 = Vo*1.3 = 13000 > Vo : P <> 0.5 makes a multitree ASYMMETRICAL , hence its V1, V2,.., V4 must be computed as a weighted geomean (of V's weighted by Pr's) : Vn = Prod( Vk^Pk ) , eg V1 = (8800^0.3)(11600^0.7) = 10677 n=0 Vo = 10000 Pr = 1.0 10000*0.88 ........... / \ ..................... 10000*1.16 = 11600 n=1 V = 8800 11600 give geomean V1 = 10677 > Vo Pr = 0.3 + 0.7 = 1 8800*0.88 .......... / \ / \...................11600*1.16 = 13456 n=2 V = 7744 10208 13456 give V2 = 11401 > Vo Pr = 0.09 + 0.42 + 0.49 = 1 7744*0.88 ......... / \ / \ / \ ..............13456*1.16 = 15609 n=3 V = 6815 8983 11841 15609 give V3 = 12173 > Vo Pr = 0.027 +0.189 + 0.441 +0.343 = 1 6815*0.88 ..... / \ / \ / \ / \ ..........15609*1.16 = 18106 n=4 V = 5997 7905 10420 13736 18106 , V4 = 12998 > Vo Pr = 0.0081 + 0.076 + 0.265 + 0.412 + 0.240 = 1 Sum[Pr] = <--- 0.084 --->+<------- 0.917 ------> = 1 losses : <--- V < Vo -->|<------ Vo < V ------> : gains IF a binomial multitree is ASYMMETRICAL wrt Pr's (due to P <> 0.5 <> 1-P) THEN it holds : + If P > 0.5 then the distribution is skewed towards higher Pr's else skewed towards lower Pr's. + The weighted geomean Vn <> the median POSSIBLE V on the level n, eg: V4 = ((5997^0.0081)(7905^0.076).....(18106^0.240)) = 12998 <> ( 5997 * 7905 * 10420 * 13736 * 18106 )^(1/5) = 10420 which is !! still the median of POSSIBLE V's but NOT anymore the median of the PROBABLE V's . The similar multitree for P = 0.7 ie 1-P = 0.3 again, but now with the OPTIMIZED f = 0.8 =. fm . The change of f has changed V's while Pr's stay as they were since P has not changed : 4th tree: P=0.7 , f=0.8 =. fm , has : (1 - f) + f*payoff = 1 + f*(payoff -1) = a factor which changes our CC (1 -0.8) + 0.8*0.4 = 1 + 0.8*(0.4 - 1) = 1 - 0.8*0.6 = 0.52 = loss factor (1 -0.8) + 0.8*1.8 = 1 + 0.8*(1.8 - 1) = 1 + 0.8*0.8 = 1.64 = gain factor has the final V4 = Vo.G^4 = Vo*1.162^4 = Vo*1.8229 = 18229 >> Vo ; P <> 0.5 makes a multitree ASYMMETRICAL , hence its V1, V2,.., V4 must be computed as a weighted geomean (of V's weighted by Pr's) : Vn = Prod( Vk^Pk ) , eg V1 = (5200^0.3)(16400^0.7) = 11620 n=0 Vo = 10000 Pr = 1.0 10000*0.52 ........... / \ ..................... 10000*1.64 = 16400 n=1 V = 5200 16400 give geomean V1 = 11620 > Vo Pr = 0.3 + 0.7 = 1 5200*0.52 ........... / \ / \...................16400*1.64 = 26896 n=2 V = 2704 8528 26896 give V2 = 13502 > Vo Pr = 0.09 + 0.42 + 0.49 = 1 2704*0.52 ......... / \ / \ / \ ..............26896*1.64 = 44109 n=3 V = 1406 4435 13986 44109 , V3 = 15688 >> Vo Pr = 0.027 + 0.189 + 0.441 + 0.343 = 1 1406*0.52 ..... / \ / \ / \ / \ .........44109*1.64 = 72339 n=4 V = 731 2306 7273 22937 72339 , V4 = 18229 >> Vo Pr = 0.008 + 0.076 + 0.265 + 0.412 + 0.240 = 1 Sum[Pr] = <------- 0.349 ------>+<-- 0.652 ---> = 1 losses : <------ V < Vo ------>|<-- Vo < V --> : gains Comparison of the 3rd and the 4th multitree , both with P=0.7 : 3rd tree: f=0.2 << fm , has Sum[Pr]=0.917 of V > Vo , V4 = 12998 4th tree: f=0.8 =. fm , has Sum[Pr]=0.652 of V > Vo , V4 = 18229 so c.fm allows for tradeoffs between a high chance of a not too high V > Vo versus a not too high chance of a high V >> Vo . These tradeoffs quantify the wisdom of "The higher the returns, the higher the risks". .- Idea5: On preferring & decision making : Readers should be told that Rc & Rs are in fact weighted geomeans Gw , so rarely found in books on Probabs & Stats, not even in the big Encyclopedia of Statistics. Many a perplexed reader will appreciate to read few facts on the means in general : A mean is a value m such that a fun(m,..,m) = fun(X1,..,Xn) ; so infinitely many means can be devised by using powers like Xj^2 , Xj^3 , etc; eg the so called root mean squared rms = sqrt(Sum[Xj^2]/n). Harmonic mean <= geomean <= arithmetic mean aka expected value E[ r.v. ] <= root mean squared aka rms with the = only in the case of all Xj being equal. H = (G^2)/A <= G = sqrt(A.H) = exp(Sum[ln(Xj)]/n) <= A = Sum[Xj]/n <= rms where Sum[ln(Xj)]/n = ln( (Prod(Xj))^(1/n) ) = ln( (Prod(Xj^(1/n)) ). Similar inequalities hold for weighted means (with a mix of positive weights which need not be just 1/n ). Q: When to use which mean-operators ? A: Arithmean reduces commensurable ie directly exchangable values (in the same units , physical , monetary or whatever, eg same boxes) to a single mean. Unlike the geomean and harmean , arithmean allows for negative values, which are more or less compensated by positive values in a mix , and a zero in a mix does not cause arithmean = 0. Geomean reduces either a set of incommensurable values (in Idea7 below), or an exponential growth (or decay) due to multiplicative factors ( like eg price indexes CPI , RPI , or an inflation index ), to a single mean factor. Above used weights like our counts W & L , and proportions like P & 1-P , express repetitions ie frequencies (absolute, and relative, respectively). Below in the Idea7 , weights have no such frequentist interpretation; they just express how much we prefer or value an aspect , attribute , feature , characteristic . Harmean reduces ratios to a single mean value, eg to the mean speed S : Speeds Sj = distance Dj/time Tj , hence Tj = Dj/Sj , so that the total Time = Sum[Tj] = Sum[Dj/Sj] = Sum[Dj]/S , so the mean speed S is S = Sum[Dj]/Sum[Dj/Sj] with distances Dj as weights , so for all Dj = D1 S = n.D1/(D1.Sum[1/Sj]) = n/Sum[1/Sj] , and 1/S = Sum[1/Sj]/n . Resistors Rj in parallel: 1/R = Sum[1/Rj] = Sum[Ij/U] = I/U because all parallel currents Ij sum up to I , while a voltage U is common to all resistors Rj in parallel. Quiz: Compute the mean resistance , and explain what it means . Hint: Read the above definition of a mean in general. Capacitors Cj in series : 1/C = Sum[1/Cj] because Q/C = Sum[Q/Cj] where Q is an electric charge. Quiz: Compute the mean capacity , and explain what it means . Variance reduction obtained by minimizing the mean squared error MSE of independent quantities of the same kind combined into a posterior variance V yields 1/V = Sum[1/Vj] , so V = 1/Sum[1/Vj] , hence in theory even poor additional estimates will decrease the total variance V. M/V = Sum[ Mj/Vj ] for means , measurements , monies M , where one Vj and one Mj may be the prior values. An analogon is Millman's theorem with Uj/Rj = Ij's for electrical circuits. For unbiased Gaussians is 1/Vj = 1/MSEj is a measure of precision aka Fisher information which is essentially a local measure of change of a pdf ; it is additive ( like Shannon's information , which is a global measure invariant wrt to local changes like swaps or sorting ). Afaik , it holds MAXimum(Gw) =. MEDian(Gw) . MEDian in probabilistic terms means 1/2 = P(X <= MED) or P(MED <= X) = 1/2 , hence for a finely discrete X it holds P(X <= MED) =. P(MED <= X) . MEDian is also the measure of the central tendency which minimizes the sum of ABSolute deviations from it. Hence Rc MAXimized wrt the fraction f MAXimizes the MEDian wealth , growth rate, end value ( moreover in the shortest time, almost always , albeit in a long run, when we will already be dead anyway , according to Keynes ). MAXed Vn is also the most probable outcome . A r.v. X has geomean Gw(X) = exp(E[ln(X]) <= E[X] == A (Jensen's inequality). Approximate relations between a geomean G, arithmean A, and a variance of X Var = Sd^2 are (the best 1st) : G =. A - Var/(2A) = A( 1 - 0.5(Sd/A)^2 ) =.. sqrt( A^2 - Var ) , hence G^2 =.. A^2 - Sd^2 is what I call a Pythagorean approximation . H =. A(1 - (Sd/A)^2 ) estimates a harmean . My little discovery: for the means of just 2 positive numbers is their A - G = ( Hellinger distance aka Matusita distance )/2 That the arithmean is a misleading price index (eg of stocks ) follows from these approximations which allow to keep the arithmean A almost unchanged ( or it may even grow ) when the geomean G decreases while the variance of X Var = Sd^2 (ie volatility ie instability) increases in predepression periods preceding a turning point. .- Idea6: On Bellman's optimality principle : In the 2nd edition of NDM add a section explaining Richard Bellman's OPTIMALITY principle of dynamic programming ( DP ) which is a cute paradigm for many tasks in both deterministic and stochastic decision making. Sequential multistage optimizations like Kelly , shortest path , MST , and many more tasks fit the DP paradigm : An optimal partial solution should be extended by the best element available so as to stay optimal. DP solutions are GREEDY aka MYOPIC in the positive sense of not needing more foresight than 1 period ( ie 1 step or 1 stage ) ahead. Kelly strategy needs foresight (in terms of P, R1 , R2 ) only 1 period ahead ie it is MYOPIC aka GREEDY , hence it fits Bellman's principle of optimality in his paradigm of dynamic programming math . DP has been applied to Kelly already in the very 1st paper on Kelly's paper in { Richard Bellman & R. Kalaba ( both at RAND ): On the role of dynamic programming in statistical communication theory, IRE Transactions on Information Theory, Sept. 1957, 197-203 }. No wonder that in 1960 a dynamic programmer R. Kalaba has also rediscovered the 5th MST algorithm ( pp.53 & 54 fig.6 in the superpaper by { Graham & Hell , 1985 } referred below ). The 2nd earliest paper on Kelly was { Michael Marcus ( at RAND ) : The utility of a communication channel and applications to suboptimal information handling procedures, IRE Trans. on Information Theory, 1958, 147-151 }. Tell about these semantic links to your info-hungry readers. .- Idea7: How a consumer should choose : Let's consider a consumer who has to choose the best (eg a PC, house, or spouse ) from several alternatives, each having more than 1 param (or property or feature, all quantifiable in incommensurable units, and for all holds the Pigs' Principle: The more of the same the tastier :-) , say, RAM size in MB and CPU speed in MHz (or IQ & BTW & $$ ; btw, BTW in NLese is VAT in UKese, but here I mean the brest-to-waist ratio; who wants to maximize VAT anyway ? :-). The sum MB+MHz makes no sense, but the product MB*MHz does, and so does a geomean G = [ (MB * MHz) ]^(1/2). If we value MB twice as much as MHz, then we should use the weighted geomean : Gw = [ (MB^2)(MHz^1) ]^(1/(2+1) ) for PC's ; Gw = [ (IQ^3)(BTW^2)($$^4) ]^(1/(3+2+4)) for spouses . Q3: Is it necessary to use [...]^(1/SumOfWeights) ?? A3: Methinks not, since the largest weighted product Pw (or Gw ) wins, since Pw1 > Pw2 makes sense, hence Pw1/Pw2 makes sense too, because in a product of ratios (like price indexes CPI or RPI ) the incommensurable units or scales annul (while comparing Sums of values in different units or scales would be nonsense). Now comes my question : Q4: In your chap.8 pp.223-228 you discuss (x -x0)(y -y0) which I believe is traditionally called the NASH PRODUCT ( not in your Index ), where (x0,y0) you call STATUS QUO point ( aka disagreement point, I would call it a NO DEAL point ). When choosing the best, we are making tradeoffs, but there are some UNacceptable bottom values for each param ie MUST BE MORE than x0 of IQ , more than y0 of BTW , and more than z0 of $$ , ie : x > x0 > 0, y > y0 > 0, z > z0 > 0, so we could use the formula F1a = (x -x0)(y -y0)(z -z0) for comparative evaluations . Ex1a: I can get for free only 1 old PC , and use it as a backup. I am free to choose from the following ancient crunchers : PC1 has 32MB & 200MHz; PC2 has 48MB & 150MHz (all reasonable values). The UNacceptably low values are 16MB & 100MHz , or less. If I doN't know what is more important (both weights = 1), I compute : F1a(PC1) = (32 -16)(200 -100) = 16*100 = 1600 F1a(PC2) = (48 -16)(150 -100) = 32* 50 = 1600 So, ceteris paribus , none of the both PC's is preferred. The fact that both PC's yield exactly 1600 is just a(n un)lucky coincidence. Somebody suggested to me to use the product of ratios like (x -x0)/(x9 -x0) (where the denominator is the range of values) : F1s = (x -x0)(y -y0)(z -z0)/[ (x9 -x0)(y9 -y0)(z9 -z0) ] Such a scaling cannot harm, but it is superfluous for MECHANICAL comparative evaluations and decision making, because in a ratio of such F1s' the denominators will annul, and the relation < or = or > will hold anyway. See { Fred Roberts ( at RAND ): Measurement Theory with Applications to Decisionmaking, Utility Theory, and the Social Sciences, 1979, p.82 } , and in a game-theoretical setting { Mesterton-Gibbons , 1992, p.102 } . However, HUMAN decision making is likely to appreciate the scalings . Although I am no game theorist, I know that NASH PRODUCT in its more general form is the product like F1aw = ((x -x0)^w1)((y -y0)^w2)((z -z0)^w3) where wj's are BARGAINING POWERS. For comparative evaluations the wj's can play the role of the weights. For more on weighting see Q5d below. So far on x0,y0,z0 as the UNacceptably LOW values of our params. Quite often we can define UNachievably HIGH values of our params, say, x9,y9,z9. Then an evaluation formula could be a product of ratios like (x -x0)/(x9 -x) with x9 > x > x0 > 0, y9 > y > y0 > 0, z9 > z > z0 > 0 : F1b = (x -x0)(y -y0)(z -z0)/[ (x9 -x)(y9 -y)(z9 -z) ] which can also be re-interpreted as a ratio of Pros/Cons to be MAXimized. In fact Pros & Cons need not to be based on the same variables. Pros can be a product of M terms, and Cons a product of N other terms, M <> N. Ex1b: Again PC1 has 32MB & 200MHz ; PC2 has 48MB & 150MHz . Again the UNacceptably low are : x0 = 16MB & 100MHz = y0, now with the UNreachably good large x9 = 120MB & 300MHz = y9 . F1b(PC1) = [(32 -16)/(120 -32)].[(200 -100)/(300 -200)] = 0.182 F1b(PC2) = [(48 -16)/(120 -48)].[(150 -100)/(300 -150)] = 0.148 so PC1 is preferred, ceteris paribus . Another possibility are ratios like (x -x0/x)/(1 -x/x9) , hence F2b = (1 -x0/x)(1 -y0/y)(1 -z0/z)/[ (1 -x/x9)(1 -y/y9)(1 -z/z9) ] Ex2b: same params as in Ex1b : F2b(PC1) = [(1 -16/32)/(1 -32/120)].[(1 -100/200)/(1 -200/300)] = 1.023 F2b(PC2) = [(1 -16/48)/(1 -48/120)].[(1 -100/150)/(1 -150/300)] = 0.741 so PC1 is preferred again. What about an asymmetrically (wrt x0 vs x9) structured F2c = (x -x0)(y -y0)(z -z0)/[ (1 -x/x9)(1 -y/y9)(1 -z/z9) ] ?? Q5: My question proper now is : Q5a: What is your learned opinion on the stuff of this Idea7 in general ?? Q5b: Do you prefer F1b over F2b over F2c for other reasons than for its symmetry and its simplicity (less typos when using a calc) ?? IF yes THEN for which reasons, if any ?? ELSE why NOT ?? Q5c: IF we have a mix of params, some of the kind The LARGER the better, and some The SMALLER the better , eg we want a spouse with a LOWER body-mass index (BMI) x, and HIGH y in $$ and HIGH z in IQ THEN we can swap the x-term in the numerator with the x-term in the denominator : F2cMm = (x9 -x)(y -y0)(z -z0)/[ (x -x0) (y9 -y)(z9 -z) ] Do you know a better way ?? Q5d: (on weighted products) A weight should be applied to the whole subterm, ie F2w = [((x-x0)/(x9-x))^w1].[((y-y0)/(y9-y))^w2].[((z-z0)/(z9-z))^w3] Since eg 5^2 > 5 , but eg 0.3^2 = 0.09 < 0.3 , and 5^(1/2) < 5 , but 0.3^(1/2) = 0.55 > 0.3 , - I wonder whether a mathematician like you would restrict magnitudes of the terms to be weighted so as to get (term)^w > 1 ?? Since 1^w = 1 , I do not like 1 <= . What about you ?? Restrictions could be imposed by rescaling by some large enough factors. - 0 < wj should be used, eg [(x -x0)^2].(y -y0).[(z -z0)^0.5] - Weights may be normalized to sum up to 1, but I wonder whether it has any (DIS)advantage ?? Any suggestions on weighted products ?? Q5e: Which of my F's do you (NOT) like and why (NOT) ?? Q5f: Do you know better formulas ?? Any refs ?? Q5g: The values of the UNachievably HIGH x9,y9,z9, should be chosen so that x,y,z cannot get too close to x9,y9,z9 . Can you recommend a simple rule (of thumb) for chosing x9,y9,z9 in practice ?? .- Idea8: On MST (probabilistic maximum spanning trees too) : I have applied MST as a MAXimum ( not minimum ) spanning tree ( ST ) to estimations of probability of vectors P(y,x1,..,xn) , where y is a class-event to be guessed, and x1,..,xn are predictor-events. P is used for automated decision making aka supervised learning. Since the counts ie absolute frequencies of the complete vectors are always too low, the problem arises : which subvectors should be used to get the best APPROXimation of those P(y,x1,..,xn)'s. I believe that there is still enough room for your creativity which could improve this Idea8 of mine (implemented and tested by me ages ago), hence this humble suggestion : An MST can be used for estimations of joint probabilities P(y,x1,..,xn) used for Bayesian decision making ( the denominator in the P(y|x1...,xn) is irrelevant for choosing y because P(x1,..,xn) is the same for all y's ). My point here is that in practice the counts over the complete vector (y,x1,..,xn) are always too low, so we have to use counts over subvectors (ie truncated vectors), and the task is to determine what are the best truncations to use in the chain product : P(y,x1..,xn) = P(y)P(x1|y)P(x2|x1,y))P(x3|x2,x1,y)..P(xn|xn_1,..,x1,y) =. P(x1,y).Product_k:2,..,n[ P(xk|xj,y) ], j<>k , which will best approximate P(y,x1,..,xn). My solution: the resulting approx should yield the MAXImum P obtainable from the MST-based chain product. Hope you like it. Although I have hands-on-&-in programming & testing experience with it, there is enough room for improvements : Q6: Which pairs (k,j) should be chosen based on which criteria ?? I choose the probabilistically most dependent pairs (k,j) first (from the pool of the free pairs ie from the remaining pairs ). How exactly would you choose the the BEST-1st candidates P(xk|xj,y) ?? In which order ie by which priority formula for dependence ?? Afaik , my solution is invariant wrt the choice of the root x1 . MST can be used to implement the Union-Find algorithmics : + To construct a spanning forest ( SF ) of an undirected graph ( UG ) ; + to decompose an UG (of relations) into its connected components ie into several disconnected subgraphs ; + to answer questions about a 2-way connectivity (not about 1-way reachability as Warshall's algo does). Upon request I will supply the list of 7 lucky questions for the lucky dragons. + to partition a set of equivalence relations into equivalence classes, and to answer questions similar about these classes and their elements. Equivalence is reflexive, symmetric and transitive. Equivalence can mean a kind of sameness like eg connectedness. { R.C. Prim , Shortest connection networks , Bell Systems Technical Journal, 36, 1957, 1389-1401 } , where he referred to Kruskal 1956, and they both referred to Otakar Boruvka 1926 , says on pp.1397-8 about MST : MiniST minimizes all INCreasing symmetric funs, and MAXimizes all decreasing symmetric funs of the arc values ( lengths , weights , costs ). MaxiST MAXimizes all INCreasing symmetric funs, and minimizes all decreasing symmetric funs of the arc values. Afaik , a symmetric function of variables x1,..,xn is a function which is invariant wrt any permutation of those variables. Since MST is GREEDY ie MYOPIC , it fits Bellman's dynamic programming ( DP ) paradigm, and since in their book belgian / french professors A. Kaufmann & Cruon { Dynamic Programming, 1967, on pp.265-68 in the sect.40 on using NonAdditive values in DP } on p.267 (low) , say that in DP models we may replace addition by "a law of internal composition over a subset V of real numbers, provided this law is ASSOCIATIVE and compatible with the relation of ORDER of these numbers. The only concrete problems known to us in which the value of a policy is defined by an operation other than addition are those where the value of a policy is the product of the present values.". Well, a log(Prod(.)) = Sum of logs. Q7: Which properties must that ORDER relation have ?? { Fred Roberts : Measurement Theory , 1979, p.15 } has a table of 7 order relations : quasi, weak, simple, strict simple, strict weak, partial, strict partial order ; you choose, we learn :-) Eg partial order relation is reflexive & transitive & antisymmetric. Of course, an order cannot be symmetric ( while eg equivalence must be, find above ). Q8: How does Kaufmann & Cruon's condition fit Prim's symmetric funs ?? Q9: Does it also hold for non-tree-like non-circular shortest paths ?? This knowledge helps to solve tasks less common than the shortest ST of roads, pipes, or wires. A nice task is to get the most reliable or safe ST of paths when probabilities of failures p[i,j] are given for all pairs of nodes i,j . Assuming independence of failures, the JOINT probability of success (= nonfailure) is Prod(1 - p[i,j]) , so JOINT probability of failure (= nonsuccess) is 1 - Prod(1 - p[i,j]) , which is to be minimized, and which is an INCreasing fun of the p[i,j]'s. This task solves MiniST of as-if additive lengths p[i,j] . But we are free to compute MaxiST of as-if additive lengths -p[i,j] . See { Prim, 1957, p.1398 }. Here the additivity of a MST works because the same (.) which extremizes a product Prod(.) extremizes also its monotonic log(Prod(.)) = Sum of logs. MST should be named as such in your section on shortest paths; my point is that when I see something on shortest paths, then I peek into the index to check for the presence/absence of MST by Kruskal 1956 ( who refs only to Otakar Boruvka ), Prim 1957 ( who refs to Kruskal and to Boruvka ), and by { Dijkstra 1959 , who refs to Kruskal }. O. Boruvka was a Czech student of math in Paris, who has published his MST algorithm in 1926 when the costs of a power-line net had to be minimized during the electrification of rural Moravia (the birthplace of Gregor Mendel , Kurt Goedel , Sigmund Freud, etc). A different algo of 1930 by Jarnik also in Czechoslovakia is similar to Prim's and Dijkstra's. Otakar Boruvka has died about 1992. The history of MST was well researched by then the chief mathematician of Bell Labs Ronald Graham & Pavol Hell in Canada, in their superpaper { On the history of the minimum spanning tree problem, Annals of the History of Computing, 7/1, January 1985, pp.43-57; fig.6 on p.54 nicely visualizes historical motivations & relationships among 5 MST algos } where they tell that O. Boruvka's algo of 1926 was rediscovered in 1960 by G. Sollin , who has published in French, but it is also described in English in yet another book by a belgian professor { A. Kaufmann : Graphs , Dynamic Programming , and Finite Games , 1967, pp.32-5 }. Boruvka's ( hence also Sollin's ) MST algo is similar to Kruskal's , but it is based on extending subgraphs rather than nodes as Kruskal does. { Graham & Hell , pp.53-4, fig.6 } write that MST algo has been also rediscovered in 1960 by an early hour dynamic programmer R. Kalaba at RAND , who already in 1957 coauthored the very 1st paper on Kelly's paper (find him above & below). Clearly MST fits Bellman's dynamic programming ( DP ) paradigm already mentioned above. .- Idea9: Wisdoms relevant to our themes : General wisdoms : Math is about insights and proofs. Calculation and computing are about numbers and results. Here you get insights, derivations, results. There is no sight without insight . Man sieht nur was Man weist. A picture may be worth 1k words , a formula may be worth infinitely many pictures . { JH } It is a simple task to make things complex , but it is a complex task to make things simple but not simplistic . Specific wisdoms : Harry Markowitz ( 1990 Nobel prize for economy) has opeRATIONALized the old folks wisdom of "Don't keep all your eggs in one basket" by quantifying the arithmean-variance trade-offs when diversifying investments for a single investment period. The Kelly-inspired geomean formula generalized for diversified stocks (not just horses) says how much to invest, if at all, into which asset over next investment periods, so as to optimize an exponential growth over a long term, if possible. My multitrees visualize its possible & probable returns & risks (see Idea4 above). { JH } The wisdom of "The higher the returns, the higher the risks" is quantified (at the end of Idea4 , just above Idea5 ) by the comparison : 3rd tree: f=0.2 << fm , has Sum[Pr]=0.917 of V > Vo , V4 = 12998 4th tree: f=0.8 =. fm , has Sum[Pr]=0.652 of V > Vo , V4 = 18229 Uncertainty is different from risk , when odds are fixed , known , or calculable . { after Frank Knight 1921 } You are unlikely to get an edge out of what you see in the news. { Edward Thorp quoted on p.142 in Poundstone 2005 } Most people don't understand that these strategies are statistical and that winnings of a few percent are possible if you are prepared to play for a long time, during which you have a good chance of losing all your money . { Jeremy Bernstein : Physicists on Wall Street , p.165 } Dependence on reversion to the mean tends to be perilous when the mean itself is in flux . { Peter Bernstein : Against the Gods , p182 } I can calculate the motions of heavenly bodies, but not the madness of people. { Isaac Newton , after losing 20k pounds in the stock market } In the long run, we're all dead . { John Maynard Keynes , 1929 } As you see, all ideas for the enhanced 2nd edition fit nicely together, and also into your nice NDM . Make the 2nd edition wonderful. The ideal volume of the Ideal Decision Making would be 444 pages. .- Errata: Typos old & new : Typos not yet in Korner's corrections on www.dpmms.cam.ac.uk/~twk : wrong <- correct : p.75 that this <- than this p.75 "quadruple initial fortune and if throw 81 heads and 93" swap(ped) p.75 by 8 <- by 7.43 p.79 line 9: we have supposed that ... > 1 <- < 1 ( clashes with p.78 ) p.79 there is (i) and then (i) again p.79 the 2nd (ii) <- (iii) p.86 de Witt <- De Witt ( in NLese, unless the 1st name preceds, in which case we write eg Jan de Witt ) p.252 sleep for half per hour <- an hour p.283-284 the last eq: numerator and denominator are (to be) swapped ; check it against the similar formulas on p.299-300 p.285 & elsewhere: replace l, which looks like 1, by L; this should be easy since there aren't too many small l's around p.333 Let us write q=1-p ... we will reject <- accept p.362 Huygen's <- Huygens's p.362 & p.363 a a <- a : try hard not to use a small a to avoid eg a a at (ii); this may be hard, since an a is a heavily overloaded symbol Typos in your errata : p.6 equality signs should NOT be reversed (see your corrections for p.4+5 ) p.65 ..... footnote 23 <- 21 p.75 ... 2.6.7 line 1. <- (ii) p.77: Exercise 2.6.10: Are you sure that your fix is ok ?? p.290 Expression <- p.291 Improve your Index : Add page nrs : to Kelly : XII, 308 ; to Thorp : 331 Missing in the index (while explicitly present in the text) : - QuickSort 175 ; payout ratio 71-2 ; sorting 175 ; miniMax 204 - status quo point 226-8, 232-3 - disagreement point == status quo point ( I call it: no deal point ) - prisoner's dilemma 213, 232-3, 271-2, 276 - RSA 138 ; Penney 267 ; Chicken 214 ; Hoeffding's ineq. 329-30, 332 ; Missing in the index (while implicitly present in the text) : - geometric mean == geomean - Nash product (x -x0)(y -y0) - Minimum spanning tree == MST in your sect. on shortest paths .- References: { only books & periodicals have all keywords in their title start with Caps ; not so in the titles of papers. Unlike in the text, our format of the names is Familyname Firstname; quiz: for which 2 reasons ? } Bellman Richard, Kalaba R: + On the role of dynamic programming in statistical information theory, IRE Trans. on Information Theory, 1957/9, 197-203 + Dynamic programming and statistical communication theory, PNAS, 13, 1957, 749-51 Bernstein Peter L: Against the Gods - The Remarkable Story of Risk , 1996, has a separate Name index Bernstein William J. & Wilkinson David: Diversification , rebalancing , and the geometric mean frontier , 1997-11-24 , on www Dijkstra E.W: A note on two problems in connexion with graphs, Numerische Mathematik, 1, 1959, 269-71 Garza Gene: Murphy's law and probability or how to compute your misfortune, Mathematics Magazine, 60/3, June 1987, 159-65 Graham Ronald & Hell Pavol: On the history of the minimum spanning tree problem, Annals of the History of Computing, 7/1, January 1985, 43-57, on p.54 fig.6 nicely visualizes historical motivations & relationships among 5 MST algos Haigh John: Taking Chances, 2nd ed. 2003 ; for Kelly on horses only see the chap.11 pp.235-8 & Appendix V ; on www are his errata as "Corrections" for both editions separately Howard Ronald A: Dynamic Programming and Markov Processes , 1969 Kahneman Daniel et al: Would you be happier if you were richer? A focussing illusion ; Science, vol.312, 2006-6-30, 1908-10 Kaufmann A: Graphs, Dynamic Programming, and Finite Games , 1967, see p.32 Kaufmann A & Cruon : Dynamic Programming, 1967; on pp.265-68 is the sect.40 on using NonAdditive values in DP Kelly John L: A new interpretation of information rate , Bell Systems Technical Journal, July 1956, 917-26 Korner Tom W: Naive Decision Making, 2008, find his errata on www and more of his typos here above. MacLean Leonard C, Thorp Edward O, Ziemba William T. (editors) : The Kelly Capital Growth Investment Criterion : Theory & Practice , 2011, 850pp. This book edited by 3 eminent prof(i)s bundles all key papers on the Kelly criterion ; 3 good papers are already at : www.edwardothorp.com / Publications / New in 2010 : 1st paper : Good and bad properties of the Kelly criterion, 2010-1-1 2nd paper : KellySimulationsNew.pdf is on simulations , 2010-1-18 3rd paper : Understanding_Kelly_New.doc by Thorp (more by Thorp below) Marcus Michael: The utility of a communication channel and applications to suboptimal information handling procedures, IRE Trans. on Information Theory, 1958, 147-151 Mesterton-Gibbons M: An Introduction to Game-Theoretical Modelling, 1992, p.91, where his fig.3.1 graphically visualizes quantitative relationships of Nash equilibrium , Pareto-optimal points, etc ; see also p.102 Mirowski Philip: Machine Dreams - Economics Becomes a Cyborg Science, 2002; pp.372-6, 382, 389 are on Kelly strategy Murphy Roy E: Adaptive Processes in Economic Systems, 1965; on pp.123-5 he has nicely visualized (Fig.7.4 on p.123) his derivation of his inequalities (7.5.12) & (7.5.19) Paulos John Allen: A Mathematican Plays the Stock Market, 2003; on pp.96-99 he well explains the need to use a geomean , but is unaware of the possibility to turn a sure LOSER into a WINNER via Kelly fraction Poundstone William: Fortune's Formula, 2005; too little math, good notes & refs Prim R.C: Shortest connection networks , Bell Systems Technical Journal, 36, 1957, 1389-1401 Roberts Fred: Measurement Theory with Applications to Decisionmaking, Utility Theory, and the Social Sciences, 1979, see pp.15 & 82 Thorp Edward O: The Kelly criterion in blackjack, sports betting, and the stock market , 1997, 2000, 2006 = the last version of 45 pp at : www.edwardothorp.com (more under MacLean above) Vince Ralph: Optimal f and the Kelly criterion, IFTA Journal, 2010 ; his in-line (a)-(j) and 1b[r=0] point to the Notes at the end of his paper . Since 1990 he has written 5 books on his "optimal f". More upon request. I hope that you will bless us with your own deep(er) insights & explanations. Looking forward to your thoughtful feedback, please don't hesitate to correct me , especially if I am not wrong ! :-) Jan Hajek in NL (we are just simple folks here) .-