-.- Critical prior interval CPI for odds ratio works also for risk ratio, and for likelihood ratios CopyRight (C) 2007 , Jan Hajek , NL, version 3.6 of 2007-6-27 Abstract: Critical prior interval CPI does not replace standard confidence interval CI. CPI provides for an objectively sceptical credibility check of CI. CPI objectively evaluates CI and indicates whether or not a CI is wider than reasonable. 1. I have realized that if proper confidence limits aka confidence bounds (L, U) of a confidence interval CI are computed for a ratio RR , LR+ , LR- , then L & U can be plugged directly into the single formula originally intended for CPI for odds ratio OR only. 2. For the bounds (L, U) of a CI for RR , LR+ , LR- , I present more accurate formulae than those found in books. Yet my more accurate formulae do not require more data. All the necessary formulae presented here were checked and are up & running in my program CPI.EXE since 2003. Motto: If it's not checked, it's wrong. { I.J. Good } +Contents: +Computing a CPI of our ratios is easy +Formulas for 95%CI of OR , RR , LR+ , LR- +Fine tuning of a Bayesian interval for log-odds ratio +Why one and the same formula works for more of our ratios +References +Abbreviations: ln = logarithmus naturalis pdf = probability density function, or distribution rhs = right hand side sqrt = square root -.- +Computing a CPI of our ratios is easy: The most prolific statistician I.J. Good (1916-date), a codebreaker at Bletchley Park in UK where he has been Alan Turing's stats assistant during WWII, has published a thin but meaty book {0} in 1950 . Robert Matthews' CPI is a specific application of Jack Good's general idea in {0} to invert Bayes theorem to estimate a prior probability or "initial probability" as he has called it in his book. Matthews takes the value of a prior odds ratio = 1 as an objective sceptical value. This is a rational prior value for all our ratios which all equal 1 for the initial presumption of no statistical dependence, hence no causation. Good's "final probability" is nowadays called "posterior probability". In 2001 Robert A.J. Matthews aka RAJM has published two papers {1} , {2} { also on www , see my References } on critical prior interval CPI for odds ratio OR with its confidence interval CI known and proper ie not including the value 1 which for all our ratios means independent events. I have realized that if we have the proper (L, U) of the standard 95% CI for any of our ratios, OR , RR , LR+ , LR- , then we can compute the corresponding bounds ( Lo , 1/Lo ) of 95% CPI(ratio) from the same formula as for the odds ratio OR : Lo = exp{ -0.5*square[ ln(U/L) ] /sqrt( square[ ln(U*L) ] - square[ ln(U/L) ] ) } = exp{ -0.25*square[ ln(U/L) ] / sqrt[ ln(U)*ln(L) ] } (eq.1) Uo = 1/Lo , so that CPI(ratio) is ( Lo , 1/Lo ) for any of our ratios. If CPI overlaps with CI(ratio) then it is a signal that the ratio may not be credible at the 95% level. Note that to obtain CPI we need only the proper CI , not the ratio itself. Hence a published CI for one of our ratios is enough to do the credibility check. I will not repeat Matthews' derivation, but here is some time saving help to understand his papers {1} & {2} : For a symmetric unimodal pdf with a mean M, variance V, and nn%CI with bounds (L , U) , it holds : L = M - k.sqrt(V) < U = M + k.sqrt(V) ; we do L + U and get: M = (L + U)/2 ; we do U - L and get: U - L = 2k.sqrt(V) = 2k.S , where S = standard error ; hence : V = square[ (U - L)/(2k) ] , with the quasi-constant : k = fun( pdf , and the confidence level nn% required ); k = 1.96 for a Gaussian pdf and 95%CI , hence V = square[(U - L)/(2*1.96)] = square[(U-L)/3.92] = square[0.255*(U - L)]. Matthews' unexplained constants for 95% CPI of a Gaussian are : 0.255 = 1/(2*1.96) = 1/3.92 ; 3.8416 = square(1.96) ; 0.5 = 1.96 * sqrt[ square(0.255) ] For each one of our 4 ratios the key equations for 3 means M_i and their 3 variances V_i of their 3 normal distributions aka Gaussian pdf's N(M_i, V_i) each with its own mean M_i and its own variance V_i ( all indexed by _i for o = original, initial, prior; d = data, evidence; c = combined, final, posterior) are the equations for 95%CI_i bounds : L_i = M_i - 1.96*sqrt(V_i) ; U_i = M_i + 1.96*sqrt(V_i) (eq.2) Objective scepticism is expressed as CPI(ratio) containing 1 because a ratio = 1 ie ln(ratio) = 0 means an effect y independent of x , ie no effect caused by x . Ln(ratio) is a transformation of actually binomial data from a 2x2 contingency table. A logarithmic transformation makes for an approximately Gaussian pdf which is always symmetric unimodal, hence has its single maximum aka mode = its mean = its meadian in the middle of its supporting range. So we assume the prior value of ln(ratio=1) = ln(Mo=1) = 0. Taking Mo = 1 takes out a subjectivity and puts in an unbiased objectivity expressing initial presumption of independence ie no causation, which should be the standard presumption for double blind clinical trials and for legal trials. Robert Matthews uses two more equations for Gaussian pdfs from Peter Lee's book {3} sect. 2.2, p.37 in the 2nd edition, p.35 in the 3rd ed. : 1/Vc = 1/Vo + 1/Vd = (Vd+Vo)/(Vo.Vd) (eq.3) Mc/Vc = Mo/Vo + Md/Vd (eq.4) Further below I derive these equations via MAP . an approach different from that of Peter Lee. For those who want deeper insights into the true meaning of these equations, we obtain from those equations : 0 < Vc = Vo.Vd/(Vo + Vd) < min( Vo , Vd ) the < is explained below; Mc = Mo.Vd/(Vo + Vd) + Md.Vo/(Vo + Vd) is a weighted average with weights Vd/(Vo + Vd) = (1/Vo)/( 1/Vo + 1/Vd ) = Wo < 1 , and Vo/(Vo + Vd) = (1/Vd)/( 1/Vo + 1/Vd ) = (1 - Wo) < 1 where a "precision" 1/V = Fisher information for a Gaussian pdf. Clearly, the smaller the variance, the larger the weight . Formally 0 <= Wo <= 1 is allowed (an = would hold if either Vo=0 or Vd=0 , not both), but in reality it holds 0 < Wo < 1. The resulting Mc is a compromise : Mc = Wo.Mo + ( 1 - Wo).Md a weighted arithmetic average form = Md + (Mo - Md).Wo an interpolation form = Md - (Md - Mo).Wo a feedback form = Mo - (Mo - Md).(1-Wo) = Mo + (Md - Mo).(1-Wo) The interpolation forms tell us that min(Mo, Md) < Mc < Max(Mo, Md) . Clearly Wo = Vd/(Vo+Vd) < 1 , hence 0 < [ Vc = Vo.Vd/(Vo+Vd) = Vo.Wo = Vd.(1-Wo) ] <= min(Vo, Vd) because of 0 < Wo < 1. The inequality 0 < Vc < min(Vo, Vd) is surprising & important, as it tells us that the variance of the combined mean Mc is smaller than any of the variances of the two constituing means Mo and Md. We obtained a variance reduction ! It is surprising ie against common sense that by combining a (very) uncertain value with a (much) more certain value we obtain Mc which is less varying than the best of both its components ! The equations (eq.3) and (eq.4) generalize to : 1/Vc = 1/V1 + 1/V2 + ... + 1/Vn (eq.3g) Mc/Vc = M1/V1 + M2/V2 + ... + Mn/Vn (eq.4g) 1/V's in (eq.3g) combine like 1/R's for resistors (or 1/C's for capacitors) connected in parallel. Much less know is that (eq.4g) has an exact electric circuit analog called Millman's theorem. Despite perfect analogy, I am not aware of any semantically deep relationship between the derivations of combined variances and the derivations of Ohm's law, Kirchoff's laws, Norton's theorem or Millman theorem. From (eq.3) & (eq.4) above it trivially follows : 1/Vc = 1/Vo + 1/Vd = ( Mo/Vo + Md/Vd )/Mc which algebraically reduces to : (1 - Mo/Mc)/Vo = (Md/Mc - 1)/Vd which simpifies to : ( Mc - Mo )/Vo = ( Md - Mc )/Vd so that the desired prior variance is : Vo = Vd.(Mc - Mo)/(Md - Mc) (eq.5) Note that from the above shown interpolation form for Mc it follows that either (Mo < Mc < Md) , or (Md < Mc < Mo), so that (eq.5) yields Vo >= 0. For those who like still more insights I add my alternative derivation of (eq.4) : p(.) is a pdf ; pi = 3.14.. ; Mo is a known constant ; p(Mc) = 1/sqrt(2pi.Vo).exp{ -square(Mc - Mo)/(2Vo) } is Gaussian pdf; p(Md|Mc) = 1/sqrt(2pi.Vd).exp{ -square(Md - Mc)/(2Vd) } is Gaussian pdf; Directly from the definition of conditional probability follows : p(Mc|Md).p(Md) = p(Mc,Md) = p(Md,Mc) = p(Mc).p(Md|Mc) ; hence : p(Mc|Md) = p(Mc).p(Md|Mc)/p(Md) is the basic Bayes rule . Elementary calculus tells us that an extreme value of a function obtains if we set to zero the first derivative [of that function, here p(Mc|Md) ] wrt the sought variable, here Mc. To get an extreme it is allowed, and often advantageous, to find an extreme of a function g(p(.)), provided g(.) is monotonic. We use ln(p(.)) to get rid of exp(.) : 0 = d/d(Mc)[ ln(p(Mc|Md)) ] = d/d(Mc)[ ln(p(Mc)) + ln(p(Md|Mc)) -0 ] where -0 is the derivative of /p(Md) which is constant wrt Mc. Substituing our Gaussians into the rhs, and taking its 1st derivative, we get : 0 = -2.(Mc - Mo)/(2.Vo) + 2.(Md - Mc)/(2.Vd) , from which : Mc.(1/Vo + 1/Vd) = Mo/Vo + Md/Vd ie our (eq.4) & (eq.3) far above. The technique I used to derive the most probable Mc is called MAP ie maximum posterior probability estimation. MAP derivation is more automatic ie not dependent on such intuitive steps like "It is now convenient to write ...", "Adding into the exponent ... a constant" used in Peter Lee's derivation in {3}, p.37 in 2nd ed., p.35 in 3rd ed. Although different from Peter Lee's derivation, my MAP-ing also starts with an explicit assumption about the shape of both pdf's. Yet there is a 3rd way , not assuming any specific pdf's, starting with a liner combination which is justifiable as being simple & robust so that it does not overfit data : Mc = w.Mo + (1-w).Md ; then the variance of a weighted sum is : Vc = Vo.w^2 + Vd.(1-w)^2 + 2w.(1-w).cov(Mo,Md) = combined variance = (So.w)^2 + (Sd.(1-w))^2 + 2w.(1-w).cov(Mo,Md) = (So.w)^2 + (Sd.(1-w))^2 + 2w.(1-w).So.Sd.corr(Mo,Md) which is an analog of c^2 = a^2 + b^2 + 2.a.b.cos(3.14.. - angle(a,b) in radians) because the correlation coefficient corr(x,y) = cov(x,y)/[S(x).S(y)] , -1 <= corr(.) <= 1 , and -1 <= cos(.) <= 1. Hence our equation with +2w.(1-w).cov(Mo,Md) is an analog of the equation for the length of a vector Sc resulting from summing up two vectors of lenghts w.So and (1-w).Sd . Vector difference, with -2w.(1-w).cov(Mo,Md), is analogous to the cosine law. These equations are generalizations of the Pythagorean theorem to any triangle. Vc = the chosen measure of error to be minimized. Setting to zero the 1st derivative of Vc wrt w yields the optimal weight w for which Vc is minimized without assuming any specific pdf . w = [ Vd - cov(Mo,Md) ]/[ Vo + Vd - 2cov(Mo,Md) ] For independent (or even for only uncorrelated Mo and Md) is covariance = 0 , a frequently made simplifying assumption , hence w =. Vd/(Vo + Vd) = [1/Vo]/[ 1/Vo + 1/Vd ] approximately for any pdf . Quiz: is w < 0 possible ? what does/would it mean ? Note that Fisher information F is : F = 1/V for a Gaussian pdf with variance V F = 1/L for a Poisson pdf with variance L ( L always equals the mean M ). Cramer-Rao inequality (the uncertainty principle of mathematical statistics) is: MSE >= 1/F for an (un)biased estimator ; MSE = Var + square(bias) Var >= 1/F for an unbiased estimator . We see that it all semantically fits : it makes a lot of common sense to combine estimators (here our means Mo, Md) by weighting each by its precision = 1/imprecision = 1/V_i . The uncommon sense is that the resulting weighted average Mc has variance 0 < Vc <= min( Vo , Vd ). Independence implies uncorrelatedness but not necessarily vice versa, ie uncorrelatedness does not necessarily imply independence, ie uncorrelated may nevertheless be independent. However, 2 jointly Gaussian random vectors or r.v.'s which are uncorrelated are also independent. But Gaussian r.v.'s need not to be jointly Gaussian. A linear relationship implies a correlation coefficient near to 1 or to -1, but not necessarily vice versa. High correlation is not equivalent (= a 2-way implication) to "most probably a linear relationship". The correct reasoning holds the opposite 1-way only : IF there is a linear relation between X and E[Y|X] , THEN the value of the correlation coefficient will be close to an extreme, and this will be due to a small mean squared error MSE . IF at least one of 2 r.v.'s has a ZERO-mean , eg Mo = 0 , THEN orthogonality & uncorrelatedness mutually imply each other, ( 2-way implication ie equivalence ) hence : IF none of both r.v.'s has a ZERO-mean , THEN they cannot be orthogonal & uncorrelated simultaneously. With my explanations, you should better understand Matthews' papers. The rest of this epaper is spent on how to compute CI of each of our ratios, and on my explanation why one CPI formula can handle them all. Extra insights for those who want to become rich : The weighted sum of returns Ri is Y = Sum_i[ Wi.Ri ], with exact variance : Var(Sum_i[ Wi.Ri ]) = Sum_j Sum_k[ Wj.Wk.cov(Rj,Rk) ] = Sum_i[ Wi^2 .Var(Ri) ] + Sum_j Sum_k[ Wj.Wk.cov(Rj,Rk) ] , j <> k = Sum_i[ Wi^2 .Var(Ri) ] + 2.Sum_j Sum_k[ Wj.Wk.cov(Rj,Rk) ] , j < k = Sum_i[ Wi^2 .Var(Ri) ] + 2.Sum_j [ Wj.Sum_k[ Wk.cov(Rj,Rk) ]] , j < k This formula is the key to the portfolio theory of 1951-9 by Harry Markowitz, who has been awarded Nobel prize for economy in 1990 for the theory of optimal portfolio selection based on weight optimization in the variance of a weighted sum of investments' risks. Total variance of a portfolio ie the volatility hence the risk of the compound investments will be greatly reduced - if there will be sufficient diversity of investments, and - if there will be negatively correlated investments ie with cov(Rj,Rk) < 0 - if the weights will be chosen so as to minimize var(Y) of the portfolio. -.- Formulas for 95%CI of OR , RR , LR+ , LR- : The recommended standard format of a 2x2 contingency table is : a | b || x = evidence ( a test result, an exposure, an alleged cause ) c | d || ~x = non(x) , a quasi-random event complementary to x ----|----||------ y | ~y || N = a+b+c+d = the total count y = effect (eg a disorder y possibly caused by x ) a,b,c,d are shorthand symbols for counts of joint events n(u,v) ie n(u&v) : a = n(x,y) ; b = n(x,~y) ; c = n(~x,y) ; d = n(~x,~y) ; P(.) are proportions = the simplest estimates of probabilities : P(x) = n(x)/N = (a+b)/N P(y) = n(y)/N = (a+c)/N = prevalence of y, a prior probability . P(y|x) = n(x,y)/n(x) = a/(a+b) = positive predictive value, is a posterior probability (after the occurrence of x was observed). P(y|x)*Px = P(y,x) = P(x,y) = Py*P(x|y) is the basic Bayesian equation . RR = P(y|x)/P(y|~x) = [a/(a+b)]/[c/(c+d)] = relative risk aka risk ratio LR+ = P(x|y)/P(x|~y) = sensitivity/(1 - specificity) in a 2x2 table = [a/(a+c)]/[b/(b+d)] = positive likelihood ratio LR- = P(~x|y)/P(~x|~y) = (1 - sensitivity)/specificity in a 2x2 table = [c/(a+c)]/[d/(b+d)] = negative likelihood ratio OR = a*d/(b*c) = (a/b)*(d/c) = (a/b)/(c/d) = LR+ / LR- is odds ratio Each of our ratios, OR , RR , LR+ , LR- , has the same general formula for its confidence interval CI of bounds (L, U), but each ratio has its own specific formula for its own standard error S[.] = sqrt(Var[.]) where Var is a variance . The general formula with the factor 1.96 for 95% CI is : L = exp{ ln(ratio) - 1.96*sqrt( Var[ ln(ratio) ] ) } U = exp{ ln(ratio) + 1.96*sqrt( Var[ ln(ratio) ] ) } CI's bounds L, U can be plugged into the formula for CPI's Lo shown earlier above, and so to obtain CPI with the bounds ( Lo , 1/Lo ). Var[ln(OR)] = 1/a + 1/c + 1/b + 1/d is variance of odds ratio. For a ratio of two proportions of the type p = z/n ie n.p = z , the commonly published approximation of the variance of the logarithm of p1/p2 is Var[ln(p1/p2)] = (1 - p1)/(n1.p1) + (1 - p2)/(n2.p2) = (1 - z1/n1)/z1 + (1 - z2/n2)/z2 = (1/z1 - 1/n1) + (1/z2 - 1/n2) = term1 + term2 I have derived more accurate formulae for Var[ln(p1/p2)] ; the simplest one uses no more information than term1 and term2 : Var[ln(p1/p2)] = term1 + term2 + term1*term2 is an improved formula ; for Var[ln(RR)] is term1 = 1/a - 1/(a+b) , term2 = 1/c - 1/(c+d) ; for Var[ln(LR+)] is term1 = 1/a - 1/(a+c) , term2 = 1/b - 1/(b+d) ; for Var[ln(LR-)] is term1 = 1/c - 1/(a+c) , term2 = 1/d - 1/(b+d) . My additional term1*term2 makes a difference when the counts are low. An example : the case of Lucia de Berk aka Lucia B. working in RKZ42 : 1.0 = a 57.0 = b | 58.0 = a+b Lucia has worked 10.0 = c 271.0 = d | 281.0 = c+d Lucia did not work ---------------------------------|---------------- 11.0 = a+c 328.0 = b+d | 339.0 with the corresponding output from my program CPI for this 2x2 table : note that the relative risk aka risk ratio RR < 1 for Lucia at work; also note that CPI caN`t be computed because CI encloses ratios = 1 : LR+= 0.523 < 1 95%CPI=(caN`t, 1 in CI) ?1? ( 0.080 to 3.441)=95%CI LR+= 0.523 < 1 90%CPI=(caN`t, 1 in CI) ?1? ( 0.108 to 2.542)=90%CI OR = 0.475 < 1 95%CPI=(caN`t, 1 in CI) ?1? ( 0.060 to 3.788)=95%CI OR = 0.475 < 1 90%CPI=(caN`t, 1 in CI) ?1? ( 0.083 to 2.714)=90%CI cOR = 0.250 RR = 0.484 < 1 95%CPI=(caN`t, 1 in CI) ?1? ( 0.063 to 3.712)=95%CI RR = 0.484 < 1 90%CPI=(caN`t, 1 in CI) ?1? ( 0.088 to 2.676)=90%CI cRR = 0.250 Qsuf= 0.98133 = 98% = RR(~y:~x) = I.J. Good`s x SufFor y RRR = -0.51552 =-52% = Rel.risk up (if +) or down (if -) ARR = -0.01835 = -2% = Abs.risk up (if +) or down (if -) NN1 = -55 = 1/ARR = number needed for 1 effect (NNT if +, NNH if -) Hyx = 0.01867 = 2% = Hyx = Hajek`s fraction < PF HyM = 0.83754 = 84% = HyM = causal impact factor if ARR < 0 PF = 0.51552 = 52% = PF = prevented fraction > Hyx for OR : w/ cov : ( 0.0597 to 3.788)=95%CI for OR : sharper: ( 0.0817 to 2.765)=95%CI for RR : term1*2: ( 0.0579 to 4.051)=95%CI for RR : simple : ( 0.0632 to 3.711)=95%CI for RR : w/ cov : ( 0.0632 to 3.711)=95%CI for RR : sharper: ( 0.0850 to 2.760)=95%CI -.- +Fine tuning of a Bayesian interval for log-odds ratio : In Peter Lee's book {3} p.152-3 in 2nd & 3rd edition, there is a section 5.6 on "Comparison of two proportions: the 2x2 table", which is relevant to Matthews' papers although not mentioned by him. For a 2x2 table (which is binomial) with Beta priors, Peter Lee derives posterior distributions which are conjugate Beta(A+a, B+b) and Beta(C+c, D+d), where A,B,C,D, are the prior counts. Log-odds ratio of odds L1, L2 has a normal pdf : ln(OR) = ln(L1/L2) = ln(L1) - ln(L1) ~ ~ N( ln( A+a-1/2).(D+d-1/2)/[ (B+b-1/2).(C+c-1/2) ] ) , 1/(A+a) + 1/(D+d) + 1/(B+b) + 1/(C+c) ) where the 1/2s are not Jeffreys', they come from {3} section 3.1. Unlike Peter Lee I will neither simplify by dropping these 1/2s, nor shall I use Haldane reference prior counts A=B=C=D=0.0 . I prefer to keep the 1/2s and use Jeffreys' prior ie A=B=C=D=0.5 which gets me : ln(OR) = ln(L1/L2) = ln(L1) - ln(L1) ~ ~ N( ln(a.d/(b.c)) , 1/(0.5+a) + 1/(0.5+d) + 1/(0.5+b) + 1/(0.5+c) ) where 1: 1/2s and 0.5s annuled in the mean; 2: 0.5s were added to the 4 components of the variance of ln(OR) . Many statisticians recommend adding 0.5 to prevent division by zero in case of a=0 or b=0 or c=0 or d=0, which was not my motivation, but I am glad to get it by other means. The difference between this and other statisticians' formula is that they would have added 0.5 to each of the 4 entries in a 2x2 table, so that they would have ln( (a+0.5).(d+0.5)/[ (b+0.5).(c+0.5) ] ) . -.- +Why one and the same formula works for more of our ratios : So far for opeRational aspects, now comes a "proof" of my extension. 1. All our ratios have the same general formula for their CI bounds (L, U). 2. RR , LR+ , LR- have the same form P(u|v)/P(u|~v) which equals to 1.0 , like OR does, if (quasi)random events u, v are independent. 3. All the specific formulas are based on the same basic idea of computing the standard error S of ln(ratio). Q: Why logarithm ? A: In general: Our ratios are always >= 0 , hence their pdf's are heavily skewed towards zero. Taking a logarithm of a ratio < 1 transforms it to values < 0, while log of a ratio > 1 has values > 0. So the log-transformed pdf becomes rather symmetrical about 0 and approximately Gaussian which is a unimodal pdf ie it has only one local maximum = its global maximum. A unimodal symmetrical pdf has a huge advantage : their mode = their mean. In fact, it is the mode ie the global maximum of a pdf which does or should matter most most of the time, however it is easier to work with the mean ie with the expected value. Whenever we work with the mean of a unimodal symmetric pdf, we are also working with its mean. Additional advantages of a Gaussian pdf aka normal distribution : - It is relatively easy to work with, and much about it can be found in many easily accessible books, not only in specialized monographies, - From all continuous distributions when the first two moments ie the mean and the variance are known for data generated or collected under fixed ie constant empirical or experimental conditions, a Gaussian pdf has maximum entropy. MaxEnt means that a pdf is the most uncertain one, ie the "least informed", the "least suggestive", the "least assuming" pdf. On the properties of the above mentioned ratios : Let rel stand for only one of the relational operators = , < , > ; then : P(u|v) rel P(u|~v) ie P(u|v)/P(u|~v) rel 1 for independent u, v ; hence : Puv/Pv rel (Pu - Puv)/(1 - Pv) where Puv stands for P(u,v) ie P(u&v) Puv - Puv.Pv rel Pu.Pv - Puv.Pv ie: Puv rel Pu.Pv confirms mutual independence of events u, v ; q.e.d. The following equivalence == always holds : [ P(y|x) rel P(y|~x) ] == [ P(x|y) rel P(x|~y) ] Hence for any pair of non-independent events x, y it always holds : either: [ P(y|x) > P(y|~x) ] and [ P(x|y) > P(x|~y) ] , or: [ P(y|x) < P(y|~x) ] and [ P(x|y) < P(x|~y) ] , but not both. q.e.d. -.- +References : Hints: Above find the word help for my help in deciphering Matthews' derivations. If short on time, start to read his papers from their middle, or read them backwards ie start with his Appendix. You can copy and paste into your browser the http's here provided to get direct access to papers by Robert Matthews . {0} I.J. Good : Probability and the Weighing of Evidence, 1950 ; on the use of Bayes theorem in reverse see pp. 81, 35, 70 . {1} Robert Matthews : Why should clinicians care about Bayesian methods ? Journal of Statistical Planning and Inference 94 (2001) 43-58, plus discussions on pp. 59-71 , also on www at : http://ourworld.compuserve.com/homepages/rajm/jspib.htm Matthews' unexplained constants are explained here above ie at http://www.humintel.com/hajek in my epaper on CPI {2} Robert Matthews : Methods for assessing the credibility of clinical trial outcomes ; Drug Information Journal , 35 (2001), 1469-1478, at http://www.diahome.org/content/abstract/2001/dij1740.pdf Matthews' unexplained constants are explained here above ie at http://www.humintel.com/hajek in my epaper on CPI {3} Peter Lee : Bayesian Statistics ; 3 printings and 3 editions differ in page nrs and errata (easily found on www ). Matthews {1} has used 2nd ed., 1997, chapter 2. As the relevant parts in the 2nd ed. I identified the sections 2.2 p.36-38 on normal prior and likelihood, and 5.6 p.152 on log-odds ratio, supported by 3.1 p.80 on log-odds supported by two appendices A19 p.290 and A20 p.291 {4} Evidence-Based Medicine - How to Pactice and Teach EBM , 3rd ed., 2005; do NOT use the 2nd edition which contains too many errors in general, and too many errors in the formulae in the appendix on confidence intervals in particular. {5} Joseph Fleiss et al: Statistical Methods for Rates and Proportions, 3rd edition, 2003. {6} David L. Simel, Gregory P. Samsa, David B. Matchar : Likelihood ratios with confidence : Sample size estimation for diagnostic test studies, Journal of Clinical Epidemiology, 44/8 (1991) pp. 763-770 {7} Doug Altman et al: Statistics with Confidence , 2nd ed. -.-