-.-
      Critical prior interval CPI for odds ratio works also
               for risk ratio, and for likelihood ratios

   CopyRight (C) 2007 , Jan Hajek , NL, version 3.6 of 2007-6-27

Abstract:  Critical prior interval CPI does not replace standard
   confidence interval  CI. CPI provides for an objectively sceptical
   credibility check of CI. CPI objectively evaluates CI and indicates
   whether or not a CI is wider than reasonable.
1. I have realized that if proper confidence limits aka confidence bounds
   (L, U) of a confidence interval CI are computed for a ratio RR , LR+ ,
   LR- , then L & U can be plugged directly into the single formula
   originally intended for CPI for odds ratio OR only.
2. For the bounds (L, U) of a CI for RR , LR+ , LR- , I present more
   accurate formulae than those found in books. Yet my more accurate
   formulae do not require more data.
All the necessary formulae presented here were checked and are up & running
in my program CPI.EXE since 2003.

Motto:  If it's not checked, it's wrong.  { I.J. Good }

+Contents:
+Computing a CPI of our ratios is easy
+Formulas for 95%CI of OR , RR , LR+ , LR-
+Fine tuning of a Bayesian interval for log-odds ratio
+Why one and the same formula works for more of our ratios
+References

+Abbreviations:
    ln = logarithmus naturalis
   pdf = probability density function, or distribution
   rhs = right hand side
  sqrt = square root

-.- +Computing a CPI of our ratios is easy:

The most prolific statistician I.J. Good (1916-date), a codebreaker at
Bletchley Park in UK where he has been Alan Turing's stats assistant
during WWII, has published a thin but meaty book {0} in 1950 .
Robert Matthews' CPI is a specific application of Jack Good's general
idea in {0} to invert Bayes theorem to estimate a prior probability or
"initial probability" as he has called it in his book. Matthews takes the
value of a prior odds ratio = 1 as an objective sceptical value. This is
a rational prior value for all our ratios which all equal 1 for the
initial presumption of no statistical dependence, hence no causation.
Good's "final probability" is nowadays called "posterior probability".
In 2001 Robert A.J. Matthews aka RAJM has published two papers {1} , {2}
{ also on www , see my References } on critical prior interval CPI for
odds ratio OR with its confidence interval CI known and proper ie not
including the value 1 which for all our ratios means independent events.

I have realized that if we have the proper (L, U) of the standard 95% CI
for any of our ratios, OR , RR , LR+ , LR- , then we can compute the
corresponding bounds ( Lo , 1/Lo ) of 95% CPI(ratio) from the same
formula as for the odds ratio OR :

Lo = exp{  -0.5*square[ ln(U/L) ]
         /sqrt( square[ ln(U*L) ] - square[ ln(U/L) ] ) }

   = exp{ -0.25*square[ ln(U/L) ] / sqrt[ ln(U)*ln(L) ] }       (eq.1)

Uo = 1/Lo , so that CPI(ratio) is ( Lo , 1/Lo ) for any of our ratios.
If CPI overlaps with CI(ratio) then it is a signal that the ratio may
not be credible at the 95% level.  Note that to obtain CPI we need only
the proper CI , not the ratio itself. Hence a published CI for one of
our ratios is enough to do the credibility check. I will not repeat
Matthews' derivation, but here is some time saving help to understand
his papers {1} & {2} :
For a symmetric unimodal pdf with a mean M, variance V, and nn%CI with
bounds (L , U) , it holds :

L = M - k.sqrt(V)  <  U = M + k.sqrt(V) ; we do  L + U and get:
M = (L + U)/2                           ; we do  U - L and get:
U - L = 2k.sqrt(V) = 2k.S  , where S = standard error ; hence :
V = square[ (U - L)/(2k) ] , with the quasi-constant :
k = fun( pdf , and the confidence level nn% required );
k = 1.96 for a Gaussian pdf and 95%CI , hence
V = square[(U - L)/(2*1.96)] = square[(U-L)/3.92] = square[0.255*(U - L)].

Matthews' unexplained constants for 95% CPI of a Gaussian are :

0.255 = 1/(2*1.96) = 1/3.92 ;        3.8416 = square(1.96) ;
0.5 = 1.96 * sqrt[ square(0.255) ]

For each one of our 4 ratios the key equations for 3 means M_i and their
3 variances V_i of their 3 normal distributions aka Gaussian pdf's
N(M_i, V_i) each with its own mean M_i and its own variance V_i ( all
indexed by _i for o = original, initial, prior;  d = data, evidence;
 c = combined, final, posterior) are the equations for 95%CI_i bounds :

  L_i = M_i - 1.96*sqrt(V_i) ;   U_i = M_i + 1.96*sqrt(V_i)     (eq.2)

Objective scepticism is expressed as CPI(ratio) containing 1 because a
ratio = 1 ie ln(ratio) = 0 means an effect y independent of x , ie no
effect caused by x . Ln(ratio) is a transformation of actually binomial
data from a 2x2 contingency table. A logarithmic transformation makes for
an approximately Gaussian pdf which is always symmetric unimodal, hence
has its single maximum aka mode = its mean = its meadian in the middle
of its supporting range.
So we assume the prior value of ln(ratio=1) = ln(Mo=1) = 0. Taking Mo = 1
takes out a subjectivity and puts in an unbiased objectivity expressing
initial presumption of independence ie no causation, which should be the
standard presumption for double blind clinical trials and for legal trials.

Robert Matthews uses two more equations for Gaussian pdfs from Peter Lee's
book {3} sect. 2.2, p.37 in the 2nd edition, p.35 in the 3rd ed. :

   1/Vc =  1/Vo +  1/Vd = (Vd+Vo)/(Vo.Vd)        (eq.3)
  Mc/Vc = Mo/Vo + Md/Vd                          (eq.4)

Further below I derive these equations via MAP . an approach different
from that of Peter Lee. For those who want deeper insights into the true
meaning of these equations, we obtain from those equations :

 0 < Vc = Vo.Vd/(Vo + Vd) < min( Vo , Vd )     the < is  explained below;
     Mc = Mo.Vd/(Vo + Vd)  +  Md.Vo/(Vo + Vd)  is a weighted average with

weights  Vd/(Vo + Vd) = (1/Vo)/( 1/Vo + 1/Vd ) =      Wo  < 1 , and
         Vo/(Vo + Vd) = (1/Vd)/( 1/Vo + 1/Vd ) = (1 - Wo) < 1

where a "precision" 1/V = Fisher information for a Gaussian pdf. Clearly,
the smaller the variance, the larger the weight .  Formally 0 <= Wo <= 1
is allowed (an = would hold if either Vo=0 or Vd=0 , not both), but in
reality it holds 0 < Wo < 1. The resulting Mc is a compromise :

   Mc = Wo.Mo + ( 1 - Wo).Md      a weighted arithmetic average form
      =    Md + (Mo - Md).Wo      an interpolation form
      =    Md - (Md - Mo).Wo      a feedback form
      =    Mo - (Mo - Md).(1-Wo)
      =    Mo + (Md - Mo).(1-Wo)

The interpolation forms tell us that min(Mo, Md) < Mc < Max(Mo, Md) .

Clearly  Wo = Vd/(Vo+Vd) < 1 , hence
0 < [ Vc = Vo.Vd/(Vo+Vd) = Vo.Wo = Vd.(1-Wo) ] <= min(Vo, Vd) because of

0 < Wo < 1. The inequality 0 < Vc < min(Vo, Vd) is surprising & important,

as it tells us that the variance of the combined mean Mc is smaller than
any of the variances of the two constituing means Mo and Md. We obtained
a variance reduction ! It is surprising ie against common sense that by
combining a (very) uncertain value with a (much) more certain value we
obtain Mc which is less varying than the best of both its components !

The equations (eq.3) and (eq.4) generalize to :

   1/Vc =  1/V1 +  1/V2 + ... +  1/Vn                  (eq.3g)
  Mc/Vc = M1/V1 + M2/V2 + ... + Mn/Vn                  (eq.4g)

1/V's in (eq.3g) combine like 1/R's for resistors (or 1/C's for capacitors)
connected in parallel. Much less know is that (eq.4g) has an exact electric
circuit analog called Millman's theorem. Despite perfect analogy, I am
not aware of any semantically deep relationship between the derivations
of combined variances and the derivations of Ohm's law, Kirchoff's laws,
Norton's theorem or Millman theorem.

From (eq.3) & (eq.4) above it trivially follows :

1/Vc = 1/Vo + 1/Vd = ( Mo/Vo + Md/Vd )/Mc  which algebraically reduces to :

(1 - Mo/Mc)/Vo = (Md/Mc - 1)/Vd  which simpifies to :
( Mc - Mo )/Vo = ( Md - Mc )/Vd  so that the desired prior variance is :

Vo = Vd.(Mc - Mo)/(Md - Mc)    (eq.5)

Note that from the above shown interpolation form for Mc it follows that
either (Mo < Mc < Md) , or (Md < Mc < Mo), so that (eq.5) yields Vo >= 0.

For those who like still more insights I add my alternative derivation of
(eq.4) :  p(.) is a pdf ; pi = 3.14.. ; Mo is a known constant ;

p(Mc)    = 1/sqrt(2pi.Vo).exp{ -square(Mc - Mo)/(2Vo) }  is Gaussian pdf;
p(Md|Mc) = 1/sqrt(2pi.Vd).exp{ -square(Md - Mc)/(2Vd) }  is Gaussian pdf;

Directly from the definition of conditional probability follows :

p(Mc|Md).p(Md) = p(Mc,Md) = p(Md,Mc) = p(Mc).p(Md|Mc) ;  hence :

p(Mc|Md) = p(Mc).p(Md|Mc)/p(Md)   is the basic Bayes rule .

Elementary calculus tells us that an extreme value of a function obtains
if we set to zero the first derivative [of that function, here p(Mc|Md) ]
wrt the sought variable, here Mc. To get an extreme it is allowed, and
often advantageous, to find an extreme of a function g(p(.)), provided
g(.) is monotonic. We use ln(p(.)) to get rid of exp(.) :

0 = d/d(Mc)[ ln(p(Mc|Md)) ] = d/d(Mc)[ ln(p(Mc)) + ln(p(Md|Mc)) -0 ]

where -0 is the derivative of /p(Md) which is constant wrt Mc. Substituing
our Gaussians into the rhs, and taking its 1st derivative, we get :

0 = -2.(Mc - Mo)/(2.Vo) + 2.(Md - Mc)/(2.Vd) , from which :

Mc.(1/Vo + 1/Vd) = Mo/Vo + Md/Vd   ie our (eq.4) & (eq.3) far above.

The technique I used to derive the most probable Mc is called MAP ie
maximum posterior probability estimation. MAP derivation is more automatic
ie not dependent on such intuitive steps like "It is now convenient to
write ...", "Adding into the exponent ... a constant" used in Peter Lee's
derivation in {3}, p.37 in 2nd ed., p.35 in 3rd ed.  Although different
from Peter Lee's derivation, my MAP-ing also starts with an explicit
assumption about the shape of both pdf's.  Yet there is a 3rd way , not
assuming any specific pdf's, starting with a liner combination which is
justifiable as being simple & robust so that it does not overfit data :

Mc = w.Mo + (1-w).Md ;  then the variance of a weighted sum is :

Vc =  Vo.w^2  +  Vd.(1-w)^2  + 2w.(1-w).cov(Mo,Md) = combined variance
   = (So.w)^2 + (Sd.(1-w))^2 + 2w.(1-w).cov(Mo,Md)
   = (So.w)^2 + (Sd.(1-w))^2 + 2w.(1-w).So.Sd.corr(Mo,Md)  which is an

analog of  c^2 = a^2 + b^2 + 2.a.b.cos(3.14.. - angle(a,b) in radians)
because the correlation coefficient corr(x,y) = cov(x,y)/[S(x).S(y)] ,
-1 <= corr(.) <= 1 , and -1 <= cos(.) <= 1.
Hence our equation with +2w.(1-w).cov(Mo,Md) is an analog of the equation
for the length of a vector Sc resulting from summing up two vectors of
lenghts w.So and (1-w).Sd .
Vector difference, with -2w.(1-w).cov(Mo,Md), is analogous to the
cosine law. These equations are generalizations of the Pythagorean theorem
to any triangle.

Vc = the chosen measure of error to be minimized. Setting to zero the
    1st derivative of Vc wrt w yields the optimal weight w for which
Vc is minimized without assuming any specific pdf .

w = [ Vd - cov(Mo,Md) ]/[ Vo + Vd - 2cov(Mo,Md) ]

For independent (or even for only uncorrelated Mo and Md) is
covariance = 0 , a frequently made simplifying assumption , hence

w =. Vd/(Vo + Vd) = [1/Vo]/[ 1/Vo + 1/Vd ]  approximately for any pdf .

Quiz: is w < 0 possible ? what does/would it mean ?

Note that Fisher information F is :
F = 1/V for a Gaussian pdf with variance V
F = 1/L for a Poisson  pdf with variance L ( L always equals the mean M ).
Cramer-Rao inequality (the uncertainty principle of mathematical statistics)
is:
    MSE >= 1/F  for an (un)biased estimator ;    MSE = Var + square(bias)
    Var >= 1/F  for an   unbiased estimator .

We see that it all semantically fits : it makes a lot of common sense to
combine estimators (here our means Mo, Md) by weighting each by its
precision = 1/imprecision = 1/V_i . The uncommon sense is that the resulting
weighted average Mc has variance 0 < Vc <= min( Vo , Vd ).

Independence implies uncorrelatedness but not necessarily vice versa, ie
uncorrelatedness does not necessarily imply independence, ie uncorrelated
may nevertheless be independent. However, 2 jointly Gaussian random vectors
or r.v.'s which are uncorrelated are also independent. But Gaussian r.v.'s
need not to be jointly Gaussian.

A linear relationship implies a correlation coefficient near to 1 or to -1,
but not necessarily vice versa. High correlation is not equivalent (= a
2-way implication) to "most probably a linear relationship". The correct
reasoning holds the opposite 1-way only :
IF there is a linear relation between X and E[Y|X] ,
THEN the value of the correlation coefficient will be close to an extreme,
     and this will be due to a small mean squared error MSE .

IF at least one of 2 r.v.'s has a ZERO-mean , eg Mo = 0 ,
THEN orthogonality & uncorrelatedness mutually imply each other,
     ( 2-way implication ie equivalence ) hence :
IF none of both r.v.'s has a ZERO-mean ,
THEN they cannot be orthogonal & uncorrelated simultaneously.

With my explanations, you should better understand Matthews' papers.
The rest of this epaper is spent on how to compute CI of each of our
ratios, and on my explanation why one CPI formula can handle them all.

Extra insights for those who want to become rich :
The weighted sum of returns Ri is  Y = Sum_i[ Wi.Ri ], with exact variance :

Var(Sum_i[ Wi.Ri ])  =         Sum_j Sum_k[   Wj.Wk.cov(Rj,Rk)  ]
  = Sum_i[ Wi^2 .Var(Ri) ] +   Sum_j Sum_k[   Wj.Wk.cov(Rj,Rk)  ] , j <> k
  = Sum_i[ Wi^2 .Var(Ri) ] + 2.Sum_j Sum_k[   Wj.Wk.cov(Rj,Rk)  ] , j <  k
  = Sum_i[ Wi^2 .Var(Ri) ] + 2.Sum_j [ Wj.Sum_k[ Wk.cov(Rj,Rk) ]] , j <  k

This formula is the key to the portfolio theory of 1951-9 by Harry Markowitz,
who has been awarded Nobel prize for economy in 1990 for the theory of
optimal portfolio selection based on weight optimization in the variance of
a weighted sum of investments' risks. Total variance of a portfolio ie the
volatility hence the risk of the compound investments will be greatly reduced
- if there will be sufficient diversity of investments, and
- if there will be negatively correlated investments ie with  cov(Rj,Rk) < 0
- if the weights will be chosen so as to minimize var(Y) of the portfolio.


-.- Formulas for 95%CI of OR , RR , LR+ , LR- :

The recommended standard format of a 2x2 contingency table is :

    a |  b ||  x = evidence ( a test result, an exposure, an alleged cause )
    c |  d || ~x = non(x) , a quasi-random event complementary to x
  ----|----||------
    y | ~y ||  N = a+b+c+d = the total count
    y = effect (eg a disorder y possibly caused by x )

a,b,c,d are shorthand symbols for counts of joint events n(u,v) ie n(u&v) :

a = n(x,y) ;  b = n(x,~y) ;  c = n(~x,y) ;  d = n(~x,~y) ;

P(.) are proportions = the simplest estimates of probabilities :
P(x)   =   n(x)/N    = (a+b)/N
P(y)   =   n(y)/N    = (a+c)/N = prevalence of y, a prior probability .
P(y|x) = n(x,y)/n(x) = a/(a+b) = positive predictive value, is a
         posterior probability (after the occurrence of x was observed).

P(y|x)*Px = P(y,x) = P(x,y) = Py*P(x|y)  is the basic Bayesian equation .

RR  =    P(y|x)/P(y|~x)
    = [a/(a+b)]/[c/(c+d)] = relative risk aka risk ratio

LR+ =    P(x|y)/P(x|~y)   = sensitivity/(1 - specificity) in a 2x2 table
    = [a/(a+c)]/[b/(b+d)] = positive likelihood ratio

LR- =   P(~x|y)/P(~x|~y)  = (1 - sensitivity)/specificity in a 2x2 table
    = [c/(a+c)]/[d/(b+d)] = negative likelihood ratio

OR  = a*d/(b*c) = (a/b)*(d/c) = (a/b)/(c/d) = LR+ / LR-   is odds ratio

Each of our ratios, OR , RR , LR+ , LR- , has the same general formula
for its confidence interval CI of bounds (L, U), but each ratio has its
own specific formula for its own standard error S[.] = sqrt(Var[.])
where Var is a variance .

The general formula with the factor 1.96 for 95% CI is :

L = exp{ ln(ratio) - 1.96*sqrt( Var[ ln(ratio) ] ) }
U = exp{ ln(ratio) + 1.96*sqrt( Var[ ln(ratio) ] ) }

CI's bounds L, U can be plugged into the formula for CPI's Lo shown
earlier above, and so to obtain CPI with the bounds ( Lo , 1/Lo ).

Var[ln(OR)] = 1/a + 1/c + 1/b + 1/d  is variance of odds ratio.  For a

ratio of two proportions of the type p = z/n ie n.p = z , the commonly
published approximation of the variance of the logarithm of p1/p2 is

Var[ln(p1/p2)] = (1 - p1)/(n1.p1) + (1 - p2)/(n2.p2)
               = (1 - z1/n1)/z1   + (1 - z2/n2)/z2
               = (1/z1 - 1/n1)    + (1/z2 - 1/n2)  =  term1 + term2

I have derived more accurate formulae for Var[ln(p1/p2)] ; the simplest
one uses no more information than term1 and term2 :

Var[ln(p1/p2)] = term1 + term2 + term1*term2  is an improved formula ;

for Var[ln(RR)]  is term1 = 1/a - 1/(a+b) ,  term2 = 1/c - 1/(c+d) ;

for Var[ln(LR+)] is term1 = 1/a - 1/(a+c) ,  term2 = 1/b - 1/(b+d) ;

for Var[ln(LR-)] is term1 = 1/c - 1/(a+c) ,  term2 = 1/d - 1/(b+d) .

My additional term1*term2 makes a difference when the counts are low.

An example : the case of Lucia de Berk aka Lucia B. working in RKZ42 :

       1.0 = a        57.0 = b   |      58.0 = a+b  Lucia has worked
      10.0 = c       271.0 = d   |     281.0 = c+d  Lucia did not work
---------------------------------|----------------
      11.0 = a+c     328.0 = b+d |     339.0

with the corresponding output from my program CPI for this 2x2 table :
note that the relative risk aka risk ratio RR < 1 for Lucia at work; also
note that CPI caN`t be computed because CI encloses ratios = 1 :

 LR+=   0.523 < 1  95%CPI=(caN`t, 1 in CI) ?1? (  0.080 to   3.441)=95%CI
 LR+=   0.523 < 1  90%CPI=(caN`t, 1 in CI) ?1? (  0.108 to   2.542)=90%CI
 OR =   0.475 < 1  95%CPI=(caN`t, 1 in CI) ?1? (  0.060 to   3.788)=95%CI
 OR =   0.475 < 1  90%CPI=(caN`t, 1 in CI) ?1? (  0.083 to   2.714)=90%CI
cOR =   0.250
 RR =   0.484 < 1  95%CPI=(caN`t, 1 in CI) ?1? (  0.063 to   3.712)=95%CI
 RR =   0.484 < 1  90%CPI=(caN`t, 1 in CI) ?1? (  0.088 to   2.676)=90%CI
cRR =   0.250
Qsuf=   0.98133 = 98% = RR(~y:~x) = I.J. Good`s x SufFor y
RRR =  -0.51552 =-52% = Rel.risk up (if +) or down (if -)
ARR =  -0.01835 = -2% = Abs.risk up (if +) or down (if -)
NN1 =     -55 = 1/ARR = number needed for 1 effect (NNT if +, NNH if -)
Hyx =   0.01867 =  2% = Hyx =   Hajek`s fraction < PF
HyM =   0.83754 = 84% = HyM = causal impact factor if ARR < 0
 PF =   0.51552 = 52% = PF = prevented fraction > Hyx
                              for OR : w/ cov : ( 0.0597 to  3.788)=95%CI
                              for OR : sharper: ( 0.0817 to  2.765)=95%CI

                              for RR : term1*2: ( 0.0579 to  4.051)=95%CI
                              for RR : simple : ( 0.0632 to  3.711)=95%CI
                              for RR : w/ cov : ( 0.0632 to  3.711)=95%CI
                              for RR : sharper: ( 0.0850 to  2.760)=95%CI

-.- +Fine tuning of a Bayesian interval for log-odds ratio :

In Peter Lee's book {3} p.152-3 in 2nd & 3rd edition, there is a section
5.6 on "Comparison of two proportions: the 2x2 table", which is relevant
to Matthews' papers although not mentioned by him. For a 2x2 table (which
is binomial) with Beta priors, Peter Lee derives posterior distributions
which are conjugate Beta(A+a, B+b) and Beta(C+c, D+d), where A,B,C,D,
are the prior counts. Log-odds ratio of odds L1, L2 has a normal pdf :

  ln(OR) = ln(L1/L2) = ln(L1) - ln(L1) ~
~ N( ln(   A+a-1/2).(D+d-1/2)/[ (B+b-1/2).(C+c-1/2) ] ) ,
        1/(A+a) + 1/(D+d)   + 1/(B+b) + 1/(C+c)
   )
where the 1/2s are not Jeffreys', they come from {3} section 3.1. Unlike
Peter Lee I will neither simplify by dropping these 1/2s, nor shall I use
Haldane reference prior counts  A=B=C=D=0.0 . I prefer to keep the
1/2s and use Jeffreys' prior ie A=B=C=D=0.5  which gets me :

  ln(OR) = ln(L1/L2) = ln(L1) - ln(L1) ~
~ N( ln(a.d/(b.c)) , 1/(0.5+a) + 1/(0.5+d) + 1/(0.5+b) + 1/(0.5+c) ) where

1:  1/2s and 0.5s annuled in the mean;
2:  0.5s were added to the 4 components of the variance of ln(OR) . Many
    statisticians recommend adding 0.5 to prevent division by zero in case
of a=0 or b=0 or c=0 or d=0, which was not my motivation, but I am glad to
get it by other means. The difference between this and other statisticians'
formula is that they would have added 0.5 to each of the 4 entries in a 2x2
table, so that they would have ln( (a+0.5).(d+0.5)/[ (b+0.5).(c+0.5) ] ) .


-.- +Why one and the same formula works for more of our ratios :

So far for opeRational aspects, now comes a "proof" of my extension.
1. All our ratios have the same general formula for their CI bounds (L, U).
2. RR , LR+ , LR-  have the same form P(u|v)/P(u|~v) which equals to 1.0 ,
   like OR does, if (quasi)random events u, v are independent.
3. All the specific formulas are based on the same basic idea of computing
   the standard error S of ln(ratio).
Q: Why logarithm ?
A: In general: Our ratios are always >= 0 , hence their pdf's are heavily
   skewed towards zero. Taking a logarithm of a ratio < 1 transforms it to
values < 0, while log of a ratio > 1 has values > 0. So the log-transformed
pdf becomes rather symmetrical about 0 and approximately Gaussian which is
a unimodal pdf ie it has only one local maximum = its global maximum.
A unimodal symmetrical pdf has a huge advantage : their mode = their mean.
In fact, it is the mode ie the global maximum of a pdf which does or should
matter most most of the time, however it is easier to work with the mean
ie with the expected value. Whenever we work with the mean of a unimodal
symmetric pdf, we are also working with its mean. Additional advantages of
a Gaussian pdf aka normal distribution :
- It is relatively easy to work with, and much about it can be found in
  many easily accessible books, not only in specialized monographies,
- From all continuous distributions when the first two moments ie the mean
  and the variance are known for data generated or collected under fixed
  ie constant empirical or experimental conditions, a Gaussian pdf has
  maximum entropy.  MaxEnt means that a pdf is the most uncertain one, ie
  the "least informed", the "least suggestive", the "least assuming" pdf.


On the properties of the above mentioned ratios :

Let rel stand for only one of the relational operators = , < ,  > ; then :

P(u|v) rel P(u|~v)  ie P(u|v)/P(u|~v) rel 1  for independent u, v ; hence :
Puv/Pv rel (Pu - Puv)/(1 - Pv)      where Puv stands for P(u,v) ie P(u&v)
Puv - Puv.Pv  rel  Pu.Pv - Puv.Pv   ie:
Puv  rel Pu.Pv  confirms mutual independence of events u, v ; q.e.d.

The following equivalence == always holds :

   [ P(y|x) rel P(y|~x) ] == [ P(x|y) rel P(x|~y) ]

Hence for any pair of non-independent events x, y it always holds :

either:  [ P(y|x) > P(y|~x) ] and [ P(x|y) > P(x|~y) ] ,
    or:  [ P(y|x) < P(y|~x) ] and [ P(x|y) < P(x|~y) ] , but not both.
q.e.d.


-.- +References :

Hints:  Above find the word  help  for my help in deciphering Matthews'
derivations. If short on time, start to read his papers from their middle,
or read them backwards ie start with his Appendix. You can copy and paste
into your browser the http's here provided to get direct access to papers
by Robert Matthews .

{0} I.J. Good : Probability and the Weighing of Evidence, 1950 ;
    on the use of Bayes theorem in reverse see pp. 81, 35, 70 .

{1} Robert Matthews : Why should clinicians care about Bayesian methods ?
    Journal of Statistical Planning and Inference 94 (2001) 43-58, plus
    discussions on pp. 59-71 , also on www at :
http://ourworld.compuserve.com/homepages/rajm/jspib.htm
    Matthews' unexplained constants are explained here above ie at
http://www.humintel.com/hajek  in my epaper on CPI

{2} Robert Matthews : Methods for assessing the credibility of clinical
    trial outcomes ; Drug Information Journal , 35 (2001), 1469-1478, at
http://www.diahome.org/content/abstract/2001/dij1740.pdf
    Matthews' unexplained constants are explained here above ie at
http://www.humintel.com/hajek  in my epaper on CPI

{3} Peter Lee : Bayesian Statistics ; 3 printings and 3 editions differ in
    page nrs and errata (easily found on www ). Matthews {1} has used 2nd
    ed., 1997, chapter 2. As the relevant parts in the 2nd ed. I identified
    the sections 2.2 p.36-38 on normal prior and likelihood, and 5.6 p.152
    on log-odds ratio, supported by 3.1 p.80 on log-odds supported by two
    appendices A19 p.290 and A20 p.291

{4} Evidence-Based Medicine - How to Pactice and Teach EBM , 3rd ed., 2005;
    do NOT use the 2nd edition which contains too many errors in general,
    and too many errors in the formulae in the appendix on confidence
    intervals in particular.

{5} Joseph Fleiss et al: Statistical Methods for Rates and Proportions,
    3rd edition, 2003.

{6} David L. Simel, Gregory P. Samsa, David B. Matchar : Likelihood ratios
    with confidence : Sample size estimation for diagnostic test studies, 
    Journal of Clinical Epidemiology, 44/8 (1991) pp. 763-770

{7} Doug Altman et al: Statistics with Confidence , 2nd ed.

-.-