-.- (In)commensurable pairs of generative & preventive relative differences as measures of causation CopyRight (C) 2007 , Jan Hajek , NL, version 2.2 of 2007-3-8 Abstract : Part 1: Absolute risk reduction ARR is the slope of a probabilistic regression line which helps to interpret relative differences. Part 2: Decomposition of relative risk aka risk ratio shows that it does NOT measure how much is an exposure x sufficient for y effect. Part 3: Since 1997 a prof. of psych. Patricia W. Cheng (UCLA) and her co-authors, notably prof. Clark Glymour (CMU), have published book sections and huge papers of usually 30-40 pages, on what has been known since 1959 as Sheps' relative difference measure of causation (by Mindel Sheps, Harvard, 1959), here denoted as the generative factor GF. -.- Contents: jump to a section by finding the leading part of a +Term : +Absolute risk reduction ARR is the slope of a probabilistic regression line +Relative risk RR measures how much is (y SufFor x) or (x NecFor y) ; RR does NOT measure how much is (x SufFor y) or (y NecFor x) . +(In)commensurability of Cheng/Clark's pairs of relative differences -.- +Absolute risk reduction ARR = slope of a probabilistic regression line : ARR(~y:~x) = P(~y|~x) - P(~y|x) = [ 1 - P(y|~x) ] - [ 1 - P(y|x) ] = ARR( y: x) = P(y|x) - P(y|~x) = the standard ARR = [ Pxy - Px.Py ]/[ Px.(1-Px) ] = cov(x,y)/var(x) = beta(y:x) = slope of a probabilistic regression line Py = beta(y:x).Px + alpha(y:x) from which follows my fresh interpretation of GF and my HF further below. NNT = 1/|ARR| = number needed to treat (or harm) = probability of a benefit (or harm) when treated (or exposed). -.- +Relative risk RR measures how much is (y SufFor x) or (x NecFor y) ; RR does NOT measure how much is (x SufFor y) or (y NecFor x) : It is well known that Boolean logic is isomorphic with the set theory. Sufficiency and necessity can be expressed in terms of the set theory. Let SufFor abbreviate "is sufficient for" ie "is a subset of", and let NecFor abbreviate "is necessary for" ie "is a superset of". From the set theory we know that : a membership of a subset is sufficient for a membership of its superset, and a membership of a superset is necessary for a membership of its subset. From Boolean logic we know that for events x , y holds the equivalence : (x implies y) == (~y implies ~x) where the ~ is a negation, and where "implies" is isomorph with "entails" ie "is a subset of" ie "SufFor". Then (x SufFor y) == (y NecFor x) == (~x NecFor ~y) == (~y SufFor ~x) (y SufFor x) == (x NecFor y) == (~y NecFor ~x) == (~x SufFor ~y) So far for the perfect, deterministic relationships between x and y. Now for probabilistic ie (quasi)random events x, y : Let x = exposure , y = effect ; then based on the set theory we can interpret the simplest conditional probabilities as the naive ie simplistic measures : P(y|x) = how much is (x SufFor y) == (y NecFor x) == ... as above, hence P(x|y) = how much is (y SufFor x) == (x NecFor y) == ... as above. P(x|y).Py = Pxy = Pyx = Px.P(y|x) is the basic Bayes equation, so P(x|y)/P(y|x) = Px/Py , and P(y|x)/P(x|y) = Py/Px Relative risk aka risk ratio RR : RR(y:x) = P(y|x)/P(y|~x) = shortly RR which I have decomposed to : = Pyx.[1/(Py - Pyx)].(1 - Px)/Px which shows that due to /P(y,~x) = Pyx.[y implies x ].(Surprise by x) since big Px is not surprising; RR measures how much (y SufFor x) or (x NecFor y). This may be a revelation for those who thought that RR measures (x SuFor y ) just because P(y|x) is a measure of (x SufFor y). My deconstruction shows that it is the /P(y|~x) which determines what RR measures. Note that 0.2/0.1 = 0.000002/0.000001 = 2 ie that like any simple ratio, RR loses information on the magnitude of risk. Therefore ARR, NNT, and GF & HF are much more informative. -.- +(In)commensurability of Cheng/Clark's pairs of relative differences : RR(a:b) = P(a|b)/P(a|~b) denotes relative risk aka risk ratio. Hence RR(a:b) = 1/RR(a:~b) is a trivial fact for a ratio of likelihoods. Any Sheps-like relative difference RD(:) can be written as RD(u:v) = [ P(u:v) - P(u|~v) ]/[ 1 - P(u|~v) ] = [ 1-P(~u|v) - (1-P(~u|~v)) ]/P(~u|~v) = 1 - P(~u|v)/P(~u|~v) = 1 - RR(~u:v) = 1 - 1/RR(~u:~v) Because for any s <= t holds 1 - s/t = 1 - 1/(t/s) , the measures called "attributable risk of the exposed" ( ARX ), and "prevented fraction" ( PF ) are perfectly commensurable ( due to the previous = ) when we use : if RR >= 1 : ARX = 1 - 1/RR(y:x) = 1 - RR(y:~x) = RD(~y:~x) , and if RR <= 1 : PF = 1 - RR(y:x) = 1 - 1/RR(y:~x) = RD(~y:x) . Similarly, the other pair of commensurable measures is : If ARR >= 0 or RR >= 1 then GF = 1 - RR(~y:x) = 1 - 1/RR(~y:~x) = RD(y:x) = [ P(y|x) - P(y|~x) ]/[ 1 - P(y|~x) ] = ARR/P(~y|~x) ] is the "generative factor" else HF = 1 - 1/RR(~y:x) = 1 - RR(~y:~x) = RD(y:~x) = [ P(y|~x) - P(y|x) ]/[ 1 - P(y|x) ] = -ARR/P(~y|x) is my "hindrance factor" . In my epapers at www.humintel.com/hajek I have defined as commensurable pairs those measures RD(:) which for RR(:) near 1 both operate on te same scale and have continuity of values ie are not too different. Obviously the equality 1 - s/t = 1 - 1/(t/s) guarantees perfect commensurability. On the other hand, GF and PF are incommensurable, eg: Let P(y|x) = 39/1000 < P(y|~x) = 40/1000 , so : GF = [ 0.039 - 0.040 ]/[ 1 - 0.040 ] = -0.001 approx. , and HF = [ 0.040 - 0.039 ]/[ 1 - 0.039 ] = 0.001 approx. , while PF = 1 - 39/40 = 1/40 = 0.025 which is vastly different from GF. Obviously, a small change of small numbers, such that the sign of the numerator ARR changes, should not lead to vastly different numbers for generative vs. preventive causations. Moreover I have discovered that it holds : RD(v:u)/RD(u:v) = P(v|u)/P(u|v) = Pv/Pu in general for any events or their negations, so eg: GF(y:x)/GF(x:y) = P(y|x)/P(x|y) = Py/Px which is clear & MEANINGFUL, as it fits with the basic Bayes. Yet another MEANING is given by my interpretation of GF : GF was designed as a "relative difference", ie as : = ARR/( fictive MAXimum ARR achievable if Pbase is given ) = slope/( as-if MAXimal slope thinkable if P(y|~x) is fixed ) = beta/( as-if MAXimal beta possible if Pbase is known ) beta is for the probabilistic regression line for events Conclusion: two commensurable measures of causation tendency (one for generative, the other for preventive situations) both expressible as 1 - R(:), are either GF & HF , or ARX & PF, but never the pairs of incommensurable measures GF & PF , or ARX & HF. -.- References : Extensive references are in my longer epapers at www.humintel.com/hajek from where this epaper distils my disproof of Cheng's and Glymour's pairing of GF with PF, which they adopted as their preventive factor. www.humintel.com/hajek is ordered so that the newer the epaper, the nearer to the top. In my earliest epapers on causation I have denoted Sheps' relative difference as RDS(x:y) while now I prefer to denote it as RD(y:x) in general, and as the generative factor GF(y:x) in particular. Also my current ARR(y:x) was denoted as ARR(x:y). These swapped notations were no errors, just a different notation. -.-