.- Concise causal insights enhanced CopyRight (C) 2005-2007, Jan Hajek , Netherlands, 2007-2-16, version 6.3 The word my shortly marks what I have not lifted from anywhere, so it is fresh and may be contested, but must not be lifted from me without a full reference to me plus this website : http://www.humintel.com/hajek contains the latest versions of my e-papers This concise e-paper compresses some of my causal insights. .- The concise history of the good old gc formula : Probabilistic formulae with the general structure g = [P(h) - P(b)]/[1 - P(b)] where P(h) = P(hit), and P(b) = a base chance of hitting (be it a success or a failure i.e. hitting the fan) discounted as just shown, have been in use for various purposes (with P's taylored to specific tasks) for at least 50 years. Examples of their use are: In mathematical psychology: - The corrected probability in signal detection theory used for evaluation of radar operators { Restle 1961 p.149, eg.7.6 }, { Swets 1964, pp.136, 165 }, { Green & Swets 1966, p.129 }, { Egan 1975, p.142, eq. 6.6 } where P(b) = P(false alarm). Their formulae have exactly the same form as Patricia W. Cheng's { 1997 } "generative causal power" gc has. - The forced-choice score in multiple-choice tests corrected for guessing, where P(b) = 1/(number of choices per question). - Jacob Cohen's coefficient of agreement kappa a.k.a. concordance or interrater agreement adjusted for chance { Fleiss 2003, pp.603,609,620 }, { Feinstein 2002, 20.4.3 }; In epidemiology: relative difference { Sheps 1959 }, { Fleiss 3rd ed., 2003, pp.123-5,133,152,156,162-3 }, { Rothman & Greenland 1998, p.56 }, { Khoury 1989 }, { Feinstein 2002, p.174 }. Fleiss derives gc thus: P(e|c) = P(e|~c) + gc.P(~e|~c) = P(e|~c) + gc.[ 1 - P(e|~c) ] (0a) gc = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] (0b) The (0a) reads: the factor gc is the proportion of cases uneffected when c is absent, which would raise the probability of occurrence of e, if c would be present, above the base chance P(e|~c). The words "would" express so called counterfactual reasoning which, among other things, distinguishes humans from animals, as far as we know. My derivation of gc : if all c would cause e then P(e,c)=Pc ie P(e|c)=1. Then gc is the ratio of (the actual ARR )/(the fictive maximum ARR ) ie: gc = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] = ARR / P(~e|~c) = [ P(e) - P(e|~c) ]/[(1 - P(e|~c)).Pc ] is my new form (0c) B.t.w. { Fleiss, 3rd ed., 2003, p.151 eqs.7.23 - 7.25, and p.156, sect. 7.5, 7.6 } says that ge = [ P(c|e) - P(c|~e) ]/[ 1 - P(c|~e) ] (0d) is a good estimate of the population attributable risk. Recall from above that P(e|c) > P(e|~c) is equivalent to P(c|e) > P(c|~e), and similarly for < , = , despite qc =/= ge . The general stucture of all formulae on the pages referenced so far is like the structure of Cheng's generative power gc, and some of them, including those in the classical books on mathematical psychology, have exactly the same form as gc. However none of those references has claimed to measure causation. Already in the early 1960ies I.J. Good has published what he has called a "causal calculus" for "causal nets" in which his "causal strength of a chain" is a product of what he calls "quasiprobability" (identical with Cheng's generative power gc) : (p - q)/(1 - q) = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] = gc as reprinted with corrections in { Good 1983, pp.201,208,212, his F is our c }. While Good's math is hard to follow, our derivation is easy to follow. In my other e-papers I have shown that Cheng's causal power is just a normalized slope of a probabilistic regression line for two events c, e. The reader is free to decide if this fresh fact justifies any (strong) claims w.r.t. causation. .- Some of my causal insights compressed: The assumptions: A1: No effect without a cause (= my Causal Principle). A2: An event c is a possible cause of an event e if c's occurrence increases the probability of occurrence of an effect e, however, in general, e does not always occur when c occurs. A3: An effect e may be caused by other causes called x (regardless of c). A4: When e occurs, c and x may be both jointly present. The goal Q: How much is a candidate cause event c alone causing the effect event e while other causes x also exist ? The task is to separate c's contribution (to the raised probability of e) from that of any other causes ( x's contribution ). Hint: Due to our assumption A1 it must be that ~c stands for the joint of all possible causes other than c. It is always useful to keep in mind the same Venn diagram in 2 forms : ___ Universe = P(All) = 1 ____ ___ 1verse = P(All) = 1 ______ | | | | | | __P(e)___________ | | P(e,c) | P(~e,c) P(c) | | ________|__P(c)__ | | == Pec | = Pc - Pec | | | | | | | |-----------|------------------| | | P(e-c) | P(e,c) | P(c-e) | | | | | | |________|________| | | | P(e,~c) | P(~e,~c) P(~c) = 1-Pc | |_________________| | | = Pe -Pec | = P(~(e or c) | | | | | | | P(~e,~c) = P(~(e U c)) | | | | |______________________________| |___ P(e) __|__ P(~e) = 1-Pe __| Let's start with my fresh explanation of the meaning of the key formula gc = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] = ARR / Harr where the numerator is called the absolute risk reduction ARR ; it discounts from P(e|c) the conditional probability of those effects caused by the causes other than c. Now my fresh insight how the denominator logically obtains: The / compares ARR with the hypothesised absolute risk reduction Harr . The 1 in Harr obtains from the plausible assumption that : IF c would always cause e THEN P(e,c) = P(c) would hold, hence also P(e|c) = Pc/Pc = 1. By subtracting P(e|~c) from 1 we get our hypothesised absolute risk called by me Harr . Q: Is it correct to subtract P(e|~c) from 1 ? A: Yes, because both P(e,~c) and P(~c) remain unchanged by our hypothesizing. A more mathy derivation of gc comes out from the constraints and freedoms provided by the following equations and inequalities : P(e) = P(x).P(e|x) + P(~x).P(e|~x) - 0 holds for any x; = P(e,x) + P(e,~x) - 0 the -0 means disjoint = P(e,c) + P(e,~c) - 0 = P(c).P(e|c) + P(~c).P(e|~c) - 0 For independently occuring c, x, as causes of an effect e when P(c), P(e|~c) are known but P(x), P(e|~x) are unknown we are free (because x is so free) to write P(e) = P(e|~x) + P(e|~c) - P(e|~x)*P(e|~c) = 1 - [ 1 - P(e|~c) ]*[ 1 - P(e|~x) ] has form of a noisy OR-gate = P(e|~c) + [ 1 - P(e|~c) ].P(e|~x) hence for known P(e) we get P(e|~x) = [ P(e) - P(e|~c) ]/[ 1 - P(e|~c) ] is fixed now. We are even more free to write for some unknown gc, gx, P(x) this: P(e) = P(c).gc + P(x).gx - P(c).gc*P(x).gx = 1 - [ 1 - P(x).gx ]*[ 1 - P(c).gc ] has form of a noisy OR-gate = P(c).gc + [ 1 - P(c).gc ]*P(x).gx where only P(c) and P(e) are known, hence free are gc, gx, P(x). Due to these freedoms we are free to equate product-by-product thus : P(c).gc = P(e|~x) = [ P(e) - P(e|~c) ]/[ 1 - P(e|~c) ] derived above; P(x).gx = P(e|~c) = known from data, but individual P(x), gx, stay unknown. gc = P(e|~x)/P(c) = [ P(e) - P(e|~c) ]/[ (1 - P(e|~c)).P(c) ] , now check the next = = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] <= 1 due to P(e|c) <= 1 so now we can also get (in a different form than we already have) : P(x).gx = [ P(e) - P(c).gc ]/[ 1 - P(c).gc ] , now check the next = = P(e|~c) Summarized: Pe = P(e|~x) + P(e|~c) - P(e|~x)*P(e|~c) equals term by term with: = P(c).gc + P(x).gx - P(c).gc*P(x).gx ; = P(c).P(e|c) + P(~c).P(e|~c) does not equal product-by-product with: = P(~x).P(e|~x) + P(x).P(e|x) with unknown P(x), P(e|x). We got all values except P(e|x), P(e,x), P(x) and gx, so we cannot split the known value of the product P(x).gx into P(x) and gx. One might be tempted to think that due to the - product * it will hold P(e|c) <= gc and by symmetry P(e|x) <= gx, but it holds not ; my proof: gc.[ 1 - P(e|~c) ] = P(e|c) - P(e|~c) from above, hence gc = P(e|c) - (1 - gc).P(e|~c) ; since gc <= 1 it holds : gc <= P(e|c) <= 1 q.e.d. gc <= P(e|c) says that e is NOT always caused by c when c is present, in which case some other cause may cause c. Hence this !! <= shows how meaningful gc is i.e. that gc is (strongly related with) a probability of e being caused by c and not by some other cause. But <= is not exclusive for gc, as <= holds for any generic formula like g = [ (1-Pb) - (1-Pa) ]/(1-Pb) is what I call the error form = 1 - (1-Pa)/(1-Pb) = (Pa - Pb)/(1-Pb) is what I call the canonical form; g <= 1 ; proof: 0 <= P(.) <= 1 ; since g <= 1 it holds: g <= Pa <= 1; proof: g - g.Pb = Pa - Pb ; g = Pa - Pb.(1-g) <= Pa <= 1 P(e|c) >= gc == P(e produced by c) in { Cheng 1997, p.372, right, low } and P(e|c) >= P(e:c) == P(e caused by c) in { Peng & Reggia 1986, p.142 left mid } and { Neapolitan 1990 p.309 mid: <= } and { Szolovits 1995 p.115 mid }, while { Cheng 1997 p.373 left up } writes: "P(e|x) coincides with px [ == gx ] when no other cause is present or exists. They are not however, equal in general. This is because other causes, known or unknown to the reasoner, may be present when x is present. ...". So the assumptions in Cheng and in the sources just quoted and also P(e:c) =/= P(e|c), agree when Cheng and others speak about P(e caused by c). Moreover I just gave a simple proof of gc <= P(e|c) which is in sync with P(e:c) <= P(e|c) by others. What Cheng calls "causal power" has been called "causal strength" before: P(e:c) is called "causal strength" by { Peng & Reggia 1986 p.142 } adopted by { Neapolitan 1990 p.310 low, 311 mid } ; gc was called "causal strength of chain" by I.J. Good who already in the early 1960ies has published what he has called a "causal calculus" for "causal nets" in which his "causal strength of a chain" is a product of what he calls "quasiprobability" (identical with Cheng's generative causal power gc) : (p - q)/(1 - q) = [ P(e|c) - P(e|~c) ]/[ 1 - P(e|~c) ] == our gc as reprinted with corrections in { Good 1983 (orig.1961), pp.201,208,212, his F is our c }. Last but not least, gc is a factor by which P(c) is multiplied, not raised to any power. .- References : the titles with key words starting with Capital Letters are titles of books, proceedings or periodicals Cheng Patricia W: From covariation to causation: a causal power theory; Psychological Review, 104/2, 1997, 367-405; on p.373 right mid: P(a|i) =/= P(a|i) should be P(a|i) =/= P(a|~i) Cheng Patricia W, Novick Laura R: Constraints and nonconstraints in causal learning: reply to White (2005) and to Luhmann and Ahn (2005); Psychological Review, 112/3, 2005, 675-707 Egan James P: Signal Detection Theory and ROC Analysis, 1975, Academic Press Feinstein Alvan R: Principles of Medical Statistics, 2002, by the late professor of medicine at Yale, who has studied both math & medicine Fleiss Joseph L, Levin Bruce, Myunghee Cho Paik: Statistical Methods for Rates and Proportions, 3rd ed., 2003; find "relative difference" and Sheps in their Index, also in earlier editions Glymour Clark, Cheng Patricia W: Causal mechanism and probability: a normative approach, pp.295-313 in Oaksford Mike & Chater Nick (eds): Rational Models of Cognition, 1998. Good I.J.: Good Thinking - The Foundations of Probability and Its Applications, 1983, University of Minnesota Press. It reprints (and lists) a fraction of his 1500 papers and notes written until 1983 ; on p.160 up: sinh(.) should be tanh(.) where Kemeny & Oppenheim's degree of factual support F(:) is discussed Green David M, Swets John A: Signal Detection Theory and Psychophysics, 1966, Wiley Hajek Jan: Causal insights inside, plus new causal hindrance factor versus P.W. Cheng's preventive causal power ; http://www.humintel.com/hajek Khoury Muin J, Flanders W. Dana, Greenland Sander, Adams Myron J.: On the measurement of susceptibility in epidemiologic studies; American Journal of Epidemiology, 129/1, 1989, 183-190 Neapolitan Richard E: Probabilistic Reasoning in Expert Systems, 1990 Novick Laura R, Cheng Patricia W: Assesing interactive causal influence; Psychological Review, 111/2, 2004, 455-485 Peng Yun, Reggia James A: Plausibility of diagnostic hypothesis; Proceedings of the 5th National Conference on Artificial Intelligence, 1986, 140-145 Peng Yun, Reggia James A: Probabilistic causal model for diagnostic problem solving; IEEE Transactions on Systems, Man, and Cybernetics, 17; Part 1, March 1987, 146-162 ; Part 2, May 1987, 395-406 Restle Frank: Psychology of Judgment and Choice, 1961, Wiley Rothman Kenneth J., Greenland Sander: Modern Epidemiology, 2nd ed., 1998, Lippincott-Raven Sheps Mindel C: An examination of some methods of comparing several rates or proportions; Biometrics, 15, 1959, 87-97 Swets John, A (ed.): Signal Detection and Recognition by Human Observers, 1964, Wiley Szolovits Peter: Uncertainty and decisions in medical informatics; Methods of Information in Medicine, 34 (1995), 111-121 -.-