- William M. K. Trochim, “Quasi-Experimental Design”, “Relationships Among Pre-Post Designs”, “Qualitative Measures”, and “Unobtrusive Measures”, Research Methods Knowledge Base (October 20, 2006 Edition) <http://www.socialresearchmethods.net>
- A possible confound in regression-discontinuity designs is a Hawthorne effect. If performance on a post-test improves for the bottom 50% of scorers on a pre-test, this is a confounding explanation if the upper 50% of pre-test scorers are given no treatment. A solution is to introduce a placebo-like (hypothesized to be ineffectual) treatment for the high scorers, so that they have an experience of being studied that is similar to the low scorers.
- Trochim’s recoding of the checkmarks into 1s and 0s for the appearance of different themes in an example in the section on “The Qualitative Debate” may not lead to meaningful totals in the addition of 1s and 0s, e.g. if the texts or qualitative data being coded are not a random or representative sample of anything, but are instead chosen to illustrate variety, as is often the case in qualitative analysis.
- OPTIONAL: Norm Scheiber, “Freaks and Geeks: How Freakonomics Is Ruining the Dismal Science”, The New Republic, April 2, 2007 (with a follow-up exchange between Steven Levitt and Norm Scheiber)
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-8-2013.ppt.pdf. A few points:
- The “example scenarios” can be used to illustrate the appropriateness of different methods of observation, e.g. a student who wants to know whether studying with music improves test performance might do a regression-discontinuity study in which the bottom half of scorers on a test are told to listen to music while studying for the next test. If the regression line shifts up or down for the bottom testers relative to the upper range ones, this is evidence the manipulation proved effective or harmful, depending on the result. Mike Krieger, a SSP alum who co-founded Instagram, recommends to startup entrepreneurs that they undertake informal ethnographic observations to get an idea of how consumers shop.
- William M. K. Trochim, “Design” (through “Hybrid Experimental Designs”), Research Methods Knowledge Base (October 20, 2006 Edition) <http://www.socialresearchmethods.net>
- Translation for psychology research: “singĺe group” = “within group or within subjects”; “multiple groups” = “between groups or between subjects”
- What is an example of an ethical issue that would make random assignment to groups impossible?
- Translation: “treatments or programs” are the same as “manipulations” and “interventions”
- A “factor” is an independent discrete variable whose values are “levels” in Trochim’s terminology.
- In the section on Randomized Block Designs, Trochim says blocking “is an analysis strategy”. For the most part this is true, however, the decision to analyze by blocks has implications for design, namely in the sizing of samples for each block, so it is good to think about it as part of the design.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-6-2013.ppt.pdf. A few notes:
- The presentation of material in the slides/lecture is specific to experimental psychology. Terminology and the issues people worry about vary somewhat across disciplines, but the underlying principles are mostly the same. Social/behavioral science (especially with human beings) brings in a host of issues that do not really arise in the physical sciences, because people and other animals can learn, make inferences about the experimenter, etc.
- For the Hawthorne effect, in addition to unobtrusive measures, another approach to mitigating the effect is to use a control manipulation, with either the same or a different group, in which a placebo (hypothesized to be ineffective) treatment or program is tested. This allows a comparison to the target treatment, to see if any improvements in outcome that appear with the target case occur also with the placebo. Placebo effects occur often.
- See the Wikipedia article for more on demand characteristics.
- Gerianne de Klerk, “Classical Test Theory (CTT)”, In M. Born, C.D. Foxcroft & R. Butter (Eds.), Online Readings in Testing and Assessment, International Test Commission, 2008
- deKlerk says that squaring a validity coefficient and multiplying it by 100 yields “the explained variance in a relationship. To explain this more fully,” he says, “the following example may assist: when the predictive validity coefficient of an ability test in relation to the criterion of job retention is 0.70 (i.e.the number of years the person will stay in the job) it means that 49% ((0.70)²*100 = 49%) of the differences in job retention (the criterion) can be explained (or predicted) by differences in the test achievements.” This latter statement is suspect. The coefficient explains 49% of the variance, but this is not the same as 49% of the differences in test achievements. See the Wikipedia article on “variance”.
- Jon A. Krosnick, “Survey Research”, Annual Review of Psychology 50:537-567, 1999
- This is an excellent survey (no pun intended) of the research on surveying, but what Krosnick calls “optimizing” versus “satisficing” behavior is imposing some spin on the analysis. It seems to assume that when people are doing less than the optimal in responding, they are doing something that is good enough for some purpose (a.k.a. “satisficing”, a term popularized by Herbert Simon). A more neutral perspective would be to map “satisficing” in his analysis onto the division of mental processes into Kahneman’s System 1/”thinking fast” and to map “optimizing”onto System 2/”thinking slow”.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-1-2013.ppt.pdf. A few points:
- In addition to Classical Test Theory, which we discussed in class and which was introduced in the deKlerk reading, you might also want to look at Item Response Theory.
- Test theory was developed for intelligence testing, but it can be applied to lots of different types of tests, e.g. those measuring attitudes, personality traits, beliefs, etc.
- Amitav Banerjee and Suprakash Chaudhury, “Statistics Without Tears: Populations and Samples”, Industrial Psychiatry Journal 19(1): 60–65, 2010
- Techniques such as systematic sampling, stratified sampling, and cluster sampling that are classified in this paper as random sampling are best viewed as nonrandom with respect to the total population from which they are a sample, but do utilize random sampling within defined subpopulations. To achieve some semblance of representativeness in a sample drawn in this way requires population-adjusting statistics so that sampled values contribute proportionally to the size of the subpopulation from which they are drawn within the total population.
- Russell V. Lenth, “Some Practical Guidelines for Effective Sample Size Determination”, The American Statistician 55(3), August 2001
- Note: the word “reects” in this paper is a typo. It should read as “reflects”.
- This paper discusses the theory of power analysis in a limited way. We will look at this topic in more detail in the session on Data Analysis (May 13th).
- The paper makes clear that actual sample size determination in the real world is often “based on nonstatistical criteria” (p. 188). It provides a good explanation practices both normatively and descriptively.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-4-29-2013.ppt.pdf. Note:
- There is a difference between an unbiased estimator and an unbiased sample. An unbiased estimator is a formula for computing a sample statistic so that its expected value will equal the value of the population parameter it is designed to estimate. Random sampling from a population is unbiased, and therefore leads to unbiased statistics on the assumption that the statistics (e.g. the sample mean and sample variance) are computed using unbiased estimators.
- Clyde H. Coombs, Robyn M. Dawes, and Amos Tversky, “Psychological Measurement Theory” (Chapter 2), Mathematical Psychology: An Elementary Introduction, Prentice-Hall, Inc., 1970, pp. 7-30
- The authors write (p. 17) that “no measurement theory for intelligence is available.” What reasons do they have for saying this?
- The chapter ends with the following statement: “Only when the assumptions of the theory are satisfied by the data can measurement be obtained.” In practice, empirical researchers often make measurements even when their data do not fit a measurement theory. Can you think of an example? What use are such measurements?
- Jean-Claude Falmagne, “Measurement Theory and the Research Psychologist”, Psychological Science 3(2):88-93, 1992
- One question that came up in class was why statements that are scale-dependent (e.g. about ratios on a Fahrenheit or Celsius scale) are not meaningful. Falmagne gives a god analysis of this in his section on “The Meaningfulness of Scientific Statements and Models” (p. 92). He concludes: “opening the door to nonmeaningful models would unavoidably transform the citadel science into a clamorous Tower of Babel. Scientists from different circles (or countries) would end up spending precious time in fruitless controversies regarding essentially identical, but very different looking, models for the same data.”
Slides: http://www.stanford.edu/class/symsys130/SymSys130-4-24-2013.ppt.pdf. Some key points:
- Psychologists were challenged in the mid-20th Century to show that their field could admit what Norman R. Campbell called “extensive measurement”, i.e. measurements that can be represented on an interval scale. Measurement theory shows that there are psychological quantities, such as loudness, that can be measured in this way.
- For more on proper scoring rules, see the Wikipedia articles on “Scoring Rules” and the “Brier Score”. I mangled the explanation a bit in class. The Brier score only applies to a set of mutually exclusive events over which a probability distribution is laid by someone from whom the probabilities are being elicited (commonly called a “judge” in the parlance of psychology).
- Daniel Kahneman and Amos Tversky, “On the Psychology of Prediction”, Psychological Review 80(4):237-251, 1973
- ‘Prediction’ as a task of judgment or scientific modeling has a slightly different meaning from the common one. It does not always have to be about future events.
- The nominal prediction examples in this paper are illustrations of what became known as “base rate neglect” – ignoring or dramatically underweighting prior probabilities in subjective probability updating under Bayes’s Theorem.
- Both base rate neglect and nonregressive prediction are manifestations of the general tendency of intuitive predictions to underweight background knowledge. Intuitive judges also pay insufficient attention to the reliability of evidence for predicting an outcome.
- The paper is important for scientific prediction both because it tells us something about the biases of scientists, and because it showcases the difference between theoretical models (probalistic, regression) and intuitive judgment. In the problems discussed in this paper, the former are normative/more accurate, and the latter are descriptive.
- Daniel Kahneman and Gary Klein, “Conditions for Intuitive Expertise: A Failure to Disagree”, American Psychologist 64(6):515-526, 2009
- This paper shows the difference between the heuristics and biases (HB) approach to judgment, and the naturalistic decision making (NDM) approach. It also shows how they can be joined in a synthesis, which is summarized in the last section.
- The authors discuss numerous studies that have compared intuitive prediction by experts with algorithmic procedures (p. 523). Studies favoring algorithms vastly outnumber those favoring human experts. But there is a sweet spot for human expertise when the environment has highly valid cues that can be learned by a decision maker with enough practice.
- The premortem method (p. 524) is an example of a debiasing procedure for human judgment.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-4-22-2013.ppt.pdf. An additional note:
- Regression toward the mean is often stated as a mere consequence of imperfect correlation. But it also depends on the assumption of a linear estimate, i.e. a linear model as defined above. For nonlinear models (e.g. cubic models), an unbiased estimate of one variable based on another may not imply regression to the mean. Regression to the mean is therefore an empirical phenomenon which is implied by the linear model. It may not occur in reality if a nonlinear model better fits observations than a linear one. See http://www.stanford.edu/class/symbsys170/2010/Induction.pdf.
- David Hume, “Of the Idea of Necessary Connexion” – Part I, An Enquiry Concerning Human Understanding, Section VII, Harvard Classics Edition (1910), originally published in 1748
- This is a passage in which Hume applies his ideas about the lack of “necessary connection” between events we perceive as being causally related to the relationship between will or volition and the movement of the body. First he tries to see “whether this idea [of the quality that binds effect to cause] be derived from reflection on the operations of our own minds, and be copied from any internal impression.” Going through this exercise, Hume finds us to have “profound ignorance in both cases.”
- “Philosophy of Science”, Wikipedia, version 01:14, 28 March 2013
- Karl R. Popper, “Science as Falsification”, Conjectures and Refutations, Routledge and Keagan Paul, 1963, pp. 33-39
- Popper’s perspective has been very influential both among philosophers of science and among scientists, especially in the social sciences. His hard-nosed exclusion of theories that had used the word “scientific” as self-descriptors (e.g. Marxism) usefully pointed out the difference between theories that pass tests of falsifiability and those that do not. But beyond taking the label of “science” away from these social and behavioral theories, it is not clear what should replace them. People who study human behavior have a much harder task than physicists when it comes to producing rigorous scientific theories that are both true and falsifiable.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-4-17-2013.pdf. Some additional:
- In the slide correcting my previous use of the term “genetic fallacy”, the definition I gave of “naturalistic fallacy/bias” is one that is widely used (e.g. by Steven Pinker), but on further reading in Wikipedia I find that the original meaning of “naturalistic fallacy”, from the philosopher G.E. Moore, was explaining “that which is good reductively, in terms of natural properties such as “pleasant” or “desirable”.” The is-ought assumption I have always thought of as the naturalistic fallacy is termed there “appeal to nature”. But you will be in good company if you use “naturalistic” fallacy to refer to the assumption that the way things are in nature is the way they ought to be.
- A point we didn’t discuss in class but which I meant to include: Applying the hypothetico-deductive (H-D) method of making predictions from a theory can lead to what is known as confirmation bias (also known as “congruence bias”) in the search for evidence: preferentially looking to find evidence that would confirm a theory rather than looking for evidence that would disconfirm it. Numerous psychological studies have shown that people have this tendency, and scientists must work to avoid it.
- Was Popper a behaviorist? I looked this up and found the following:
- “Popper rejects both behaviorism and psychologism, and maintains that the content of thought, the meanings of words, the semantics of language, are not determined either by the natural laws of the physical world or by the natural laws of psychology. The world of objective knowledge, which is governed by the laws of logic, is a third world that is autonomous from the world of objective physical nature and also from the world of subjective psychology. In The Self and Its Brain he argues against behaviorism and physicalist reductionism by the display of ambiguous drawings that he emphasizes may be interpreted in different ways by voluntary action, in order to demonstrate the existence of world 2, the world of the mind and of subjective mental experiences. He argues against the psychologistic view by stating that the objects of world 3 are intersubjectively testable. Hence there are the three separate worlds which cannot be reduced to one another: world 1 is the world of objective physical nature, world 2 is the subjective world of psychological experience, and world 3 is the objective world of human artifacts or creations including knowledge. Popper emphasizes that while the three worlds interact through world 2, nevertheless the world of objective knowledge is autonomous of the world of subjective psychological experience including perceptual experiences. Advocates of psychologism and the naturalistic theory of the semantics of language fail to recognize the autonomy of world 3 from the other two worlds.”
- And this: “Sir Karl Popper has claimed that behaviorism is misguided because it holds that conditioning occurs through repetition. According to Popper, there is no such thing as learning through repetition. To the limited extent that philosophers of science have concerned themselves with behaviorism, this attack is one of the most direct and unique in that the battleground is not over the value of mentalism/cognitivism but a bold claim that conditioning—the heart and soul of behaviorism—is fictitious.”