- Many philosophers believe that data from naive survey-takers cannot really inform, and should not be used as a standard for, philosophical theories such as Gettier’s theory of the justification of knowledge. See a recent paper on tests of the Gettier intuitions.
- Scientific theories generally need to be tethered to ground truths such as data or logical principles that have wide acceptance. What can tether a philosophical theory?
- Negative results in psychology often arise from initial attempts to test a hypothesis, which attempts are not performed correctly, at least from the point of view of the experimenter. But the perspective of Simmons et al. from the last session’s reading is that these prior experiments constitute flexibility in data gathering and analysis that should be disclosed by the researchers so that reviewers can asses the likelihood that later found effects are not spurious.
- For a critique of design by usability testing, see the work of Jared Spool.
Readings (with discussion led by Tom Wasow):
- Thomas Wasow and Jennifer Arnold, “Intuitions in Linguistic Argumentation”, Lingua 115(11):1481-1496, 2005
- Tom and Jennifer end their paper by contrasting linguistics to physics and lamenting the fact that experiments relevant to generative grammar have little effect on generative grammar theorists. When I asked Tom whether he thought a more explicit division of labor would be appropriate, he demurred, but did say that twenty years ago, when psychologists and linguists got together to discuss language, psychologists were mostly just interested in the data and linguists were mostly just interested in the theory. That is beginning to change, thanks partly to more curricula that emphasize multiple methods.
- Jon Sprouse and Diogo Almeida, “The empirical status of data in syntax: A reply to Gibson and Fedorenko”, Language and Cognitive Processes 28(3):222-228
- From the discussion: the competence–performance distinction is useful, but quantitative judgment data are relevant to both, not just to performance, in Tom’s view.
- Edward Gibson , Steven T. Piantadosi and Evelina Fedorenko,
“Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida”, Language and Cognitive Processes, 28(3):229-240, 2013
- William M. K. Trochim, “Write-Up”, Research Methods Knowledge Base (October 20, 2006 Edition) <http://www.socialresearchmethods.net
- The full section on “Write-Up” (including “Key Elements”, “Formatting”, and the “Sample Paper”) provides a good overview of how to write a paper in American Psychological Association (APA) format.
- Oded Goldreich, “How to Write a Paper”, January 18, 2013
- This article is geared toward those writing CS theory papers, so it provides an interesting contrast with Trochim’s advice.
- Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything As Significant”, Psychological Science 22(11) 1359 –1366, 2011
- The “Requirements for Authors” and “Guidelines for Reviewers” in this paper have already had considerable influence on the field of psychology and seem like the wave of the future in many respects.
- Gary Marcus, “Six Ways to Clean Up Science”, The New Yorker, December 24, 2012
- Marcus mentions “meta-analysis” as one powerful way that research can be advanced.
- OPTIONAL: Richard Feynman, “Cargo Cult Science”, Commencement Address, Caltech, 1974
- Reasons why researchers might present or write in a way that too few in their audience can understand: (1) not being able to imagine what is difficult for others to understand about a subject one is very familiar with; (2) misjudging the audience as being more familiar with the subject than they are; (3) applying the bad model of others’ practices in the field and assuming that is just how research should be presented; (4) wanting to look like (a) I am smart, (b) I have done a lot of work, and (c) the work was difficult.
- Much of the current debate around how research should be presented in psychology revolves around the deficiencies of p-values, which measure only P(E|H) when the real quantity of interest is P(H|E).
- Norbert Wiener, “Cybernetics in History” (Chapter I) and “Role of the Intellectual and the Scientist” (Chapter VII), The Human Use of Human Beings (1954 Edition), Free Association Books, 1989, pp. 15-27 and 131-135
- Chapter 1 presents a view of technology as it has developed historically. The basic viewpoint is summarized in the last paragraph: machines and communication technologies have been developed “to hold back nature’s tendency toward disorder by adjusting to various purposive ends.” Wiener views such technology as “the cement which binds its fabric together” and as a parallel to biological processes, with the study of nature, humans, and machines being united in “cybernetics“. This seems to be a positive portrayal of technology, but understanding of the central role of communication shifted people’s focus toward it in a way that has also been critical of technology.
- In Chapter 8, Wiener’s analysis of the effects of communication technology turns more critical. The expansion of communication technology has shifted the purposes of communication in more commercial directions, leading to what we might now call “lowest common denominator” media that tries to avoid offense and is focused on profit rather than original expression. So Wiener knew, even before Marshall McLuhan, that changes brought about by the development of communication technology could be negative, and this forms one of the first pillars of the ethics of computing and information technology.
- William M. K. Trochim, “Ethics in Research”, Research Methods Knowledge Base (October 20, 2006 Edition) <http://www.socialresearchmethods.net>
- This brief piece describes the main elements of ethical research on human beings as it grew out of earlier histories of unethical research: voluntary participation, informed consent, avoidance of the risk of harm, confidentiality (or anonymity), and the right to service.
- Dan Jones, “A WEIRD View of Human Nature Skews Psychologists’ Studies”, Science 328:1627, 2010
- Martha J. Farah, “Neuroethics: The Ethical, Legal, and Societal Impact of Neuroscience”, Annual Review of Psychology 63:571-591, 2012
- This paper presents a broad overview of ethical issues related to contemporary neuroscience. The discussion of “neuromarketing” provides one particularly potent example of how neuroscience research methods can lead to ethical issues. This is not unique to neuroscience, of course. Increasingly, science and technology can be used to influence people’s thoughts and desires in ways that may not ultimately benefit them.
- OPTIONAL: Yudhijit Bhattacharjee, “The Mind of a Con Man”, New York Times Magazine, April 26, 2013
- Fascinating case study. Stapel’s fraud has shaken the entire field of experimental psychology and led to new initiatives to change what psychologists do and value. How might you change the incentive systems to prevent cases like this from arising?
- JUST FOR LAUGHS: “Kripke Resigns As Report Alleges He Faked Results of Thought Experiments”, fauxphilnews, February 22, 2012
- We discussed the Milgram Experiment and the Stanford Prison Experiment, both of which contributed to the felt need to regulate psychological research, and both of which took place before the current system of institutional review was in place. Thinking about these experiments helps us to understand the elements of best practice that have developed since then: informed consent, privacy, avoidance of harm, etc.
- Researchers are responsible to many different types of interested parties, and the ethics of the field tries to balance how they weigh in research practice.
- Don Norman, “Cognitive Engineering”, in Donald A. Norman and Stephen W. Draper (editors), User Centered System Design: New Perspectives on Human-Computer Interaction, Lawrence Erlbaum Associates, 1986, pp. 31-61
- This article by the cognitive psychologist turned HCI guru Don Norman predated his well known book The Design of Everyday Things. What he calls “cognitive engineering” is closely related to what others call “cognitive ergonomics” – the application of research about cognition to design. See the slides from this session for the points I think are most salient and enduring in Norman’s article.
- Tim Brown, “Design Thinking”, Harvard Business Review, June 2008
- This is a good summary of the ideas behind “design thinking” as practiced and promoted by the influential palo-alto based design firm IDEO, of which Tim Brown is the CEO. Design thinking is also the predominant approach at Stanford’s d.school, whose web site contains a crash course in design thinking.
- The case study described in this article revolves around IDEO’s work with Shimano to develop the “coasting bike”. Unfortunately, the product did not sell well and was soon discontinued, so including that example makes this article seem less persuasive than it might be if the chosen example were a successful product.
- Andreas Holzinger, “Usability Engineering Methods for Software Developers”, Communications of the ACM 48(1):71-74, 2005
- Usability research is connected with evaluation, which would be a worthy topic of its own for this course in future years.
- User testing is another aspect of the relationship between cognitive psychology and HCI. While cognitive engineering/ergonomics focuses on bringing psychological principles into design, usability testing brings and augments methods of experimental psychology to the refinement and evaluation of designs.
- The proposals that the two teams came up with for redesigning Stanford’s daily class schedule were both interesting and included valuable ideas. I am planning to write up a proposal of my own that will include these ideas, and will share it with all of you when it is done. The Faculty Senate seems likely to take up the issue of the class schedule in 2013-’14. As an exercise, I feel this could be improved by asking design teams to go through an explicit process: discuss and list problems you want to address, then brainstorm solutions, then develop a single proposed design and do a pro-con chart showing how your design compares to the existing system. In a multiple teams approach, designs could then be compared on pro-con charts in relation to each other as well as the existing system, and the groups together could choose one or develop a synthesizing proposal that combines them. It would be interesting to give different instructions to the groups, e.g. one goes through a step-by-step process and another does not, to see how procedural structuring affects the proposal that each group develops.
- John R. Anderson, “Problem Solving” (Chapter 8), Cognitive Psychology and Its Implications, Sixth Edition, Worth Publishers, 2005, pp. 242-278
- Problem solving skills of the type discussed in the reading, and which have been studied in the past by psychologists, are becoming less important in everyday life as we have more access to answers others have generated, through Internet searches and easy communication with those who have encountered such problems. In the real world, it is usually possible to look up the answer to a problem or ask someone who will know it from experience, and as we acquire more experience, novel problem solving gives way to recognition and memory for techniques we have learned in the past. But firms hiring technical workers still like to give brain teasers as interview questions, perhaps because they think the ability to solve these questions indicates general intelligence, or because they want to see the process of thinking an interviewee employs.
- Jeannette M. Wing, “Computational Thinking”, Communications of the ACM 49(3):33-35, 2006
- This paper is a good summary of the ways in which CS knowledge can be applied in everyday life.
- OPTIONAL: George Polya, Part I. “In the Classroom” and Part II. “How to Solve It – A Dialogue”, How to Solve It : A New Aspect of Mathematical Method, Doubleday Anchor Books, 1957, pp. 1-36
- Polya’s book was a landmark for understanding how mathematicians solve problems heuristically. Is the method he advocates applicable to problem solving more generally?
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-20-2013.pdf. We talked about the following real world problems and how to solve them:
- How to select and plant mushrooms on a half-acre of land on two plots in a way that helps the ecosystem of a permaculture farm?
- How to use flashcards to learn Japanese grammar?
- How to store excess energy produced but not needed at the time of production?
- How to improve educational outcomes in the U.S.?
- Paul M. Churchland, “Cognitive Activity in Artificial Neural Networks”, in D. Osherson and E. E. Smith (editors), An Invitation to Cognitive Science, Volume 3: Thinking, MIT Press, 1990
- This paper discusses NetTalk, a neural network program that solved the text-to-speech problem. You can see a demo of NetTalk on Youtube. The voice of the demo is like that of a child. Does this bias us toward believing that NetTalk is a truer simulation of child language learning?
- David Ferrucci, Eric Brown, Jennifer Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager, Nico Schlaefer, and Chris Welty, “Building Watson: An Overview of the DeepQA Project”, AI Magazine 31(3), Fall 2010
- DeepQA uses a “kitchen sink” approach to computing answer probabilities. It allows different techniques to weigh in on the answer, rather than opting for one technique, applying the Yogi Bera adage, “When you get to the fork in the road, take it.” Does this tell us anything about debates over which framework is the correct one for understanding cognition (e.g. bayesian nets, neural nets, logic, heuristic rules)?
- Jonathan Rauch, “Seeing Around Corners”, The Atlantic Monthly, April 2002
- With agent-based modeling, as in many other types of simulation, it can be hard to tell whether we should take the model seriously just because it duplicates some observed pattern of behavior in the real world, when we also know that it makes simplifying or incorrect assumptions. Nonetheless, the methodology is interesting and still of growing interest.
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-15-2013.pdf. Note:
- I defined “model” circularly here, or as we say in computing, “recursively.” 🙂
- William M. K. Trochim, “Analysis” (through “Randomized Block Analysis”), Research Methods Knowledge Base (October 20, 2006 Edition) <http://www.socialresearchmethods.net>
- “Big Data”, Wikipedia, version 18:03, 26 April 2013
- “Big data” just refers to the amount of data. It does not always imply the use of techniques or concepts different from those used with smaller amounts of data. For example, the Preis et al. study noted in this article, showing that countries which generate more searches for the next year than for the past year tend to have higher GDPs, applies very simple between-groups statistical inference.
- A large sample does not guarantee it is unbiased. See “Critiques of Big Data execution” in the article.
- OPTIONAL: Gary Marcus, “Steamrolled By Big Data”, The New Yorker, April 3, 2013
Slides: http://www.stanford.edu/class/symsys130/SymSys130-5-13-2013.ppt.pdf. Notes:
- An example of power analysis computation (adapted from Arthur M. Glenberg, Learning from Data: An Introduction to Statistical Reasoning, HBJ, 1988): Suppose a researcher believes that a given population P’ (say, one that has been given specific tutoring in taking IQ tests) scores higher on an IQ test than the general population P. Average IQ is 100 in P, so the null hypothesis H0 is AvgIQ(P’) = 100, and the alternative hypothesis is a one-sided hypothesis, H1: AvgIQ(P’) > 100. The researcher designs a study to test H0 versus H1. If the significance level alpha = 0.05, and if the number of test-takers in our sample from P’ is 64, and if the standard deviation of the IQ test is 16, then we can compute the power of the test if the researcher specifies an effect size which is the minimum s/he wants to be able to detect. Suppose they choose an effect size = 0.25s.d., or 4 IQ points, i.e. the alternative hypothesis is now H1: AvgIQ(P’) = 104. We can compute the power of giving an IQ test to 64 people drawn at random from P’. Power = Probability(Rejecting H0 | H1 is true). Since IQ scores are generally assumed to be normally distributed around their mean in a population, we can compute this probability by looking up the area under the curve of a normal distribution in a standard table for average measured IQ scores from P’ above a threshold value, determined by alpha in this test. Since it is a one-sided test and alpha=0.05, the table tells us that we will be rejecting H0 if the sample AvgIQ(P’) is greater than z=1.65 [Can you see how we found this?], where z=[SampleMean(P’)-Mean(P)]/StandardError(P’). In this example, SampleMean(P’) is the average of the scores received by test takers in our sample from P’. Mean(P) = 100, by assumption since P is the general population, and StandardError(P’) = the estimated standard deviation of SampleMean(P’), which is calculated by dividing the standard deviation of the distribution (=16) by the square root of the sample size (=64), so it is 16/Sqrt(64) = 16/8 = 2. With z=1.65, we can solve for the SampleMean(P’) which will be our cutoff for rejecting H0. We therefore compute that we will reject H0 if SampleMean(P’) > 100 + 2(1.65) = 103.3. Applying this to our H1 assumption that AvgIQ(P’) = 104, we compute the area under the normal distribution of sample means from that population given a standard deviation of 16 and a sample size of 64. The z-score for 103.3 in this distribution is (103.3-104)/[16/Sqrt(64)] = 0.7/2 = 0.35. From the normal distribution table, this means the probability that we will get a sample mean > 103.3 given H1 is 0.1368+0.5 = 0.6368 [Again, see if you can tell where this comes from in the table], and this is our power. So we have a little less than a 64% chance of rejecting H0 given our revised H1 (AvgIQ(P’)=104). A heuristic for power is that it should be above 0.8, so this test is a little underpowered for the effect size we are aiming to show.