Confirmation bias is the tendency to interpret in a way that aligns with our preexisting beliefs or hypotheses. It distorts how we conduct research and leads to unreliable results. This unit helps you identify confirmation bias and mitigate its effects through practical approaches like masking.
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
References:
Confirmation Bias
Bacon, F. (1620). Novum Organum, sive Indicia Vera de Interpretatione Naturae (“New organon, or true directions concerning the interpretation of nature”).
Bebarta, V., Luyten, D., & Heard, K. (2003). Emergency Medicine Animal Research: Does Use of Randomization and Blinding Affect the Results? Academic Emergency Medicine, 10(6), 684–687. https://doi.org/10.1111/j.1553-2712.2003.tb00056.x
Born, R. T. (2024). Stop Fooling Yourself! (Diagnosing and Treating Confirmation Bias). Eneuro, 11(10), ENEURO.0415-24.2024. https://doi.org/10.1523/ENEURO.0415-24.2024
Canli, T., Zhao, Z., Desmond, J. E., Kang, E., Gross, J., & Gabrieli, J. D. E. (2001). An fMRI study of personality influences on brain reactivity to emotional stimuli. Behavioral Neuroscience, 115(1), 33–42. https://doi.org/10.1037/0735-7044.115.1.33
Casad, B. J., & Luebering, J. E. (n.d.). confirmation bias. https://www.britannica.com/science/confirmation-bias
Catalog of Bias. (n.d.). https://catalogofbias.org/biases/
Cwiek, A., Rajtmajer, S. M., Wyble, B., Honavar, V., Grossner, E., & Hillary, F. G. (2022). Feeding the machine: Challenges to reproducible predictive modeling in resting-state connectomics. Network Neuroscience, 1–20. https://doi.org/10.1162/netn_a_00212
Darwin, F. (Ed.). (1892). Charles Darwin: his life told in an autobiographical chapter, and in a selected series of his published letters. (abridged edition). https://darwin-online.org.uk/EditorialIntroductions/Freeman_LifeandLettersandAutobiography.html
Ferguson, J., Littman, R., Christensen, G., Paluck, E. L., Swanson, N., Wang, Z., Miguel, E., Birke, D., & Pezzuto, J.-H. (2023). Survey of open science practices and attitudes in the social sciences. Nature Communications, 14(1), 5401. https://doi.org/10.1038/s41467-023-41111-1
Fine, C. (2010). From Scanner to Sound Bite: Issues in Interpreting and Reporting Sex Differences in the Brain. Current Directions in Psychological Science, 19(5), 280–283. https://www.jstor.org/stable/41038586
Gazzaniga, M. S. (2005). Forty-five years of split-brain research and still going strong. Nature Reviews Neuroscience, 6(8), 653–659. https://doi.org/10.1038/nrn1723
Gelman, A., & Loken, E. (2016). The Statistical Crisis in Science. In M. Pitici (Ed.), The Best Writing on Mathematics 2015 (pp. 305–318). Princeton University Press. https://doi.org/10.1515/9781400873371-028
Gregory, R. L. (2015). Eye and Brain: The Psychology of Seeing - Fifth Edition. Princeton University Press. https://doi.org/10.1515/9781400866861
Kaanders, P., Sepulveda, P., Folke, T., Ortoleva, P., & De Martino, B. (2022). Humans actively sample evidence to support prior beliefs. ELife, 11, e71768. https://doi.org/10.7554/eLife.71768
Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103(3), 582–591. https://doi.org/10.1037/0033-295X.103.3.582
Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 100804. https://doi.org/10.1016/j.patter.2023.100804
Kappes, A., Harvey, A. H., Lohrenz, T., Montague, P. R., & Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nature Neuroscience, 23(1), 130–137. https://doi.org/10.1038/s41593-019-0549-2
Little, M. A., Varoquaux, G., Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C., & Kording, K. P. (2017). Using and understanding cross-validation strategies. Perspectives on Saeb et al. GigaScience, 6(5). https://doi.org/10.1093/gigascience/gix020
MacLeod, A. K., Coates, E., & Hetherton, J. (2008). Increasing well-being through teaching goal-setting and planning skills: results of a brief intervention. Journal of Happiness Studies, 9(2), 185–196. https://doi.org/10.1007/s10902-007-9057-2
Matthay, E. C., & Glymour, M. M. (2020). A Graphical Catalog of Threats to Validity: Linking Social Science with Epidemiology. Epidemiology, 31(3), 376–384. https://doi.org/10.1097/EDE.0000000000001161
Monaghan, T. F., Agudelo, C. W., Rahman, S. N., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Blinding in Clinical Trials: Seeing the Big Picture. Medicina, 57(7), 647. https://doi.org/10.3390/medicina57070647
Muthukumaraswamy, S. D., Forsyth, A., & Lumley, T. (2021). Blinding and expectancy confounds in psychedelic randomized controlled trials. Expert Review of Clinical Pharmacology, 14(9), 1133–1152. https://doi.org/10.1080/17512433.2021.1933434
Platt, J. R. (1964). Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science, 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347
Schulz, K. F., Altman, D. G., Moher, D., & for the CONSORT Group. (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ, 340(mar23 1), c332–c332. https://doi.org/10.1136/bmj.c332
Toga, A. W., & Thompson, P. M. (2003). Mapping brain asymmetry. Nature Reviews Neuroscience, 4(1), 37–48. https://doi.org/10.1038/nrn1009
Tuyttens, F. A. M., De Graaf, S., Heerkens, J. L. T., Jacobs, L., Nalon, E., Ott, S., Stadig, L., Van Laer, E., & Ampe, B. (2014). Observer bias in animal behaviour research: can we believe what we score, if we score what we believe? Animal Behaviour, 90, 273–280. https://doi.org/10.1016/j.anbehav.2014.02.007
Verploegh, I. S. C., Lazar, N. A., Bartels, R. H. M. A., & Volovici, V. (2022). Evaluation of the Use of P Values in Neurosurgical Literature: from Statistical Significance to Clinical Irrelevance. World Neurosurgery, 161, 280-283.e3. https://doi.org/10.1016/j.wneu.2022.02.018
Vesterinen, H. M., Sena, E. S., ffrench-Constant, C., Williams, A., Chandran, S., & Macleod, M. R. (2010). Improving the translational hit of experimental treatments in multiple sclerosis. Multiple Sclerosis Journal, 16(9), 1044–1055. https://doi.org/10.1177/1352458510379612
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition. Perspectives on Psychological Science, 4(3), 274–290. https://doi.org/10.1111/j.1745-6924.2009.01125.x
Wason, P. C. (1960). On the Failure to Eliminate Hypotheses in a Conceptual Task. Quarterly Journal of Experimental Psychology, 12(3), 129–140. https://doi.org/10.1080/17470216008416717
Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5(6), 598–604. https://doi.org/10.1038/nn0602-858
Ye, J., Li, Y., Lazar, N. A., Schaeffer, D. J., & McDowell, J. E. (2016). Finding common task‐related regions in fMRI data from multiple subjects by periodogram clustering and clustering ensemble. Statistics in Medicine, 35(15), 2635–2651. https://doi.org/10.1002/sim.6906
Zitron-Emanuel, N., & Ganel, T. (2018). Food deprivation reduces the susceptibility to size-contrast illusions. Appetite, 128, 138–144. https://doi.org/10.1016/j.appet.2018.06.006
Instructor guide:
Confirmation Bias
This instructor guide is designed as a flexible framework to help you deliver the Confirmation Bias unit effectively.
- Each lesson is summarized with key takeaways and includes cues for additional, step-by-step directions for activities.
- Use the video and supplementary slide-by-slide annotations with speaker notes provided to familiarize yourself with the material, streamline your lesson preparation, and enhance classroom discussions.
- Feel free to adapt and customize the content to fit your teaching style and your students' needs. Get access to the slides here, then navigate to File -> Make a Copy to get started.
Overview and Introduction
Summary:
This unit on confirmation bias explores the subtle yet powerful ways that our predisposition to privilege our prior beliefs can influence every stage of scientific research. By understanding how confirmation bias can distort experimental design, data collection, analysis, and interpretation, students will learn essential strategies to design more rigorous, transparent, and reproducible studies. This unit is ideal for early-career researchers and advanced students who wish to strengthen their critical thinking skills and safeguard against biased reasoning.
Why use this unit:
- This unit on confirmation bias equips students with a fundamental understanding of cognitive biases, especially confirmation bias, which is often the root of many research errors.
- Each lesson blends theoretical insights with practical activities, ensuring that learners not only recognize bias but can also implement strategies to minimize its impact in their work.
- Real-world examples and interactive activities encourage students to reflect on their own decision-making processes and develop more robust research practices.
Lesson 1: Our biased brains
Lesson summary:
This lesson introduces confirmation bias as a fundamental error in cognition. It details how initial beliefs influence the gathering and interpretation of information. The classic Wason task (1960) illustrates the pervasive nature of this bias, and highlights the need for critical evaluation in research.
Goal:
Build an intuition around identifying cognitive bias and taking steps to mitigate it.
Activity overview: (~3-5 minutes)
Participants engage in a modified version of the Wason 2-4-6 task to experience firsthand the important distinction between trying to falsify vs trying to support an initial hypothesis.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-01/
Step-by-Step activity instructions:
- [Instructor] Prompt students to start typing numbers into the 3 input fields. Remind them that they will either see "TRUE" or "FALSE" when they hit the "TEST" button — this indicates if their number sequences match the "secret rule" for this session. Note: Each user will have a random "secret rule," so if you'd like pairs or teams to work together on the same one, have a representative be the only one who pulls up the activity/have them share a single device for that group.
- Type numbers into the 3 input boxes, then hit the "TEST" button to see if your guess is "true" or "false" in regards to the "secret rule."
- Repeat until you're ready to draft a hypothesis.
- Click the "GUESS THE NUMBER RULE" button to reveal an input box.
- Type your hypothesis for what the "secret rule" is into the input box.
- Click "SUBMIT" to discover what the "secret rule" is and if you guessed correctly.
- Click "CONTINUE" to review your guesses in relation to the guesses of others.
- [Instructor] Spend time with students reviewing their guesses. Make sure that by the end of the discussion, there is a conversation about falsifying your hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
A hypothesis is only as strong as the effort spent to try and refute it.
Lesson 2: Mitigating bias through masking
Lesson summary:
This lesson explains masking (also called blinding) as a crucial method to reduce observer bias in experiments. Practical examples and step-by-step protocols demonstrate how rigorous masking improves accuracy and reliability of research findings. Students are invited to explore the problematic impact bias has on distorting the effect size of experiment results.
Goal:
Explore the design and implementation of rigorous masking protocols to ensure data collection and analysis remain free from observer bias, a specific type of confirmation bias.
Activity overview: (~1-2 minutes)
Students will explore the impact of bias on spurious results in scientific research.
Heads up!
This activity uses some technical terms. Here are some quick definitions for each of these terms:
- Sample Size: The number of observations or data points included in a study or experiment. Note that sample size should be reported for all levels: typically, the numbers of subjects, assays, and datapoints. Larger sample sizes generally increase the reliability and statistical power of a study.
- Bias Amount: A measure of systematic error introduced into a study, which can distort the results away from the true effect.
- True Effect Size (d): A quantitative measure of the magnitude of the difference or relationship being studied. A larger effect size means a stronger relationship or difference between groups.
- Probability of detecting a significant result: The likelihood that an observed result is statistically significant, meaning it is unlikely to have occurred by random chance.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-04/
Step-by-Step activity instructions:
- [Instructor] It might make the most sense to treat this activity as an extended discussion. Students can still work solo, but the exploration works best with prompting about how different configurations influence effect size while segueing into experiences students have had in their labs or when reading published papers.
- You are provided with two sliding bars. One represents "Sample size" and the other represents "Bias" in a given experiment.
- Click on each dot to move it along its respective sliders.
- Try different pairings to see how these factors distort the relationship between true effect size and the probability of statistical significance.
- The relationships you explore will be visualized in the graph below the 2 sliders.
- [Instructor] If you haven't been a part of the discussion thus far, make sure you lead a brief discussion, and connect this exploration with the next segment of the unit, showcasing the impact of not masking and not randomizing on effect sizes.
Activity takeaway:
A failure to mask can introduce substantial risk of spurious results.
Lesson 3: Data masking in machine learning models
Lesson summary:
This lesson shows that delegating analysis to algorithms does not free us from confirmation bias. In fact, machine-learning models can easily exploit any hint of data leakage. Using a Parkinson's-disease detector as a case study, the lesson shows how evaluating a model on the same patients used for training produces inflated accuracy. It is effectively a machine-learning analogue of unmasked treatment groups. The counter is rigorous data partitioning (i.e., training, validation, and test sets with zero overlap and representative variability) in addition to vigilance against subtler leaks.
Goal:
Learn to detect and block data leakage so your machine-learning models deliver honest, generalizable performance rather than echoing built-in bias.
Activity overview: (~3 min)
Students will interact with a basic simulation of a machine learning experiment for detecting a condition like Parkinson's Disease, to understand the impacts of leakage on model results.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-07
Step-by-Step activity instructions:
- [Instructor] Students can work together or solo. Explain that each subset (A, B, C, D) represents data from 10 patients. Remind students that toggling a button assigns that subset to use in the Training or Testing set. Emphasize that their selections directly determine how the model is built and evaluated.
- Under Model Setup, take note of the Training Set and Test Set sections. Each section contains four toggle buttons labeled A, B, C, D. Toggling a button on includes that subset of data in the selected set. Toggling it off removes it from that set.
- In the first phase, your goal is to maximize the model's performance. Select all four subsets (A, B, C, D) under the Training Set. Select at least one subset (A, B, C, or D) under the Test Set. When a valid selection is made, the Train Model button will become enabled.
- Click the Train Model button to build your model. The model's performance results will appear in the Results table.
- You will receive a prompt to click NEXT to proceed. If desired, click Reset to clear your selections. You may make alternative Model Setup selections and train another model. Each additional trained model will add a new entry to the Results table.
- In the second phase, your goal is to perform a proper train/test split. Select at least one subset for the Training Set. Select at least one subset for the Test Set. Ensure there is no overlap between Training and Testing subsets. When a valid, non-overlapping selection is made, the Train Model button will become enabled.
- Click the Train Model button. The new model's performance will be added to the Results table.
- Review the models listed in the Results table. Each model includes its Test Accuracy.
- You will be prompted to answer a multiple-choice question: Select which model's Test Accuracy you believe is most indicative of its true accuracy on new data. The answer choices correspond to the models displayed in the Results table.
- Click Submit to lock in your selection. Your chosen model will be highlighted in the Results table. The Reveal Out-of-Sample Accuracy button will become enabled.
- Click Reveal Out-of-Sample Accuracy. An Out-of-Sample (OOS) Accuracy column will be added to the Results table. A Change column will also appear, indicating how performance differs from the Test Accuracy.
- You will receive a feedback prompt encouraging you to reflect on why some models' test accuracies differed from their out-of-sample performance.
- Click COMPLETE ACTIVITY to finish the activity.
- [Instructor] At this stage, lead discussion and have students share about their experience. Ask students what patterns they observed between Test Accuracy and Out-of-Sample Accuracy. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Reserving a share of the data for testing is a necessary component of building a reliable model.
Lesson 8: Beyond confirmation bias: additional disruptions to rigor
Lesson summary:
This lesson broadens the discussion to include additional biases that can distort science. It emphasizes the importance of adopting systematic safeguards to minimize the cumulative impact of these biases on research integrity.
Goal:
Build a comprehensive understanding of how various cognitive biases and rigor issues connect to each other and negatively impact research.
Activity overview: (~15 min)
Participants will map out a "bias ecosystem" by connecting different biases and discussing real-life examples where these interactions have led to research pitfalls.
Heads up!
This activity introduces a lot of terms for a variety of biases. If students have questions about any particular bias, you can learn more by searching the catalogue of biases.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-08
Step-by-Step activity instructions:
- [Instructor] Note that there is an option on the landing page to have students work individually and as a group. If you want to run it for a class to only see their peers' results, click the "AS A GROUP" button to receive a session link that each member of the class will need to use to have their results coordinate on the results page. If students select the "AS AN INDIVIDUAL" button, the results page at the end will show results from "global users" outside of your class.
- You should see boxes in the playable field with text describing different types of bias.
- You can click on any box (bias) to drag it to a different position within the field.
- You can click the center of any box to start drawing a line that can be connected to another box (bias).
- Once you connect 2 biases, click "Click to edit" to type in a description of how these biases are connected.
- You can connect more than one bias to another.
- Once you've completed your "bias map", you can click "EXPORT AS PNG" or "EXPORT AS PDF" to save.
- Click "SUBMIT" to advance to the next page and compare your bias map with other users.
- [Instructor] At this stage, lead a discussion and have students share about their experience and the connections they've made. Segue to the discussion questions in the unit to connect students to the next portion of the lesson.
Activity takeaway:
Different kinds of rigor issues and biases can replicate, cause, inform, or otherwise relate to one another.
Observations & final notes
Note for Instructors:
Each unit is estimated to comprise approximately 3 hours of instructional time (approx. 15-20 minutes per lesson), but variances in discussion length, student needs, experiences with the interactive activities, or instructor customization may yield different unit and lesson durations.
Concepts likely to challenge students:
- Students may have emotional reactions in places where the instructional content differs from their prior experience. This is okay! You're helping them to learn how to do more rigorous work.
- We advise letting disgruntled students express their points of disagreement, then gently encouraging them to consider why the materials might disagree with what they've been taught previously.
- Remember: There's nothing wrong with asking a student to hold on to their grievance to let you conclude the unit.
- Think we got it wrong? We want to improve! Email us at c4r@seas.upenn.edu.
- Differentiating between exploratory and confirmatory analysis and the pitfalls of "double dipping" in data.
- Recognizing subtle instances of unmasking in experiments, especially when biases operate at multiple levels.
- Designing research protocols that adequately control for the myriad ways confirmation bias can infiltrate decision-making.
- Navigating the technical details of machine learning modeling, particularly in parsing out different types of cross-validation.
Final Reminder for Instructors:
This guide is intended as a flexible framework for presenting the Confirmation Bias unit. Instructors are encouraged to adapt the content to best suit their teaching style and the needs of their students. Feel free to expand on any section, incorporate additional examples, or integrate further interactive elements. Remember, this is your presentation: use it as a starting point and customize it to best serve your teaching.