Welcome to Randomization!
Most of us have encountered the idea that randomization is important for proper experimental design, but may have less clear guidance on how to randomize or why randomization is so important. We also might be distracted by common appeals to the “randomness” of winning lottery numbers or other patterned behaviors that feel like pure chance.
Randomization in experimental design is about systematically and thoughtfully imposing an order onto our treatment of variables. Specifically, randomization is a series of actions that reduce the interference of confounding variables and disturbance variables as you study causal relationships. Intentional action like this doesn’t feel very random, and that’s on purpose.
This unit will help you examine the potential ramifications of not controlling for confounding factors, which would skew your results and prevent you from collecting useful data. They also supply an opportunity to explore a few randomization methods that can be applied in your own research.
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
References:
Randomization
References: Randomization
Updated May 30, 2025.
Alandra, K. (2023). Introductory Statistics. Bentham Science Publishers; eBook Academic Collection (EBSCOhost). https://research.ebsco.com/linkprocessor/plink?id=f370fade-dbfd-34bc-bfe1-164c0d49d614
Albrechet-Souza, L., Cristina De Carvalho, M., Rodrigues Franci, C., & Brandão, M. L. (2007). Increases in plasma corticosterone and stretched-attend postures in rats naive and previously exposed to the elevated plus-maze are sensitive to the anxiolytic-like effects of midazolam. Hormones and Behavior, 52(2), 267–273. https://doi.org/10.1016/j.yhbeh.2007.05.002
APA Dictionary of Psychology. (n.d.). Retrieved May 23, 2025, from https://dictionary.apa.org/
Athey, S., Imbens, G. W., & Wager, S. (2018). Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4), 597–623. https://doi.org/10.1111/rssb.12268
Bobbitt, Z. (2022, August 16). How (And When) to Use set.seed in R. Statology. https://www.statology.org/set-seed-in-r/
Broglio, K. (2018). Randomization in Clinical Trials: Permuted Blocks and Stratification. JAMA, 319(21), 2223–2224. https://doi.org/10.1001/jama.2018.6360
Chan, A.-W., Tetzlaff, J. M., Altman, D. G., Laupacis, A., Gøtzsche, P. C., Krleža-Jerić, K., Hróbjartsson, A., Mann, H., Dickersin, K., Berlin, J. A., Doré, C. J., Parulekar, W. R., Summerskill, W. S. M., Groves, T., Schulz, K. F., Sox, H. C., Rockhold, F. W., Rennie, D., & Moher, D. (2013). SPIRIT 2013 Statement: Defining Standard Protocol Items for Clinical Trials. Annals of Internal Medicine, 158(3), 200–207. https://doi.org/10.7326/0003-4819-158-3-201302050-00583
Cramer, D., & Howitt, D. (2004). The SAGE Dictionary of Statistics. SAGE Publications, Ltd. https://doi.org/10.4135/9780857020123
Create a blocked randomisation list | Sealed Envelope. (n.d.). Retrieved May 13, 2025, from https://www.sealedenvelope.com/simple-randomiser/v1/lists
EQUATOR Network | Enhancing the QUAlity and Transparency Of Health Research. (n.d.). Retrieved May 23, 2025, from https://www.equator-network.org/
Fornai, F., Longone, P., Cafaro, L., Kastsiuchenka, O., Ferrucci, M., Manca, M. L., Lazzeri, G., Spalloni, A., Bellio, N., Lenzi, P., Modugno, N., Siciliano, G., Isidoro, C., Murri, L., Ruggieri, S., & Paparelli, A. (2008). Lithium delays progression of amyotrophic lateral sclerosis. Proceedings of the National Academy of Sciences, 105(6), 2052–2057. https://doi.org/10.1073/pnas.0708022105
Hilgers, R.-D., Manolov, M., Heussen, N., & Rosenberger, W. F. (2020). Design and analysis of stratified clinical trials in the presence of bias. Statistical Methods in Medical Research, 29(6), 1715–1727. https://doi.org/10.1177/0962280219846146
Huang, W., Percie Du Sert, N., Vollert, J., & Rice, A. S. C. (2019). General Principles of Preclinical Study Design. In A. Bespalov, M. C. Michel, & T. Steckler (Eds.), Good Research Practice in Non-Clinical Pharmacology and Biomedicine (Vol. 257, pp. 55–69). Springer International Publishing. https://doi.org/10.1007/164_2019_277
Imbens, G. W. (2020). Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics. Journal of Economic Literature, 58(4), 1129–1179. https://doi.org/10.1257/jel.20191597
Kang, M., Ragan, B. G., & Park, J.-H. (2008). Issues in Outcomes Research: An Overview of Randomization Techniques for Clinical Trials. Journal of Athletic Training, 43(2), 215–221. https://doi.org/10.4085/1062-6050-43.2.215
Kernan, W. N., Viscoli, C. M., Makuch, R. W., Brass, L. M., & Horwitz, R. I. (1999). Stratified randomization for clinical trials. Journal of Clinical Epidemiology, 52(1), 19–26. https://doi.org/10.1016/s0895-4356(98)00138-3
Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M., & Altman, D. G. (2010). Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research. PLoS Biology, 8(6), e1000412. https://doi.org/10.1371/journal.pbio.1000412
Knippenberg, S., Thau, N., Dengler, R., & Petri, S. (2010). Significance of behavioural tests in a transgenic mouse model of amyotrophic lateral sclerosis (ALS). Behavioural Brain Research, 213(1), 82–87. https://doi.org/10.1016/j.bbr.2010.04.042
Korngreen, A., Ma, W., Priel, Z., & Silberberg, S. D. (1998). Extracellular ATP directly gates a cation‐selective channel in rabbit airway ciliated epithelial cells. The Journal of Physiology, 508(3), 703–720. https://doi.org/10.1111/j.1469-7793.1998.703bp.x
Lachin, J. M. (1988a). Properties of simple randomization in clinical trials. Controlled Clinical Trials, 9(4), 312–326. https://doi.org/10.1016/0197-2456(88)90046-3
Lachin, J. M. (1988b). Statistical properties of randomization in clinical trials. Controlled Clinical Trials, 9(4), 289–311. https://doi.org/10.1016/0197-2456(88)90045-1
Lachin, J. M., Matts, J. P., & Wei, L. J. (1988). Randomization in clinical trials: conclusions and recommendations. Controlled Clinical Trials, 9(4), 365–374. https://doi.org/10.1016/0197-2456(88)90049-9
Law, J. (n.d.). A Dictionary of Science (Oxford Paperback Reference) 6th edition by Martin, Elizabeth A. (2010) Paperback. Oxford University Press.
Matts, J. P., & Lachin, J. M. (1988). Properties of permuted-block randomization in clinical trials. Controlled Clinical Trials, 9(4), 327–344. https://doi.org/10.1016/0197-2456(88)90047-5
Monaghan, T. F., Agudelo, C. W., Rahman, S. N., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Blinding in Clinical Trials: Seeing the Big Picture. Medicina (Kaunas, Lithuania), 57(7), 647. https://doi.org/10.3390/medicina57070647
Moraes, A. B., Giacomini, A. C. V. V., Genario, R., Marcon, L., Scolari, N., Bueno, B. W., Demin, K. A., Amstislavskaya, T. G., Strekalova, T., Soares, M. C., De Abreu, M. S., & Kalueff, A. V. (2021). Pro-social and anxiolytic-like behavior following a single 24-h exposure to 17β-estradiol in adult male zebrafish. Neuroscience Letters, 747, 135591. https://doi.org/10.1016/j.neulet.2020.135591
Nevalainen, T. (2014). Animal Husbandry and Experimental Design. ILAR Journal, 55(3), 392–398. https://doi.org/10.1093/ilar/ilu035
Percie du Sert, N., Hurst, V., Ahluwalia, A., Alam, S., Avey, M. T., Baker, M., Browne, W. J., Clark, A., Cuthill, I. C., Dirnagl, U., Emerson, M., Garner, P., Holgate, S. T., Howells, D. W., Karp, N. A., Lazic, S. E., Lidster, K., MacCallum, C. J., Macleod, M., … Würbel, H. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. BMC Veterinary Research, 16(1), 242. https://doi.org/10.1186/s12917-020-02451-y
Retraction: “An Overview of Randomization Techniques: An Unbiased Assessment of Outcome in Clinical Research.” (2023). Journal of Human Reproductive Sciences, 16(1), 87. https://doi.org/10.4103/0974-1208.170593
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
Schulz, K. F., Altman, D. G., Moher, D., & for the CONSORT Group. (2010). CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials. PLoS Medicine, 7(3), e1000251. https://doi.org/10.1371/journal.pmed.1000251
Schulz, K. F., Chalmers, I., Altman, D. G., Grimes, D. A., Moher, D., & Hayes, R. J. (2018). “Allocation concealment”: the evolution and adoption of a methodological term. Journal of the Royal Society of Medicine, 111(6), 216–224. https://doi.org/10.1177/0141076818776604
Schulz, K. F., & Grimes, D. A. (2002). Generation of allocation sequences in randomised trials: chance, not choice. The Lancet, 359(9305), 515–519. https://doi.org/10.1016/S0140-6736(02)07683-3
Silcocks, P. (2012). How many strata in an RCT? A flexible approach. British Journal of Cancer, 106(7), 1259–1261. https://doi.org/10.1038/bjc.2012.84
Sorge, R. E., Martin, L. J., Isbester, K. A., Sotocinal, S. G., Rosen, S., Tuttle, A. H., Wieskopf, J. S., Acland, E. L., Dokova, A., Kadoura, B., Leger, P., Mapplebeck, J. C. S., McPhail, M., Delaney, A., Wigerblad, G., Schumann, A. P., Quinn, T., Frasnelli, J., Svensson, C. I., … Mogil, J. S. (2014). Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nature Methods, 11(6), 629–632. https://doi.org/10.1038/nmeth.2935
Suresh, K. (2011). An overview of randomization techniques: An unbiased assessment of outcome in clinical research. Journal of Human Reproductive Sciences, 4(1), 8. https://doi.org/10.4103/0974-1208.82352
Verhave, P. S., van Eenige, R., & Tiebosch, I. (2024). Methods for applying blinding and randomisation in animal experiments. Laboratory Animals, 58(5), 419–426. https://doi.org/10.1177/00236772241272991
Instructor guide:
Randomization
This instructor guide is designed as a flexible framework to help you deliver the Confirmation Bias unit effectively.
- Each lesson is summarized with key takeaways and includes cues for additional, step-by-step directions for activities.
- Use the video and supplementary slide-by-slide annotations with speaker notes provided to familiarize yourself with the material, streamline your lesson preparation, and enhance classroom discussions.
- Feel free to adapt and customize the content to fit your teaching style and your students' needs. Get access to the slides here, then navigate to File -> Make a Copy to get started.
Overview and Introduction
Summary:
This unit on confirmation bias explores the subtle yet powerful ways that our predisposition to privilege our prior beliefs can influence every stage of scientific research. By understanding how confirmation bias can distort experimental design, data collection, analysis, and interpretation, students will learn essential strategies to design more rigorous, transparent, and reproducible studies. This unit is ideal for early-career researchers and advanced students who wish to strengthen their critical thinking skills and safeguard against biased reasoning.
Why use this unit:
- This unit on confirmation bias equips students with a fundamental understanding of cognitive biases, especially confirmation bias, which is often the root of many research errors.
- Each lesson blends theoretical insights with practical activities, ensuring that learners not only recognize bias but can also implement strategies to minimize its impact in their work.
- Real-world examples and interactive activities encourage students to reflect on their own decision-making processes and develop more robust research practices.
Lesson 1: Our biased brains
Lesson summary:
This lesson introduces confirmation bias as a fundamental error in cognition. It details how initial beliefs influence the gathering and interpretation of information. The classic Wason task (1960) illustrates the pervasive nature of this bias, and highlights the need for critical evaluation in research.
Goal:
Build an intuition around identifying cognitive bias and taking steps to mitigate it.
Activity overview: (~3-5 minutes)
Participants engage in a modified version of the Wason 2-4-6 task to experience firsthand the important distinction between trying to falsify vs trying to support an initial hypothesis.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-01/
Step-by-Step activity instructions:
- [Instructor] Prompt students to start typing numbers into the 3 input fields. Remind them that they will either see "TRUE" or "FALSE" when they hit the "TEST" button — this indicates if their number sequences match the "secret rule" for this session. Note: Each user will have a random "secret rule," so if you'd like pairs or teams to work together on the same one, have a representative be the only one who pulls up the activity/have them share a single device for that group.
- Type numbers into the 3 input boxes, then hit the "TEST" button to see if your guess is "true" or "false" in regards to the "secret rule."
- Repeat until you're ready to draft a hypothesis.
- Click the "GUESS THE NUMBER RULE" button to reveal an input box.
- Type your hypothesis for what the "secret rule" is into the input box.
- Click "SUBMIT" to discover what the "secret rule" is and if you guessed correctly.
- Click "CONTINUE" to review your guesses in relation to the guesses of others.
- [Instructor] Spend time with students reviewing their guesses. Make sure that by the end of the discussion, there is a conversation about falsifying your hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
A hypothesis is only as strong as the effort spent to try and refute it.
Lesson 2: Formulating rigorous hypotheses
Lesson summary:
This lesson focuses on the pitfalls of designing experiments that fail to rigorously test hypotheses. Instructors will discuss the importance of developing falsifiable and specific hypotheses, in order to minimize the opportunities for confirmation bias to unduly influence hypothesis testing.
Goal:
Learn to design specific, falsifiable, and contextual hypotheses so that experiments provide evidence that helps support well-justified conclusions.
Activity overview: (~10 minutes)
Students will work solo or in small groups to improve a hypothesis and study design. The activity guides students to make choices to assess the conditions under which a hypothesized effect will be true.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-02a/
Step-by-Step activity instructions:
- [Instructor] Students will see the research study they are going to improve. Encourage students to read the displayed Hypothesis and Study Design carefully before proceeding.
- The screen displays the original Hypothesis and Study Design. Click NEXT to advance to Screen 2.
- You will see that you are revising the Hypothesis of the research study. Multiple revised hypothesis options are provided. Guide students to select the hypothesis that best demonstrates the principles of specificity and falsifiability.
- Click Submit to confirm the selection. Feedback will appear explaining the strengths or weaknesses of the selected hypothesis. If you selected the best choice, the selection is disabled and the NEXT button is enabled. If you selected a different choice, you can revise your choice and then re-submit.
- Once the best hypothesis is selected, click NEXT to advance to Screen 3.
- You will now revise the Study Design of the research study. The original hypothesis will be replaced with the improved hypothesis selected on Screen 2, with an indicator that the hypothesis was improved.
- Select an improved Study Design.
- Click Submit to confirm the selection. Feedback about your choice will appear. If you selected the best choice, the selection is disabled and the NEXT button is enabled. If you selected a different choice, you can revise your choice and then re-submit.
- Click NEXT to advance to the Congrats screen.
- [Instructor] Lead discussion while on the results screen. Discuss how changes to the Study Design influenced alignment with the hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Rigorous research requires a specific, falsifiable hypothesis and a study design that tests it.
Lesson 3: Researcher degrees of freedom
Lesson summary:
This lesson unpacks researcher degrees of freedom. These are the myriad design and analysis choices that quietly steer findings toward a favored conclusion. Using an fMRI case study, it shows how post-hoc region selection ("double dipping"), ill-matched controls, and participant pre-screening each bias results toward supporting a localized neural-correlate hypothesis (H₁). The discussion then broadens to general researcher choices and argues that, without forethought, each becomes a conduit for confirmation bias. Countermeasures include formulating specific, falsifiable hypotheses, pre-registering protocols, masking and randomizing, and transparently reporting all results.
Goal:
Recognize and constrain researcher degrees of freedom so experimental choices illuminate truth instead of amplifying confirmation bias.
Activity overview: (~10 minutes)
In this session, learners are given a case study illustrating various choices that can lead to bias. They will then select the choice that produces the largest bias in favor of the provided hypothesis, H1. If you have a small group, consider having a single person show their screen to all participants so that the group can debate these "would you rather" prompts as they go. If you have a larger group, consider dividing them into small groups.
Heads up!
This activity uses some technical language that may be unfamiliar to those without experience in fMRI research. The point of using this specific example isn't to intimidate you or your students; it's to drive home the point that we can still find rigor issues even if we aren't deeply familiar with a subject area. Here is a glossary of terms and how they affect hypothesis testing:
- Multiple comparison correction does not favor H1, but may not be ideal in obtaining sufficient power for the study if many voxels will be examined.
- Analyzing the most correlated voxel strongly favors H1 because it double-dips, using the same data to determine the parameters of the analysis and to perform the analysis.
- Stopping data collection early favors H1, because it conditions on the outcome; the amount of data collected will depend on the interim results.
- A simple control seems like a good idea, but adding in another kind of picture (landscapes), could be an unnecessary source of noise.
- Analysis Parameter: Smoothing favors H1 because it loosens the parameters of the analysis to find a promising result.
- Personality screen strongly favors H1 because it selects a biased sample that seems aligned with that hypothesis.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-03
Step-by-Step activity instructions:
- [Instructor] You will have the option of sharing the activity with your students via QR Code or a direct link.
- Read the H1 for the fMRI case study: "Emotional pictures produce stronger BOLD responses than neutral pictures in the amygdala."
- Then read the 2 choices provided and click on the option you believe would most strongly bias the results towards supporting H1.
- Then type out your reasoning for why this action would bias the experiment in favor of H1 in the input box and click the "ROUND 2" button.
- Repeat, then click the "ROUND 3" button.
- Repeat, then click the "SUBMIT" button.
- Compare your results with those of your peers on the final page. You can toggle through everyone's answers and discuss your thought process.
- [Instructor] Lead discussion while on the results screen. Be sure to toggle between the different responses provided in each round. Hover over the visualization to see a pop-up with the text description of the choices. Scroll up and down to reveal student responses. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Some well-intended research practices introduce bias into our experiments.
Lesson 4: Mitigating bias through masking
Lesson summary:
This lesson explains masking (also called blinding) as a crucial method to reduce observer bias in experiments. Practical examples and step-by-step protocols demonstrate how rigorous masking improves accuracy and reliability of research findings. Students are invited to explore the problematic impact bias has on distorting the effect size of experiment results.
Goal:
Explore the design and implementation of rigorous masking protocols to ensure data collection and analysis remain free from observer bias, a specific type of confirmation bias.
Activity overview: (~1-2 minutes)
Students will explore the impact of bias on spurious results in scientific research.
Heads up!
This activity uses some technical terms. Here are some quick definitions for each of these terms:
- Sample Size: The number of observations or data points included in a study or experiment. Note that sample size should be reported for all levels: typically, the numbers of subjects, assays, and datapoints. Larger sample sizes generally increase the reliability and statistical power of a study.
- Bias Amount: A measure of systematic error introduced into a study, which can distort the results away from the true effect.
- True Effect Size (d): A quantitative measure of the magnitude of the difference or relationship being studied. A larger effect size means a stronger relationship or difference between groups.
- Probability of detecting a significant result: The likelihood that an observed result is statistically significant, meaning it is unlikely to have occurred by random chance.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-04/
Step-by-Step activity instructions:
- [Instructor] It might make the most sense to treat this activity as an extended discussion. Students can still work solo, but the exploration works best with prompting about how different configurations influence effect size while segueing into experiences students have had in their labs or when reading published papers.
- You are provided with two sliding bars. One represents "Sample size" and the other represents "Bias" in a given experiment.
- Click on each dot to move it along its respective sliders.
- Try different pairings to see how these factors distort the relationship between true effect size and the probability of statistical significance.
- The relationships you explore will be visualized in the graph below the 2 sliders.
- [Instructor] If you haven't been a part of the discussion thus far, make sure you lead a brief discussion, and connect this exploration with the next segment of the unit, showcasing the impact of not masking and not randomizing on effect sizes.
Activity takeaway:
A failure to mask can introduce substantial risk of spurious results.
Lesson 5: How good is your mask?
Lesson summary:
This lesson explores the challenges associated with accidental unmasking and the assessment of masking effectiveness in experimental settings. Strategies for identifying subtle cues and preventing unmasking are discussed to ensure that research outcomes remain as unbiased as possible. Students are given the opportunity to develop a take-home tool for use in their own research.
Goal:
Develop an awareness of potential unmasking cues and how to safeguard experimental outcomes through effective masking.
Activity overview: (~12-15 minutes)
Participants will review an experimental scenario and identify practices with the potential to lead to accidental unmasking. They will then propose solutions to improve masking procedures. Make sure you give all participants ample time to read the experiment, and encourage them to use the hint function to proceed if they get stuck.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-05/
Step-by-Step activity instructions:
- [Instructor] Students will have the option to complete this activity as an individual or group. If you want to run it for a class to only see their peers' results, click the "AS A GROUP" button to receive a session link that each member of the class will need to use to have their results coordinate on the results page. If students select the "AS AN INDIVIDUAL" button, the results page at the end will show results from "global users" outside of your class.
- Review the details of the behavioral study on sleep patterns and, solo or with your group, think through all the ways this study might be accidentally unmasked.
- Type your observations into the input box.
- [Instructor] Prompt students to use the hint feature and chat about options with each other to continue thinking about risk points to unmask.
- Click "Get Hint" whenever you feel stuck to help you think of different ways the mask might fail.
- When you are done, click the "NEXT" button.
- Next, along the left side of the screen, you will see the responses of other users along with your own.
- [Instructor] Use this results page as a brief opportunity for discussion where students share about their choices and notes. How and why did they make those decisions?
- On the right side of the screen, you can supply advice about different facets of the study, the Study Team, the Participants, the Data Accumulation, or the Environmental Conditions. You can also skip giving advice and click "Open Checklist" directly.
- [Instructor] Remind students that offering advice is optional, but be sure to give students more time if they choose to do this.
- If you do provide advice, you will be taken to a screen to review advice you and other users have given. You can then click the "Open Checklist" button.
- [Instructor] For the checklist, prompt students to take a moment to consider what they will really need to remember/self-check in the future for their own experiments.
- On the checklist page, feel free to add custom checklist items to help you evaluate your own research using the text entry and "Add" button.
- When you are ready, click the "Download PDF" button to save this list as a PDF.
- [Instructor] This activity is fairly discussion-heavy, but feel free to wrap up the activity with any final thoughts or observations before segueing back into the lesson that picks up on the question of formally assessing if an experiment was properly masked.
Activity takeaway:
Masking issues can arise at multiple places throughout studies, but careful critique and creative solutions can overcome these issues.
Lesson 6: Analytical practices to mitigate bias
Lesson summary:
This lesson shifts the focus from data collection to analysis, and reveals how confirmation bias lurks in every post-collection decision. Using the StudentLife dataset as a sandbox, learners experience how easy it is to unearth a "significant" correlation and then face the subtle distinction between a descriptive pattern and a causal claim. The lesson contrasts exploratory and confirmatory research, highlights questionable practices (like HARKing, p-hacking, and the garden of forking paths), and offers antidotes like clear labeling of exploratory findings, pre-registered analysis plans, and transparent reporting.
Goal:
Master analytic discipline (distinguishing exploratory curiosity from confirmatory testing) to prevent your data-driven insights from drifting into bias-driven illusions.
Activity overview: (~5 min)
Students will work with a sample dataset and one of two preset hypotheses. They will evaluate the dataset to see if it supports their given hypothesis. They will ultimately discuss the implications of making analytic choices with a pre-determined hypothesis.
Heads up!
All participants will be randomly assigned to one of two groups. Half will be asked to demonstrate that daily activities do influence student outcomes, while the other will demonstrate that daily activities do not influence student outcomes.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-06/
Step-by-Step activity instructions:
- [Instructor] Students will be given 1 of 2 possible hypotheses to inform their thinking. If possible, let students stay in the dark about the existence of a counter-hypothesis their peers might have. Part of the "twist" of the experience is to see how far students will go from exploratory thinking to confirmatory thinking.
- You'll see a hypothesis at the top of the screen in relation to the broader research question: "Do daily activities affect student outcomes?"
- Looking at the visualization of the data provided, use the radio buttons in the 2 categories (Student outcomes and daily activities) to see what types of relationships you notice between a given activity and an outcome.
- [Instructor] While students are working, gently model/prompt the students to explore different choices and notice how selecting various combinations reveals different relationships.
- When you've noticed a relationship that connects to your given hypothesis, leave the radio dials on those 2 items, and then click the "SUBMIT" button.
- Next, you will be able to compare your input with your peers.
- [Instructor] When you lead a discussion, point out that different students had conflicting hypotheses. Discuss what connections they saw. If they selected choices that deliberately "demonstrated" that their hypothesis was "true," discuss how they decided to bypass selections that revealed data that countered their hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
It is easy to find a pattern in data when one goes looking for it. This practice is very important for exploratory analysis, but becomes problematic when conclusions are drawn.
Lesson 7: Data masking in machine learning models
Lesson summary:
This lesson shows that delegating analysis to algorithms does not free us from confirmation bias. In fact, machine-learning models can easily exploit any hint of data leakage. Using a Parkinson's-disease detector as a case study, the lesson shows how evaluating a model on the same patients used for training produces inflated accuracy. It is effectively a machine-learning analogue of unmasked treatment groups. The counter is rigorous data partitioning (i.e., training, validation, and test sets with zero overlap and representative variability) in addition to vigilance against subtler leaks (like iterative hyper-parameter tuning on the test set, duplicated samples, temporal dependence, or information-rich data augmentations). Without such safeguards, models seem to confirm our hopes in the lab yet falter in reality.
Goal:
Learn to detect and block data leakage so your machine-learning models deliver honest, generalizable performance rather than echoing built-in bias.
Activity overview: (~3 min)
Students will interact with a basic simulation of a machine learning experiment for detecting a condition like Parkinson's Disease, to understand the impacts of leakage on model results.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-07
Step-by-Step activity instructions:
- [Instructor] Students can work together or solo. Explain that each subset (A, B, C, D) represents data from 10 patients. Remind students that toggling a button assigns that subset to use in the Training or Testing set. Emphasize that their selections directly determine how the model is built and evaluated.
- Under Model Setup, take note of the Training Set and Test Set sections. Each section contains four toggle buttons labeled A, B, C, D. Toggling a button on includes that subset of data in the selected set. Toggling it off removes it from that set.
- In the first phase, your goal is to maximize the model's performance. Select all four subsets (A, B, C, D) under the Training Set. Select at least one subset (A, B, C, or D) under the Test Set. When a valid selection is made, the Train Model button will become enabled.
- Click the Train Model button to build your model. The model's performance results will appear in the Results table.
- You will receive a prompt to click NEXT to proceed. If desired, click Reset to clear your selections. You may make alternative Model Setup selections and train another model. Each additional trained model will add a new entry to the Results table.
- In the second phase, your goal is to perform a proper train/test split. Select at least one subset for the Training Set. Select at least one subset for the Test Set. Ensure there is no overlap between Training and Testing subsets. When a valid, non-overlapping selection is made, the Train Model button will become enabled.
- Click the Train Model button. The new model's performance will be added to the Results table.
- Review the models listed in the Results table. Each model includes its Test Accuracy.
- You will be prompted to answer a multiple-choice question: Select which model's Test Accuracy you believe is most indicative of its true accuracy on new data. The answer choices correspond to the models displayed in the Results table.
- Click Submit to lock in your selection. Your chosen model will be highlighted in the Results table. The Reveal Out-of-Sample Accuracy button will become enabled.
- Click Reveal Out-of-Sample Accuracy. An Out-of-Sample (OOS) Accuracy column will be added to the Results table. A Change column will also appear, indicating how performance differs from the Test Accuracy.
- You will receive a feedback prompt encouraging you to reflect on why some models' test accuracies differed from their out-of-sample performance.
- Click COMPLETE ACTIVITY to finish the activity.
- [Instructor] At this stage, lead discussion and have students share about their experience. Ask students what patterns they observed between Test Accuracy and Out-of-Sample Accuracy. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Reserving a share of the data for testing is a necessary component of building a reliable model.
Lesson 8: Beyond confirmation bias: additional disruptions to rigor
Lesson summary:
This lesson broadens the discussion to include additional biases that can distort science. It emphasizes the importance of adopting systematic safeguards to minimize the cumulative impact of these biases on research integrity.
Goal:
Build a comprehensive understanding of how various cognitive biases and rigor issues connect to each other and negatively impact research.
Activity overview: (~15 min)
Participants will map out a "bias ecosystem" by connecting different biases and discussing real-life examples where these interactions have led to research pitfalls.
Heads up!
This activity introduces a lot of terms for a variety of biases. If students have questions about any particular bias, you can learn more by searching the catalogue of biases.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-08
Step-by-Step activity instructions:
- [Instructor] Note that there is an option on the landing page to have students work individually and as a group. If you want to run it for a class to only see their peers' results, click the "AS A GROUP" button to receive a session link that each member of the class will need to use to have their results coordinate on the results page. If students select the "AS AN INDIVIDUAL" button, the results page at the end will show results from "global users" outside of your class.
- You should see boxes in the playable field with text describing different types of bias.
- You can click on any box (bias) to drag it to a different position within the field.
- You can click the center of any box to start drawing a line that can be connected to another box (bias).
- Once you connect 2 biases, click "Click to edit" to type in a description of how these biases are connected.
- You can connect more than one bias to another.
- Once you've completed your "bias map", you can click "EXPORT AS PNG" or "EXPORT AS PDF" to save.
- Click "SUBMIT" to advance to the next page and compare your bias map with other users.
- [Instructor] At this stage, lead a discussion and have students share about their experience and the connections they've made. Segue to the discussion questions in the unit to connect students to the next portion of the lesson.
Activity takeaway:
Different kinds of rigor issues and biases can replicate, cause, inform, or otherwise relate to one another.
Observations & final notes
Note for Instructors:
Each unit is estimated to comprise approximately 3 hours of instructional time (approx. 15-20 minutes per lesson), but variances in discussion length, student needs, experiences with the interactive activities, or instructor customization may yield different unit and lesson durations.
Concepts likely to challenge students:
- Students may have emotional reactions in places where the instructional content differs from their prior experience. This is okay! You're helping them to learn how to do more rigorous work.
- We advise letting disgruntled students express their points of disagreement, then gently encouraging them to consider why the materials might disagree with what they've been taught previously.
- Remember: There's nothing wrong with asking a student to hold on to their grievance to let you conclude the unit.
- Think we got it wrong? We want to improve! Email us at c4r@seas.upenn.edu.
- Differentiating between exploratory and confirmatory analysis and the pitfalls of "double dipping" in data.
- Recognizing subtle instances of unmasking in experiments, especially when biases operate at multiple levels.
- Designing research protocols that adequately control for the myriad ways confirmation bias can infiltrate decision-making.
- Navigating the technical details of machine learning modeling, particularly in parsing out different types of cross-validation.
Final Reminder for Instructors:
This guide is intended as a flexible framework for presenting the Confirmation Bias unit. Instructors are encouraged to adapt the content to best suit their teaching style and the needs of their students. Feel free to expand on any section, incorporate additional examples, or integrate further interactive elements. Remember, this is your presentation: use it as a starting point and customize it to best serve your teaching.