Welcome to Randomization!
Most of us have encountered the idea that randomization is important for proper experimental design, but may have less clear guidance on how to randomize or why randomization is so important. We also might be distracted by common appeals to the “randomness” of winning lottery numbers or other patterned behaviors that feel like pure chance.
Randomization in experimental design is about systematically and thoughtfully imposing an order onto our treatment of variables. Specifically, randomization is a series of actions that reduce the interference of confounding variables and disturbance variables as you study causal relationships. Intentional action like this doesn’t feel very random, and that’s on purpose.
This unit will help you examine the potential ramifications of not controlling for confounding factors, which would skew your results and prevent you from collecting useful data. They also supply an opportunity to explore a few randomization methods that can be applied in your own research.
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Lesson title
Lesson description
Instructor Guide: Randomization Unit
This instructor guide is designed as a flexible framework to help you deliver the Randomization unit effectively.
- Each lesson is summarized with key takeaways and includes cues for additional, step-by-step directions for activities.
- Use the video and supplementary slide-by-slide annotations with speaker notes provided to quickly familiarize yourself with the material, streamline your lesson preparation, and enhance classroom discussions.
- Feel free to adapt and customize the content to fit your teaching style and your students' needs. Get access to the slides here, then navigate to File -> Make a Copy to get started.
Overview & Introduction
Summary:
This unit on randomization explores the critical role that randomization plays in experimental design, examining how proper randomization techniques can reduce bias and improve study validity. By understanding how to implement randomization correctly at every stage of research, students will learn essential strategies to design more rigorous, transparent, and reproducible studies. This unit is ideal for early-career researchers and advanced students who wish to strengthen their methodological skills and safeguard against experimental bias.
Why use this unit:
- This unit on randomization equips students with a fundamental understanding of how randomization helps to contain bias and strengthen causal inference in research.
- Each lesson blends theoretical insights with practical activities, ensuring that learners not only recognize the importance of randomization but can also implement various randomization strategies in their work.
- Real-world examples and interactive activities encourage students to reflect on their own research practices and develop more robust experimental designs.
Lesson Overviews
Lesson 1: If you don't randomize you don't know if it is real
Lesson summary:
This lesson introduces the fundamental concept of randomization through an example study of lithium treatment in an ALS mouse model. Students learn how non-random assignment led to confounding between disease stage and treatment groups, compromising the study's conclusions.
Key takeaway:
Students should come away understanding that without proper randomization, confounding variables (both known and unknown) can invalidate experimental conclusions, making it impossible to determine if observed effects are "real."
Activity overview: (~5 minutes)
Students analyze experimental data to identify potential relationships between treatment assignment, survival, and other variables in the ALS mouse model example. They'll discover how disease stage became a confounding variable when mice were selected based on ease of catching.
Link to activity:
https://smi-ran-why-ran-v4.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Direct students to open the activity link to analyze the experimental data from the ALS mouse study.
- Students are assigned different variables to explore their relationships with treatment assignment and survival.
- Encourage students to look for patterns that might explain the apparent treatment effect.
- Guide students to discover that mice assigned to the lithium treatment group were predominantly in earlier disease stages (longer "time to fall" on rotarod).
- Discuss how this confounding occurred because mice in earlier disease stages were harder to catch, so they were more likely to be assigned to treatment after easier-to-catch mice were assigned to control.
- [Instructor] Lead a discussion about how proper randomization would have distributed mice of all disease stages evenly between treatment groups.
Activity takeaway:
Students should recognize how non-random assignment created systematic differences between treatment groups that confounded the results, making it impossible to know if the treatment effect was real.
Lesson 2: Randomization in the wild: avoiding common mistakes
Lesson summary:
This lesson examines common misconceptions about randomization and reveals how methods that seem random may actually introduce systematic bias. Students learn to distinguish between true randomization and other allocation approaches like alternation or manual allocation.
Key takeaway:
Students should understand that truly random allocation sequences appear different from human-generated "random" sequences, and that proper randomization implementation is critical for eliminating bias.
Activity #1 overview: (~5 minutes)
Students will examine three different allocation sequences, all claimed to be "randomized" in published studies, and determine which one represents true randomization.
Link to activity:
https://smi-ran-all-seqa-v1.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Ask students to open the activity link and examine the three different allocation sequences.
- Have students analyze each sequence to determine which one represents true randomization.
- Discuss the characteristics of each sequence (pattern, balance, runs of the same value).
- Reveal to students that Sequence 3 shows genuine randomization, despite having runs and imbalances that might seem "non-random" to human intuition.
- Point out that Sequence 1 (perfect alternation) and Sequence 2 (human-generated) both contain patterns that make them predictable.
- [Instructor] Discuss real-world consequences of non-random allocation shown in examples from the lesson.
Activity takeaway:
Students should recognize that their intuitions about what "looks random" may be incorrect, and understand the key features that distinguish true randomization from other allocation methods.
Activity #2 overview: (~3 minutes)
Students will examine three different allocation sequences, alternation, manual allocation, and simple randomization, and compare patterns in the effect size, p-values, and balance between treatment groups.
Link to activity:
https://smi-ran-all-seqb-v1.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Ask students to open the activity link and examine the three different allocation sequences.
- Have students analyze each sequence to determine how the effect size, p-values, and group balance patterns differ.
- Discuss the characteristics of each sequence - are there concerns about how they differ from the true effect?
- What patterns for these sequences differ at scale - are long runs in the simple randomization sequence as prominent?
- Do the students believe that a larger N will reduce bias for the alternating and manual allocations?
- [Instructor] Discuss real-world consequences of non-random allocation shown in examples from the lesson.
Activity takeaway:
Students should recognize that alternation and manual allocation can lead to bias - inflated effect sizes or false positive findings. They might also identify that the impact of the long runs they noted in activity 1 seem to be less impactful at a higher scale.
Lesson 3: Choosing the best randomization method
Lesson summary:
This lesson guides students through the decision-making process for selecting the most appropriate randomization method based on study characteristics. It introduces three main approaches: simple randomization, block randomization, and stratified randomization.
Key takeaway:
Students should learn to assess their study design, sample size, and potential sources of variation to select the optimal randomization approach for their specific research context.
Activity overview: (~5 minutes)
Students will work through a flowchart to determine which randomization method is most appropriate for their research or a case study.
Link to activity:
https://smi-ran-ran-flo-v1.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Introduce the study options: students either use a study of their own, a study they know well, or use the case study.
- Have students consider the study characteristics and potential sources of bias for their chosen study.
- Guide students through the flowchart, answering questions about sample size, important covariates, or sources of bias..
- Based on their answers, students will arrive at a recommendation for which randomization approach is most appropriate.
- [Instructor] Facilitate a discussion about why the recommended method is suitable for the specific study characteristics.
Activity takeaway:
Students should understand the factors that influence the choice of randomization method and be able to apply this decision framework to their own research designs.
Lesson 4: Simple randomization
Lesson summary:
This lesson explores simple randomization, which is the most basic randomization technique and is analogous to flipping a coin for each subject. Students learn about the benefits and limitations of simple randomization, particularly in smaller studies.
Key takeaway:
Students should understand how simple randomization provides the highest level of unpredictability but may result in unbalanced group sizes, especially in smaller studies.
Activity #1 overview: (~3 minutes)
Students will simulate the outcomes of simple randomization using a roulette wheel analogy to see the probability of achieving balanced group sizes with 24 mice.
Link to activity:
https://smi-ran-rou-whe-v0.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Have students open the activity link to simulate spinning a roulette wheel 24 times (representing 24 mice in the study).
- Guide students to observe how many "black" spins they get out of 24, representing one treatment group.
- Repeat the simulation multiple times to show the variation in outcomes.
- Discuss the probability of getting exactly 12 black spins (balanced groups) versus more unbalanced outcomes.
- Extend to simulations with larger sample sizes (100+) to demonstrate the law of large numbers.
- [Instructor] Lead discussion about what problems the students predict for studies with uneven group sizes.
Activity takeaway:
Students should understand that simple randomization cannot guarantee balanced group sizes, especially in smaller studies, which is an important consideration when choosing a randomization method.
Activity #2 overview: (~5 minutes)
Students will determine which research tasks should be handled by masked versus unmasked team members to maintain the integrity of randomization in the NMDA receptor antagonist study.
Link to activity:
https://smi-ran-simple-ran-v1.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Introduce the concept of masking (blinding) in research and explain the two teams:
- Team A (Access): Has access to treatment allocation information
- Team B (Masked): Remains masked to which mouse receives which treatment
- Direct students to the activity where they'll assign various research tasks to the appropriate team.
- Have students work individually or in small groups to consider each task and the potential biases that could arise if performed by the wrong team.
- For each task, students should:
- Consider the type of bias it might introduce (selection, performance, observer)
- Decide which team should handle it to minimize bias
- Provide reasoning for their choice
- Key tasks students will assign include:
- Generating the random treatment allocation sequence
- Preparing coded syringes with drugs or saline
- Assessing and recording maze performance of mice
- [Instructor] After completion, facilitate a discussion about why certain tasks must be performed by specific teams:
- Why must Team A prepare the syringes? (to maintain masking)
- Why must Team B conduct behavioral assessments? (to prevent observer bias)
- Why separation of duties is crucial even in small lab settings
- Discuss practical implementation challenges researchers might face in maintaining proper masking procedures.
Activity takeaway:
Students should understand that proper masking requires thoughtful assignment of research tasks to different personnel. They should recognize that effective randomization benefits are preserved only when combined with appropriate masking procedures that prevent selection, performance, and observer biases from influencing the results.
Lesson 5: Block randomization
Lesson summary:
This lesson introduces block randomization as a method to ensure balance in group sizes throughout the study. Students learn how blocking by time, space, or procedural factors can mitigate various sources of environmental variation.
Key takeaway:
Students should understand how block randomization creates "mini-experiments" that maintain balance between treatment groups while controlling for temporal, spatial, or procedural sources of variation.
Activity overview: (~3 minutes)
Students will practice generating block randomization sequences and understand how different block sizes and sample sizes impact treatment allocation.
Link to activity:
https://smi-ran-blk-ran-v4.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Guide students to open the activity link.
- Have students experiment with different parameters:
- Number of treatment groups
- Block size
- Target sample size
- Encourage students to observe how the generated sequences maintain balance within each block.
- [Instructor] Lead a discussion about how small versus large block sizes or small versus large sample sizes might impact balance or predictability.
Activity takeaway:
Students should understand how to implement block randomization and recognize its benefits for controlling environmental variation while maintaining balanced group sizes.
Lesson 6: Stratified randomization
Lesson summary:
This lesson focuses on stratified randomization as a technique to balance important covariates across treatment groups. Students learn how stratification can minimize the influence of known prognostic factors on study outcomes.
Key takeaway:
Students should understand how stratified randomization ensures balance of important baseline characteristics across treatment groups, improving the precision of treatment effect estimates.
Activity overview: (~5 minutes)
Students will explore different stratification approaches for a zebrafish aggression study and see how various stratification choices impact the estimated treatment effect.
Link to activity:
https://smi-ran-str-ran-v1.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Have students open the activity link to explore stratification strategies for the zebrafish study.
- Students randomize the zebrafish using a block randomization approach to create two equally sized treatment groups.
- Guide students to consider which variables (sex, family ID, size, swimming speed) are most important to balance.
- Students play with stratification choices and pick one they believe is the most appropriate approach.
- Have students explore outputs showing how different stratification approaches affect the study results.
- [Instructor] Lead a discussion about the trade-offs between not stratifying and overstratification. Guide them to select a stratification choice they think strikes a balance.
Activity takeaway:
Students should understand how to identify and prioritize stratification variables, recognize the dangers of overstratification, and appreciate how stratification affects treatment effect estimates.
Lesson 7: Randomization beyond treatment assignment
Lesson summary:
This lesson expands the concept of randomization beyond just treatment allocation to encompass all aspects of experimental design. Students learn how randomizing time-based, space-based, and personnel-based factors can reduce various sources of bias.
Key takeaway:
Students should understand that comprehensive randomization strategies should address temporal, spatial, and personnel sources of variation to strengthen study validity.
Activity overview: (~5 minutes)
Students will identify additional opportunities for randomization in a spatial memory study with mice, beyond treatment allocation.
Link to activity:
https://smi-ran-ran-lab-v0.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Have students read about the spatial memory in mice study and review the study flowchart.
- Guide students to identify potential time effects, space effects, and personnel effects that could influence study outcomes.
- Encourage students to brainstorm randomization strategies to address each source of variation.
- Guide students to prioritize their randomization possibilities according to how impactful the randomization step would be in countering bias and how challenging the randomization step would be to implement.
- Compare their answers to the rest of the class and discuss practical implementation challenges and how to prioritize randomization efforts.
- [Instructor] Lead a discussion about balancing methodological rigor with practical constraints in laboratory settings.
Activity takeaway:
Students should recognize that randomization extends beyond treatment assignment and develop strategies to implement comprehensive randomization in their own research.
Lesson 8: Randomization in the literature
Lesson summary:
This final lesson examines common pitfalls in how randomization methods are reported in published research. Students learn about reporting guidelines and best practices for transparent documentation of randomization procedures.
Key takeaway:
Students should understand the importance of detailed reporting of randomization techniques for transparency, reproducibility, and proper evaluation of research quality.
Activity overview: (~5 minutes)
Students will evaluate methods sections from published papers and identify missing or inadequate details about randomization procedures.
Link to activity:
https://smi-ran-ran-lit-v1-ten.vercel.app/
Step-by-Step activity instructions:
- [Instructor] Have students open the activity link to review published methods sections.
- Guide students to identify methodology questions they have for the authors about their randomization procedures.
- Have students drag and drop their questions to the corresponding buckets that relate to the reporting items recommended by the ARRIVE Guidelines.
- Review the improved methods sections that now include all items from ARRIVE.
- [Instructor] Have students discuss which missing items were most important to their interpretation of the study. Discuss how inadequate reporting affects the ability to evaluate study quality and reproducibility.
- [Instructor] Introduce established reporting guidelines like ARRIVE and CONSORT that set standards for thorough documentation of randomization.
Activity takeaway:
Students should recognize common deficiencies in randomization reporting and learn to document their own randomization procedures thoroughly for transparency and reproducibility.
Supplementary Video Presentation
Video recording overview:
A video recording of this unit being presented is provided via the link below. This resource includes detailed, slide-by-slide annotations and relevant speaker notes designed to help you understand the presentation flow, key discussion points, and ensure you have all the insights needed to guide your class effectively.
Link to video:
[Insert Video URL]
Link to slides:
[Insert Slides URL]
Note for Instructors:
Each unit is estimated to comprise approximately 3 hours of instructional time (approx. 15 minutes per lesson), but variances in discussion length, student needs, experiences with the interactive activities, or instructor customization may yield different unit and lesson durations.
Observations & Final Notes
Concepts likely to challenge students:
- Understanding that true randomization can produce sequences that don't "look random" to human intuition, including runs and imbalances.
- Distinguishing between different randomization methods and selecting the most appropriate one for a specific research context.
- Recognizing the need to account for blocking and stratification in statistical analysis.
- Implementing comprehensive randomization strategies that address all potential sources of bias beyond just treatment allocation.
- Balancing methodological rigor with practical constraints in laboratory settings.
- Think we got it wrong? We want to improve! Email us at c4r@seas.upenn.edu.
Key terms to emphasize:
- Confounding variables/confounders
- Selection bias
- Performance bias
- Observer bias
- Simple randomization
- Block randomization
- Stratified randomization
- Allocation concealment
- Overstratification
- Masking/blinding
Final reminder for instructors:
This guide is intended as a flexible framework for presenting the Randomization unit. Instructors are encouraged to adapt the content to best suit their teaching style and the needs of their students. Feel free to expand on any section, incorporate additional examples, or integrate further interactive elements. Remember, this is your presentation: use it as a starting point and customize it to best serve your teaching.
References:
Randomization
References: Randomization
Updated May 30, 2025.
Alandra, K. (2023). Introductory Statistics. Bentham Science Publishers; eBook Academic Collection (EBSCOhost). https://research.ebsco.com/linkprocessor/plink?id=f370fade-dbfd-34bc-bfe1-164c0d49d614
Albrechet-Souza, L., Cristina De Carvalho, M., Rodrigues Franci, C., & Brandão, M. L. (2007). Increases in plasma corticosterone and stretched-attend postures in rats naive and previously exposed to the elevated plus-maze are sensitive to the anxiolytic-like effects of midazolam. Hormones and Behavior, 52(2), 267–273. https://doi.org/10.1016/j.yhbeh.2007.05.002
APA Dictionary of Psychology. (n.d.). Retrieved May 23, 2025, from https://dictionary.apa.org/
Athey, S., Imbens, G. W., & Wager, S. (2018). Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4), 597–623. https://doi.org/10.1111/rssb.12268
Bobbitt, Z. (2022, August 16). How (And When) to Use set.seed in R. Statology. https://www.statology.org/set-seed-in-r/
Broglio, K. (2018). Randomization in Clinical Trials: Permuted Blocks and Stratification. JAMA, 319(21), 2223–2224. https://doi.org/10.1001/jama.2018.6360
Chan, A.-W., Tetzlaff, J. M., Altman, D. G., Laupacis, A., Gøtzsche, P. C., Krleža-Jerić, K., Hróbjartsson, A., Mann, H., Dickersin, K., Berlin, J. A., Doré, C. J., Parulekar, W. R., Summerskill, W. S. M., Groves, T., Schulz, K. F., Sox, H. C., Rockhold, F. W., Rennie, D., & Moher, D. (2013). SPIRIT 2013 Statement: Defining Standard Protocol Items for Clinical Trials. Annals of Internal Medicine, 158(3), 200–207. https://doi.org/10.7326/0003-4819-158-3-201302050-00583
Cramer, D., & Howitt, D. (2004). The SAGE Dictionary of Statistics. SAGE Publications, Ltd. https://doi.org/10.4135/9780857020123
Create a blocked randomisation list | Sealed Envelope. (n.d.). Retrieved May 13, 2025, from https://www.sealedenvelope.com/simple-randomiser/v1/lists
EQUATOR Network | Enhancing the QUAlity and Transparency Of Health Research. (n.d.). Retrieved May 23, 2025, from https://www.equator-network.org/
Fornai, F., Longone, P., Cafaro, L., Kastsiuchenka, O., Ferrucci, M., Manca, M. L., Lazzeri, G., Spalloni, A., Bellio, N., Lenzi, P., Modugno, N., Siciliano, G., Isidoro, C., Murri, L., Ruggieri, S., & Paparelli, A. (2008). Lithium delays progression of amyotrophic lateral sclerosis. Proceedings of the National Academy of Sciences, 105(6), 2052–2057. https://doi.org/10.1073/pnas.0708022105
Hilgers, R.-D., Manolov, M., Heussen, N., & Rosenberger, W. F. (2020). Design and analysis of stratified clinical trials in the presence of bias. Statistical Methods in Medical Research, 29(6), 1715–1727. https://doi.org/10.1177/0962280219846146
Huang, W., Percie Du Sert, N., Vollert, J., & Rice, A. S. C. (2019). General Principles of Preclinical Study Design. In A. Bespalov, M. C. Michel, & T. Steckler (Eds.), Good Research Practice in Non-Clinical Pharmacology and Biomedicine (Vol. 257, pp. 55–69). Springer International Publishing. https://doi.org/10.1007/164_2019_277
Imbens, G. W. (2020). Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics. Journal of Economic Literature, 58(4), 1129–1179. https://doi.org/10.1257/jel.20191597
Kang, M., Ragan, B. G., & Park, J.-H. (2008). Issues in Outcomes Research: An Overview of Randomization Techniques for Clinical Trials. Journal of Athletic Training, 43(2), 215–221. https://doi.org/10.4085/1062-6050-43.2.215
Kernan, W. N., Viscoli, C. M., Makuch, R. W., Brass, L. M., & Horwitz, R. I. (1999). Stratified randomization for clinical trials. Journal of Clinical Epidemiology, 52(1), 19–26. https://doi.org/10.1016/s0895-4356(98)00138-3
Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M., & Altman, D. G. (2010). Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research. PLoS Biology, 8(6), e1000412. https://doi.org/10.1371/journal.pbio.1000412
Knippenberg, S., Thau, N., Dengler, R., & Petri, S. (2010). Significance of behavioural tests in a transgenic mouse model of amyotrophic lateral sclerosis (ALS). Behavioural Brain Research, 213(1), 82–87. https://doi.org/10.1016/j.bbr.2010.04.042
Korngreen, A., Ma, W., Priel, Z., & Silberberg, S. D. (1998). Extracellular ATP directly gates a cation‐selective channel in rabbit airway ciliated epithelial cells. The Journal of Physiology, 508(3), 703–720. https://doi.org/10.1111/j.1469-7793.1998.703bp.x
Lachin, J. M. (1988a). Properties of simple randomization in clinical trials. Controlled Clinical Trials, 9(4), 312–326. https://doi.org/10.1016/0197-2456(88)90046-3
Lachin, J. M. (1988b). Statistical properties of randomization in clinical trials. Controlled Clinical Trials, 9(4), 289–311. https://doi.org/10.1016/0197-2456(88)90045-1
Lachin, J. M., Matts, J. P., & Wei, L. J. (1988). Randomization in clinical trials: conclusions and recommendations. Controlled Clinical Trials, 9(4), 365–374. https://doi.org/10.1016/0197-2456(88)90049-9
Law, J. (n.d.). A Dictionary of Science (Oxford Paperback Reference) 6th edition by Martin, Elizabeth A. (2010) Paperback. Oxford University Press.
Matts, J. P., & Lachin, J. M. (1988). Properties of permuted-block randomization in clinical trials. Controlled Clinical Trials, 9(4), 327–344. https://doi.org/10.1016/0197-2456(88)90047-5
Monaghan, T. F., Agudelo, C. W., Rahman, S. N., Wein, A. J., Lazar, J. M., Everaert, K., & Dmochowski, R. R. (2021). Blinding in Clinical Trials: Seeing the Big Picture. Medicina (Kaunas, Lithuania), 57(7), 647. https://doi.org/10.3390/medicina57070647
Moraes, A. B., Giacomini, A. C. V. V., Genario, R., Marcon, L., Scolari, N., Bueno, B. W., Demin, K. A., Amstislavskaya, T. G., Strekalova, T., Soares, M. C., De Abreu, M. S., & Kalueff, A. V. (2021). Pro-social and anxiolytic-like behavior following a single 24-h exposure to 17β-estradiol in adult male zebrafish. Neuroscience Letters, 747, 135591. https://doi.org/10.1016/j.neulet.2020.135591
Nevalainen, T. (2014). Animal Husbandry and Experimental Design. ILAR Journal, 55(3), 392–398. https://doi.org/10.1093/ilar/ilu035
Percie du Sert, N., Hurst, V., Ahluwalia, A., Alam, S., Avey, M. T., Baker, M., Browne, W. J., Clark, A., Cuthill, I. C., Dirnagl, U., Emerson, M., Garner, P., Holgate, S. T., Howells, D. W., Karp, N. A., Lazic, S. E., Lidster, K., MacCallum, C. J., Macleod, M., … Würbel, H. (2020). The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. BMC Veterinary Research, 16(1), 242. https://doi.org/10.1186/s12917-020-02451-y
Retraction: “An Overview of Randomization Techniques: An Unbiased Assessment of Outcome in Clinical Research.” (2023). Journal of Human Reproductive Sciences, 16(1), 87. https://doi.org/10.4103/0974-1208.170593
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
Schulz, K. F., Altman, D. G., Moher, D., & for the CONSORT Group. (2010). CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials. PLoS Medicine, 7(3), e1000251. https://doi.org/10.1371/journal.pmed.1000251
Schulz, K. F., Chalmers, I., Altman, D. G., Grimes, D. A., Moher, D., & Hayes, R. J. (2018). “Allocation concealment”: the evolution and adoption of a methodological term. Journal of the Royal Society of Medicine, 111(6), 216–224. https://doi.org/10.1177/0141076818776604
Schulz, K. F., & Grimes, D. A. (2002). Generation of allocation sequences in randomised trials: chance, not choice. The Lancet, 359(9305), 515–519. https://doi.org/10.1016/S0140-6736(02)07683-3
Silcocks, P. (2012). How many strata in an RCT? A flexible approach. British Journal of Cancer, 106(7), 1259–1261. https://doi.org/10.1038/bjc.2012.84
Sorge, R. E., Martin, L. J., Isbester, K. A., Sotocinal, S. G., Rosen, S., Tuttle, A. H., Wieskopf, J. S., Acland, E. L., Dokova, A., Kadoura, B., Leger, P., Mapplebeck, J. C. S., McPhail, M., Delaney, A., Wigerblad, G., Schumann, A. P., Quinn, T., Frasnelli, J., Svensson, C. I., … Mogil, J. S. (2014). Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nature Methods, 11(6), 629–632. https://doi.org/10.1038/nmeth.2935
Suresh, K. (2011). An overview of randomization techniques: An unbiased assessment of outcome in clinical research. Journal of Human Reproductive Sciences, 4(1), 8. https://doi.org/10.4103/0974-1208.82352
Verhave, P. S., van Eenige, R., & Tiebosch, I. (2024). Methods for applying blinding and randomisation in animal experiments. Laboratory Animals, 58(5), 419–426. https://doi.org/10.1177/00236772241272991
Instructor guide:
Randomization
This instructor guide is designed as a flexible framework to help you deliver the Confirmation Bias unit effectively.
- Each lesson is summarized with key takeaways and includes cues for additional, step-by-step directions for activities.
- Use the video and supplementary slide-by-slide annotations with speaker notes provided to familiarize yourself with the material, streamline your lesson preparation, and enhance classroom discussions.
- Feel free to adapt and customize the content to fit your teaching style and your students' needs. Get access to the slides here, then navigate to File -> Make a Copy to get started.
Overview and Introduction
Summary:
This unit on confirmation bias explores the subtle yet powerful ways that our predisposition to privilege our prior beliefs can influence every stage of scientific research. By understanding how confirmation bias can distort experimental design, data collection, analysis, and interpretation, students will learn essential strategies to design more rigorous, transparent, and reproducible studies. This unit is ideal for early-career researchers and advanced students who wish to strengthen their critical thinking skills and safeguard against biased reasoning.
Why use this unit:
- This unit on confirmation bias equips students with a fundamental understanding of cognitive biases, especially confirmation bias, which is often the root of many research errors.
- Each lesson blends theoretical insights with practical activities, ensuring that learners not only recognize bias but can also implement strategies to minimize its impact in their work.
- Real-world examples and interactive activities encourage students to reflect on their own decision-making processes and develop more robust research practices.
Lesson 1: Our biased brains
Lesson summary:
This lesson introduces confirmation bias as a fundamental error in cognition. It details how initial beliefs influence the gathering and interpretation of information. The classic Wason task (1960) illustrates the pervasive nature of this bias, and highlights the need for critical evaluation in research.
Goal:
Build an intuition around identifying cognitive bias and taking steps to mitigate it.
Activity overview: (~3-5 minutes)
Participants engage in a modified version of the Wason 2-4-6 task to experience firsthand the important distinction between trying to falsify vs trying to support an initial hypothesis.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-01/
Step-by-Step activity instructions:
- [Instructor] Prompt students to start typing numbers into the 3 input fields. Remind them that they will either see "TRUE" or "FALSE" when they hit the "TEST" button — this indicates if their number sequences match the "secret rule" for this session. Note: Each user will have a random "secret rule," so if you'd like pairs or teams to work together on the same one, have a representative be the only one who pulls up the activity/have them share a single device for that group.
- Type numbers into the 3 input boxes, then hit the "TEST" button to see if your guess is "true" or "false" in regards to the "secret rule."
- Repeat until you're ready to draft a hypothesis.
- Click the "GUESS THE NUMBER RULE" button to reveal an input box.
- Type your hypothesis for what the "secret rule" is into the input box.
- Click "SUBMIT" to discover what the "secret rule" is and if you guessed correctly.
- Click "CONTINUE" to review your guesses in relation to the guesses of others.
- [Instructor] Spend time with students reviewing their guesses. Make sure that by the end of the discussion, there is a conversation about falsifying your hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
A hypothesis is only as strong as the effort spent to try and refute it.
Lesson 2: Formulating rigorous hypotheses
Lesson summary:
This lesson focuses on the pitfalls of designing experiments that fail to rigorously test hypotheses. Instructors will discuss the importance of developing falsifiable and specific hypotheses, in order to minimize the opportunities for confirmation bias to unduly influence hypothesis testing.
Goal:
Learn to design specific, falsifiable, and contextual hypotheses so that experiments provide evidence that helps support well-justified conclusions.
Activity overview: (~10 minutes)
Students will work solo or in small groups to improve a hypothesis and study design. The activity guides students to make choices to assess the conditions under which a hypothesized effect will be true.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-02a/
Step-by-Step activity instructions:
- [Instructor] Students will see the research study they are going to improve. Encourage students to read the displayed Hypothesis and Study Design carefully before proceeding.
- The screen displays the original Hypothesis and Study Design. Click NEXT to advance to Screen 2.
- You will see that you are revising the Hypothesis of the research study. Multiple revised hypothesis options are provided. Guide students to select the hypothesis that best demonstrates the principles of specificity and falsifiability.
- Click Submit to confirm the selection. Feedback will appear explaining the strengths or weaknesses of the selected hypothesis. If you selected the best choice, the selection is disabled and the NEXT button is enabled. If you selected a different choice, you can revise your choice and then re-submit.
- Once the best hypothesis is selected, click NEXT to advance to Screen 3.
- You will now revise the Study Design of the research study. The original hypothesis will be replaced with the improved hypothesis selected on Screen 2, with an indicator that the hypothesis was improved.
- Select an improved Study Design.
- Click Submit to confirm the selection. Feedback about your choice will appear. If you selected the best choice, the selection is disabled and the NEXT button is enabled. If you selected a different choice, you can revise your choice and then re-submit.
- Click NEXT to advance to the Congrats screen.
- [Instructor] Lead discussion while on the results screen. Discuss how changes to the Study Design influenced alignment with the hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Rigorous research requires a specific, falsifiable hypothesis and a study design that tests it.
Lesson 3: Researcher degrees of freedom
Lesson summary:
This lesson unpacks researcher degrees of freedom. These are the myriad design and analysis choices that quietly steer findings toward a favored conclusion. Using an fMRI case study, it shows how post-hoc region selection ("double dipping"), ill-matched controls, and participant pre-screening each bias results toward supporting a localized neural-correlate hypothesis (H₁). The discussion then broadens to general researcher choices and argues that, without forethought, each becomes a conduit for confirmation bias. Countermeasures include formulating specific, falsifiable hypotheses, pre-registering protocols, masking and randomizing, and transparently reporting all results.
Goal:
Recognize and constrain researcher degrees of freedom so experimental choices illuminate truth instead of amplifying confirmation bias.
Activity overview: (~10 minutes)
In this session, learners are given a case study illustrating various choices that can lead to bias. They will then select the choice that produces the largest bias in favor of the provided hypothesis, H1. If you have a small group, consider having a single person show their screen to all participants so that the group can debate these "would you rather" prompts as they go. If you have a larger group, consider dividing them into small groups.
Heads up!
This activity uses some technical language that may be unfamiliar to those without experience in fMRI research. The point of using this specific example isn't to intimidate you or your students; it's to drive home the point that we can still find rigor issues even if we aren't deeply familiar with a subject area. Here is a glossary of terms and how they affect hypothesis testing:
- Multiple comparison correction does not favor H1, but may not be ideal in obtaining sufficient power for the study if many voxels will be examined.
- Analyzing the most correlated voxel strongly favors H1 because it double-dips, using the same data to determine the parameters of the analysis and to perform the analysis.
- Stopping data collection early favors H1, because it conditions on the outcome; the amount of data collected will depend on the interim results.
- A simple control seems like a good idea, but adding in another kind of picture (landscapes), could be an unnecessary source of noise.
- Analysis Parameter: Smoothing favors H1 because it loosens the parameters of the analysis to find a promising result.
- Personality screen strongly favors H1 because it selects a biased sample that seems aligned with that hypothesis.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-03
Step-by-Step activity instructions:
- [Instructor] You will have the option of sharing the activity with your students via QR Code or a direct link.
- Read the H1 for the fMRI case study: "Emotional pictures produce stronger BOLD responses than neutral pictures in the amygdala."
- Then read the 2 choices provided and click on the option you believe would most strongly bias the results towards supporting H1.
- Then type out your reasoning for why this action would bias the experiment in favor of H1 in the input box and click the "ROUND 2" button.
- Repeat, then click the "ROUND 3" button.
- Repeat, then click the "SUBMIT" button.
- Compare your results with those of your peers on the final page. You can toggle through everyone's answers and discuss your thought process.
- [Instructor] Lead discussion while on the results screen. Be sure to toggle between the different responses provided in each round. Hover over the visualization to see a pop-up with the text description of the choices. Scroll up and down to reveal student responses. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Some well-intended research practices introduce bias into our experiments.
Lesson 4: Mitigating bias through masking
Lesson summary:
This lesson explains masking (also called blinding) as a crucial method to reduce observer bias in experiments. Practical examples and step-by-step protocols demonstrate how rigorous masking improves accuracy and reliability of research findings. Students are invited to explore the problematic impact bias has on distorting the effect size of experiment results.
Goal:
Explore the design and implementation of rigorous masking protocols to ensure data collection and analysis remain free from observer bias, a specific type of confirmation bias.
Activity overview: (~1-2 minutes)
Students will explore the impact of bias on spurious results in scientific research.
Heads up!
This activity uses some technical terms. Here are some quick definitions for each of these terms:
- Sample Size: The number of observations or data points included in a study or experiment. Note that sample size should be reported for all levels: typically, the numbers of subjects, assays, and datapoints. Larger sample sizes generally increase the reliability and statistical power of a study.
- Bias Amount: A measure of systematic error introduced into a study, which can distort the results away from the true effect.
- True Effect Size (d): A quantitative measure of the magnitude of the difference or relationship being studied. A larger effect size means a stronger relationship or difference between groups.
- Probability of detecting a significant result: The likelihood that an observed result is statistically significant, meaning it is unlikely to have occurred by random chance.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-04/
Step-by-Step activity instructions:
- [Instructor] It might make the most sense to treat this activity as an extended discussion. Students can still work solo, but the exploration works best with prompting about how different configurations influence effect size while segueing into experiences students have had in their labs or when reading published papers.
- You are provided with two sliding bars. One represents "Sample size" and the other represents "Bias" in a given experiment.
- Click on each dot to move it along its respective sliders.
- Try different pairings to see how these factors distort the relationship between true effect size and the probability of statistical significance.
- The relationships you explore will be visualized in the graph below the 2 sliders.
- [Instructor] If you haven't been a part of the discussion thus far, make sure you lead a brief discussion, and connect this exploration with the next segment of the unit, showcasing the impact of not masking and not randomizing on effect sizes.
Activity takeaway:
A failure to mask can introduce substantial risk of spurious results.
Lesson 5: How good is your mask?
Lesson summary:
This lesson explores the challenges associated with accidental unmasking and the assessment of masking effectiveness in experimental settings. Strategies for identifying subtle cues and preventing unmasking are discussed to ensure that research outcomes remain as unbiased as possible. Students are given the opportunity to develop a take-home tool for use in their own research.
Goal:
Develop an awareness of potential unmasking cues and how to safeguard experimental outcomes through effective masking.
Activity overview: (~12-15 minutes)
Participants will review an experimental scenario and identify practices with the potential to lead to accidental unmasking. They will then propose solutions to improve masking procedures. Make sure you give all participants ample time to read the experiment, and encourage them to use the hint function to proceed if they get stuck.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-05/
Step-by-Step activity instructions:
- [Instructor] Students will have the option to complete this activity as an individual or group. If you want to run it for a class to only see their peers' results, click the "AS A GROUP" button to receive a session link that each member of the class will need to use to have their results coordinate on the results page. If students select the "AS AN INDIVIDUAL" button, the results page at the end will show results from "global users" outside of your class.
- Review the details of the behavioral study on sleep patterns and, solo or with your group, think through all the ways this study might be accidentally unmasked.
- Type your observations into the input box.
- [Instructor] Prompt students to use the hint feature and chat about options with each other to continue thinking about risk points to unmask.
- Click "Get Hint" whenever you feel stuck to help you think of different ways the mask might fail.
- When you are done, click the "NEXT" button.
- Next, along the left side of the screen, you will see the responses of other users along with your own.
- [Instructor] Use this results page as a brief opportunity for discussion where students share about their choices and notes. How and why did they make those decisions?
- On the right side of the screen, you can supply advice about different facets of the study, the Study Team, the Participants, the Data Accumulation, or the Environmental Conditions. You can also skip giving advice and click "Open Checklist" directly.
- [Instructor] Remind students that offering advice is optional, but be sure to give students more time if they choose to do this.
- If you do provide advice, you will be taken to a screen to review advice you and other users have given. You can then click the "Open Checklist" button.
- [Instructor] For the checklist, prompt students to take a moment to consider what they will really need to remember/self-check in the future for their own experiments.
- On the checklist page, feel free to add custom checklist items to help you evaluate your own research using the text entry and "Add" button.
- When you are ready, click the "Download PDF" button to save this list as a PDF.
- [Instructor] This activity is fairly discussion-heavy, but feel free to wrap up the activity with any final thoughts or observations before segueing back into the lesson that picks up on the question of formally assessing if an experiment was properly masked.
Activity takeaway:
Masking issues can arise at multiple places throughout studies, but careful critique and creative solutions can overcome these issues.
Lesson 6: Analytical practices to mitigate bias
Lesson summary:
This lesson shifts the focus from data collection to analysis, and reveals how confirmation bias lurks in every post-collection decision. Using the StudentLife dataset as a sandbox, learners experience how easy it is to unearth a "significant" correlation and then face the subtle distinction between a descriptive pattern and a causal claim. The lesson contrasts exploratory and confirmatory research, highlights questionable practices (like HARKing, p-hacking, and the garden of forking paths), and offers antidotes like clear labeling of exploratory findings, pre-registered analysis plans, and transparent reporting.
Goal:
Master analytic discipline (distinguishing exploratory curiosity from confirmatory testing) to prevent your data-driven insights from drifting into bias-driven illusions.
Activity overview: (~5 min)
Students will work with a sample dataset and one of two preset hypotheses. They will evaluate the dataset to see if it supports their given hypothesis. They will ultimately discuss the implications of making analytic choices with a pre-determined hypothesis.
Heads up!
All participants will be randomly assigned to one of two groups. Half will be asked to demonstrate that daily activities do influence student outcomes, while the other will demonstrate that daily activities do not influence student outcomes.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-06/
Step-by-Step activity instructions:
- [Instructor] Students will be given 1 of 2 possible hypotheses to inform their thinking. If possible, let students stay in the dark about the existence of a counter-hypothesis their peers might have. Part of the "twist" of the experience is to see how far students will go from exploratory thinking to confirmatory thinking.
- You'll see a hypothesis at the top of the screen in relation to the broader research question: "Do daily activities affect student outcomes?"
- Looking at the visualization of the data provided, use the radio buttons in the 2 categories (Student outcomes and daily activities) to see what types of relationships you notice between a given activity and an outcome.
- [Instructor] While students are working, gently model/prompt the students to explore different choices and notice how selecting various combinations reveals different relationships.
- When you've noticed a relationship that connects to your given hypothesis, leave the radio dials on those 2 items, and then click the "SUBMIT" button.
- Next, you will be able to compare your input with your peers.
- [Instructor] When you lead a discussion, point out that different students had conflicting hypotheses. Discuss what connections they saw. If they selected choices that deliberately "demonstrated" that their hypothesis was "true," discuss how they decided to bypass selections that revealed data that countered their hypothesis. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
It is easy to find a pattern in data when one goes looking for it. This practice is very important for exploratory analysis, but becomes problematic when conclusions are drawn.
Lesson 7: Data masking in machine learning models
Lesson summary:
This lesson shows that delegating analysis to algorithms does not free us from confirmation bias. In fact, machine-learning models can easily exploit any hint of data leakage. Using a Parkinson's-disease detector as a case study, the lesson shows how evaluating a model on the same patients used for training produces inflated accuracy. It is effectively a machine-learning analogue of unmasked treatment groups. The counter is rigorous data partitioning (i.e., training, validation, and test sets with zero overlap and representative variability) in addition to vigilance against subtler leaks (like iterative hyper-parameter tuning on the test set, duplicated samples, temporal dependence, or information-rich data augmentations). Without such safeguards, models seem to confirm our hopes in the lab yet falter in reality.
Goal:
Learn to detect and block data leakage so your machine-learning models deliver honest, generalizable performance rather than echoing built-in bias.
Activity overview: (~3 min)
Students will interact with a basic simulation of a machine learning experiment for detecting a condition like Parkinson's Disease, to understand the impacts of leakage on model results.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-07
Step-by-Step activity instructions:
- [Instructor] Students can work together or solo. Explain that each subset (A, B, C, D) represents data from 10 patients. Remind students that toggling a button assigns that subset to use in the Training or Testing set. Emphasize that their selections directly determine how the model is built and evaluated.
- Under Model Setup, take note of the Training Set and Test Set sections. Each section contains four toggle buttons labeled A, B, C, D. Toggling a button on includes that subset of data in the selected set. Toggling it off removes it from that set.
- In the first phase, your goal is to maximize the model's performance. Select all four subsets (A, B, C, D) under the Training Set. Select at least one subset (A, B, C, or D) under the Test Set. When a valid selection is made, the Train Model button will become enabled.
- Click the Train Model button to build your model. The model's performance results will appear in the Results table.
- You will receive a prompt to click NEXT to proceed. If desired, click Reset to clear your selections. You may make alternative Model Setup selections and train another model. Each additional trained model will add a new entry to the Results table.
- In the second phase, your goal is to perform a proper train/test split. Select at least one subset for the Training Set. Select at least one subset for the Test Set. Ensure there is no overlap between Training and Testing subsets. When a valid, non-overlapping selection is made, the Train Model button will become enabled.
- Click the Train Model button. The new model's performance will be added to the Results table.
- Review the models listed in the Results table. Each model includes its Test Accuracy.
- You will be prompted to answer a multiple-choice question: Select which model's Test Accuracy you believe is most indicative of its true accuracy on new data. The answer choices correspond to the models displayed in the Results table.
- Click Submit to lock in your selection. Your chosen model will be highlighted in the Results table. The Reveal Out-of-Sample Accuracy button will become enabled.
- Click Reveal Out-of-Sample Accuracy. An Out-of-Sample (OOS) Accuracy column will be added to the Results table. A Change column will also appear, indicating how performance differs from the Test Accuracy.
- You will receive a feedback prompt encouraging you to reflect on why some models' test accuracies differed from their out-of-sample performance.
- Click COMPLETE ACTIVITY to finish the activity.
- [Instructor] At this stage, lead discussion and have students share about their experience. Ask students what patterns they observed between Test Accuracy and Out-of-Sample Accuracy. Use the discussion questions in the unit to help facilitate discussion.
Activity takeaway:
Reserving a share of the data for testing is a necessary component of building a reliable model.
Lesson 8: Beyond confirmation bias: additional disruptions to rigor
Lesson summary:
This lesson broadens the discussion to include additional biases that can distort science. It emphasizes the importance of adopting systematic safeguards to minimize the cumulative impact of these biases on research integrity.
Goal:
Build a comprehensive understanding of how various cognitive biases and rigor issues connect to each other and negatively impact research.
Activity overview: (~15 min)
Participants will map out a "bias ecosystem" by connecting different biases and discussing real-life examples where these interactions have led to research pitfalls.
Heads up!
This activity introduces a lot of terms for a variety of biases. If students have questions about any particular bias, you can learn more by searching the catalogue of biases.
Link to activity:
https://monolith-test-1.vercel.app/activities/hms-cbi-08
Step-by-Step activity instructions:
- [Instructor] Note that there is an option on the landing page to have students work individually and as a group. If you want to run it for a class to only see their peers' results, click the "AS A GROUP" button to receive a session link that each member of the class will need to use to have their results coordinate on the results page. If students select the "AS AN INDIVIDUAL" button, the results page at the end will show results from "global users" outside of your class.
- You should see boxes in the playable field with text describing different types of bias.
- You can click on any box (bias) to drag it to a different position within the field.
- You can click the center of any box to start drawing a line that can be connected to another box (bias).
- Once you connect 2 biases, click "Click to edit" to type in a description of how these biases are connected.
- You can connect more than one bias to another.
- Once you've completed your "bias map", you can click "EXPORT AS PNG" or "EXPORT AS PDF" to save.
- Click "SUBMIT" to advance to the next page and compare your bias map with other users.
- [Instructor] At this stage, lead a discussion and have students share about their experience and the connections they've made. Segue to the discussion questions in the unit to connect students to the next portion of the lesson.
Activity takeaway:
Different kinds of rigor issues and biases can replicate, cause, inform, or otherwise relate to one another.
Observations & final notes
Note for Instructors:
Each unit is estimated to comprise approximately 3 hours of instructional time (approx. 15-20 minutes per lesson), but variances in discussion length, student needs, experiences with the interactive activities, or instructor customization may yield different unit and lesson durations.
Concepts likely to challenge students:
- Students may have emotional reactions in places where the instructional content differs from their prior experience. This is okay! You're helping them to learn how to do more rigorous work.
- We advise letting disgruntled students express their points of disagreement, then gently encouraging them to consider why the materials might disagree with what they've been taught previously.
- Remember: There's nothing wrong with asking a student to hold on to their grievance to let you conclude the unit.
- Think we got it wrong? We want to improve! Email us at c4r@seas.upenn.edu.
- Differentiating between exploratory and confirmatory analysis and the pitfalls of "double dipping" in data.
- Recognizing subtle instances of unmasking in experiments, especially when biases operate at multiple levels.
- Designing research protocols that adequately control for the myriad ways confirmation bias can infiltrate decision-making.
- Navigating the technical details of machine learning modeling, particularly in parsing out different types of cross-validation.
Final Reminder for Instructors:
This guide is intended as a flexible framework for presenting the Confirmation Bias unit. Instructors are encouraged to adapt the content to best suit their teaching style and the needs of their students. Feel free to expand on any section, incorporate additional examples, or integrate further interactive elements. Remember, this is your presentation: use it as a starting point and customize it to best serve your teaching.