First, please create an account

Already have a Sophia account?

Scientific Replication

Author: Sophia

what's covered

In this lesson, you will explore how social psychology evaluates the reliability of its findings by examining different forms of replication, how replication results are interpreted, and the barriers that make replication challenging. Specifically, this lesson will cover:

1. The Replication Crisis
2. Types of Replication
3. Interpreting Replication Outcomes
4. Barriers to Replication in Social Psychology

1. The Replication Crisis

Transparency practices such as preregistration, replication, and open sharing emerged in response to growing concerns about the reliability of psychological research. As researchers began to look more closely at how often published findings could be reproduced, troubling patterns started to appear. Many influential results failed to hold up when tested again, raising fundamental questions about how knowledge in the field is generated and evaluated. These concerns coalesced into what is now widely known as the replication crisis.

The replication crisis refers to the discovery that many published findings in psychology, especially social psychology, are not reliably reproduced when other researchers repeat the same studies. Large collaborative efforts, such as the Open Science Collaboration’s Reproducibility Project, attempted to replicate 100 psychology studies and found that only about 36% produced statistically significant results the second time around, and the effects that were replicated were often much smaller than the original reports (Open Science Collaboration, 2015).

Social psychology was hit particularly hard because many of its classic findings involved subtle, context-dependent effects that are highly sensitive to small changes in mood, setting, or instructions. The crisis revealed how easily results can be distorted by questionable research practices (QRPs) such as optional stopping (collecting data until a desired result appears), selectively reporting only significant outcomes, or presenting post hoc explanations as if they were planned in advance. Although these behaviors are usually motivated by the pressure to publish rather than intentional fraud, they greatly increase the risk of false positives, making effects seem stronger or more reliable than they really are.

Study: “A Multilab Preregistered Replication of the Ego-Depletion Effect”

One of the most famous examples tied to the replication crisis is the ego depletion theory, the idea that self-control works like a limited resource that can be “used up.” Early studies suggested that people who exerted effort on one task (like resisting cookies) would perform worse on a second self-control task (like solving puzzles). These studies were widely cited and became a core idea in textbooks. But when an international team of labs conducted a large preregistered replication with more than 2,000 participants, they found no evidence for the effect. This led researchers to question whether willpower truly “runs out” or whether earlier findings were inflated by small samples, publication bias, and flexible analytic decisions. The case is now taught as a powerful example of why replication matters and how the field learns and evolves when results don’t hold up (Hagger et al., 2016).

Several structural issues in the field amplified the problem. Many classic studies relied on small sample sizes (often fewer than 100 participants), which leads to low statistical power and inflated effect sizes. Publication bias further distorted the scientific record because journals tended to publish only positive, surprising findings, while “null results” were quietly filed away, creating what researchers call the file-drawer problem. This helped early reports of social psychological phenomena appear far more robust than later evidence supported. When teams attempted direct replications of these effects using identical materials and procedures, many failed to appear, and confidence intervals often overlapped zero.

IN CONTEXT
The Bem Precognition Case

In 2011, social psychologist Daryl Bem published a series of experiments suggesting that people could anticipate future events, a claim of “precognition.” His findings generated enormous interest, yet when other researchers attempted to replicate his results using the same materials and statistical methods, they failed. These replication attempts revealed how easily research can produce false positives when sample sizes are small, analyses are flexible, or only significant results get published.

The inability to reproduce Bem’s effects became a catalyst for psychology’s reflection and reform, helping catalyze what’s now known as the replication crisis. This moment, though unsettling, has been one of growth for the field. Replication is a normal and essential part of science’s self-correcting process, and failures are opportunities to refine theories, improve methods, and clarify when and where effects truly occur.

terms to know

Questionable Research Practices: Research behaviors that fall short of misconduct but still bias results.
Ego Depletion Theory: The idea that self-control works like a limited resource that becomes “used up,” making it harder to exert willpower on later tasks.
Publication Bias: The tendency for journals to publish studies with significant or exciting results while ignoring studies that find no effect, which skews the scientific record.
File-Drawer Problem: When studies with null or nonsignificant findings remain unpublished.

2. Types of Replication

The replication crisis brought new attention to how social psychologists evaluate whether their findings are reliable, making replication a central part of modern research reform. A direct replication attempts to repeat an earlier study as closely as possible, using the same materials, instructions, procedures, and participant type, to determine whether the original effect was a true, stable finding rather than a statistical fluke or an artifact of unique conditions (Diener & Biswas-Diener, 2021). This strategy works especially well for classic studies with clear protocols, such as Asch’s conformity experiments, where direct replications have historically produced similar error rates of about one third, supporting the effect’s reliability (Diener & Biswas-Diener, 2021). However, direct replications also have limits: If cultural norms or social attitudes shift over time, an effect may be harder to reproduce even when methods are identical.

To complement direct tests, psychologists also rely on conceptual replications, which aim to reproduce the underlying idea of a study rather than its exact procedures. Conceptual replications modify nonessential elements, such as switching from photographs to videos or from a lab task to an online platform, while keeping the theoretical mechanism intact, which helps researchers learn whether an effect generalizes across different settings or stimuli (Diener & Biswas-Diener, 2021). These are especially important in social psychology, where behavior is highly sensitive to context, norms, and subtle cues.

Asset Name: SOPA06

IN CONTEXT
Many Labs Replication of the Facial Trustworthiness Effect

A well-known example comes from the Many Labs projects, which included attempts to replicate the “facial trustworthiness” effect (the finding that people make rapid, consistent judgments about strangers based on very subtle facial cues). In the original studies, participants rated neutral faces on trust, and results suggested strong agreement and predictable patterns. Many Labs teams across dozens of international sites performed direct replications using the same facial stimuli and rating scales, while others conducted conceptual replications using different faces, video clips, or cross-cultural samples.

The result? Some consistency appeared across labs, but the effect was far weaker than originally reported, and in several countries, the patterns were not replicated at all. Conceptual replications revealed that cultural differences in emotion norms and unfamiliarity with certain facial types influenced judgments, showing that the effect was not a universal psychological process but a culturally moderated one. This example illustrates how both replication strategies are essential: Direct replications test reliability, while conceptual replications reveal why results differ across settings (Klein et al., 2014).

Despite the initial concern it created, the replication crisis ultimately strengthened scientific practice. Researchers now place greater emphasis on external replications (conducted by independent teams) to reduce laboratory-specific biases and use sequential evidence-building, where studies accumulate until statistical criteria indicate sufficient confidence.

terms to know

External Replications: Replications carried out by independent researchers or labs that were not part of the original study.
Sequential Evidence-Building: A research approach where scientists accumulate evidence across multiple studies.

3. Interpreting Replication Outcomes

Understanding replication results requires moving beyond a simple “passed” or “failed” judgment. Social psychology, in particular, emphasizes patterns in effect sizes, confidence intervals, and contextual sensitivity. Even successful replications often show smaller effect sizes than the originals because early studies, especially small-sample studies, tend to inflate effects, a pattern known as small-study bias.

EXAMPLE

If an original effect was d = 0.80 but a replication finds d = 0.40, the smaller estimate may still support the phenomenon as long as the replication’s confidence interval overlaps the original and excludes zero, indicating that the effect is real but weaker than first claimed. This emphasizes that replication outcomes must be interpreted by comparing these quantitative details rather than relying on p-values alone (Diener & Biswas-Diener, 2021).

Large-scale replication attempts, such as the Open Science Collaboration’s project with 100 psychology studies, show that around one third to one half of published findings are replicated, depending on the field (Open Science Collaboration, 2015). However, replication “failures” do not always mean a finding is false. Context sensitivity—the differences in participant populations, cultural norms, motivation levels, and timing—can substantially influence whether an effect appears, especially in social psychology, where effects are subtle and situation dependent.

Interpreting replications also means assessing practical significance. A replication result may be statistically significant but so small that it lacks real-world importance, while a nonsignificant result might still suggest a meaningful pattern if the effect aligns with the predicted direction and falls within a theoretically expected range. Many researchers now rely on prediction intervals, which evaluate how compatible a replication’s effect is with the range of plausible true effects rather than treating significance as a yes/no criterion. The focus shifts to cumulative evidence across multiple studies, not the outcome of any single experiment (Diener & Biswas-Diener, 2021).

terms to know

Small-Study Bias: The tendency for studies with small sample sizes to report larger or more exaggerated effects.
Context Sensitivity: The idea that a psychological effect depends on the specific conditions under which a study is conducted, so the effect may appear in some situations but not in others.
Prediction Intervals: A statistical range that indicates where future replication results are expected to fall.

4. Barriers to Replication in Social Psychology

Replicating studies is essential for scientific progress, yet multiple structural, cultural, and practical barriers make replication especially difficult in social psychology. Academic publishing systems tend to reward novel, surprising, or “exciting” findings over verification, meaning that journals rarely accept straightforward replications unless they overturn a major theory. This creates strong incentives for researchers to pursue original studies rather than invest time in replications, which often receive fewer citations and less visibility, contributing little to career advancement. Career pressures, such as tenure requirements and the value placed on high-impact publications, discourage early-career researchers from conducting replications because they are seen as “low-reward, high-effort” projects.

Replication also demands substantial resources. Many social psychological effects require large sample sizes (often hundreds of participants) to ensure adequate statistical power, which is costly for a single lab to achieve. Underpowered studies were common historically, contributing both to inflated initial effects and later replication failures. Funding agencies likewise tend to prioritize new, innovative projects over replications, leaving few grants available for verification work. Some replications require recreating highly specific experimental contexts with exact timing, specialized stimuli, and particular settings, which can be difficult or impossible in a new lab (Diener & Biswas-Diener, 2021).

try it

Replication is essential for confirming research findings, but in social psychology, it can be hard to carry out. Several barriers make researchers less likely to prioritize replication work.

Why do social psychologists often struggle to conduct replication studies, even though replication is crucial for validating scientific findings?

Replication studies are difficult in social psychology because the academic system prioritizes novel and high-impact research over verification, leaving little career incentive or funding for replications. Large sample requirements, resource constraints, and the technical difficulty of exactly reproducing experimental conditions further discourage researchers from investing in replication work.

Social and interpersonal dynamics also create obstacles. Replicating a mentor’s or colleague’s study (and especially failing to replicate it) can damage professional relationships. Some researchers fear being labeled part of the “replication police,” a stigma that can deter them from attempting high-profile replications, particularly in tight academic subfields. Additionally, powerful questionable research practices (QRPs) such as optional stopping or selective reporting have historically yielded publishable but fragile effects; replicators often lack full access to original materials or analytic code, making exact reproduction difficult.

Diagram of four barriers to replication: focus on novel findings, career disincentives, resource constraints, professional stigmas.

To address these barriers, psychology has begun implementing structural reforms that shift incentives toward transparency and verification. Registered Replication Reports (RRRs) are multi-lab replications conducted using a shared, preregistered protocol that is peer-reviewed before data collection. Their results are published together regardless of outcome, increasing rigor and strengthening the credibility of key psychological findings (Diener & Biswas-Diener, 2021). Journals now award open science badges for sharing materials, datasets, and preregistrations to encourage reproducibility. Funding bodies, such as the Templeton Foundation, have created grants dedicated to large-scale replication projects. Collaborative models like Many Labs reduce the resource burden by distributing data collection across dozens of institutions, demonstrating how shared labor makes otherwise massive replications possible (Open Science Collaboration, 2015). Together, these efforts aim to build what some scholars call an “antifragile science”, a system that becomes stronger, not weaker, when challenged by failures or scrutiny.

term to know

Registered Replication Reports (RRR): A publication format in which many independent research teams follow the same, preregistered protocol to replicate an important psychological finding, producing one combined report that offers a highly reliable test of whether the original effect is real.

summary

In this lesson, you learned that the replication crisis has pushed social psychology to rethink how it evaluates scientific claims by emphasizing different types of replication, including direct attempts to closely repeat a study and conceptual tests that examine whether the same idea holds under new conditions. Because interpreting replication outcomes requires looking beyond simple “success or failure,” researchers now compare effect sizes, confidence intervals, and contextual factors to understand when findings are stable, weaker than originally reported, or sensitive to specific situations. At the same time, barriers to replication in social psychology make it challenging to conduct the rigorous follow-ups needed to strengthen the field.

SOURCE: THIS TUTORIAL HAS BEEN ADAPTED FROM 1. “PSYCHOLOGY 2E” BY SPIELMAN, R. M., JENKINS, W. J., & LOVETT, M. D, ACCESS FOR FREE AT OPENSTAX.ORG/DETAILS/BOOKS/PSYCHOLOGY-2E. 2. “SOCIAL PSYCHOLOGY” BY CROYLE, J., & SIGNORELLA, M. L. (N.D.), AT PENNSTATE. ACCESS FOR FREE AT PSU.PB.UNIZIN.ORG/SOCIALPSYCHMETHODSJMC948/CHAPTER/INTRODUCTION/. LICENSING: CREATIVE COMMONS ATTRIBUTION 4.0 INTERNATIONAL.

REFERENCES

Diener, E., & Biswas-Diener, R. (2021). The replication crisis in psychology. In R. Biswas-Diener & E. Diener (Eds.), Noba textbook series: Psychology. DEF Publishers. nobaproject.com/modules/the-replication-crisis-in-psychology

Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., Brand, R., Brandt, M. J., Brewer, G., Bruyneel, S., Calvillo, D. P., Campbell, B. C., Cannon, P. R., Carlucci, M., Carruth, N., Cheung, T., Crowell, A., De Ridder, D. T. D., Dewitte, S., … Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573. doi.org/10.1177/1745691616652873

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716. doi.org/10.1126/science.aac4716

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Jr., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Cicero, D. C., Coleman, J. A., Conway, M. A., Corker, K. S., Curran, P. G., Cushman, F., … Nosek, B. A. (2014). Investigating variation in replicability: A “Many Labs” replication project. Social Psychology, 45(3), 142–152. doi.org/10.1027/1864-9335/a000178

Terms to Know

Context Sensitivity: The idea that a psychological effect depends on the specific conditions under which a study is conducted, so the effect may appear in some situations but not in others.
Ego Depletion Theory: The idea that self-control works like a limited resource that becomes “used up,” making it harder to exert willpower on later tasks.
External Replications: Replications carried out by independent researchers or labs that were not part of the original study.
File-Drawer Problem: When studies with null or nonsignificant findings remain unpublished.
Prediction Intervals: A statistical range that indicates where future replication results are expected to fall.
Publication Bias: The tendency for journals to publish studies with significant or exciting results while ignoring studies that find no effect, which skews the scientific record.
Questionable Research Practices: Research behaviors that fall short of misconduct but still bias results.
Registered Replication Reports (RRR): A publication format in which many independent research teams follow the same, preregistered protocol to replicate an important psychological finding, producing one combined report that offers a highly reliable test of whether the original effect is real.
Sequential Evidence-Building: A research approach where scientists accumulate evidence across multiple studies.
Small-Study Bias: The tendency for studies with small sample sizes to report larger or more exaggerated effects.

First, please create an account

Scientific Replication

Table of Contents

1. The Replication Crisis

2. Types of Replication

3. Interpreting Replication Outcomes

4. Barriers to Replication in Social Psychology