Aug 31, 2025
This article provides an overview of the ongoing replication crisis in science. I will continue to add relevant links and interesting cases as I find them.
TL;DR:
Research indicates that up to 50% of scientific publications (even more in some fields) are fundamentally flawed or unreliable. This issue is more prevalent in the humanities and social sciences than in hard sciences like physics.
Evidence also reveals networks of "scientists," publication houses, and reviewers who collaborate to publish fraudulent research, exchange favorable reviews, and cite each other's work to artificially boost their credibility and impact metrics.
Don't just take my word for it—I encourage you to read the actual studies on metascience and the particular cases yourself. I list some references at the end of this article. You can also explore the Wikipedia page on the replication crisis and examine its sources, beginning with Ioannidis (2005).
The state of science today is concerning. Members of both the scientific community and civil society should address these issues instead of blindly cheering for a non-existent "Team Science"—something all too common in science enthusiast circles influenced by popular science content creators on YouTube and TikTok.
The Long Story
The replication crisis is a massive, ongoing issue, widely known and actively discussed in the scientific community. I work closely with scientists, read numerous scientific papers, and have family members who are hard science researchers and professors at major universities. This gives me insight into the underside of academic research—and it's not pretty.
In theory, the scientific process is subject to rigorous peer-review and cross-checks from independent teams (replication). However, this is NOWHERE close to day-to-day reality. Independent replications are all but nonexistent outside a few niches (multi-center clinical trials, applied physics, and a few others).
The reason is simple: there's no incentive.
- Funding for straight replications is thin. Many programs won't reward a thesis that is only a direct replication, or they demand an "extension."
- Prestige journals prefer novelty. They won't waste precious publication real estate on boring replication studies.
- Replication papers usually collect a fraction of the citations that originals get, and even failed replications barely dent the original paper's citation count. If you replicate a well-known study successfully, no one will ever cite you; people will continue to cite the original work.
- However, if you try to replicate a study, and get a negative, you're risking coming under fire from the author and his buddies (that is one reason why it usually takes for the original authors to die before anyone even dares to question some studies). Academia is EXTREMELY politicized. There are factions, alliances, and constant dogs-under-carpet struggle. You don't want to make enemies unnecessarily.
Try putting yourself in the shoes of an average Joel Jr. the Scientist. Why would you choose to do a replication study that won't further your career even in the best-case scenario, and can hurt you a lot in the worst case?
Some initiatives to improve the situation do exist (for example, Registered Reports), but they're a small minority. No one bothers to read or cite them. Most research go on untested and unreplicated for years, decades, or forever. Even the famous studies that get thousands of citations.
There are counterexamples, e.g. 2023's room‑temperature superconductor claim (it was quickly replicated by multiple labs and found false), but it's the exception, not the rule, and it happened only because the claim was gigantic, with a lot of lucrative commercial applications.
Notorious Notable Cases
This section is a work in progress; check out the sources for more info
The Aerosol/Droplet Confusion and Its Impact on COVID-19 Pandemic
A striking recent example with far-reaching consequences is the aerosol/droplet confusion story. For decades, "scientific consensus" held that seasonal colds spread through droplets. Textbooks universally stated that microbial particles must be smaller than 5 µm (microns) to form aerosols and qualify for airborne transmission. This led to recommendations like "social distancing" and "don't touch your nose and eyes"—measures that proved ineffective against COVID-19. Epidemiologists were baffled by reports of people becoming infected without obvious contact with carriers, developing theories about "superspreaders" and "asymptomatic carriers." Eventually, someone asked the fundamental question: "Why do we actually think COVID spreads by droplets?"
A brief but intense struggle followed before the consensus shifted and recommendations were updated to emphasize FFP2-level masks for protection from airborne aerosols, rather than social distancing or disinfecting grocery packaging.
What was the source of this decades-long misunderstanding? The opinion of Alexander Langmuir, a highly influential CDC chief epidemiologist. He fiercely opposed the aerosol theory for years before conceding slightly, acknowledging that some diseases do spread by aerosols—but then arbitrarily setting the cutoff at 5 µm diameter in 1963. This figure was then mindlessly copied from textbook to textbook for nearly 60 years without anyone bothering to verify the data.
The situation becomes even more absurd when you consider that industrial protection and hygiene standards had long defined inhalable particles as being up to 100 µm (20 times larger than the medical "consensus") and included appropriate protective guidelines. No one bothered to cross-check and notice this glaring discrepancy. Public health agencies eventually updated their guidance, but only after months of contradictory messaging based on the incorrect cutoff. It ultimately required a dedicated investigation* to trace the error back to its 1963 source.
“Universe 25” by John Calhoun (Rat Utopia, Behavioral Sink)
The claim: Rats living in a "paradise" with all their needs met supposedly first multiplied, then developed anxiety, depression, aggression, and suicidal behaviors before their "society" ultimately collapsed and they all died off.
The popularity, citations, interpretations, implications: The experiment functioned as a Rorschach test. Calhoun actively encouraged human analogies, heavily anthropomorphizing the rats. The 1960s–70s fears about overpopulation and urban decay amplified its impact. The term "behavioral sink" spread through newspapers, textbooks, and politics. The story has recently resurfaced online as a cautionary tale about social media, declining fertility rates, and "civilizational rot," despite historians and science writers highlighting the contrived experimental setup and its questionable relevance to humans.
The replication: Multiple attempts failed to support the original claims. The experiment itself was flawed—the rat "utopia" was more of a rat torture chamber. Kessler (1966) achieved high, stable population densities without collapse by using different founder diversity and enclosure designs. Hammock (1971) observed behavioral changes from crowding but not the extinction pattern Calhoun reported. Historical analyses show that Calhoun's enclosure architecture and restricted movement pathways likely intensified aggression and segregation, demonstrating that density alone doesn't inevitably cause pathology. Human crowding research (e.g., Freedman 1975) similarly shows that effects depend on context and control factors.
“Rat Park” by Bruce Alexander
The claim: Rats housed in a large, enriched social environment ("Rat Park") consume far less oral morphine than isolated, caged rats. The experiment suggested that environment and social connection significantly reduce drug-seeking behavior, and that addiction is caused primarily by isolation rather than the drugs themselves.
The popularity and impact: This study became a pop-science staple, frequently cited in textbooks and anti-addiction campaigns, and continuously recycled in media explanations and comics. Academic perspectives are more measured, acknowledging that environmental enrichment matters while recognizing the original experiment's methodological limitations.
The replications: No successful direct replications exist. A 1996 internal attempt by Petrie failed to reproduce the effect. Methodological critiques highlight several problems: small sample sizes, the strong taste of oral morphine creating a confounding variable, inconsistent measurement methods between housing conditions, and equipment failures leading to data loss. A 2020 review found no successful direct replication, though broader conceptual research across species does support the more modest conclusion that social and physical enrichment can reduce certain drug-seeking behaviors.
Stanford Prison Experiment (SPE)
Claim: ordinary people rapidly become abusive guards; situation trumps disposition.
Replication record: no ethical way to run a direct replication. A conceptual test (BBC Prison Study, 2006) produced very different dynamics and emphasized leadership and group identity over automatic brutality.
Core criticisms: archival audio and documents indicate experimenter demand and coaching; non‑blinded roles; small N; unclear hypotheses; theatrical framing. A 2019 American Psychologist paper details archival findings and textbook distortions. Zimbardo disputes the critiques, but the evidentiary weight has shifted.
Dunning–Kruger Effect (DKE)
Claim: the least skilled dramatically overestimate their ability; experts underestimate.
Replication record: the pattern appears in many datasets, but newer analyses show it can arise from regression to the mean, measurement noise, and scale compression. Several papers model DKE as a mainly statistical artifact; others argue for a smaller, residual psychological effect once you control for artifacts.
Core criticisms: plotting methods inflate the effect; self‑assessment measures are noisy; range restriction; alternative models fit as well or better.
Current read: Overconfidence exists, but the iconic DKE graph is not proof of metacognitive deficit and can be fully explained by the flaws in the study design.
Pop-Sci Claims
This section draws primarily from the claims made in Daniel Kahneman's "Thinking, Fast and Slow." I focus on his work mainly because of his condescending tone that suggests, "You thought it works this way? Ha, you fool! The actual science shows otherwise! You can't argue with Science!" Well, let's see who's mistaken now.
Marshmallow Test (Delay of Gratification)
Replications: in a large, more diverse sample, the effect on later outcomes shrank substantially after controlling for socioeconomic and cognitive covariates.
Verdict: the simple causal story does not hold; delay behavior tracks broader family and cognitive factors.
Social Priming (e.g., elderly‑walking)
Replications: multiple failures, including a high‑visibility 2012 PLOS ONE replication with tighter controls.
Verdict: broad social‑priming claims are unreliable at scale.
Ego Depletion
Replications: Registered Replication Report across 23 labs found an effect indistinguishable from zero (d ≈ 0.04). Later efforts using different protocols remain mixed, with meta‑analyses noting bias in earlier literature.
Verdict: the classic “willpower as a depleting resource” story is not supported by high‑powered direct replications.
Power Posing
Replications: large preregistered replication failed on hormones and behavior, but consistently found increases in felt power; a coauthor (Carney) publicly disavowed the original claims.
Verdict: physiological and behavioral claims failed; a small, subjective “feel more powerful” effect remains plausible.
Sources
The Good
- — a compound claimed to be a room-temperature superconductor in 2023
The Bad
- — the story of aerosol/droplet confusion
- — Debunking the Stanford Prison Experiment
- — on social priming effect (or lack thereof)
- — Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women
- https://journals.sagepub.com/doi/10.1177/1745691616652873 — A Multilab Preregistered Replication of the Ego-Depletion Effect
- https://journals.sagepub.com/doi/10.1177/2515245918810225 — Many Labs 2: Investigating Variation in Replicability Across Samples and Settings (replication of 28 widely cited experiments in psychology)
The [Ugly] Metascience
- — Wikipedia doesn’t count as a reliable source, but it gives a good overview of the problem, and you can scroll to the sources section to read the publications that do count.
- — the 2005 study by John P. A. Ioannidis that started it all
- — this manifesto from 2017 proposes a path out of the dark place we’re in.
- — a study of the industrial-scale fraud circles in science
Navigation Popup
