the science of journaling
twelve gratitude rcts ranked by control rigour
twelve gratitude rcts ranked by what they controlled for. the effect collapses as rigour rises. the honest read on gratitude journaling research.
the science of journaling
twelve gratitude rcts ranked by what they controlled for. the effect collapses as rigour rises. the honest read on gratitude journaling research.
The gratitude prescription has hardened into a slogan. Three things, every morning, for happiness. Behind the slogan are roughly thirty years of randomised trials and three serious meta-analyses, and the meta-analyses do not say what the slogan says. Gratitude works, somewhat, against weak controls. Set it against any other writing exercise of the same length and most of the effect goes away. This post ranks twelve gratitude RCTs by the rigour of what they controlled for, then watches the effect size collapse as the controls get harder. The size of the claim should match the rigour of the test that produced it.
The story descends from Emmons and McCullough's 2003 study of counting blessings versus burdens. [5] Three short trials with modest effects. The number that travelled was Study 1's d ≈ 0.42 against a daily-hassles control. Two decades of consumer wellness writing has cited Emmons and stopped there.
The two meta-analyses that should have ended the conversation are Davis 2016 with thirty-two samples and Cregg and Cheavens 2021 with twenty-seven RCTs and N of 3,675. [3][2] They ask one methodological question that almost no consumer writing engages with. Effective compared to what? The phrasing is from Wood, Froh and Geraghty's 2010 review, the canonical critique. [8] If the comparison is do nothing, the effect is medium. If the comparison is do anything else of the same length, the effect is small. If the comparison is do another positive-psychology exercise of equal expectancy, the effect is roughly zero.
Trials get ranked by what gratitude was tested against.
The progression matters because the marketing claim, gratitude rewires your brain, is a tier-one claim made on tier-one evidence and quietly extended beyond what tier-three and tier-four trials support.
Twelve trials drawn from Cregg and Cheavens's 2021 corpus, ordered roughly by control rigour. Effect size is the absolute value of Hedges' g on depressive symptoms. The y-axis is in favour of gratitude.
| study (control type) | |Hedges' g| |
|---|---|
| Cheng 2015 (waitlist) | 0.64 |
| Booker 2017 (waitlist) | 0.45 |
| Southwell 2017 (waitlist) | 0.33 |
| O'Leary 2015 (waitlist) | 0.28 |
| Watkins 2015 (matched) | 0.6 |
| Lambert 2012 (matched) | 0.36 |
| Jackowska 2016 (matched) | 0.35 |
| Kerr 2015 (matched) | 0.34 |
| Manthey 2016 (active) | 0.22 |
| Mongrain 2012 (active) | 0.21 |
| Sergeant 2011 (active) | 0.05 |
| Lyubomirsky 2011 (active) | 0.02 |
The four largest trials by sample size sit in the active-control tier. Sergeant 2011 with N of 514, Manthey 2016 with N of 300, Lyubomirsky 2011 with N of 208, Mongrain 2012 with N of 190. All four return an absolute g of at most 0.22, and two return effects indistinguishable from zero. The smaller trials with weaker controls produce the headline numbers consumer wellness sites quote.
Cregg's pooled finding follows the chart exactly. Against waitlist controls, gratitude reduced depressive symptoms by g of −0.51, a medium effect. Against active controls matched on time and structure, the effect collapsed to g of −0.18. Removing two outliers (Geraghty 2010, Ki 2009) shrank the depression effect by 26% and rendered the anxiety effect non-significant.
Davis 2016 reached the same conclusion three years earlier, with cleaner numbers. Across thirty-two samples, gratitude beat measurement-only controls on psychological well-being by d of 0.31. Against psychologically active comparisons it was d of −0.03. After trim-and-fill correction for publication bias, the matched-activity edge collapsed to d of 0.02. The authors wrote, in plain prose, that gratitude interventions may operate primarily through placebo effects.
gratitude interventions had a medium effect when compared with waitlist-only conditions, but only a trivial effect when compared with putatively inert control conditions involving any kind of activity.
Dickens's 2017 series of fifty-six meta-analyses, drawing on a different study set, lands on the same conclusion. Well-being effects of d ≈ 0.31 against neutral controls drop to d ≈ 0.17 against active controls. [4] Three meta-analyses across a decade, with overlapping but not identical inclusion criteria, all converging on the same moderator. Control type explains most of the variance the consumer literature attributes to gratitude itself.
The strongest non-psychological signal is sleep. Boggiss's 2020 review found subjective sleep quality improved in five of eight RCTs that measured it. [1] The other physical-health outcomes (inflammation, blood pressure, glycaemic control) were equivocal or under-powered. The sleep finding is the one corner of the gratitude literature where a competent placebo control would still leave a genuine signal, and it points at the mechanism. A short bedtime list of three things, gratitude or otherwise, displaces pre-sleep cognitive arousal. Scullin's 2018 polysomnography trial randomised young adults to spend five minutes writing either a specific to-do list or a list of tasks already completed. [6] The to-do group fell asleep nine minutes faster, mean 15.8 against 25.1, Cohen's d of 0.63. What does the work is structuring attention before bed. Gratitude at bedtime evicts whatever was about to loop, the same job a planning list does. The signal echoes what survives in .
The reading of thirty years of trials is not that gratitude journaling does nothing. Against a deconditioned baseline, almost any structured positive writing exercise produces a small, real benefit. Gratitude is one structured exercise among several. It does not justify a daily ritual on the strength of a meta-analytic effect that disappears under a competent placebo.
What survives is humbler. Two minutes of attention to something that went well, embedded in , is worth doing. Pair it with the rest of a and ask compared to what. With the marketing claim quieter, the practice itself is what is left to argue about.