the science of journaling
the pennebaker effect at forty
the canonical journaling claim shrank as the methods got better. an honest read on forty years of expressive-writing meta-analyses, from smyth to reinhold.
Forty years ago a small randomised trial in the Journal of Abnormal Psychology asked forty-six undergraduates to write about a personal trauma for fifteen minutes on four consecutive evenings. Six months later, students in the trauma-and-feelings condition had visited the campus health centre roughly half as often as the trivial-topic controls. Pennebaker and Beall called the result promising rather than definitive, F(3, 42) = 2.74, p = .055 by their own report. [3] The qualifier did not survive the next four decades. By the time the protocol reached the wellness press, promising had become expressive writing improves your immune system, processes trauma, rewires depression. Forty years and four serious meta-analyses later, the careful summary is narrower and more interesting than either the headline or its sceptical mirror.
the headline number that nobody quotes
The most-cited single number in the journaling literature is Smyth's 1998 average effect size, d = 0.47, pooled across thirteen randomised studies and 806 participants. [6] Wellness writers shorthand this as medium effect and stop. The number that almost no one quotes is the second meta-analysis. Frattaroli's 2006 random-effects pooling of one hundred forty-six experimental disclosure studies and 10,994 participants reported an overall correlation of r = .075, equivalent to d ≈ 0.151. [1] Four times the studies, thirteen times the sample, roughly a third of the effect.
from a promising fluke to a canonical claim
The arc from Pennebaker 1986 to Smyth 1998 is the standard early-career trajectory of a paradigm. A small first study with one borderline interaction opens a question the field had not asked. Can a brief writing assignment move a health outcome at all? Through the late eighties and nineties the protocol was repeated against trivial-topic controls in healthy undergraduates. By Smyth's pooling, thirteen of those trials produced an unweighted d of 0.47, with the largest individual buckets in physiological functioning (d = 0.68) and psychological wellbeing (d = 0.66). [6] Neither session count nor session length moderated the effect, both p > .10. The number was real, the protocol was simple, and the claim that writing about emotion produces measurable health effects hardened into the canonical version a decade before the field had pooled enough studies to test it properly.
the shrinkage
Then the literature did what literatures do. Methods improved. Sample sizes grew. The pool widened to include populations and outcomes the original protocol had not been validated for. The pooled effect compressed.
| study | |pooled d-equivalent| |
|---|---|
| Smyth 1998 | 0.47 |
| Frattaroli 2006 | 0.15 |
| Frisina 2004 | 0.19 |
| Travagin 2015 | 0.13 |
| Reinhold 2018 | 0.03 |
The cleanest apples-to-apples comparison sits inside the chart. Frattaroli's 2006 depression-specific bucket reported r = .073, equivalent to d ≈ 0.15. Reinhold and colleagues, twelve years later, ran a multilevel meta-analysis on thirty-nine RCTs of expressive writing on depressive symptoms in physically healthy adults. The long-term effect at a mean six-month follow-up was g = −0.03, ninety-five percent confidence interval [−0.16, 0.09]. [5] A small significant effect at immediate post-test (g = −0.09 in the direction favouring writing) faded by the first follow-up. Trim-and-fill estimated zero missing studies. There is no publication-bias artefact inflating the early numbers; the depression effect on healthy adults simply did not survive long-term follow-up under stricter pooling.
the heterogeneity the headline number hides
Frattaroli's overall r of .075 is a poor description of any individual outcome. Subjective impact ratings, the participant's own retrospective judgement of the writing experience, ran r = .159. Reported health symptoms ran r = .072. Physiological functioning ran r = .060. Psychological health ran r = .056. Health behaviours, the only outcome bucket where someone might check whether you smoked less or exercised more, ran r = .007 and did not clear significance across ten studies. [1] The protocol moves how participants feel about the writing exercise more than it moves their measurable health.
Spending 20 min a day for 3 days on an independent writing activity producing an effect halfway between small and medium is, in this author's opinion, quite impressive.
The author of the largest meta-analysis ever conducted on the protocol describes the effect as halfway between small and medium under optimal conditions on the largest of her outcome buckets. Read at that size, the literature is neither the wellness slogan nor the dismissal.
the population the recent meta-analyses excluded
Reinhold's null long-term result on depression looks at first like a clean refutation. The footnote complicates it. The 2018 sample explicitly excluded participants with PTSD and excluded studies of physical illness, on the principled ground that those subgroups are interventions in their own right and pool poorly with self-help-style designs. [5] Those subgroups happen to be exactly where Smyth and Frattaroli reported their largest effects. Trauma populations and somatic-illness samples. Part of the shrinkage is genuine methodological progress, and part of it is the deliberate exclusion of the subgroups where the original signal lived loudest.
The shrinkage, read carefully, is not a single story. It is a clarification of the question. Smyth and Frattaroli pooled across the corpus the field had been studying since 1986. Healthy undergraduates writing about trauma. Cancer patients writing about diagnosis. Caregivers, sleep-disordered breathers, fibromyalgia samples. The pooled d describes what happens on average across that mixed corpus, with the heavy-hitting clinical subgroups doing the arithmetic work. Reinhold restricted the sample to physically healthy adults, treated PTSD as a clinical-intervention question in its own right, and asked the narrower contemporary version. Does brief self-directed writing help non-clinical adults with depression at six-month follow-up? The answer was null. The earlier answer was small to medium. The two are not in contradiction. They are answers to different questions about different samples, and the honest reading of the literature has to keep both questions in view.
The adolescent literature points the same way. Travagin and colleagues pooled twenty-one expressive-writing studies in youth aged ten to eighteen and found an overall g of 0.127, with the largest signal where the dosage was highest and the writing topic most specific. [7] Frisina's earlier meta-analysis of clinical populations reported d = 0.19, with a planned contrast showing physical-health outcomes (d = 0.21) outpacing psychological ones (d = 0.07, non-significant). [2] The pooled effect is small across the populations the field has studied, and what survives most reliably is the somatic signal rather than the mood one.
what survived forty years
The cognitive-mechanism side of the literature held up better than the effect sizes. Pennebaker's 1997 Psychological Science review reported a measurable shift in language during writing, a rising use of causal words such as because and insight words such as understand, tracked by independent judges who saw poorly organised descriptions become coherent narratives by the final day. [4] What carried the effect, on Pennebaker's own reading, was the translation of experience into structured language rather than the catharsis the original protocol had been built around. That mechanism does not require a thirty-minute trauma protocol. It requires an attempt to put a specific concrete thing into specific concrete words.
What the pillar of the science of journaling loses with the shrinkage is the slogan. What it keeps is the floor finding from Burton and King's two-minute miracle, the immune branch with its small but persistent signal on the body, and the cognitive translation Pennebaker described in 1997. The same shrinkage pattern appears in the gratitude literature, where the twelve RCTs ranked by control rigour show the effect collapsing as the controls tighten. None of those depend on d = 0.47. They depend on the act of putting one specific sentence onto a page. The forty-year arc of the literature is the slow correction of an early estimate that was always going to compress under serious meta-analysis. Read at the size it actually is, expressive writing is one of the more peculiar small effects in psychology that did not vanish.
references.
- 1.Frattaroli, J. (2006). Experimental disclosure and its moderators: A meta-analysis. Psychological Bulletin 132(6), 823–865.doi:10.1037/0033-2909.132.6.823
- 2.Frisina, P.G. et al. (2004). A meta-analysis of the effects of written emotional disclosure on the health outcomes of clinical populations. Journal of Nervous and Mental Disease 192(9), 629–634.doi:10.1097/01.nmd.0000138317.30764.63
- 3.Pennebaker, J.W. & Beall, S.K. (1986). Confronting a traumatic event: Toward an understanding of inhibition and disease. Journal of Abnormal Psychology 95(3), 274–281.doi:10.1037/0021-843X.95.3.274
- 4.Pennebaker, J.W. (1997). Writing About Emotional Experiences as a Therapeutic Process. Psychological Science 8(3), 162-166.doi:10.1111/j.1467-9280.1997.tb00403.x
- 5.Reinhold, M. et al. (2018). Effects of expressive writing on depressive symptoms — A meta-analysis. Clinical Psychology: Science and Practice 25(1), e12224.doi:10.1111/cpsp.12224
- 6.Smyth, J.M. (1998). Written emotional expression: Effect sizes, outcome types, and moderating variables. Journal of Consulting and Clinical Psychology 66(1), 174–184.doi:10.1037/0022-006X.66.1.174
- 7.Travagin, G. et al. (2015). How effective are expressive writing interventions for adolescents? A meta-analytic review. Clinical Psychology Review 36, 42–55.doi:10.1016/j.cpr.2015.01.003
related.
- ten journaling books we don't recommendthe popular journaling shelf has a contrarian list of its own. ten books that overclaim, ignore the evidence, or sell as journaling what isn't.
- best time to journal, there is no rctno head-to-head trial settles morning vs evening journaling. four indirect lines of evidence, chronobiology, sleep, worry, and one bedtime study, tilt one way.
- ten science-of-journaling books worth readingthe science-side canon of journaling books is smaller than the popular shelf. ten books, four decades of research, honest about what replication has shown.