Power analysis:

bridging theory and practice

Andrés Cruz

UT GOV Methods Workshop, 2024-09-26

A power crisis?1


Why is power analysis important?

  1. Low-powered studies are more likely to be unduly optimistic and fail to replicate2 3 4

  2. “Power is for you, not for Reviewer Two” –C. Rainey5

    • When getting a null result, you want to be able to distinguish between a true null vs low power

    • We need justifiable6 and cost-effective sample sizes to get research done

Outline

  • We’ll cover:
    • 🔋 Theory: What is power?
    • 🧮 Mechanics: How to calculate power?
    • 📚 Practice: How to use existing information?

What is power?

  • The probability of rejecting the null hypothesis when it is false
    • This is what we want! Ideally, \(Power = 1\).
    • But there are usually practical considerations…

Example

One-sided z-test:

\(\color{red}{h_0: \mu = 2}\); \(\color{blue}{h_1: \mu > 2}\)

Consider the specific value of \(\color{blue}{\mu_1 = 4}\) in the alternative hyp.

* Sampling distribution: \(\bar X \sim N(\mu, \sigma / \sqrt n)\)

Power increases with:

  • Stronger hypothesized effects

  • Lower population variance

  • More observations

Calculating power analytically (I)

  1. Determine threshold for rejection region under \(h_0\)
  2. Calculate shaded probability under \(h_1\)

Calculating power analytically (II)

Our ingredients: \(\mu_1=4\), \(\alpha=.05\), \(\sigma=5\), \(n=30\).

# get rejection threshold (under h0)
thr <- qnorm(p = .95, mean = 2, sd = 5 / sqrt(30))
# get power (under h1)
1 - pnorm(thr, mean = 4, sd = 5 / sqrt(30))
[1] 0.7074796


pwr::pwr.norm.test(d = (4 - 2) / 5, n = 30, 
                   sig.level = 0.05, alternative = "greater")

     Mean power calculation for normal distribution with known variance 

              d = 0.4
              n = 30
      sig.level = 0.05
          power = 0.7074796
    alternative = greater

Calculating power with sims. (I)

# run one million simulations
k <- 1e06

v_rejections <- sapply(1:k, \(i){
  # sample values and take mean (under h1)
  sampled_mean <- rnorm(n = 30, mean = 4, sd = 5) |> mean()
  # check whether this sampled mean is above the rejection threshold
  as.integer(sampled_mean > thr)
})

# get proportion rejected
mean(v_rejections)
[1] 0.707352

Calculating power with sims. (II)7

library(DeclareDesign)

declaration <- 
  declare_model(N = 30, Y = rnorm(n = 30, mean = 4, sd = 5)) +
  declare_test(handler = (\(d) {
    z.test(d$Y, alternative = "greater", mu = 2, sigma.x = 5) |> tidy()
  }))

declaration |> diagnose_designs(sims = 1000)

Research design diagnosis based on 1000 simulations. Diagnosis completed in 6 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

      Design N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power
 declaration   1000            NA          3.97   NA        0.90   NA  0.71
 Coverage
       NA

Calculating power (I)

  • Analytic approach
    • Pros: Fast, off-the-shelf
    • Cons: Restricted to some (simple) designs
  • Simulations approach
    • Pros: Flexible!
    • Cons: Slower, harder to code
  • For example: power in conjoint experiments can be obtained analytically8 or via simulations9 10

Calculating power (II)

  • Takeaway: mechanically, calculating power is straightforward with the necessary ingredients

    1. Significance level
    2. Sample size
    3. Hypothesized effect
    4. Population variance*
  • Can also get required sample size for a given level of power (0.8?11)

  • The elephant in the room: how to set (3) and (4)?

Gathering ingredients: variance* (I)

  • What we need is information to anticipate our standard error

  • Sources:

    • Existing or pilot studies12 13
    • Estimates of the variance in the population of interest

Gathering ingredients: variance* (II)14

  • For example, if we have a SE from an existing study, we could do \(\widehat {SE}_{\text{planned}} = \widehat{SE}_{existing} \times \displaystyle \sqrt \frac{n_{\text{existing}}}{n_{planned}}\).

  • If our SE comes from a pilot study, we might want to be more conservative: \(\widehat{SE}_{\text{planned}} = \widehat{SE}_{pilot} \times \displaystyle \sqrt \frac{n_{\text{pilot}}}{n_{planned}} \times \displaystyle \Biggl(\displaystyle \sqrt \frac{1}{n_{pilot}} + 1\Biggr)\).

  • Using an estimate of the variance of the outcome usually requires stronger assumptions

Gathering ingredients: hyp. effect

  • What should we aim for?
    • When possible, “smallest substantively meaningful effect”15
    • Oftentimes, “expected effect size”16
  • Sources
    • Previous studies with similar designs and outcomes
    • Meta-analysis or literature reviews17
    • DO NOT use a pilot study for this18 19 20
  • If unsure, we can always do a sensitivity analysis

Reasoning with power graphs 21

Takeaway points

  • When designing a study, you want to run a power analysis

  • Use analytical approaches for simple designs and simulation approaches for more complex designs

  • Power depends on sample size, significance level, population variance*, and effect size

    • Get info from previous studies, pilots*, or the reference population
  • Use (but suspect) rules of thumb. Embrace the trade-offs!

Thank you!

✉️ <andres.cruz@utexas.edu>

Footnotes

  1. Arel-Bundock, Vincent, Ryan C. Briggs, Hristos Doucouliagos, Marco M. Aviña, and T.D. Stanley. 2023. “Quantitative Political Science Research Is Greatly Underpowered.” OSF Preprints. doi:10.31219/osf.io/7vy2f.

  2. Gelman, Andrew, and John Carlin. “Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors.” Perspectives on psychological science 9, no. 6 (2014): 641-651. https://doi.org/10.1177/1745691614551642

  3. Button, Katherine S., John PA Ioannidis, Claire Mokrysz, Brian A. Nosek, Jonathan Flint, Emma SJ Robinson, and Marcus R. Munafò. “Power failure: Why small sample size undermines the reliability of neuroscience.” Nature Reviews Neuroscience 14, no. 5 (2013): 365-376. https://doi.org/10.1038/nrn3475

  4. Tressoldi, Patrizio E. “Replication unreliability in psychology: elusive phenomena or “elusive” statistical power?.” Frontiers in Psychology 3 (2012): 218. https://doi.org/10.3389/fpsyg.2012.00218

  5. Rainey, Carlisley. “Power, Part I: Power Is for You, Not for Reviewer Two.” Blog post. 2023. https://www.carlislerainey.com/blog/2023-05-22-power-1-for-you-not-reviewer-2/.

  6. Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267

  7. Blair, Graeme, Alexander Coppock, and Macartan Humphreys. Research design in the social sciences: Declaration, diagnosis, and redesign. Princeton University Press, 2023. https://book.declaredesign.org/

  8. Schuessler, Julian, and Markus Freitag. 2020. “Power Analysis for Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/9yuhp.

  9. Stefanelli, Alberto, and Martin Lukac. 2020. “Subjects, Trials, and Levels: Statistical Power in Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/spkcy.

  10. Blair, Graeme, Alexander Coppock, and Macartan Humphreys. Research design in the social sciences: Declaration, diagnosis, and redesign. Princeton University Press, 2023. https://book.declaredesign.org/library/experimental-descriptive.html#sec-ch17s3. Sec. 17.3: Conjoint experiments.

  11. Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267

  12. Rainey, Carlisle. (2024). “Statistical Power from Pilot Data: Simulations to Illustrate.” Blog post. https://www.carlislerainey.com/blog/2024-06-03-pilot-power/

  13. DeclareDesign Team. (2019). “Should a pilot study change your study design decisions?” Blog post. https://declaredesign.org/blog/posts/pilot-studies.html

  14. Rainey, Carlisle. 2024. “Power Rules: Practical Statistical Power Calculations.” https://github.com/carlislerainey/power-rules/blob/main/power-rules.pdf

  15. Rainey, Carlisle. 2024. “Power Rules: Practical Statistical Power Calculations.” https://github.com/carlislerainey/power-rules/blob/main/power-rules.pdf

  16. Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267

  17. Stefanelli, Alberto, and Martin Lukac. 2020. “Subjects, Trials, and Levels: Statistical Power in Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/spkcy.

  18. Leon, Andrew C., Lori L. Davis, and Helena C. Kraemer. “The role and interpretation of pilot studies in clinical research.” Journal of psychiatric research 45, no. 5 (2011): 626-629. https://doi.org/10.1016/j.jpsychires.2010.10.008

  19. DeclareDesign Team. (2019). “Should a pilot study change your study design decisions?” Blog post. https://declaredesign.org/blog/posts/pilot-studies.html

  20. Rainey, Carlisle. (2024). “Statistical Power from Pilot Data: Simulations to Illustrate.” Blog post. https://www.carlislerainey.com/blog/2024-06-03-pilot-power/

  21. Schuessler, Julian, and Markus Freitag. 2020. “Power Analysis for Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/9yuhp. Created with their online tool: https://markusfreitag.shinyapps.io/cjpowr/.