Power analysis:

bridging theory and practice

Andrés Cruz

UT GOV Methods Workshop, 2024-09-26

A power crisis?¹

Why is power analysis important?

Low-powered studies are more likely to be unduly optimistic and fail to replicate² ³ ⁴
“Power is for you, not for Reviewer Two” –C. Rainey⁵
- When getting a null result, you want to be able to distinguish between a true null vs low power
- We need justifiable⁶ and cost-effective sample sizes to get research done

Outline

We’ll cover:
- 🔋 Theory: What is power?
- 🧮 Mechanics: How to calculate power?
- 📚 Practice: How to use existing information?

What is power?

The probability of rejecting the null hypothesis when it is false
- This is what we want! Ideally, \(Power = 1\).
- But there are usually practical considerations…

Example

One-sided z-test:

\(\color{red}{h_0: \mu = 2}\); \(\color{blue}{h_1: \mu > 2}\)

Consider the specific value of \(\color{blue}{\mu_1 = 4}\) in the alternative hyp.

* Sampling distribution: \(\bar X \sim N(\mu, \sigma / \sqrt n)\)

Power increases with:

Stronger hypothesized effects
Lower population variance
More observations

Calculating power analytically (I)

Determine threshold for rejection region under \(h_0\)
Calculate shaded probability under \(h_1\)

Calculating power analytically (II)

Our ingredients: \(\mu_1=4\), \(\alpha=.05\), \(\sigma=5\), \(n=30\).

# get rejection threshold (under h0)
thr <- qnorm(p = .95, mean = 2, sd = 5 / sqrt(30))
# get power (under h1)
1 - pnorm(thr, mean = 4, sd = 5 / sqrt(30))

[1] 0.7074796

pwr::pwr.norm.test(d = (4 - 2) / 5, n = 30, 
                   sig.level = 0.05, alternative = "greater")


     Mean power calculation for normal distribution with known variance 

              d = 0.4
              n = 30
      sig.level = 0.05
          power = 0.7074796
    alternative = greater

Calculating power with sims. (I)

# run one million simulations
k <- 1e06

v_rejections <- sapply(1:k, \(i){
  # sample values and take mean (under h1)
  sampled_mean <- rnorm(n = 30, mean = 4, sd = 5) |> mean()
  # check whether this sampled mean is above the rejection threshold
  as.integer(sampled_mean > thr)
})

# get proportion rejected
mean(v_rejections)

[1] 0.707352

Calculating power with sims. (II)⁷

library(DeclareDesign)

declaration <- 
  declare_model(N = 30, Y = rnorm(n = 30, mean = 4, sd = 5)) +
  declare_test(handler = (\(d) {
    z.test(d$Y, alternative = "greater", mu = 2, sigma.x = 5) |> tidy()
  }))

declaration |> diagnose_designs(sims = 1000)


Research design diagnosis based on 1000 simulations. Diagnosis completed in 6 secs. Diagnosand estimates with bootstrapped standard errors in parentheses (100 replicates).

      Design N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power
 declaration   1000            NA          3.97   NA        0.90   NA  0.71
 Coverage
       NA

Calculating power (I)

Analytic approach
- Pros: Fast, off-the-shelf
- Cons: Restricted to some (simple) designs
Simulations approach
- Pros: Flexible!
- Cons: Slower, harder to code
For example: power in conjoint experiments can be obtained analytically⁸ or via simulations⁹ ¹⁰

Calculating power (II)

Takeaway: mechanically, calculating power is straightforward with the necessary ingredients
1. Significance level
2. Sample size
3. Hypothesized effect
4. Population variance*
Can also get required sample size for a given level of power (0.8?¹¹)
The elephant in the room: how to set (3) and (4)?

Gathering ingredients: variance* (I)

What we need is information to anticipate our standard error
Sources:
- Existing or pilot studies¹² ¹³
- Estimates of the variance in the population of interest

Gathering ingredients: variance* (II)¹⁴

For example, if we have a SE from an existing study, we could do \(\widehat {SE}_{\text{planned}} = \widehat{SE}_{existing} \times \displaystyle \sqrt \frac{n_{\text{existing}}}{n_{planned}}\).
If our SE comes from a pilot study, we might want to be more conservative: \(\widehat{SE}_{\text{planned}} = \widehat{SE}_{pilot} \times \displaystyle \sqrt \frac{n_{\text{pilot}}}{n_{planned}} \times \displaystyle \Biggl(\displaystyle \sqrt \frac{1}{n_{pilot}} + 1\Biggr)\).
Using an estimate of the variance of the outcome usually requires stronger assumptions

Gathering ingredients: hyp. effect

What should we aim for?
- When possible, “smallest substantively meaningful effect”¹⁵
- Oftentimes, “expected effect size”¹⁶
Sources
- Previous studies with similar designs and outcomes
- Meta-analysis or literature reviews¹⁷
- DO NOT use a pilot study for this¹⁸ ¹⁹ ²⁰
If unsure, we can always do a sensitivity analysis

Reasoning with power graphs ²¹

Takeaway points

When designing a study, you want to run a power analysis
Use analytical approaches for simple designs and simulation approaches for more complex designs
Power depends on sample size, significance level, population variance*, and effect size
- Get info from previous studies, pilots*, or the reference population
Use (but suspect) rules of thumb. Embrace the trade-offs!

Thank you!

✉️ <andres.cruz@utexas.edu>

Footnotes

Arel-Bundock, Vincent, Ryan C. Briggs, Hristos Doucouliagos, Marco M. Aviña, and T.D. Stanley. 2023. “Quantitative Political Science Research Is Greatly Underpowered.” OSF Preprints. doi:10.31219/osf.io/7vy2f.
Gelman, Andrew, and John Carlin. “Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors.” Perspectives on psychological science 9, no. 6 (2014): 641-651. https://doi.org/10.1177/1745691614551642
Button, Katherine S., John PA Ioannidis, Claire Mokrysz, Brian A. Nosek, Jonathan Flint, Emma SJ Robinson, and Marcus R. Munafò. “Power failure: Why small sample size undermines the reliability of neuroscience.” Nature Reviews Neuroscience 14, no. 5 (2013): 365-376. https://doi.org/10.1038/nrn3475
Tressoldi, Patrizio E. “Replication unreliability in psychology: elusive phenomena or “elusive” statistical power?.” Frontiers in Psychology 3 (2012): 218. https://doi.org/10.3389/fpsyg.2012.00218
Rainey, Carlisley. “Power, Part I: Power Is for You, Not for Reviewer Two.” Blog post. 2023. https://www.carlislerainey.com/blog/2023-05-22-power-1-for-you-not-reviewer-2/.
Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267
Blair, Graeme, Alexander Coppock, and Macartan Humphreys. Research design in the social sciences: Declaration, diagnosis, and redesign. Princeton University Press, 2023. https://book.declaredesign.org/
Schuessler, Julian, and Markus Freitag. 2020. “Power Analysis for Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/9yuhp.
Stefanelli, Alberto, and Martin Lukac. 2020. “Subjects, Trials, and Levels: Statistical Power in Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/spkcy.
Blair, Graeme, Alexander Coppock, and Macartan Humphreys. Research design in the social sciences: Declaration, diagnosis, and redesign. Princeton University Press, 2023. https://book.declaredesign.org/library/experimental-descriptive.html#sec-ch17s3. Sec. 17.3: Conjoint experiments.
Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267
Rainey, Carlisle. (2024). “Statistical Power from Pilot Data: Simulations to Illustrate.” Blog post. https://www.carlislerainey.com/blog/2024-06-03-pilot-power/
DeclareDesign Team. (2019). “Should a pilot study change your study design decisions?” Blog post. https://declaredesign.org/blog/posts/pilot-studies.html
Rainey, Carlisle. 2024. “Power Rules: Practical Statistical Power Calculations.” https://github.com/carlislerainey/power-rules/blob/main/power-rules.pdf
Rainey, Carlisle. 2024. “Power Rules: Practical Statistical Power Calculations.” https://github.com/carlislerainey/power-rules/blob/main/power-rules.pdf
Lakens, Daniël. “Sample size justification.” Collabra: Psychology 8, no. 1 (2022): 33267. https://doi.org/10.1525/collabra.33267
Stefanelli, Alberto, and Martin Lukac. 2020. “Subjects, Trials, and Levels: Statistical Power in Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/spkcy.
Leon, Andrew C., Lori L. Davis, and Helena C. Kraemer. “The role and interpretation of pilot studies in clinical research.” Journal of psychiatric research 45, no. 5 (2011): 626-629. https://doi.org/10.1016/j.jpsychires.2010.10.008
DeclareDesign Team. (2019). “Should a pilot study change your study design decisions?” Blog post. https://declaredesign.org/blog/posts/pilot-studies.html
Rainey, Carlisle. (2024). “Statistical Power from Pilot Data: Simulations to Illustrate.” Blog post. https://www.carlislerainey.com/blog/2024-06-03-pilot-power/
Schuessler, Julian, and Markus Freitag. 2020. “Power Analysis for Conjoint Experiments.” SocArXiv. https://doi.org/10.31235/osf.io/9yuhp. Created with their online tool: https://markusfreitag.shinyapps.io/cjpowr/.