Friday, February 15, 2013

The Will to (FMRI) Power

Power is like the sun: Everybody wants it, everybody finds in it a pleasant burning sensation, and yet nobody really knows what it is or how to get more of it. In my introductory statistics courses, this was the one concept - in addition to a few other small things, like standard error, effect size, sampling distributions, t-tests, and ANOVAs - that I completely failed to comprehend. Back then, I spoke as a child, I understood like a child, I reasoned like a child; but then I grew older, and I put away childish things, and resolved to learn once and for all what power really was.

1. What is Power?

The textbook definition of statistical power is rejecting the null hypothesis when it is, in fact, false; and everyone has a vague sense that, as the number of subjects increases, power increases as well. But why is this so?

To illustrate the concept of power, consider two partially overlapping distributions, shown in blue and red:

The blue distribution is the null distribution, stating that there is no effect, or difference; the red distribution, on the other hand, represents the alternative hypothesis that there is some effect or difference. The red dashed line represents our rejection region, beyond which we would reject the null hypothesis; and we can see that the more density of the alternative distribution that lies outside of this cutoff region, the greater probability we have of randomly drawing a sample that leads to a rejection of the null hypothesis, and therefore the greater our statistical power.

However, the sticking point is this: How do we determine where to place our alternative distribution? Potentially, it could be anywhere. So how do we decide where to put it?

One approach is to make an educated guess; and there is nothing wrong with this approach, given that it is solidly based on theory, and this may be appropriate if you do not have the time or resources to run an adequate pilot sample to do a power calculation. Another approach may be to estimate the mean of the alternative distribution based on the results from other studies; but, assuming that those results were significant, they have a greater probability of being sampled from the upper tail of the alternative distribution, and therefore have a larger probability of being greater than the true mean of the alternative distribution.

A third approach is to estimate the mean of the alternative distribution based on a sample - which is the logic behind doing a pilot study. This is often the best estimate we can make of the alternative distribution, and, given that you have the time and resources to carry out such a pilot study, is the best option for estimating power.

Once the mean of the alternative distribution has been established, the next step is to determine how power can be affected by changing the sample size. Recall that the standard error, or standard deviation of your sampling distribution of means, is inversely related to the square root of the number of subjects in your sample; and, critically, that the standard error is assumed to be the same for both the null distribution and the alternative distribution. Thus, increasing the sample size leads to a reduction in the spread of both distributions, which in turn leads to less overlap between the two distributions and again increases power.

Result of increasing the sample size from 4 to 10. Note that there is now less overlap between the distributions, and that more of the alternative distribution now lies to the right of the cutoff threshold, increasing power.

2. Power applied to FMRI

This becomes an even trickier issue when dealing with neuroimaging data, when gathering a large number of pilot subjects can be prohibitively expensive, and the funding of grants depends on reasonable estimates from a power analysis.

Fortunately, a tool called fmripower allows the researcher to calculate power estimates for a range of potential future subjects, given a small pilot sample. The interface is clean, straightforward, and easy to use, and the results are useful not only for grant purposes, but also for a sanity check of whether your effect will have enough power to warrant going through with a full research study. If achieving power of about 80% requires seventy or eighty subjects, you may want to rethink your experiment, and possibly collect another pilot sample that includes more trials of interest or a more powerful design.

A few caveats about using fmripower:

  1. This tool should not be used for post-hoc power analyses; that is, calculating the power associated with a sample or full dataset that you already collected. This type of analysis is uninformative (since we cannot say with any certainty whether our result came from the null distribution or a specific alternative distribution), and can be misleading (see Hoenig & Heisey, 2001).
  2. fmripower uses a default atlas when calculating power estimates, which parcellates cortical and subcortical regions into dozens of smaller regions of interest (ROIs). While this is useful for visualization and learning purposes, it is not recommended to use every single ROI; unless, of course, you correct for the number of ROIs used by applying a method such as Bonferroni correction (e.g., dividing your Type I error rate by the number of ROIs used).
  3. When selecting an ROI, make sure that it is independent (cf. Kriegeskorte, 2009). This means choosing an ROI based on either anatomical landmarks or atlases, or from an independent contrast (i.e., a contrast that does not share any variance or correlate with your contrast of interest). Basing your ROI on your pilot study's contrast of interest - that is, the same contrast that you will examine in your full study - will bias your power estimate, since any effect leading to significant activation in a small pilot sample will necessarily be very large.
  4. For your final study, do not include your pilot sample, as this can lead to an inflated Type I error rate (Mumford, 2012). A sample should be used for power estimates only; it should not be included in the final analysis. 

Once you've experimented around with fmripower and gotten used to its interface, either with SPM or FSL, simply load up your group-level analysis (either FSL's cope.feat directory and corresponding cope.nii.gz file for the contrast of interest, or SPM's .mat file containing the contrasts in that second-level analysis), choose an unbiased ROI, your Type I error rate, and whether you want to estimate power for a specific subject number or across a range of subjects. I find the range much more useful, as it gives you an idea of the sweet spot between the number of subjects and power estimate.

Thanks to Jeanette Mumford, whose obsession with power is an inspiration to us all.


  1. Hi,

    I have a problem when using fmripower.
    I got an error as below when I click 'select design matrix'

    Undefined function or variable "new_design".

    Error in fmripower_spm/design_push_Callback (line 284)
    if ~isnumeric(new_design)

    Error while evaluating uicontrol Callback

    1. Hey Izz, are you using a 2nd-level SPM.mat file? It should work on any file that includes multiple subjects.


  2. Hi - I'm getting the same error also and am definitely using a 2nd level SPM.mat file. I don't seem to be able to manually enter a design even though instructions indicate this should be an option...?

    1. It's hard to tell; maybe it's an SPM version issue? fmripower was developed a while ago, and maybe it only works up through SPM8. I'm not sure; I haven't used it in a while, and when I was using it, it was with SPM5.

      An alternative option is to use upload your second-level map, give it a mask (optional), and fill out a few other fields and you should be good to go.


  3. Hi Andy,
    I'm having a hard time using fmripower with fsl. But you didn't show the steps for FSL use in your video. Are you considering to do an update?

    1. Hey Liliana, I probably won't do another update on fMRIpower; I'm using instead. It's more current, and it doesn't require loading any design matrices (you just need a second-level contrast map and a mask).



  4. Hey Andy,
    does the Type I error rate (in your example 0.001) refer to the SPM single voxel threshold or the cluster threshold or FWE-correction? I am wondering because Mumford describes a Type I error of 5% in her paper on fMRIpower.
    Thanks in advance!

    1. Hi Aylin,

      I'm actually not entirely sure; it's been a while since I used the software package, and I don't think it's being updated anymore. I think it refers to the FWE correction for the clusters.


    2. Hi Andy,
      thanks for your fast reply. Do you recommend another software for power calculation? I think the site of Neuropower is down already for a while.