Tuesday, May 26, 2015

K-Means Analysis with FMRI Data

Clustering, or finding subgroups of data, is an important technique in biostatistics, sociology, neuroscience, and dowsing, allowing one to condense what would be a series of complex interaction terms into a straightforward visualization of which observations tend to cluster together. The following graph, taken from the online Introduction to Statistical Learning in R (ISLR), shows this in a two-dimensional space with a random scattering of observations:

Different colors denote different groups, and the number of groups can be decided by the researcher before performing the k-means clustering algorithm. To visualize how these groups are being formed, imagine an "X" being drawn in the center of mass of each cluster; also known as a centroid, this can be thought of as exerting a gravitational pull on nearby data points - those closer to that centroid will "belong" to that cluster, while other data points will be classified as belonging to the other clusters they are closer to.

This can be applied to FMRI data, where several different columns of data extracted from an ROI, representing different regressors, can be assigned to different categories. If, for example, we are looking for only two distinct clusters and we have several different regressors, then a voxel showing high values for half of the regressors but low values for the other regressors may be assigned to cluster 1, while a voxel showing the opposite pattern would be assigned to cluster 2. The label itself is arbitrary, and is interpreted by the researcher.

To do this in Matlab, all you need is a matrix with data values from your regressors extracted from an ROI (or the whole brain, if you want to expand your search). This is then fed into the kmeans function, which takes as arguments the matrix and the number of clusters you wish to partition it into; for example, kmeans(your_matrix, 3).

This will return a vector of numbers classifying a particular row (i.e., a voxel) as belonging to one of the specified clusters. This vector can then be prefixed to a matrix of the x-, y-, and z-coordinates of your search space, and then written into an image for visualizing the results.

There are a couple of scripts to help out with this: One, createBlankNIFTI.m, which will erase a standardized space image (I suggest a mask output by SPM at its second level) and replace every voxel with zeros, and the other script, createNIFTI.m, will fill in those voxels with your cluster numbers. You should see something like the following (here, I am visualizing it in the AFNI viewer, since it automatically colors in different numbers):

Sample k-means analysis with k=3 clusters.

The functions are pasted below, as well as a couple of explanatory videos.

function createBlankNIFTI(imageFile)

%Note: Make sure that the image is a copy, and retain the original

X = spm_read_vols(spm_vol(imageFile));
X(:,:,:) = 0;
spm_write_vol(spm_vol(imageFile), X);


function createNIFTI(imageFile, textFile)

hdr = spm_vol(imageFile);
img = spm_read_vols(hdr);

fid = fopen(textFile);
nrows = numel(cell2mat(textscan(fid,'%1c%*[^\n]')));

fid = 0;

for i = 1:nrows
    if fid == 0
        fid = fopen(textFile);
    Z = fscanf(fid, '%g', 4);
    img(Z(2), Z(3), Z(4)) = Z(1);
    spm_write_vol(hdr, img);



  1. Hi Andrew,

    Thank you very much for explanations and the video. I found it very useful. Can I also apply kmeans to rs-fMRI data? How can I construct my input matrix if I have several groups of subjects?Thank you and have a nice day!

    Best regards,

    1. Hi Josephine, I'm not sure if that would work; k-means requires multiple values at each voxel, and then analyzes which voxels tend to cluster together (e.g., whether certain values go up and down together). As there are no regressors of interest in resting state, I don't see how you could apply it to that data.


    2. Hi andy,

      Thanks for you blog it's saved me several times over. Following up Josephine's question supposing you were interested in your physiologic regressors could you use them for k means analysis in rsfmri?



    3. Hi Austin,

      I haven't done k-means clustering on resting state data, but I think it should work. It depends what your question is: Are you testing what voxels cluster on physiological signal to separate them from resting state signal? If so, then entering all of those values into a k-means analysis should be able to tease them apart. Again, I haven't done it, but the logic seems sound.


    4. Hi Andrew,
      Thank you for all your tutos. I was wondering how you can get a group result, after running kmean clustering analysis on several subject? Could you get something like a 'probability map' of clustering? And how to do that? Thank you again!

    5. Hi Carole,

      My guess is that you will need to do a chi-square test. In other words, for a given voxel and a given number of clusters, what is the probability of seeing a certain percentage of your subjects all show the same cluster for that voxel?

      I'm not positive about this, so double check it with someone else. However, that's the test that comes to mind.


    6. Hi Andrew,
      Many thanks for your reply!.

  2. Hi Andy

    Thank you very much for this post and for the interesting and informative blog! Could you perhaps comment about the differences between (unsupervised) machine learning and non machine learning clustering strategies?

    I read a definition saying that ML are iterative ("optimization solutions"), while non-ML are "analytic". However, if I understand correctly, K-means is iterative as well.

    Specifically, if you could comment on where processes like ICA and K-means are in relation to the terminology (would it be to much of a stretch to refer to them as a form of ML)? If there's a reference/source you could direct me to it would be very much appreciated!

    Many thanks!!