Andy's Brain Blog: K-Means Analysis with FMRI Data

Tuesday, May 26, 2015

K-Means Analysis with FMRI Data

Clustering, or finding subgroups of data, is an important technique in biostatistics, sociology, neuroscience, and dowsing, allowing one to condense what would be a series of complex interaction terms into a straightforward visualization of which observations tend to cluster together. The following graph, taken from the online Introduction to Statistical Learning in R (ISLR), shows this in a two-dimensional space with a random scattering of observations:

Different colors denote different groups, and the number of groups can be decided by the researcher before performing the k-means clustering algorithm. To visualize how these groups are being formed, imagine an "X" being drawn in the center of mass of each cluster; also known as a centroid, this can be thought of as exerting a gravitational pull on nearby data points - those closer to that centroid will "belong" to that cluster, while other data points will be classified as belonging to the other clusters they are closer to.

This can be applied to FMRI data, where several different columns of data extracted from an ROI, representing different regressors, can be assigned to different categories. If, for example, we are looking for only two distinct clusters and we have several different regressors, then a voxel showing high values for half of the regressors but low values for the other regressors may be assigned to cluster 1, while a voxel showing the opposite pattern would be assigned to cluster 2. The label itself is arbitrary, and is interpreted by the researcher.

To do this in Matlab, all you need is a matrix with data values from your regressors extracted from an ROI (or the whole brain, if you want to expand your search). This is then fed into the kmeans function, which takes as arguments the matrix and the number of clusters you wish to partition it into; for example, kmeans(your_matrix, 3).

This will return a vector of numbers classifying a particular row (i.e., a voxel) as belonging to one of the specified clusters. This vector can then be prefixed to a matrix of the x-, y-, and z-coordinates of your search space, and then written into an image for visualizing the results.

There are a couple of scripts to help out with this: One, createBlankNIFTI.m, which will erase a standardized space image (I suggest a mask output by SPM at its second level) and replace every voxel with zeros, and the other script, createNIFTI.m, will fill in those voxels with your cluster numbers. You should see something like the following (here, I am visualizing it in the AFNI viewer, since it automatically colors in different numbers):

Sample k-means analysis with k=3 clusters.

The functions are pasted below, as well as a couple of explanatory videos.

function createBlankNIFTI(imageFile)

%Note: Make sure that the image is a copy, and retain the original

X = spm_read_vols(spm_vol(imageFile));
X(:,:,:) = 0;
spm_write_vol(spm_vol(imageFile), X);

=================================

function createNIFTI(imageFile, textFile)


hdr = spm_vol(imageFile);
img = spm_read_vols(hdr);

fid = fopen(textFile);
nrows = numel(cell2mat(textscan(fid,'%1c%*[^\n]')));
fclose(fid);

fid = 0;



for i = 1:nrows
    if fid == 0
        fid = fopen(textFile);
    end
    
    Z = fscanf(fid, '%g', 4);
    
    img(Z(2), Z(3), Z(4)) = Z(1);
    spm_write_vol(hdr, img);
end

8 comments:

UnknownDecember 6, 2016 at 11:41 AM
Hi Andrew,

Thank you very much for explanations and the video. I found it very useful. Can I also apply kmeans to rs-fMRI data? How can I construct my input matrix if I have several groups of subjects?Thank you and have a nice day!

Best regards,
Josephine
ReplyDelete
Replies
UnknownMarch 15, 2018 at 1:08 PM
Hi Andy

Thank you very much for this post and for the interesting and informative blog! Could you perhaps comment about the differences between (unsupervised) machine learning and non machine learning clustering strategies?

I read a definition saying that ML are iterative ("optimization solutions"), while non-ML are "analytic". However, if I understand correctly, K-means is iterative as well.

Specifically, if you could comment on where processes like ICA and K-means are in relation to the terminology (would it be to much of a stretch to refer to them as a form of ML)? If there's a reference/source you could direct me to it would be very much appreciated!

Many thanks!!

Leer.
ReplyDelete
Replies

Add comment

Tuesday, May 26, 2015

K-Means Analysis with FMRI Data

8 comments:

Total Pageviews

Blog Archive