Notes on MPPCA denoising

2023-12-21

Introduction

This note is for the paper titled as Denoising of diffusion MRI using random matrix theory, which was published on the NeuroImage Journal on 2016.

Considering the publish data, this paper only compared the proposed method with two denoising methods used in denoising diffusion MRI: the non-local mean(ANLM), total variation(TV/TGV). And the authors concluded the shortcomings of these two denoising approaches:

ANLM: loss of spatial resolution of the image (blur) and introduction of additional partial volume effects that lead to complications in further quantitative analyses or to biases in diffusion modeling.

TV: dependency on a regularization term, introduction of reconstruction artifacts, and the fact that thermal noise is not the sole source of local variations. Indeed, fine anatomical details might be removed as well by this non-selective technique.

Then the authors pointed out the main problem of PCA denoising. They wrote:

The number of signal-carrying principle components, i.e. the number of components that significantly contribute to the description of the underlying diffusion process, is unknown and is expected to depend on imaging factors such as resolution, b-value and SNR. Hence, an objective criterion to discriminate between the signal-carrying and noise-only components has been missing... Commonly used criteria include thresholding of the eigenvalues associated with the principal components by an empirically set value.

Based on that, they objectified the threshold for PCA denoising by exploiting the fact that noise-only eigenvalues are expected to obey the universal Marchenko-Pastur law, a result of the random matrix theory for noisy covariance matrices. There is a blog by Terrence Tao describing the topics in random matrix theory.

The question is what's the Marchenko-Pastur law and how we use it to objectify the threshold for PCA?

Method

Marchenko-Pastur distribution

A redundant $M \times N$ data matrix $X$ is the one that can be synthesized by a combination of a few, $P ≪ {M, N}$ linearly independent sources, or principal components, derived via the singular value decomposition of $X$ :

X = \sqrt{N} U Λ V_{}^{T}

where $U$ and $V$ are unitary matrices whose columns are the left-singular and right-singular vectors of $X$ . Without loss of generality, we assume $M$ < $N$ . The diagonal elements $Λ_{1, 1}, . . . Λ_{M, M}$ of the $M \times N$ matrix $Λ$ are the singular values, with $Λ_{i, i} = λ_{i}$ being the $i^{t h}$ eigenvalue of the $M \times M$ matrix:

Y = \frac{1}{N} X X^{T} = U Λ^{2} U^{T}

Marcheko-Pastur law: If $X$ denotes $m \times n$ random matrix whose entries are independent identically distributed (iid.) random variables with mean 0 and variance $σ^{2}$ < $\infty$ let $Y = \frac{1}{N} X X^{T}$ and let $λ_{1}, λ_{2}, . . ., λ_{m}$ be the eigenvalues of $Y$ . Finally, consider the random measure

μ_{m} (A) = \frac{1}{m} # {λ_{j} \in A}, A \subset ℝ

counting the number of eigenvalues in the subset $A$ included in $ℝ$ .

Assume that $m, n \to \infty$ so that the ratio $m / n \to γ \in (0, + \infty)$ . Then $μ_{m} \to μ$ , where

p (λ | σ, γ) = \frac{\sqrt{(λ_{+} - λ) (λ - λ_{-})}}{2 π γ λ σ^{2}}, λ \in [λ_{-}, λ_{+}], w i t h

λ_{\pm} = σ^{2} (1 \pm \sqrt{γ})^{2}

Hence, the $\tilde{M} = M - P$ samllest nonzero eigenvalues $λ_{P + 1} \geq \dots \geq λ_{M}$ obey the MP distribution. And $γ = \tilde{M} / N$ and $σ$ the noise level. If $\tilde{M} ≫ P$ . Note that the width of the MP bulk spectrum equals:

λ_{+} - λ_{-} = 4 \sqrt{γ} σ^{2}

and the expectation value of an MP distribution is given by:

\int_{λ_{-}}^{λ_{+}} p (λ | σ, γ) λ d λ = σ^{2}

The distribution edge $λ_{+}$ distinguishes between noise- and significant signal carrying principle components. Nullifying all $λ \leq λ_{+}, Λ \to \tilde{Λ}$ , and reconstructing the matrix results in a denoised matrix:

\tilde{X} = \sqrt{N} U \tilde{Λ} V^{T}

Using the expectation value, we can calculate the variance accumulated in the omitted eigenvalues is give by:

v a r (X) = E (X X^{T}) - E (X)^{2} = E (Y) = \frac{1}{M} \sum_{p + 1}^{M} λ_{i} = \frac{\tilde{M}}{M} σ^{2}; ∵ E (X) = 0;

As the omitted and residual principal components are linearly uncorrelated, the variance $\tilde{σ^{2}}$ of the residual noise, contained within the $P$ significant components, can be give by:

\tilde{σ^{2}} = σ^{2} - v a r (X) = \frac{P}{M} σ^{2}

The SNR after denoising should thus scale with $\sqrt{M / P}$ .

Denoising algorithm

Now it's clear that we should know the $λ_{+}$ , or know the $P$ . The noise level $σ$ will be given as a additional product. In this paper, the number $\tilde{P}$ of significant components is given by incrementally increasing $p$ until

\sum_{i = p + 1}^{M} λ_{i} \geq (M - p) \tilde{σ^{2}} (p) .

The update of $σ$ using the equation mentioned to calculate the spectrum length:

\tilde{σ^{2}} (p) = \frac{λ_{p + 1} - λ_{M}}{4 \sqrt{γ_{p}}} .

Once the $\tilde{P}$ is determined, the estimated noise level $σ$ :

\tilde{σ^{2}} = \frac{\sum_{i = \tilde{P} + 1}^{M} λ_{i}}{M - \tilde{P}} .

Discussion

The assumption that the omitted and residual principal components are linearly uncorrelated may fail for diffusion-weighted images with non-Cartesian readout. A manuscript on arxiv trying to solving this problem.
Denoising must be the first step of post-processing pipeline because data interpolation or smoothing will change the noise characteristics on which MPPCA relies.
A tensor-based MPPCA was recently published on MRM to address issue of denoising the high-dimensional data.
A question in my mind: what's the difference between Low Rank approximation and PCA? A pdf I found online for it.

#MRI