Loss Functions

Negative Log-Likelihood

QUiNN assumes a Gaussian likelihood for regression:

\[p(\mathcal{D} \mid w) = \prod_{i=1}^{N} \mathcal{N}(y_i \mid M(x_i; w),\, \sigma^2),\]

where \(\sigma\) is the data noise standard deviation. The negative log-likelihood is

\[-\log p(\mathcal{D} \mid w) = \frac{N}{2}\log(2\pi\sigma^2) + \frac{1}{2\sigma^2} \sum_{i=1}^{N} \|y_i - M(x_i; w)\|^2.\]

Gaussian Prior

When a prior is used, QUiNN employs an isotropic Gaussian centered at an anchor \(w_0\):

\[p(w) = \mathcal{N}(w \mid w_0,\, \sigma_{\text{prior}}^2 I_K).\]

The negative log-prior is

\[-\log p(w) = \frac{1}{2\sigma_{\text{prior}}^2} \|w - w_0\|^2 + \frac{K}{2}\log(2\pi\sigma_{\text{prior}}^2).\]

Negative Log-Posterior

Combining the likelihood and the prior, the negative log-posterior used for training is

\[-\log p(w \mid \mathcal{D}) = \frac{1}{2\sigma^2} \sum_{i=1}^{N} \|y_i - M(x_i; w)\|^2 + \frac{N}{2}\log(2\pi\sigma^2) + \frac{N}{N_{\text{full}}} \left( \frac{1}{2\sigma_{\text{prior}}^2}\|w - w_0\|^2 + \frac{K}{2}\log(2\pi\sigma_{\text{prior}}^2) \right),\]

where \(N_{\text{full}}\) is the full dataset size (relevant for mini-batch scaling).