Skip to the content.

This website contains information regarding the paper Input-gradient space particle inference for neural network ensembles.

TL;DR: We introduce First-order Repulsive Deep ensembles (FoRDEs), a method that trains an ensemble of neural networks diverse with respect to their input gradients.

Please cite our work if you find it useful:

@inproceedings{trinh2024inputgradient,
    title={Input-gradient space particle inference for neural network ensembles},
    author={Trung Trinh and Markus Heinonen and Luigi Acerbi and Samuel Kaski},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=nLWiR5P3wr}
}

Repulsive deep ensembles (RDEs) [1]

Description: Train an ensemble \(\{\boldsymbol{\theta}_i\}_{i=1}^M\) using Wasserstein gradient descent [2], which employs a kernelized repulsion term to diversify the particles to cover the Bayes posterior \(p(\boldsymbol{\theta} | \mathcal{D}) \).

drawing

Problem: It is unclear how to define the repulsion term for neural networks:

First-order Repulsive deep ensembles (FoRDEs)

drawing

Possible advantages:

Defining the input-gradient kernel \(k\)

Given a base kernel \(\kappa\), we define the kernel in the input-gradient space for a minibatch of training samples \(\mathcal{B}=\{(\mathbf{x}_b, y_b\}_{b=1}^B\) as follows:

drawing

We choose the RBF kernel on a unit sphere as the base kernel \(\kappa\):

drawing

Tuning the lengthscale \(\boldsymbol{\Sigma}\)

Each lengthscale is inversely proportional to the strength of the repulsion force in the corresponding input dimension:

drawing

Proposition: One should apply strong forces in high-variance dimensions (more in-between uncertainty) and weak forces in low-variance dimensions (less in-between uncertainty). drawing

Illustrative experiments

drawing

drawing

For a 1D regression task (above) and a 2D classification task (below), FoRDEs capture higher uncertainty than baselines in all regions outside of the training data. For the 2D classification task, we visualize the entropy of the predictive posteriors.

Lengthscale tuning experiments

drawing

Benchmark comparison

drawing

drawing

drawing

Main takeaways

  1. Input-gradient-space repulsion can perform better than weight- and function-space repulsion.
  2. Better corruption robustness can be achieved by configuring the repulsion kernel using the eigen-decomposition of the training data.

References

[1] F. D’Angelo and V. Fortuin, “Repulsive deep ensembles are Bayesian,” Advances in Neural Information Processing Systems, vol. 34, pp. 3451–3465, 2021.

[2] C. Liu, J. Zhuo, P. Cheng, R. Zhang, and J. Zhu, “Understanding and Accelerating Particle-Based Variational Inference,” in International Conference on Machine Learning, 2019.