FLBI

Feature Learning for Bayesian Inference

Started
September 1, 2022
Status
In Progress
Share this project

Abstract

The goal of this project is to use machine learning to find low-dimensional features in high-dimensional noisy data generated by stochastic models or real systems. In the first case, we are interested in the features imprinted on simulated data by the parameters of the stochastic model. In the latter, the interesting features depend on the particular system. In hydrology, one of the domains considered in this project, they are fingerprints of catchment properties in observed time-series of river-runoff. In astrophysics, another considered domain, the features are the parameters of solar dynamo models that govern the magnetic field strength of the sun. In each case, the problem is to disentangle the effect of high-dimensional disturbances from the effects of relevant characteristics (i.e., parameters, fingerprints, etc.). This problem is reminiscent of the problem of finding collective quantities characterizing states of interacting particle systems in statistical mechanics. Variational autoencoders, have been proven to be capable of learning such quantities in the form of order parameters or collective variables, which could then be used to, e.g., enhance Markov chain Monte Carlo samplers or molecular dynamics simulations.

People

Collaborators

SDSC Team:
Simon Dirmeier
Fernando Perez-Cruz

PI | Partners:

Swiss Federal Institute of Aquatic Science and Technology, Mathematical Methods in Environmental Research:

  • Dr. Carlo Albert
  • Dr. Simone Ulzega
  • Alberto Bassi

More info

University of Lugano, Institute of Computational Science:

  • Prof. Dr. Antonietta Mira

More info

description

Motivation

Computational Bayesian inference of parameters or summary statistics is complicated in high-dimensional parameter- or data-spaces. In that case, conventional methods, such as approximate Bayesian computation or sequential Monte Carlo versions thereof, fail to infer correct posterior distributions for a measurement which necessitates the development of efficient methods for these scenarios. A recent approach utilizes deep learning to approximate posterior distributions with neural networks, however, these approaches still do not scale well to high-dimensional data, such as time series data of the magnetic field of solar cycle.

Proposed Approach / Solution

For this project, we develop novel methods for high-dimensional approximate inference, both in parameter- and data-space, utilizing recent developments in neural density estimation and dimensionality reduction. We a) propose several theoretical approaches to solve high-dimensional approximate inference and b) develop high-quality Python libraries implementing these algorithms for applied researchers to use.
One of our recent methods, which we call surjective sequential neural likelihood estimation, uses dimensionality-reducing normalizing flows to more efficiently estimate probability densities which allows for increased accuracy in inferring posterior distributions. On several experimental benchmarks our method outperforms baseline methods while being more computationally efficient w.r.t. the number of trainable parameters and computational speed (Figures 1 and 2).

Impact

We expect of our methods to be of great value for Bayesian inference, since current methodology is not suitable for high-dimensional time series data, for instance time series observations from solar dynamos or other ODE/PDE-models.

Figure 1: Assessing the performance of SSNL on several experimental benchmarks. We compared our method, SSNL, against several baseline methods (SNL, SNRE-C, SNPE-C) in four benchmark models. The x-axis shows the sample size used for training the model, while the y-axis shows the divergence to the true posterior (the lower the better). SSNL either outperforms the baselines or is on par while requiring less trainable parameters due to the dimensionality reduction.
Figure 2: Assessing the performance of SSNL on a complicated solar dynamo model. Applying our dimensionality-reducing method, SSNL, to a solar dynamo shows its advantage over a dimensionality-preserving method (SNL). With only 1000 data points (called “round 1” above) SSNL achieves to locate the posterior distributions correctly around the true parameter value (black dot), while SNL infers a biased posterior distribution. When using 15,000 data points (called “round 15”), both methods achieve similar results.

Gallery

Annexe

Additional resources

Bibliography

  1. Albert, Carlo, Hans R. Künsch, and Andreas Scheidegger. "A simulated annealing approach to approximate Bayes computations." Statistics and computing 25 (2015): 1217-1232.
  2. Greenberg, David, Marcel Nonnenmacher, and Jakob Macke. "Automatic posterior transformation for likelihood-free inference." International Conference on Machine Learning. PMLR, 2019.
  3. Papamakarios, George, David Sterratt, and Iain Murray. "Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows." The 22nd international conference on artificial intelligence and statistics. PMLR, 2019.

Publications

Dirmeier, S.; Albert, C.; Perez-Cruz, F. "Simulation-based Inference for High-dimensional Data using Surjective Sequential Neural Likelihood Estimation" Proceedings of the 41st Conference on Uncertainty in Artificial Intelligence 2025 View publication
Dirmeier, S.; Mira, A. "Causal Posterior Estimation" Preprint 2025 View publication
Albert, C.; Ulzega, S.; Dirmeier, S.; Scheidegger, A.; Bassi, A.; Mira, A. "Simulated Annealing ABC with multiple summary statistics" Preprint 2025 View publication
Dirmeier, S.; Ulzega, S.; Mira, A.; Albert, C. "Simulation-based inference with the Python Package sbijax" Preprint 2024 View publication
Bassi, A.; Höge, M.; Mira, A.; Fenicia, F.; Albert, C. "Learning Landscape Features from Streamflow with Autoencoders" Preprint 2024 View publication
Di Noia, A.; Macocco, I.; Glielmo, A.; Laio, A.; Mira, A. "Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification" Preprint 2024 View publication
Dirmeier, S. "Surjectors: surjection layers for density estimation with normalizing flows" Journal of Open Source Software 9 94 6188 2024 View publication

Related Pages

More projects

LUCID National Data Stream

In Progress
Low Value of Care in Medical Hospitalized Patients - a National Data Stream on Quality of Care in Swiss University Hospitals
Health & Biomedical

Syngenta: Steam consumption optimization

Completed
Reliable strategies to save energy in Syngenta’s Kaisten plant
Energy & Sustainability
Private sector

Pilot project ENERBAT

Completed
Data-Driven Pathways to Net Zero for the Canton of Vaud’s Building Portfolio
Energy & Sustainability
Climate & Environment
Public sector

EKZ: Synthetic Load Profile Generation

Completed
Reliable electricity load monitoring for non-metered nodes
Energy & Sustainability
Public sector

News

Latest news

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland
May 7, 2026

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland

Coding the Future: Energy Data Hackdays Expand to French-speaking Switzerland

Held at the SDSC headquarters at Biopôle, the Energy Data Hackdays gather 100 experts to tackle 5 energy and grid challenges.
Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués
April 30, 2026

Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués

Science des données : le SDSC et le Canton de Vaud soutiennent quatre projets appliqués

Le SDSC et le Canton de Vaud ont retenu quatre projets parmi les 57 soumissions reçues lors de leur deuxième appel à projets.
Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne
March 12, 2026

Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne

Le Swiss Data Science Center inaugure son siège au Biopôle de Lausanne

Le SDSC inaugure aujourd'hui son siège au campus Biopôle de Lausanne, dans le cadre d'un partenariat stratégique avec l'État de Vaud.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!