
Synthetic Data for Biomedical Applications


Before joining SDSC, Arshjot Khehra received his MSc in Artificial Intelligence from USI Lugano, where he completed his thesis on hierarchical graph reinforcement learning. Previously, he worked for 4+ years across India and Singapore gaining data science experience in insurance, logistics, and manufacturing sectors. He also holds a BSc in Industrial Engineering from PEC Chandigarh. Over the course of his career, Arshjot worked on a wide array of projects, such as, handwritten text recognition and generation, voice matching across phone call recordings, policy lapse rate prediction for customer retention, and automated insurance claim processing.


Matthias Galipaud obtained his PhD in evolutionary biology in 2012 from the University of Burgundy in Dijon (France), and held postdoctoral positions as a mathematical biologist at the university of Bielefeld (Germany) and the university of Zurich, where he researched the evolutionary theories of aging and mate choice. In 2020, he became a data scientist, developing machine learning solutions for startups in Switzerland and Australia before joining the SDSC Innovation Team in November 2022.


Valerio started his career working for 7 years as a particle-physics researcher at CERN. There, he used state-of-the-art techniques to extract information from data, especially to search for traces of dark matter in particle collisions. Since 2016, he has worked in consulting, applying data science in several industries. First, he joined the Quant team of Ernst & Young in Geneva. Later, he created his own company, SamurAI sàrl, providing consulting services for his clients. He also has a passion for teaching very complex subjects in simple terms. That is why he particularly enjoys offering training programs to private companies and universities. Valerio joined the SDSC in Mai 2022 as a Principal Data Scientist with the mission of accompanying industrial partners and other institutions through their data science journey.

Presentation
Overview
Recently, synthetic data has enjoyed growing interest from the biomedical sector. Synthetic patient data helps in leveraging privacy issues. Augmenting datasets with synthetic records helps with increasing classification model training performance in the face of scarce health data and rare minority classes (e.g. rare diseases).
During this one-day workshop, organized by CHUV and SDSC, we will review available tools for synthetic data generation and use cases in the biomedical and pharmaceutical sectors.
Details
Target Audience
Experienced professionals, executives, and data scientists in the biomedical and pharmaceutical sectors wishing to acquire hands-on knowledge on synthetic data generation and usage.
As the workshop involves hands-on sessions, prior experience with the programming language Python is required. The workshop will be held in English.
Programme
Objectives
By the end of the day, participants will:
- Have a grasp of current available methods for synthetic tabular and image data generation.
- Have identified use cases and challenges of synthetic data in the biomedical and pharmaceutical sector.
- Have hands-on experience with generating synthetic data with python and evaluating its quality.
Agenda
09:00
Welcome coffee
09:30
Welcome & introduction
09:40
Synthetic data: How it works and where it is currently used
10:10
GANs, VAEs and diffusion: a deeper dive
10:55
Break
11:15
Applications in healthcare
11:45
Towards the use of synthetic data in biomedical applications: Evaluation of privacy and utility tradeoff
12:15
Lunch
13:15
Applications in the pharmaceutical industry
14:00
Hands-on (part 1): Understanding synthetic data generation (e.g. generating synthetic medical images for image classification)
14:50
Break
15:10
Hands-on (part 2): Understanding synthetic data evaluation (e.g., sharing survival data evaluating the utility and privacy of tabular synthetic data
16:40
Panel discussion
17:10
Concluding remarks, Apéro & Networking
Instructors
Jeremie Despraz, MS, Principal Data Scientist in Clinical AI, CHUV
Matthias Galipaud, PhD, Senior Data Scientist, SDSC, ETHZ
Beyrem Kaabachi, MS, Data Scientist in Health Data Privacy, CHUV
Arshjot Khehra, Data Scientist, SDSC ETHZ
Jean-Louis Raisaro, PhD, Tenure-Track Assistant Professor in Biomedical data science, CHUV
Alena Simalatsar, PhD, Assistant Professor, HES-SO
Practical Information
Price
Non-members: 150/pers
Ongoing collaborations with SDSC or BDSC: free
Availability & Registration
52 registered participants - Registration closed.
Other events

Data-Driven Control Methods for Energy and Manufacturing


Roberto holds an M.Sc. and a Ph.D. in Particle Physics from the University of Torino, Italy. He has worked for several years in fundamental research as a senior fellow and data scientist at the CERN Experimental Physics division and on a research project supported by the Belgian National Fund for Scientific Research (FNRS). In 2018 he moved to EPFL to work on data mining and Machine Learning techniques for the built environment and renewable energies. He has started and led multiple collaborations with academic and industry partners in the energy domain. Roberto joined the SDSC in September 2021 as a Principal Data Scientist with the mission of accompanying industries, NGOs and international organizations through their data science journey.


Carl holds a Ph.D in Mathematics from École des Ponts ParisTech and Université Gustave Eiffel in Paris. He has broad interests in statistics and stochastic control, and works on reinforcement learning, generative methods and time series forecasting, with applications in various domains such as energy, finance and physics. He worked with EDF R&D and Finance des Marchés de l’Energie (FiME) laboratory on applications of machine learning to risk management, including time series generation and deep hedging. He joined the SDSC in 2022 as a senior data scientist in the academic team at École Polytechnique Fédérale de Lausanne (EPFL).


Victor joined as a Data Scientist in the SDSC Innovation team in 2023. He holds a Bachelor's degree in Mechanical Engineering (B.Eng.) from the University of Pretoria in South Africa, as well as Master's degrees in Robotics and Mechatronics (M.Sc.) and Artificial Intelligence (M.Sc.) from KU Leuven in Belgium.Prior to joining SDSC, he worked for several years as a consultant at Capgemini Engineering and as an R&D Engineer at Toyota Motor Europe. Within the Advanced Powertrain and Target Setting team at Toyota, Victor played a crucial role in the pre-development of innovative electric and fuel-cell vehicles. His responsibilities included leading the development and deployment of Natural Language Processing (NLP) tools and pipelines, data science and machine learning, building data analytics dashboards, statistical forecasting, powertrain design, optimal control system design, and strategic technical target setting. He is passionate about leveraging his combined Engineering and Data Science knowledge to solve complex problems in the industry.


After earning a MSc in Theoretical Physics at University of Padua, Giulio graduated in Quantitative Finance from Bocconi University. Before joining the SDSC industry cell in June 2021, he spent a few years working in the financial sector, where he mainly dealt with the application of machine learning to financial risk management. When not coding, Giulio spends his free time playing bass guitar, hiking and cooking.

ENID | Enabling Innovation with Data Science at ETH Zurich


Dr. Olivier Verscheure is the director and founder of the Swiss Data Science Center (SDSC). Olivier also co-leads a joint training program between EPFL and HEC Lausanne, specifically designed for senior executives. Since 2018, Olivier has been a member of the Board of Directors of Lonza, a global leader in the life sciences sector. This company provides products and services to the pharmaceutical, biotechnology, and specialized healthcare industries.Olivier began his career at IBM Research after earning his Ph.D. in computer science from EPFL. He held several research and leadership positions at the IBM T. J. Watson Research Center in New York and co-created and co-directed the IBM Research center in Dublin, Ireland, before joining the EPFL in 2016.


Silvia holds an MSc in Computer Science from EPFL and a PhD in Computer Science from the University of York, UK. She has been a senior research fellow at the University of Trento and later at Politecnico di Milano, Italy. Here, she had the chance to work on Marie Curie and ERC projects relating to natural language processing. From 2012 to 2019, she was a Senior Manager and NLP expert at ELCA Informatique Switzerland, whose AI department she helped create and expand. Silvia joined the Swiss Data Science Center in 2019 and is currently its Chief Transformation Officer, in charge of the team leading organizations to digital transformation.


Anna joined SDSC as a Data Scientist focusing on industry collaborations in July 2019. She completed her PhD in Bioinformatics at the University of Luxembourg, where she analysed large-scale heterogeneous datasets and leveraged multiple disciplines: Statistics, Network Analysis, and Machine Learning. Before joining SDSC, Anna worked as a Data Scientist at Deloitte Luxembourg, with a focus on computer vision and time-series analysis.Currently, Anna is a Principal Data Scientist based at the ETH Zurich office, where she leads biomedical collaborations with industry partners. Anna works on a range of projects: protein properties prediction, biomanufacturing optimization, statistical model evaluation and others.


Matthias Galipaud obtained his PhD in evolutionary biology in 2012 from the University of Burgundy in Dijon (France), and held postdoctoral positions as a mathematical biologist at the university of Bielefeld (Germany) and the university of Zurich, where he researched the evolutionary theories of aging and mate choice. In 2020, he became a data scientist, developing machine learning solutions for startups in Switzerland and Australia before joining the SDSC Innovation Team in November 2022.


Dan received an MSc in civil and environmental engineering from UC Berkeley and a Ph.D. from EPFL, where he developed models combining machine learning and geographic information systems to estimate renewable energy potentials on a large scale. After serving as a researcher/data scientist at Unisanté (Lausanne) and completing a one-year postdoc at the Quebec Artificial Intelligence Institute (Mila) in Montréal, Dan joined the SDSC Innovation team. His work has generally been focusing on crafting and tailoring machine learning methods and deep learning architectures for a variety of domains, most notably the spatio-temporal modeling and forecasting of environmental and energy related variables, as well as multiple applications in public health research.

Data Science for the Sciences


Guillaume Obozinski graduated with a PhD in Statistics from UC Berkeley in 2009. He did his postdoc and held until 2012 a researcher position in the Willow and Sierra teams at INRIA and Ecole Normale Supérieure in Paris. He was then Research Faculty at Ecole des Ponts ParisTech until 2018. Guillaume has broad interests in statistics and machine learning and worked over time on sparse modeling, optimization for large scale learning, graphical models, relational learning and semantic embeddings, with applications in various domains from computational biology to computer vision.
Contact us
Let’s talk Data Science
Do you need our services or expertise?
Contact us for your next Data Science project!