Citizen-controlled Data Science for Multiple Sclerosis Research

April 1, 2018
Share this project


Multiple Sclerosis (MS) is a complex chronic disease whose manifestation depends on clinical, environmental and individual factors and for which prediction of individual progression is poor and often treatment decisions are hampered by the lack of objective parameters (e.g., related to fatigue).

MS data was employed as the use case within the MIDATA project which aims at developing an ethically fair and secure data infrastructure that permits collection, integration and analysis of diverse types of data under the control of the citizen/patient.

The task of SDSC within this project was to extract data from the doctor's reports collected and stored with the hospital software kisim. A doctor's report is a semi-structured text of a few to several dozen lines where each line is associated with a topic such as diagnosis, current state, history, MRI or medication. The neurology clinic at the university hospital of Zurich USZ has developed and is maintaining the database seantis to store MS patients records in a structured manner. So far, the seantis database has been filled manually by transcribing information from the doctor's reports to the corresponding fields.



SDSC Team:
Fernando Perez-Cruz
Lilian Gasser
Luis Salamanca

PI | Partners:

Institute of Molecular Systems Biology:

  • Dr. Ernst Hafen

More info

Department of Computer Science:

  • Prof. Dr. Gunnar Rätsch
  • Prof. Dr. Christian Holz
  • Dr. Cristobal Esteban Aizpiri
  • Liliana Barrios

More info

Klinik für Neurologie:

  • Dr. med. Andreas Lutterotti
  • Marc Hilty
  • Dr. med. Roland Martin

More info

Institute for Medical Informatics:

  • Dr. François von Kaenel

More info

Scientific IT Services

  • Bräunlich Gerhard

More info



Semi-automatic update of the MS database seantis using the doctor's reports.


  • Build embedding of doctor's reports using Doc2Vec where one text line corresponds to one document.
  • Multi-class classification of text lines using embedding vectors as features and manually assigned labels as targets. This intermediate step allows to predict text line labels for new  unseen doctor's reports.
  • For specific parts of seantis (MS diagnosis, MRI information, ...), tailored classification procedures were developed to predict columns of interest, e.g. MS diagnosis type, type of MRI (spinal or cranial) and whether new and/or contrast medium enhancing lesions were detected.


Facilitate the update of the seantis database by providing predictions for fields of interest based on extracted information from doctor's reports.


Figure 1: General overview
Figure 2: Applied methodology


Additionnal resources



Paitz, Patrick; Chmiel, Ma\lgorzata; Husmann, Lena; Volpi, Michele; Kamper, Francois; Walter, Fabian"Generic seismic mass-movement detection leveraging unsupervised statistical learning methods"IUGG23-07422023

Related Pages

More projects


In Progress
Machine Learning for the Future Circular Collider Design
Big Science Data


In Progress
Real-time cleansing of snow and weather data for operational avalanche forecasting
Energy, Climate & Environment


AI-augmented architectural design
Energy, Climate & Environment


In Progress
Extracting activity from large 4D whole-brain image datasets
Biomedical Data Science


Latest news

PassGPT | Using language models to enhance password security
February 6, 2024

PassGPT | Using language models to enhance password security

PassGPT | Using language models to enhance password security

PassGPT is a Large Language Model for password generation trained on leaked passwords, which can outperform existing methods based on generative adversarial networks by guessing twice as many unseen passwords.
ADORE | A benchmark dataset in ecotoxicology to foster the adoption of machine learning
January 24, 2024

ADORE | A benchmark dataset in ecotoxicology to foster the adoption of machine learning

ADORE | A benchmark dataset in ecotoxicology to foster the adoption of machine learning

Applying machine learning to ecotoxicology could help reduce the number of animal tests, costs, and animals sacrificed while preserving the accuracy of the in vivo tests.
License Flowers | Art and AI at SDSC
February 21, 2024

License Flowers | Art and AI at SDSC

License Flowers | Art and AI at SDSC

An adventure to create art using AI to raise awareness on code licenses

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!