The creation of avalanche bulletins is still a largely expert driven and manual task. Forecasters manually inspect vast amounts of spatio-temporal data, which describe the condition of the snowpack and of the local weather in the Alps. Based on their intuition and knowledge, they will then assign danger levels on an ordinal scale from 1 to 5 (low to high avalanche danger) to create the avalanche bulletin for the Swiss Alps. This labour intensive task is carried out once or twice a day during snow season, and becomes vulnerable to errors and biases. Forecasters can hardly explore all of the relevant data. In this SDSC collaborative project, we aim at exploring the feasibility of using data-driven statistical models to support the process of avalanche danger forecast, explore relevant data, and ultimately get one step closer to obtain an automated decision tool supporting human experts.
This blog post introduces the research project "DEAPSnow: Improving snow avalanche forecasting by data-driven automated predictions", a joint collaboration between the Swiss Data Science Center and the Swiss institute for snow and avalanche research.
Avalanche danger bulletins are essential in Switzerland
Timely and accurate prediction of avalanche danger is not only crucial information for wintertime activities and ski stations, but is an important source of information regarding land-use planning and for the mapping of natural hazard areas. On a shorter temporal scale, the avalanche bulletin is an important source of information to assess risk on public and private infrastructures after heavy snowfall. For all these reasons, an accurate and timely avalanche danger forecast, is a crucial piece for most Alpine villages and related wintertime economic activities.
The creation of a bulletin is still a task largely driven by expert-based knowledge. Highly trained and skilled avalanche forecasters gather and parse a tremendous amount and variety of data. This information is a massive spatio-temporal multi-faceted data cube that needs ultimately to be reduced into a set of five danger levels, ranging from 1 to 5, an elevation and mountain aspects (N, E, S, W) to which said danger level score applies. This summarizes snowpack instabilities and avalanche susceptibility into a format readable and interpretable by both layman users and experts . For instance, in Fig. 1 below, the danger level for part of the southern prealps (the area in yellow) corresponds to moderate risk (level 2) from 1400m upwards, at all slope orientations.
Supporting automated delineation of danger levels
As one can guess, avalanche forecasting and the creation of an avalanche bulletin is an difficult data-intensive process, with scarce automation in terms of decision making. The forecast of danger levels is a very complicated process because experience and situational interpretation that transcend the mere data are needed to assign danger levels. To further exacerbate the complexity of the task, the danger level is not a quantity strictly defined in terms of physical properties of the snowpack, but it is only defined as a qualitative set of levels on an ordinal scale. For instance, danger level 1 is defined as:
"The snowpack is well bonded and stable in general. Triggering is generally possible only from high additional loads (e.g. several skiers) in isolated areas of very steep, extreme terrain. Only small and medium natural avalanches are possible"
while danger level 5 is defined as:
"The snowpack is poorly bonded and largely unstable in general. Numerous very large and often extremely large natural avalanches can be expected, even in moderately steep terrain."
This poses two main issues: first, the danger level itself has to be assigned by interpreting the situation at a given day. This entails questions about temporal consistency of bulletins, since a change in the forecasters team, change in sensor or used climate and snow models, could potentially entail changes is assignment of danger levels. A danger level of 3 assigned in 1992 could correspond to a danger level of 4 after reanalysis in 2020. Crucially, since the danger level is the result of expert interpretation, there is no direct way of directly measuring it using physical parameters or accurately verifying it at post-hoc. Depending on the overall conditions, at the moment of the creation of the bulletin, forecasters have to take data from punctual measurements locations and generalize the estimated danger level over all the warning regions. This process involves some level of smoothing: some areas might receive and assigned danger level that differs from the one indicated by measurements, in particular if neighboring stations point at a higher danger level. Elevation of each measurement station should also be taken into account, since the amount of snow and in general conditions favorable to avalanche formations are varying according to elevation.
The second issue is that each forecaster, in order to optimally interpret the situation, has to parse a massive variety of data. These include manual observations, visual estimates, subjective judgments by individual external observers, meteorological data from automated weather stations, output from numerical weather prediction and snow cover models, avalanche occurrence and snow stratigraphy data. Each one of these datasets come with a specific spatio-temporal resolution, validity, accuracy and a history of subjective preference by the forecaster.
The "DEAPSnow" research project aim at answering important questions that would lead to the creation of a tool able to assess local danger level prediction and therefore support forecasters in their task.
Can a danger level be estimated automatically using machine learning models?
What data is required and at what temporal resolution? How much historical data should be taken into account for each prediction?
What family of methods is best at predicting the danger level?
How would such a pipeline work for real-time prediction using snowpack and weather forecast data?
Creation of unique datasets enabling the use of machine learning
Ultimately, the avalanche danger is a function of snowpack stability, which in turn is affected by weather and climatology. These data are collected by Intercantonal Measurement and Information System (IMIS) network. IMIS stations collect a range of measurements every 30 minutes, which are sent to a receiver server located at the SLF centre in Davos, Graubunden. These data are fed into a numerical model -- the SNOWPACK model) -- to estimate a large set of features related several snow characteristics.
To approach the research questions outlined above, we compiled a large dataset containing measurements from weather stations, measurements about the snow state and the output of a numerical model targeting the snowpack and its layer evolution. The numerical model is fit to observed snow conditions, at the location shown in Fig. 2. Measurements are dense in space, but not dense enough to represent all local weather and snow characteristics. Whether an avalanche occurs or not is dominated by a series of local processes, but the regional level avalanche danger is related to larger-scale snow cover and weather characteristics, which make it possible to use such dataset also in an automated processing.
The avalanche bulletin is published on a daily basis at 17:00, forecasting the danger for the next day. It has to be noted, that the danger level is forecasted at the so-called danger regions (black polygon boundaries in Fig. 1), while we append it to punctual time series measurements representative of local measurements. We compiled data from the past 22 seasons, and we attempt to predict the danger level given the measurements and physical model outputs on a daily basis.
Machine learning to the rescue
A basic machine learning task can be phrased as a standard supervised classification problem: given the measurements on a given day, we aim at predicting the danger level forecast attached for it (i.e. made the previous day). In real scenarios, we do not have access to real next-day measurements, but we do have access to simulated forecast measurements as provided by a climate model and the snowpack numerical model, which are the same as those used by forecasters.
We first focused only on the prediction of dry snow avalanches. Such data subset is accessible by parsing the ancillary information about type of expected avalanches, as provided by the forecasters. Many preprocessing steps have to be undertaken in order to filter data and parse stations which measure parameters related to actual avalanche formation processes (e.g. based on elevation, amounts of measured snow, etc.)
This supervised classification problem is extremely unbalanced, and it is representative of the actual danger level forecasted in the Alps occurring every year. Fig. 3 represents a bar-plot of the counts of the danger levels forecasts for all the measurement stations, over 22 years. Notice that danger level 5 is not even appearing in the plot, since only counting 0.06% (N=236) of all the events.
We trained several models, ranging from linear regressors and classifiers, boosted decision trees, random forests, recurrent neural networks and convolutional neural networks. All the models have access to some form of historical information, the last two models access previous days measurements directly, as sequences, while all the other have access to smoothed measurements by additional variables summarizing multiple day statistics (e.g. mean over 3 days, 7 days, etc.). It turns out that, given the vast heterogeneity of the input data, random forests perform best. We train the models over 20 years and use winters of 2018/2019 and 2019/2020 as independent test sets.
These preliminary results are encouraging:
Averaged per-class (f-score) indicates an agreement of >70% with the official bulletins. Although ground truth labels are uncertain and potentially biased in ways we cannot detect, this accuracy score is very high, and the models have shown to be temporally and spatially consistent.
Random forests naturally return a ranking of features based on their importance in the model. The ranking of the features is consistent with variables that are analysed by forecasters. These variables mostly relate to fresh snow and wind driven snow accumulation of the last several days and several indices and profile parameters related with the stability of the snow cover.
Errors committed by the baseline model are often committed by predicting a danger level close to the official forecast. That is, It never happens that a "real" danger level 4 is mistaken for a danger level 1 or a danger level 2. This means that the classification problem is well posed, and errors that are potentially costly (in terms of real world consequences) are not being committed. Fig. 4 shows the error matrix and the diagonal is clearly dominating, as one would hope.
Lessons learned so far
Although machine learning models seem to provide encouraging results, their outputs will always have to be verified by expert forecasters. The labels on which models are trained are very unbalanced and ultimately uncertain, in some aspects and could potentially lead to biased predictions in uncommon snow and weather conditions. The integration of such models in the operational decision process has to be made with care, involving all the people concerned.The Swiss dataset we are using is large, rich and ultimately well curated. Even so, many forecast regions (white boundary areas in the bulletin in Fig. 1) do not contain any IMIS measurement station, and therefore their estimated danger level must be extrapolated. In principle, for those regions contained in the Alpine massif, simple nearest neighbors interpolation could be accurate enough. But to assign danger level estimates to regions at the border of the Alps (e.g. Prealpine slopes or Jura), forecasters use their experience and some level of intuition, a very hard process to be translated into a globally optimal machine learning algorithm. For that reason, methods providing a ful mapping of danger levels at the full Swiss level, still need to be explored.Finally, we focused so far on dry snow avalanches. This surely compose the large bulk of avalanches during winter seasons in Switzerland. However, particularly at the end of winter seasons, wet snow avalanches are a common phenomena, although rare when compared to all the events. As the meteorological factors and physical mechanisms that lead to wet snow avalanche formation is completely different from the dry snow avalanche formation, specific models should be developed and assessed.
Ongoing project directions
There are important avenues that must be paved in order to make machine learning models credible in the operational avalanche forecasting setting. First and foremost, real time validation by experts, in addition to statistical accuracy assessment using the annotated test set, is paramount .In order to simulate an operational setting, we set up with the help of the SLF IT, a live server making predictions every 3 hours for the whole 2020/2021 season. All the models tested in this settings (Random Forests and Convolutional Neural Networks) are extremely fast at inference on CPU, therefore not posing an issue in terms of computational load. Results were then sent to one forecaster, that would inspect them carefully after having made and discussed the official bulletin. Overall, the performance matched the one provided on the 2018/2019 - 2019/2020 test set, underlining the consistency of the model also for the season 2020/2021. This experiment was very successful and we are planning to extend it to the next season.As mentioned above, further models will be developed for wet snow avalanches, so that could be included in the live prediction setting. It will be important that such models provide some additional feedback in terms of interpretability, since mechanics of wet snow avalanches are less clear and depend strongly on subtle temporal changes in liquid water content within the snowpack.To conclude, the aim of this large, open ended project is to explore the possibility of using statistical machine learning to support the avalanche forecast task in Switzerland. We aim at answering several questions by developing models predicting different key quantities: We approached directly danger level prediction, and we currently explore the possibility of predicting the likelihood of wet snow avalanches. We consider models for dry snow avalanche conditions that are fast at inference, since being able to use them in real time, is a crucial requirement.Figures and plots in this blog post are taken from an upcoming publication (full reference will be updated when published).
Deep learning – the area of machine learning generating its models from deep neural networks – has revolutionized the way we think of machine learning problems.While its disruptive force initially hit the field of computer vision with the massive adoption.