Using machine learning to correct model error in data assimilation and forecast applications

The idea of using machine learning (ML) methods to reconstruct the dynamics of a system is the topic of recent studies in the geosciences, in which the key output is a surrogate model meant to emulate the dynamical model. In order to treat sparse and noisy observations in a rigorous way, ML can be combined with data assimilation (DA). This yields a class of iterative methods in which, at each iteration, a DA step assimilates the observations and alternates with a ML step to learn the underlying dynamics of the DA analysis. In this article, we propose to use this method to correct the error of an existing, knowledge-based model. In practice, the resulting surrogate model is a hybrid model between the original (knowledge-based) model and the ML model. We demonstrate the feasibility of the method numerically using a two-layer, two-dimensional, quasi-geostrophic channel model. Model error is introduced by the means of perturbed parameters. The DA step is performed using the strong-constraint 4D-Var algorithm, while the ML step is performed using deep learning tools. The ML models are able to learn a substantial part of the model error and the resulting hybrid surrogate models produce better short- to mid-range forecasts. Furthermore, using the hybrid surrogate models for DA yields a significantly better analysis than using the original model.

The paper, led by my colleague Alban Farchi, entitled Using machine learning to correct model error in data assimilation and forecast applications, and initiated during a stay at the ECMWF, is published (open access) in the Quarterly Journal of the Royal Meteorological Society.