Data Generated by Polyphemus

Ensemble Simulations for the Full Year 2001

Quick Description

We provide over 200 simulations generated with Polyphemus for the full year 2001 across Western Europe. The generation of the ensemble is explained and discussed in this paper:
Automatic generation of large ensembles for air quality forecasting using the Polyphemus system; Garaud and Mallet; Geosci. Model Dev., 3, 69-85, 2010 External link


You may use the data for any purpose, without restrictions. Note that the data is distributed in the hope that it will be useful, but without any warranty of any kind.

The simulations rely on different data sources that are cited in the above paper. As demanded by ECMWF External link, we mention on this page that the simulations used ECMWF meteorological fields. You will have to cite them as well if you use the data.

If you publish material using the ensemble simulations, we let you decide whether or not to cite us. Of course, we hope that we will cite the above paper if you think it is appropriate.

Downloading Data

The 107-member ensemble described in the paper is made of two parts. Six members, referred to as reference members in the paper, were generated "by hand". The other (101) members were randomly generated with the approach detailed in the paper. These two parts can be found here for the six members, and here for the randomly generated members.

In addition, we provide an extension with another 100 members (randomly generated), available here.

Data Format

In each directory, there are sub-directories named with series of 0, 1 and 2. Each sub-directory contains the output of one member of the ensemble. The sequence of numbers encodes the model formulation and the perturbations of its inputs. In the directories results/, you will find ozone (O3.bin), nitric oxide (NO), nitrogen dioxide (NO2) and sulfur dioxide (SO2).

The binary files store single-precision floating-point numbers. They do not contain any header. Each binary file only stores the concentrations for one species and for one member of the ensemble. Each file is stored in this format:

Loop on time t
   Loop on latitude y
      Loop on longitude x

The time step is one hour. The first date is 2001-01-01 at 01:00 UTC. The last date is 2001-12-31 at 00:00 UTC. In total, there are 8736 time steps.

The domain is discretized by a grid with a fixed space step of 0.5° along latitude and longitude. For each time, the first grid point stored in a file is the lower left corner. The coordinates of its center are (-10.5°, 35°). There are 46 points along latitude and 67 points along longitude. The center of the last grid point (upper right corner) is therefore located at (22.5°, 57.5°).

Loading the Data in Python

Using Numpy External link, you may load the data in Python with the following lines:

from numpy import fromfile
concentration = fromfile("6-members/0000000000000000000/results/O3.bin", dtype = "Float32")
concentration.shape = (8736, 46, 67) # resizing
print concentration[100, 20, 30] # returns 41.0507
print concentration.astype(float).mean() # returns 77.4630367294

Loading the Data in R

In R, the following lines can be used: = file("6-members/0000000000000000000/results/O3.bin", "rb")
concentration = readBin(, double(), size = 4, n = 8736 * 46 * 67, endian = "little")
mean(concentration) # returns 77.46304