gapfill : Fills gaps of flux data from Eddy covariance measurements according to
Reichstein et al. (Global Change Biology, 2005) or estimate flux uncertainties after Lasslop et al. (Biogeosciences, 2008).

This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

Copyright (c) 2012-2020 Matthias Cuntz - mc (at) macu (dot) de Released under the MIT License; see LICENSE file for details.

  • Written Mar 2012 by Matthias Cuntz - mc (at) macu (dot) de
  • Ported to Python 3, Feb 2013, Matthias Cuntz
  • Input data can be ND-array, Apr 2014, Matthias Cuntz
  • Bug in longestmarginalgap: was only working at time series edges, rename it to longgap, Apr 2014, Matthias Cuntz
  • Keyword fullday, Apr 2014, Matthias Cuntz
  • Input can be pandas Dataframe or numpy array(s), Apr 2020, Matthias Cuntz
  • Using numpy docstring format, May 2020, Matthias Cuntz

The following functions are provided

gapfill(dfin[, flag, date, timeformat, …]) Fills gaps in flux data from Eddy covariance measurements with Marginal Distribution Sampling (MDS) according to Reichstein et al.
gapfill(dfin, flag=None, date=None, timeformat='%Y-%m-%d %H:%M:%S', colhead=None, sw_dev=50.0, ta_dev=2.5, vpd_dev=5.0, longgap=60, fullday=False, undef=-9999, ddof=1, err=False, verbose=0)[source]

Fills gaps in flux data from Eddy covariance measurements with Marginal Distribution Sampling (MDS) according to Reichstein et al. (Global Change Biology, 2005).

This means, if there is a gap in the data, look for similar meteorological conditions (defined as maximum possible deviations) in a certain time window and fill with the average of these ‘similar’ values.

The routine can also do the same search for similar meteorological conditions for every data point and calculate its standard deviation as a measure of uncertainty after Lasslop et al. (Biogeosciences, 2008).

  • dfin (pandas.Dataframe or numpy.array) –

    time series of fluxes to fill as well as meteorological variables incoming short-wave radiation, air temperature, air vapour pressure deficit.

    dfin can be a pandas.Dataframe with the columns ‘SW_IN’ (or starting with ‘SW_IN’) for incoming short-wave radiation [W m-2] ‘TA’ (or starting with ‘TA_’) for air temperature [deg C] ‘VPD’ (or starting with ‘VPD’) for air vapour deficit [hPa] and columns with ecosystem fluxes with possible missing values (gaps). The index is taken as date variable.

    dfin can also me a numpy array with the same columns. In this case colhead, date, and possibly dateformat must be given.

  • flag (pandas.Dataframe or numpy.array, optional) –

    flag Dataframe or array has the same shape as dfin. Non-zero values in flag will be treated as missing values in dfin.

    flag must follow the same rules as dfin if pandas.Dataframe.

    If flag is numpy array, df.columns.values will be used as column heads and the index of dfin will be copied to flag.

  • date (array_like of string, optional) –

    1D-array_like of calendar dates in format given in timeformat.

    date must be given if dfin is numpy array.

  • timeformat (str, optional) – Format of dates in date, if given (default: ‘%Y-%m-%d %H:%M:%S’). See strftime documentation of Python’s datetime module: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
  • colhed (array_like of str, optional) – column names if dfin is numpy array. See dfin for mandatory column names.
  • sw_dev (float, optional) – threshold for maximum deviation of global radiation (default: 50)
  • ta_dev (float, optional) – threshold for maximum deviation of air Temperature (default: 2.5)
  • vpd_dev (float, optional) – threshold for maximum deviation of vpd (default: 5.)
  • longgap (int, optional) – avoid extraploation into a gap longer than longgap days (default: 60)
  • fullday (bool, optional) –
    True: move beginning of large gap to start of next day and move end of
    large gap to end of last day (default: False)
  • undef (float, optional) –

    values having undef value are treated as missing values in dfin (default: -9999)

    np.nan is not allowed (working).

  • ddof (int, optional) – Delta Degrees of Freedom. The divisor used in calculation of standard deviation for error estimates (err=True) is N-ddof, where N represents the number of elements (default: 1).
  • err (bool, optional) – True: fill every data point, i.e. used for error generation (default: False)
  • shape (bool or tuple, optional) –
    True: output have the same shape as input data if dfin is numpy array;
    if a tuple is given, then this tuple is used to reshape.

    False: outputs are 1D arrays if dfin is numpy array (default: False).

  • verbose (int, optional) – Verbosity level 0-3 (default: 0). 0 is no output; 3 is very verbose.

If err==False: filled_data, quality_class

If err==True: err_data

pandas.Dataframe(s) will be returned if dfin was Dataframe.

numpy array(s) will be returned if dfin was numpy array.

Return type:

pandas.Dataframe(s) or numpy array(s)


If err==True, there is no error estimate if there are no meteorological conditions in the vicinity of the data point.

Routine Does not work with undef=np.nan.

Reichstein et al. (2005)
On the separation of net ecosystem exchange into assimilation and ecosystem respiration: review and improved algorithm. Global Change Biology 11, 1424-1439


>>> import numpy as np
>>> from fread import fread
>>> from date2dec import date2dec
>>> from dec2date import dec2date
>>> ifile = 'test_gapfill.csv' # Tharandt 1998 = Online tool example file
>>> undef = -9999.
>>> # data
>>> dat   = fread(ifile, skip=2, transpose=True)
>>> ndat  = dat.shape[1]
>>> head  = fread(ifile, skip=2, header=True)
>>> head1 = head[0]
>>> # colhead
>>> idx   = []
>>> for i in head1:
...     if i in ['NEE', 'LE', 'H', 'Rg', 'Tair', 'VPD']: idx.append(head1.index(i))
>>> colhead = ['FC', 'LE', 'H', 'SW_IN', 'TA', 'VPD']
>>> # data
>>> dfin = dat[idx,:]
>>> # flag
>>> flag = np.where(dfin == undef, 2, 0)
>>> flag[0,:] = dat[head1.index('qcNEE'),:].astype(np.int)
>>> flag[1,:] = dat[head1.index('qcLE'),:].astype(np.int)
>>> flag[2,:] = dat[head1.index('qcH'),:].astype(np.int)
>>> flag[np.where(flag==1)] = 0
>>> # date
>>> day_id  = head1.index('Day')
>>> hour_id = head1.index('Hour')
>>> ntime   = dat.shape[1]
>>> year  = np.ones(ntime, dtype=np.int) * 1998
>>> hh    = dat[hour_id,:].astype(np.int)
>>> mn    = np.rint((dat[hour_id,:]-hh)*60.).astype(np.int)
>>> y0    = date2dec(yr=year[0], mo=1, dy=1, hr=hh, mi=mn)
>>> jdate = y0 + dat[day_id,:]
>>> adate = dec2date(jdate, eng=True)
>>> # fill
>>> dat_f, flag_f = gapfill(dfin, flag=flag, date=adate, colhead=colhead, undef=undef, verbose=0)
>>> print('{:d} {:d} {:d} {:d} {:d} {:d}'.format(*flag_f[0,11006:11012]))
1 1 1 2 2 2
>>> print('{:.2f} {:.2f} {:.2f} {:.2f} {:.2f} {:.2f}'.format(*dat_f[0,11006:11012]))
-18.68 -15.63 -19.61 -15.54 -12.40 -15.33
>>> # 1D err
>>> dat_std = gapfill(dfin, flag=flag, date=adate, colhead=colhead, undef=undef, verbose=0, err=True)
>>> print('{:.3f} {:.3f} {:.3f} {:.3f} {:.3f} {:.3f}'.format(*dat_std[0,11006:11012]))
5.372 13.118 6.477 -9999.000 -9999.000 -9999.000
>>> dat_err     = np.ones(ndat, dtype=np.int)*(-1)
>>> kk          = np.where((dat_std[0,:] != undef) & (dat_f[0,:] != 0.))[0]
>>> dat_err[kk] = np.abs(dat_std[0,kk]/dat_f[0,kk]*100.).astype(np.int)
>>> print('{:d} {:d} {:d} {:d} {:d} {:d}'.format(*dat_err[11006:11012]))
28 83 33 -1 -1 -1