hesseflux.madspikes

madspikes : Spike detection for using a moving median absolute difference filter.

This module was original written by Tino Rau and Matthias Cuntz, and maintained by Arndt Piayda while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued by Matthias Cuntz while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

Copyright (c) 2008-2020 Matthias Cuntz - mc (at) macu (dot) de Released under the MIT License; see LICENSE file for details.

  • Written 2008 by Tino Rau and Matthias Cuntz - mc (at) macu (dot) de
  • Maintained by Arndt Piayda since Aug 2014.
  • Input can be pandas Dataframe or numpy array(s), Apr 2020, Matthias Cuntz
  • Removed iteration, Apr 2020, Matthias Cuntz
  • Using numpy docstring format, May 2020, Matthias Cuntz

The following functions are provided

madspikes(dfin[, flag, isday, colhead, …]) Spike detection for using a moving median absolute difference filter.
madspikes(dfin, flag=None, isday=None, colhead=None, undef=-9999, nscan=720, nfill=48, z=7, deriv=2, swthr=10.0, plot=False)[source]

Spike detection for using a moving median absolute difference filter. Used with Eddy vovariance data in Papale et al. (Biogeosciences, 2006).

Parameters:
  • dfin (pandas.Dataframe or numpy.array) –

    time series of data where spike detection with MAD should be applied.

    dfin can be a pandas.Dataframe.

    dfin can also me a numpy array. In this case colhead must be given. MAD will be applied along axis=0, i.e. on each column of axis=1.

  • flag (pandas.Dataframe or numpy.array, optional) –

    flag Dataframe or array has the same shape as dfin. Non-zero values in flag will be treated as missing values in dfin.

    If flag is numpy array, df.columns.values will be used as column heads.

  • isday (array_like of bool, optional) –

    True when it is day, False when night. Must have the same length as dfin.shape[0].

    If isday is not given, dfin must have a column with head ‘SW_IN’ or starting with ‘SW_IN’. isday will then be dfin[‘SW_IN’] > swthr.

  • colhed (array_like of str, optional) – column names if dfin is numpy array.
  • undef (float, optional) –

    values having undef value are treated as missing values in dfin (default: -9999)

    np.nan is not allowed (working).

  • nscan (int, optional) – size of moving window to calculate mad in time steps (default: 15*48)
  • nfill (int, optional) –

    step size of moving window to calculate mad in time steps (default: 1*48)

    mad will be calculated in nscan time window. Resulting mask will be applied only in nfill window in the middle of the nscan window. Then nscan window will be moved by nfill time steps.

  • z (float, optional) – Input is allowed to deviate maximum z standard deviations from the median (default: 7)
  • deriv (int, optional) –

    0: Act on raw input.

    1: Use first derivatives.

    2: Use 2nd derivatives (default).

  • swthr (float, optional) – Threshold to determine daytime from incoming shortwave radiation if isday not given (default: 10).
  • plot (bool, optional) – True: data and spikes are plotted into madspikes.pdf (default: False).
Returns:

  • pandas.Dataframe or numpy array – flags, 0 everywhere except detected spikes set to 2.
  • History
  • ——-
  • Written, Matthias Cuntz & Tino Rau, 2008
  • Maintained, Arndt Piayda, Aug 2014
  • Modified, Matthias Cuntz, Apr 2020 - input can be pandas Dataframe or numpy array(s) – - removed iteration Matthias Cuntz, May 2020 - numpy docstring format