hesseflux.mad

mad : Median absolute deviation test, either on raw values or on 1st or 2nd derivatives.

This module was written by Matthias Cuntz while at Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany, and continued while at Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nancy, France.

Copyright (c) 2011-2020 Matthias Cuntz - mc (at) macu (dot) de Released under the MIT License; see LICENSE file for details.

  • Written Nov 2011 by Matthias Cuntz - mc (at) macu (dot) de
  • ND-array, act on axis=0, May 2012, Matthias Cuntz
  • Removed bug in broadcasting, Jun 2012, Matthias Cuntz
  • Better usage of numpy possibilities, e.g. using np.diff, Jun 2012, Matthias Cuntz
  • Ported to Python 3, Feb 2013, Matthias Cuntz
  • Use bottleneck for medians, otherwise loop over axis=1, Jul 2013, Matthias Cuntz and Juliane Mai
  • Re-allow masked arrays and arrays with NaNs, Jul 2013, Matthias Cuntz
  • Removed bug in NaN treatment, Oct 2013, Matthias Cuntz
  • Keyword nonzero, Oct 2013, Matthias Cuntz
  • Using numpy docstring format, May 2020, Matthias Cuntz

The following functions are provided

mad(datin[, z, deriv, nozero]) Median absolute deviation test, either on raw values, or on 1st or 2nd derivatives.
mad(datin, z=7, deriv=0, nozero=False)[source]

Median absolute deviation test, either on raw values, or on 1st or 2nd derivatives.

Returns mask with False everywhere except where <(median-MAD*z/0.6745) or >(md+MAD*z/0.6745).

Parameters:
  • datin (array or masked array) – mad acts on axis=0.
  • z (float, optional) – Input is allowed to deviate maximum z standard deviations from the median (default: 7)
  • deriv (int, optional) –

    0: Act on raw input (default).

    1: Use first derivatives.

    2: Use 2nd derivatives.

  • nozero (bool, optional) – True: exclude zeros (0.) from input datin.
Returns:

False everywhere except where input deviates more than z standard deviations from median

Return type:

array of bool

Notes

If input is an array then mad is checked along the zeroth axis for outlier.

1st derivative is calculated as d = datin[1:n]-datin[0:n-1] because mean of left and right would give 0 for spikes.

If all(d.mask==True) then return d.mask, which is all True.

Examples

>>> import numpy as np
>>> y = np.array([-0.25,0.68,0.94,1.15,2.26,2.35,2.37,2.40,2.47,2.54,2.62,
...               2.64,2.90,2.92,2.92,2.93,3.21,3.26,3.30,3.59,3.68,4.30,
...               4.64,5.34,5.42,8.01],dtype=np.float)
>>> # Normal MAD
>>> print(mad(y))
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False]
>>> print(mad(y,z=4))
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False  True]
>>> print(mad(y,z=3))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]
>>> # MAD on 2nd derivatives
>>> print(mad(y,z=4,deriv=2))
[False False False False False False False False False False False False
 False False False False False False False False False False False  True]
>>> # direct usage
>>> my = np.ma.array(y, mask=mad(y,z=4))
>>> print(my)
[-0.25 0.68 0.94 1.15 2.26 2.35 2.37 2.4 2.47 2.54 2.62 2.64 2.9 2.92 2.92
 2.93 3.21 3.26 3.3 3.59 3.68 4.3 4.64 5.34 5.42 --]
>>> # MAD on several dimensions
>>> yy = np.transpose(np.array([y,y]))
>>> print(np.transpose(mad(yy,z=4)))
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
  False  True]
 [False False False False False False False False False False False False
  False False False False False False False False False False False False
  False  True]]
>>> yyy = np.transpose(np.array([y,y,y]))
>>> print(np.transpose(mad(yyy,z=3)))
[[ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]
 [ True False False False False False False False False False False False
  False False False False False False False False False False False False
   True  True]]
>>> # Masked arrays
>>> my = np.ma.array(y, mask=np.zeros(y.shape))
>>> my.mask[-1] = True
>>> print(mad(my,z=4))
[True False False False False False False False False False False False
 False False False False False False False False False False False False
 False --]
>>> print(mad(my,z=3))
[True False False False False False False False False False False False
 False False False False False False False False False False False True
 True --]
>>> # Arrays with NaNs
>>> ny = y.copy()
>>> ny[-1] = np.nan
>>> print(mad(ny,z=4))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
 False False]
>>> print(mad(ny,z=3))
[ True False False False False False False False False False False False
 False False False False False False False False False False False  True
  True False]
>>> # Exclude zeros
>>> zy = y.copy()
>>> zy[1] = 0.
>>> print(mad(zy,z=3))
[ True  True False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]
>>> print(mad(zy,z=3,nozero=True))
[ True False False False False False False False False False False False
 False False False False False False False False False False False False
  True  True]