nanfunctions.py 56 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633
  1. """
  2. Functions that ignore NaN.
  3. Functions
  4. ---------
  5. - `nanmin` -- minimum non-NaN value
  6. - `nanmax` -- maximum non-NaN value
  7. - `nanargmin` -- index of minimum non-NaN value
  8. - `nanargmax` -- index of maximum non-NaN value
  9. - `nansum` -- sum of non-NaN values
  10. - `nanprod` -- product of non-NaN values
  11. - `nancumsum` -- cumulative sum of non-NaN values
  12. - `nancumprod` -- cumulative product of non-NaN values
  13. - `nanmean` -- mean of non-NaN values
  14. - `nanvar` -- variance of non-NaN values
  15. - `nanstd` -- standard deviation of non-NaN values
  16. - `nanmedian` -- median of non-NaN values
  17. - `nanquantile` -- qth quantile of non-NaN values
  18. - `nanpercentile` -- qth percentile of non-NaN values
  19. """
  20. from __future__ import division, absolute_import, print_function
  21. import functools
  22. import warnings
  23. import numpy as np
  24. from numpy.lib import function_base
  25. from numpy.core import overrides
  26. array_function_dispatch = functools.partial(
  27. overrides.array_function_dispatch, module='numpy')
  28. __all__ = [
  29. 'nansum', 'nanmax', 'nanmin', 'nanargmax', 'nanargmin', 'nanmean',
  30. 'nanmedian', 'nanpercentile', 'nanvar', 'nanstd', 'nanprod',
  31. 'nancumsum', 'nancumprod', 'nanquantile'
  32. ]
  33. def _replace_nan(a, val):
  34. """
  35. If `a` is of inexact type, make a copy of `a`, replace NaNs with
  36. the `val` value, and return the copy together with a boolean mask
  37. marking the locations where NaNs were present. If `a` is not of
  38. inexact type, do nothing and return `a` together with a mask of None.
  39. Note that scalars will end up as array scalars, which is important
  40. for using the result as the value of the out argument in some
  41. operations.
  42. Parameters
  43. ----------
  44. a : array-like
  45. Input array.
  46. val : float
  47. NaN values are set to val before doing the operation.
  48. Returns
  49. -------
  50. y : ndarray
  51. If `a` is of inexact type, return a copy of `a` with the NaNs
  52. replaced by the fill value, otherwise return `a`.
  53. mask: {bool, None}
  54. If `a` is of inexact type, return a boolean mask marking locations of
  55. NaNs, otherwise return None.
  56. """
  57. a = np.array(a, subok=True, copy=True)
  58. if a.dtype == np.object_:
  59. # object arrays do not support `isnan` (gh-9009), so make a guess
  60. mask = a != a
  61. elif issubclass(a.dtype.type, np.inexact):
  62. mask = np.isnan(a)
  63. else:
  64. mask = None
  65. if mask is not None:
  66. np.copyto(a, val, where=mask)
  67. return a, mask
  68. def _copyto(a, val, mask):
  69. """
  70. Replace values in `a` with NaN where `mask` is True. This differs from
  71. copyto in that it will deal with the case where `a` is a numpy scalar.
  72. Parameters
  73. ----------
  74. a : ndarray or numpy scalar
  75. Array or numpy scalar some of whose values are to be replaced
  76. by val.
  77. val : numpy scalar
  78. Value used a replacement.
  79. mask : ndarray, scalar
  80. Boolean array. Where True the corresponding element of `a` is
  81. replaced by `val`. Broadcasts.
  82. Returns
  83. -------
  84. res : ndarray, scalar
  85. Array with elements replaced or scalar `val`.
  86. """
  87. if isinstance(a, np.ndarray):
  88. np.copyto(a, val, where=mask, casting='unsafe')
  89. else:
  90. a = a.dtype.type(val)
  91. return a
  92. def _remove_nan_1d(arr1d, overwrite_input=False):
  93. """
  94. Equivalent to arr1d[~arr1d.isnan()], but in a different order
  95. Presumably faster as it incurs fewer copies
  96. Parameters
  97. ----------
  98. arr1d : ndarray
  99. Array to remove nans from
  100. overwrite_input : bool
  101. True if `arr1d` can be modified in place
  102. Returns
  103. -------
  104. res : ndarray
  105. Array with nan elements removed
  106. overwrite_input : bool
  107. True if `res` can be modified in place, given the constraint on the
  108. input
  109. """
  110. c = np.isnan(arr1d)
  111. s = np.nonzero(c)[0]
  112. if s.size == arr1d.size:
  113. warnings.warn("All-NaN slice encountered", RuntimeWarning, stacklevel=4)
  114. return arr1d[:0], True
  115. elif s.size == 0:
  116. return arr1d, overwrite_input
  117. else:
  118. if not overwrite_input:
  119. arr1d = arr1d.copy()
  120. # select non-nans at end of array
  121. enonan = arr1d[-s.size:][~c[-s.size:]]
  122. # fill nans in beginning of array with non-nans of end
  123. arr1d[s[:enonan.size]] = enonan
  124. return arr1d[:-s.size], True
  125. def _divide_by_count(a, b, out=None):
  126. """
  127. Compute a/b ignoring invalid results. If `a` is an array the division
  128. is done in place. If `a` is a scalar, then its type is preserved in the
  129. output. If out is None, then then a is used instead so that the
  130. division is in place. Note that this is only called with `a` an inexact
  131. type.
  132. Parameters
  133. ----------
  134. a : {ndarray, numpy scalar}
  135. Numerator. Expected to be of inexact type but not checked.
  136. b : {ndarray, numpy scalar}
  137. Denominator.
  138. out : ndarray, optional
  139. Alternate output array in which to place the result. The default
  140. is ``None``; if provided, it must have the same shape as the
  141. expected output, but the type will be cast if necessary.
  142. Returns
  143. -------
  144. ret : {ndarray, numpy scalar}
  145. The return value is a/b. If `a` was an ndarray the division is done
  146. in place. If `a` is a numpy scalar, the division preserves its type.
  147. """
  148. with np.errstate(invalid='ignore', divide='ignore'):
  149. if isinstance(a, np.ndarray):
  150. if out is None:
  151. return np.divide(a, b, out=a, casting='unsafe')
  152. else:
  153. return np.divide(a, b, out=out, casting='unsafe')
  154. else:
  155. if out is None:
  156. return a.dtype.type(a / b)
  157. else:
  158. # This is questionable, but currently a numpy scalar can
  159. # be output to a zero dimensional array.
  160. return np.divide(a, b, out=out, casting='unsafe')
  161. def _nanmin_dispatcher(a, axis=None, out=None, keepdims=None):
  162. return (a, out)
  163. @array_function_dispatch(_nanmin_dispatcher)
  164. def nanmin(a, axis=None, out=None, keepdims=np._NoValue):
  165. """
  166. Return minimum of an array or minimum along an axis, ignoring any NaNs.
  167. When all-NaN slices are encountered a ``RuntimeWarning`` is raised and
  168. Nan is returned for that slice.
  169. Parameters
  170. ----------
  171. a : array_like
  172. Array containing numbers whose minimum is desired. If `a` is not an
  173. array, a conversion is attempted.
  174. axis : {int, tuple of int, None}, optional
  175. Axis or axes along which the minimum is computed. The default is to compute
  176. the minimum of the flattened array.
  177. out : ndarray, optional
  178. Alternate output array in which to place the result. The default
  179. is ``None``; if provided, it must have the same shape as the
  180. expected output, but the type will be cast if necessary. See
  181. `doc.ufuncs` for details.
  182. .. versionadded:: 1.8.0
  183. keepdims : bool, optional
  184. If this is set to True, the axes which are reduced are left
  185. in the result as dimensions with size one. With this option,
  186. the result will broadcast correctly against the original `a`.
  187. If the value is anything but the default, then
  188. `keepdims` will be passed through to the `min` method
  189. of sub-classes of `ndarray`. If the sub-classes methods
  190. does not implement `keepdims` any exceptions will be raised.
  191. .. versionadded:: 1.8.0
  192. Returns
  193. -------
  194. nanmin : ndarray
  195. An array with the same shape as `a`, with the specified axis
  196. removed. If `a` is a 0-d array, or if axis is None, an ndarray
  197. scalar is returned. The same dtype as `a` is returned.
  198. See Also
  199. --------
  200. nanmax :
  201. The maximum value of an array along a given axis, ignoring any NaNs.
  202. amin :
  203. The minimum value of an array along a given axis, propagating any NaNs.
  204. fmin :
  205. Element-wise minimum of two arrays, ignoring any NaNs.
  206. minimum :
  207. Element-wise minimum of two arrays, propagating any NaNs.
  208. isnan :
  209. Shows which elements are Not a Number (NaN).
  210. isfinite:
  211. Shows which elements are neither NaN nor infinity.
  212. amax, fmax, maximum
  213. Notes
  214. -----
  215. NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
  216. (IEEE 754). This means that Not a Number is not equivalent to infinity.
  217. Positive infinity is treated as a very large number and negative
  218. infinity is treated as a very small (i.e. negative) number.
  219. If the input has a integer type the function is equivalent to np.min.
  220. Examples
  221. --------
  222. >>> a = np.array([[1, 2], [3, np.nan]])
  223. >>> np.nanmin(a)
  224. 1.0
  225. >>> np.nanmin(a, axis=0)
  226. array([ 1., 2.])
  227. >>> np.nanmin(a, axis=1)
  228. array([ 1., 3.])
  229. When positive infinity and negative infinity are present:
  230. >>> np.nanmin([1, 2, np.nan, np.inf])
  231. 1.0
  232. >>> np.nanmin([1, 2, np.nan, np.NINF])
  233. -inf
  234. """
  235. kwargs = {}
  236. if keepdims is not np._NoValue:
  237. kwargs['keepdims'] = keepdims
  238. if type(a) is np.ndarray and a.dtype != np.object_:
  239. # Fast, but not safe for subclasses of ndarray, or object arrays,
  240. # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
  241. res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
  242. if np.isnan(res).any():
  243. warnings.warn("All-NaN slice encountered", RuntimeWarning, stacklevel=2)
  244. else:
  245. # Slow, but safe for subclasses of ndarray
  246. a, mask = _replace_nan(a, +np.inf)
  247. res = np.amin(a, axis=axis, out=out, **kwargs)
  248. if mask is None:
  249. return res
  250. # Check for all-NaN axis
  251. mask = np.all(mask, axis=axis, **kwargs)
  252. if np.any(mask):
  253. res = _copyto(res, np.nan, mask)
  254. warnings.warn("All-NaN axis encountered", RuntimeWarning, stacklevel=2)
  255. return res
  256. def _nanmax_dispatcher(a, axis=None, out=None, keepdims=None):
  257. return (a, out)
  258. @array_function_dispatch(_nanmax_dispatcher)
  259. def nanmax(a, axis=None, out=None, keepdims=np._NoValue):
  260. """
  261. Return the maximum of an array or maximum along an axis, ignoring any
  262. NaNs. When all-NaN slices are encountered a ``RuntimeWarning`` is
  263. raised and NaN is returned for that slice.
  264. Parameters
  265. ----------
  266. a : array_like
  267. Array containing numbers whose maximum is desired. If `a` is not an
  268. array, a conversion is attempted.
  269. axis : {int, tuple of int, None}, optional
  270. Axis or axes along which the maximum is computed. The default is to compute
  271. the maximum of the flattened array.
  272. out : ndarray, optional
  273. Alternate output array in which to place the result. The default
  274. is ``None``; if provided, it must have the same shape as the
  275. expected output, but the type will be cast if necessary. See
  276. `doc.ufuncs` for details.
  277. .. versionadded:: 1.8.0
  278. keepdims : bool, optional
  279. If this is set to True, the axes which are reduced are left
  280. in the result as dimensions with size one. With this option,
  281. the result will broadcast correctly against the original `a`.
  282. If the value is anything but the default, then
  283. `keepdims` will be passed through to the `max` method
  284. of sub-classes of `ndarray`. If the sub-classes methods
  285. does not implement `keepdims` any exceptions will be raised.
  286. .. versionadded:: 1.8.0
  287. Returns
  288. -------
  289. nanmax : ndarray
  290. An array with the same shape as `a`, with the specified axis removed.
  291. If `a` is a 0-d array, or if axis is None, an ndarray scalar is
  292. returned. The same dtype as `a` is returned.
  293. See Also
  294. --------
  295. nanmin :
  296. The minimum value of an array along a given axis, ignoring any NaNs.
  297. amax :
  298. The maximum value of an array along a given axis, propagating any NaNs.
  299. fmax :
  300. Element-wise maximum of two arrays, ignoring any NaNs.
  301. maximum :
  302. Element-wise maximum of two arrays, propagating any NaNs.
  303. isnan :
  304. Shows which elements are Not a Number (NaN).
  305. isfinite:
  306. Shows which elements are neither NaN nor infinity.
  307. amin, fmin, minimum
  308. Notes
  309. -----
  310. NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
  311. (IEEE 754). This means that Not a Number is not equivalent to infinity.
  312. Positive infinity is treated as a very large number and negative
  313. infinity is treated as a very small (i.e. negative) number.
  314. If the input has a integer type the function is equivalent to np.max.
  315. Examples
  316. --------
  317. >>> a = np.array([[1, 2], [3, np.nan]])
  318. >>> np.nanmax(a)
  319. 3.0
  320. >>> np.nanmax(a, axis=0)
  321. array([ 3., 2.])
  322. >>> np.nanmax(a, axis=1)
  323. array([ 2., 3.])
  324. When positive infinity and negative infinity are present:
  325. >>> np.nanmax([1, 2, np.nan, np.NINF])
  326. 2.0
  327. >>> np.nanmax([1, 2, np.nan, np.inf])
  328. inf
  329. """
  330. kwargs = {}
  331. if keepdims is not np._NoValue:
  332. kwargs['keepdims'] = keepdims
  333. if type(a) is np.ndarray and a.dtype != np.object_:
  334. # Fast, but not safe for subclasses of ndarray, or object arrays,
  335. # which do not implement isnan (gh-9009), or fmax correctly (gh-8975)
  336. res = np.fmax.reduce(a, axis=axis, out=out, **kwargs)
  337. if np.isnan(res).any():
  338. warnings.warn("All-NaN slice encountered", RuntimeWarning, stacklevel=2)
  339. else:
  340. # Slow, but safe for subclasses of ndarray
  341. a, mask = _replace_nan(a, -np.inf)
  342. res = np.amax(a, axis=axis, out=out, **kwargs)
  343. if mask is None:
  344. return res
  345. # Check for all-NaN axis
  346. mask = np.all(mask, axis=axis, **kwargs)
  347. if np.any(mask):
  348. res = _copyto(res, np.nan, mask)
  349. warnings.warn("All-NaN axis encountered", RuntimeWarning, stacklevel=2)
  350. return res
  351. def _nanargmin_dispatcher(a, axis=None):
  352. return (a,)
  353. @array_function_dispatch(_nanargmin_dispatcher)
  354. def nanargmin(a, axis=None):
  355. """
  356. Return the indices of the minimum values in the specified axis ignoring
  357. NaNs. For all-NaN slices ``ValueError`` is raised. Warning: the results
  358. cannot be trusted if a slice contains only NaNs and Infs.
  359. Parameters
  360. ----------
  361. a : array_like
  362. Input data.
  363. axis : int, optional
  364. Axis along which to operate. By default flattened input is used.
  365. Returns
  366. -------
  367. index_array : ndarray
  368. An array of indices or a single index value.
  369. See Also
  370. --------
  371. argmin, nanargmax
  372. Examples
  373. --------
  374. >>> a = np.array([[np.nan, 4], [2, 3]])
  375. >>> np.argmin(a)
  376. 0
  377. >>> np.nanargmin(a)
  378. 2
  379. >>> np.nanargmin(a, axis=0)
  380. array([1, 1])
  381. >>> np.nanargmin(a, axis=1)
  382. array([1, 0])
  383. """
  384. a, mask = _replace_nan(a, np.inf)
  385. res = np.argmin(a, axis=axis)
  386. if mask is not None:
  387. mask = np.all(mask, axis=axis)
  388. if np.any(mask):
  389. raise ValueError("All-NaN slice encountered")
  390. return res
  391. def _nanargmax_dispatcher(a, axis=None):
  392. return (a,)
  393. @array_function_dispatch(_nanargmax_dispatcher)
  394. def nanargmax(a, axis=None):
  395. """
  396. Return the indices of the maximum values in the specified axis ignoring
  397. NaNs. For all-NaN slices ``ValueError`` is raised. Warning: the
  398. results cannot be trusted if a slice contains only NaNs and -Infs.
  399. Parameters
  400. ----------
  401. a : array_like
  402. Input data.
  403. axis : int, optional
  404. Axis along which to operate. By default flattened input is used.
  405. Returns
  406. -------
  407. index_array : ndarray
  408. An array of indices or a single index value.
  409. See Also
  410. --------
  411. argmax, nanargmin
  412. Examples
  413. --------
  414. >>> a = np.array([[np.nan, 4], [2, 3]])
  415. >>> np.argmax(a)
  416. 0
  417. >>> np.nanargmax(a)
  418. 1
  419. >>> np.nanargmax(a, axis=0)
  420. array([1, 0])
  421. >>> np.nanargmax(a, axis=1)
  422. array([1, 1])
  423. """
  424. a, mask = _replace_nan(a, -np.inf)
  425. res = np.argmax(a, axis=axis)
  426. if mask is not None:
  427. mask = np.all(mask, axis=axis)
  428. if np.any(mask):
  429. raise ValueError("All-NaN slice encountered")
  430. return res
  431. def _nansum_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  432. return (a, out)
  433. @array_function_dispatch(_nansum_dispatcher)
  434. def nansum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  435. """
  436. Return the sum of array elements over a given axis treating Not a
  437. Numbers (NaNs) as zero.
  438. In NumPy versions <= 1.9.0 Nan is returned for slices that are all-NaN or
  439. empty. In later versions zero is returned.
  440. Parameters
  441. ----------
  442. a : array_like
  443. Array containing numbers whose sum is desired. If `a` is not an
  444. array, a conversion is attempted.
  445. axis : {int, tuple of int, None}, optional
  446. Axis or axes along which the sum is computed. The default is to compute the
  447. sum of the flattened array.
  448. dtype : data-type, optional
  449. The type of the returned array and of the accumulator in which the
  450. elements are summed. By default, the dtype of `a` is used. An
  451. exception is when `a` has an integer type with less precision than
  452. the platform (u)intp. In that case, the default will be either
  453. (u)int32 or (u)int64 depending on whether the platform is 32 or 64
  454. bits. For inexact inputs, dtype must be inexact.
  455. .. versionadded:: 1.8.0
  456. out : ndarray, optional
  457. Alternate output array in which to place the result. The default
  458. is ``None``. If provided, it must have the same shape as the
  459. expected output, but the type will be cast if necessary. See
  460. `doc.ufuncs` for details. The casting of NaN to integer can yield
  461. unexpected results.
  462. .. versionadded:: 1.8.0
  463. keepdims : bool, optional
  464. If this is set to True, the axes which are reduced are left
  465. in the result as dimensions with size one. With this option,
  466. the result will broadcast correctly against the original `a`.
  467. If the value is anything but the default, then
  468. `keepdims` will be passed through to the `mean` or `sum` methods
  469. of sub-classes of `ndarray`. If the sub-classes methods
  470. does not implement `keepdims` any exceptions will be raised.
  471. .. versionadded:: 1.8.0
  472. Returns
  473. -------
  474. nansum : ndarray.
  475. A new array holding the result is returned unless `out` is
  476. specified, in which it is returned. The result has the same
  477. size as `a`, and the same shape as `a` if `axis` is not None
  478. or `a` is a 1-d array.
  479. See Also
  480. --------
  481. numpy.sum : Sum across array propagating NaNs.
  482. isnan : Show which elements are NaN.
  483. isfinite: Show which elements are not NaN or +/-inf.
  484. Notes
  485. -----
  486. If both positive and negative infinity are present, the sum will be Not
  487. A Number (NaN).
  488. Examples
  489. --------
  490. >>> np.nansum(1)
  491. 1
  492. >>> np.nansum([1])
  493. 1
  494. >>> np.nansum([1, np.nan])
  495. 1.0
  496. >>> a = np.array([[1, 1], [1, np.nan]])
  497. >>> np.nansum(a)
  498. 3.0
  499. >>> np.nansum(a, axis=0)
  500. array([ 2., 1.])
  501. >>> np.nansum([1, np.nan, np.inf])
  502. inf
  503. >>> np.nansum([1, np.nan, np.NINF])
  504. -inf
  505. >>> np.nansum([1, np.nan, np.inf, -np.inf]) # both +/- infinity present
  506. nan
  507. """
  508. a, mask = _replace_nan(a, 0)
  509. return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  510. def _nanprod_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  511. return (a, out)
  512. @array_function_dispatch(_nanprod_dispatcher)
  513. def nanprod(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  514. """
  515. Return the product of array elements over a given axis treating Not a
  516. Numbers (NaNs) as ones.
  517. One is returned for slices that are all-NaN or empty.
  518. .. versionadded:: 1.10.0
  519. Parameters
  520. ----------
  521. a : array_like
  522. Array containing numbers whose product is desired. If `a` is not an
  523. array, a conversion is attempted.
  524. axis : {int, tuple of int, None}, optional
  525. Axis or axes along which the product is computed. The default is to compute
  526. the product of the flattened array.
  527. dtype : data-type, optional
  528. The type of the returned array and of the accumulator in which the
  529. elements are summed. By default, the dtype of `a` is used. An
  530. exception is when `a` has an integer type with less precision than
  531. the platform (u)intp. In that case, the default will be either
  532. (u)int32 or (u)int64 depending on whether the platform is 32 or 64
  533. bits. For inexact inputs, dtype must be inexact.
  534. out : ndarray, optional
  535. Alternate output array in which to place the result. The default
  536. is ``None``. If provided, it must have the same shape as the
  537. expected output, but the type will be cast if necessary. See
  538. `doc.ufuncs` for details. The casting of NaN to integer can yield
  539. unexpected results.
  540. keepdims : bool, optional
  541. If True, the axes which are reduced are left in the result as
  542. dimensions with size one. With this option, the result will
  543. broadcast correctly against the original `arr`.
  544. Returns
  545. -------
  546. nanprod : ndarray
  547. A new array holding the result is returned unless `out` is
  548. specified, in which case it is returned.
  549. See Also
  550. --------
  551. numpy.prod : Product across array propagating NaNs.
  552. isnan : Show which elements are NaN.
  553. Examples
  554. --------
  555. >>> np.nanprod(1)
  556. 1
  557. >>> np.nanprod([1])
  558. 1
  559. >>> np.nanprod([1, np.nan])
  560. 1.0
  561. >>> a = np.array([[1, 2], [3, np.nan]])
  562. >>> np.nanprod(a)
  563. 6.0
  564. >>> np.nanprod(a, axis=0)
  565. array([ 3., 2.])
  566. """
  567. a, mask = _replace_nan(a, 1)
  568. return np.prod(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  569. def _nancumsum_dispatcher(a, axis=None, dtype=None, out=None):
  570. return (a, out)
  571. @array_function_dispatch(_nancumsum_dispatcher)
  572. def nancumsum(a, axis=None, dtype=None, out=None):
  573. """
  574. Return the cumulative sum of array elements over a given axis treating Not a
  575. Numbers (NaNs) as zero. The cumulative sum does not change when NaNs are
  576. encountered and leading NaNs are replaced by zeros.
  577. Zeros are returned for slices that are all-NaN or empty.
  578. .. versionadded:: 1.12.0
  579. Parameters
  580. ----------
  581. a : array_like
  582. Input array.
  583. axis : int, optional
  584. Axis along which the cumulative sum is computed. The default
  585. (None) is to compute the cumsum over the flattened array.
  586. dtype : dtype, optional
  587. Type of the returned array and of the accumulator in which the
  588. elements are summed. If `dtype` is not specified, it defaults
  589. to the dtype of `a`, unless `a` has an integer dtype with a
  590. precision less than that of the default platform integer. In
  591. that case, the default platform integer is used.
  592. out : ndarray, optional
  593. Alternative output array in which to place the result. It must
  594. have the same shape and buffer length as the expected output
  595. but the type will be cast if necessary. See `doc.ufuncs`
  596. (Section "Output arguments") for more details.
  597. Returns
  598. -------
  599. nancumsum : ndarray.
  600. A new array holding the result is returned unless `out` is
  601. specified, in which it is returned. The result has the same
  602. size as `a`, and the same shape as `a` if `axis` is not None
  603. or `a` is a 1-d array.
  604. See Also
  605. --------
  606. numpy.cumsum : Cumulative sum across array propagating NaNs.
  607. isnan : Show which elements are NaN.
  608. Examples
  609. --------
  610. >>> np.nancumsum(1)
  611. array([1])
  612. >>> np.nancumsum([1])
  613. array([1])
  614. >>> np.nancumsum([1, np.nan])
  615. array([ 1., 1.])
  616. >>> a = np.array([[1, 2], [3, np.nan]])
  617. >>> np.nancumsum(a)
  618. array([ 1., 3., 6., 6.])
  619. >>> np.nancumsum(a, axis=0)
  620. array([[ 1., 2.],
  621. [ 4., 2.]])
  622. >>> np.nancumsum(a, axis=1)
  623. array([[ 1., 3.],
  624. [ 3., 3.]])
  625. """
  626. a, mask = _replace_nan(a, 0)
  627. return np.cumsum(a, axis=axis, dtype=dtype, out=out)
  628. def _nancumprod_dispatcher(a, axis=None, dtype=None, out=None):
  629. return (a, out)
  630. @array_function_dispatch(_nancumprod_dispatcher)
  631. def nancumprod(a, axis=None, dtype=None, out=None):
  632. """
  633. Return the cumulative product of array elements over a given axis treating Not a
  634. Numbers (NaNs) as one. The cumulative product does not change when NaNs are
  635. encountered and leading NaNs are replaced by ones.
  636. Ones are returned for slices that are all-NaN or empty.
  637. .. versionadded:: 1.12.0
  638. Parameters
  639. ----------
  640. a : array_like
  641. Input array.
  642. axis : int, optional
  643. Axis along which the cumulative product is computed. By default
  644. the input is flattened.
  645. dtype : dtype, optional
  646. Type of the returned array, as well as of the accumulator in which
  647. the elements are multiplied. If *dtype* is not specified, it
  648. defaults to the dtype of `a`, unless `a` has an integer dtype with
  649. a precision less than that of the default platform integer. In
  650. that case, the default platform integer is used instead.
  651. out : ndarray, optional
  652. Alternative output array in which to place the result. It must
  653. have the same shape and buffer length as the expected output
  654. but the type of the resulting values will be cast if necessary.
  655. Returns
  656. -------
  657. nancumprod : ndarray
  658. A new array holding the result is returned unless `out` is
  659. specified, in which case it is returned.
  660. See Also
  661. --------
  662. numpy.cumprod : Cumulative product across array propagating NaNs.
  663. isnan : Show which elements are NaN.
  664. Examples
  665. --------
  666. >>> np.nancumprod(1)
  667. array([1])
  668. >>> np.nancumprod([1])
  669. array([1])
  670. >>> np.nancumprod([1, np.nan])
  671. array([ 1., 1.])
  672. >>> a = np.array([[1, 2], [3, np.nan]])
  673. >>> np.nancumprod(a)
  674. array([ 1., 2., 6., 6.])
  675. >>> np.nancumprod(a, axis=0)
  676. array([[ 1., 2.],
  677. [ 3., 2.]])
  678. >>> np.nancumprod(a, axis=1)
  679. array([[ 1., 2.],
  680. [ 3., 3.]])
  681. """
  682. a, mask = _replace_nan(a, 1)
  683. return np.cumprod(a, axis=axis, dtype=dtype, out=out)
  684. def _nanmean_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  685. return (a, out)
  686. @array_function_dispatch(_nanmean_dispatcher)
  687. def nanmean(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  688. """
  689. Compute the arithmetic mean along the specified axis, ignoring NaNs.
  690. Returns the average of the array elements. The average is taken over
  691. the flattened array by default, otherwise over the specified axis.
  692. `float64` intermediate and return values are used for integer inputs.
  693. For all-NaN slices, NaN is returned and a `RuntimeWarning` is raised.
  694. .. versionadded:: 1.8.0
  695. Parameters
  696. ----------
  697. a : array_like
  698. Array containing numbers whose mean is desired. If `a` is not an
  699. array, a conversion is attempted.
  700. axis : {int, tuple of int, None}, optional
  701. Axis or axes along which the means are computed. The default is to compute
  702. the mean of the flattened array.
  703. dtype : data-type, optional
  704. Type to use in computing the mean. For integer inputs, the default
  705. is `float64`; for inexact inputs, it is the same as the input
  706. dtype.
  707. out : ndarray, optional
  708. Alternate output array in which to place the result. The default
  709. is ``None``; if provided, it must have the same shape as the
  710. expected output, but the type will be cast if necessary. See
  711. `doc.ufuncs` for details.
  712. keepdims : bool, optional
  713. If this is set to True, the axes which are reduced are left
  714. in the result as dimensions with size one. With this option,
  715. the result will broadcast correctly against the original `a`.
  716. If the value is anything but the default, then
  717. `keepdims` will be passed through to the `mean` or `sum` methods
  718. of sub-classes of `ndarray`. If the sub-classes methods
  719. does not implement `keepdims` any exceptions will be raised.
  720. Returns
  721. -------
  722. m : ndarray, see dtype parameter above
  723. If `out=None`, returns a new array containing the mean values,
  724. otherwise a reference to the output array is returned. Nan is
  725. returned for slices that contain only NaNs.
  726. See Also
  727. --------
  728. average : Weighted average
  729. mean : Arithmetic mean taken while not ignoring NaNs
  730. var, nanvar
  731. Notes
  732. -----
  733. The arithmetic mean is the sum of the non-NaN elements along the axis
  734. divided by the number of non-NaN elements.
  735. Note that for floating-point input, the mean is computed using the same
  736. precision the input has. Depending on the input data, this can cause
  737. the results to be inaccurate, especially for `float32`. Specifying a
  738. higher-precision accumulator using the `dtype` keyword can alleviate
  739. this issue.
  740. Examples
  741. --------
  742. >>> a = np.array([[1, np.nan], [3, 4]])
  743. >>> np.nanmean(a)
  744. 2.6666666666666665
  745. >>> np.nanmean(a, axis=0)
  746. array([ 2., 4.])
  747. >>> np.nanmean(a, axis=1)
  748. array([ 1., 3.5])
  749. """
  750. arr, mask = _replace_nan(a, 0)
  751. if mask is None:
  752. return np.mean(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  753. if dtype is not None:
  754. dtype = np.dtype(dtype)
  755. if dtype is not None and not issubclass(dtype.type, np.inexact):
  756. raise TypeError("If a is inexact, then dtype must be inexact")
  757. if out is not None and not issubclass(out.dtype.type, np.inexact):
  758. raise TypeError("If a is inexact, then out must be inexact")
  759. cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=keepdims)
  760. tot = np.sum(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  761. avg = _divide_by_count(tot, cnt, out=out)
  762. isbad = (cnt == 0)
  763. if isbad.any():
  764. warnings.warn("Mean of empty slice", RuntimeWarning, stacklevel=2)
  765. # NaN is the only possible bad value, so no further
  766. # action is needed to handle bad results.
  767. return avg
  768. def _nanmedian1d(arr1d, overwrite_input=False):
  769. """
  770. Private function for rank 1 arrays. Compute the median ignoring NaNs.
  771. See nanmedian for parameter usage
  772. """
  773. arr1d, overwrite_input = _remove_nan_1d(arr1d,
  774. overwrite_input=overwrite_input)
  775. if arr1d.size == 0:
  776. return np.nan
  777. return np.median(arr1d, overwrite_input=overwrite_input)
  778. def _nanmedian(a, axis=None, out=None, overwrite_input=False):
  779. """
  780. Private function that doesn't support extended axis or keepdims.
  781. These methods are extended to this function using _ureduce
  782. See nanmedian for parameter usage
  783. """
  784. if axis is None or a.ndim == 1:
  785. part = a.ravel()
  786. if out is None:
  787. return _nanmedian1d(part, overwrite_input)
  788. else:
  789. out[...] = _nanmedian1d(part, overwrite_input)
  790. return out
  791. else:
  792. # for small medians use sort + indexing which is still faster than
  793. # apply_along_axis
  794. # benchmarked with shuffled (50, 50, x) containing a few NaN
  795. if a.shape[axis] < 600:
  796. return _nanmedian_small(a, axis, out, overwrite_input)
  797. result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input)
  798. if out is not None:
  799. out[...] = result
  800. return result
  801. def _nanmedian_small(a, axis=None, out=None, overwrite_input=False):
  802. """
  803. sort + indexing median, faster for small medians along multiple
  804. dimensions due to the high overhead of apply_along_axis
  805. see nanmedian for parameter usage
  806. """
  807. a = np.ma.masked_array(a, np.isnan(a))
  808. m = np.ma.median(a, axis=axis, overwrite_input=overwrite_input)
  809. for i in range(np.count_nonzero(m.mask.ravel())):
  810. warnings.warn("All-NaN slice encountered", RuntimeWarning, stacklevel=3)
  811. if out is not None:
  812. out[...] = m.filled(np.nan)
  813. return out
  814. return m.filled(np.nan)
  815. def _nanmedian_dispatcher(
  816. a, axis=None, out=None, overwrite_input=None, keepdims=None):
  817. return (a, out)
  818. @array_function_dispatch(_nanmedian_dispatcher)
  819. def nanmedian(a, axis=None, out=None, overwrite_input=False, keepdims=np._NoValue):
  820. """
  821. Compute the median along the specified axis, while ignoring NaNs.
  822. Returns the median of the array elements.
  823. .. versionadded:: 1.9.0
  824. Parameters
  825. ----------
  826. a : array_like
  827. Input array or object that can be converted to an array.
  828. axis : {int, sequence of int, None}, optional
  829. Axis or axes along which the medians are computed. The default
  830. is to compute the median along a flattened version of the array.
  831. A sequence of axes is supported since version 1.9.0.
  832. out : ndarray, optional
  833. Alternative output array in which to place the result. It must
  834. have the same shape and buffer length as the expected output,
  835. but the type (of the output) will be cast if necessary.
  836. overwrite_input : bool, optional
  837. If True, then allow use of memory of input array `a` for
  838. calculations. The input array will be modified by the call to
  839. `median`. This will save memory when you do not need to preserve
  840. the contents of the input array. Treat the input as undefined,
  841. but it will probably be fully or partially sorted. Default is
  842. False. If `overwrite_input` is ``True`` and `a` is not already an
  843. `ndarray`, an error will be raised.
  844. keepdims : bool, optional
  845. If this is set to True, the axes which are reduced are left
  846. in the result as dimensions with size one. With this option,
  847. the result will broadcast correctly against the original `a`.
  848. If this is anything but the default value it will be passed
  849. through (in the special case of an empty array) to the
  850. `mean` function of the underlying array. If the array is
  851. a sub-class and `mean` does not have the kwarg `keepdims` this
  852. will raise a RuntimeError.
  853. Returns
  854. -------
  855. median : ndarray
  856. A new array holding the result. If the input contains integers
  857. or floats smaller than ``float64``, then the output data-type is
  858. ``np.float64``. Otherwise, the data-type of the output is the
  859. same as that of the input. If `out` is specified, that array is
  860. returned instead.
  861. See Also
  862. --------
  863. mean, median, percentile
  864. Notes
  865. -----
  866. Given a vector ``V`` of length ``N``, the median of ``V`` is the
  867. middle value of a sorted copy of ``V``, ``V_sorted`` - i.e.,
  868. ``V_sorted[(N-1)/2]``, when ``N`` is odd and the average of the two
  869. middle values of ``V_sorted`` when ``N`` is even.
  870. Examples
  871. --------
  872. >>> a = np.array([[10.0, 7, 4], [3, 2, 1]])
  873. >>> a[0, 1] = np.nan
  874. >>> a
  875. array([[ 10., nan, 4.],
  876. [ 3., 2., 1.]])
  877. >>> np.median(a)
  878. nan
  879. >>> np.nanmedian(a)
  880. 3.0
  881. >>> np.nanmedian(a, axis=0)
  882. array([ 6.5, 2., 2.5])
  883. >>> np.median(a, axis=1)
  884. array([ 7., 2.])
  885. >>> b = a.copy()
  886. >>> np.nanmedian(b, axis=1, overwrite_input=True)
  887. array([ 7., 2.])
  888. >>> assert not np.all(a==b)
  889. >>> b = a.copy()
  890. >>> np.nanmedian(b, axis=None, overwrite_input=True)
  891. 3.0
  892. >>> assert not np.all(a==b)
  893. """
  894. a = np.asanyarray(a)
  895. # apply_along_axis in _nanmedian doesn't handle empty arrays well,
  896. # so deal them upfront
  897. if a.size == 0:
  898. return np.nanmean(a, axis, out=out, keepdims=keepdims)
  899. r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
  900. overwrite_input=overwrite_input)
  901. if keepdims and keepdims is not np._NoValue:
  902. return r.reshape(k)
  903. else:
  904. return r
  905. def _nanpercentile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
  906. interpolation=None, keepdims=None):
  907. return (a, q, out)
  908. @array_function_dispatch(_nanpercentile_dispatcher)
  909. def nanpercentile(a, q, axis=None, out=None, overwrite_input=False,
  910. interpolation='linear', keepdims=np._NoValue):
  911. """
  912. Compute the qth percentile of the data along the specified axis,
  913. while ignoring nan values.
  914. Returns the qth percentile(s) of the array elements.
  915. .. versionadded:: 1.9.0
  916. Parameters
  917. ----------
  918. a : array_like
  919. Input array or object that can be converted to an array, containing
  920. nan values to be ignored.
  921. q : array_like of float
  922. Percentile or sequence of percentiles to compute, which must be between
  923. 0 and 100 inclusive.
  924. axis : {int, tuple of int, None}, optional
  925. Axis or axes along which the percentiles are computed. The
  926. default is to compute the percentile(s) along a flattened
  927. version of the array.
  928. out : ndarray, optional
  929. Alternative output array in which to place the result. It must
  930. have the same shape and buffer length as the expected output,
  931. but the type (of the output) will be cast if necessary.
  932. overwrite_input : bool, optional
  933. If True, then allow the input array `a` to be modified by intermediate
  934. calculations, to save memory. In this case, the contents of the input
  935. `a` after this function completes is undefined.
  936. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
  937. This optional parameter specifies the interpolation method to
  938. use when the desired percentile lies between two data points
  939. ``i < j``:
  940. * 'linear': ``i + (j - i) * fraction``, where ``fraction``
  941. is the fractional part of the index surrounded by ``i``
  942. and ``j``.
  943. * 'lower': ``i``.
  944. * 'higher': ``j``.
  945. * 'nearest': ``i`` or ``j``, whichever is nearest.
  946. * 'midpoint': ``(i + j) / 2``.
  947. keepdims : bool, optional
  948. If this is set to True, the axes which are reduced are left in
  949. the result as dimensions with size one. With this option, the
  950. result will broadcast correctly against the original array `a`.
  951. If this is anything but the default value it will be passed
  952. through (in the special case of an empty array) to the
  953. `mean` function of the underlying array. If the array is
  954. a sub-class and `mean` does not have the kwarg `keepdims` this
  955. will raise a RuntimeError.
  956. Returns
  957. -------
  958. percentile : scalar or ndarray
  959. If `q` is a single percentile and `axis=None`, then the result
  960. is a scalar. If multiple percentiles are given, first axis of
  961. the result corresponds to the percentiles. The other axes are
  962. the axes that remain after the reduction of `a`. If the input
  963. contains integers or floats smaller than ``float64``, the output
  964. data-type is ``float64``. Otherwise, the output data-type is the
  965. same as that of the input. If `out` is specified, that array is
  966. returned instead.
  967. See Also
  968. --------
  969. nanmean
  970. nanmedian : equivalent to ``nanpercentile(..., 50)``
  971. percentile, median, mean
  972. nanquantile : equivalent to nanpercentile, but with q in the range [0, 1].
  973. Notes
  974. -----
  975. Given a vector ``V`` of length ``N``, the ``q``-th percentile of
  976. ``V`` is the value ``q/100`` of the way from the minimum to the
  977. maximum in a sorted copy of ``V``. The values and distances of
  978. the two nearest neighbors as well as the `interpolation` parameter
  979. will determine the percentile if the normalized ranking does not
  980. match the location of ``q`` exactly. This function is the same as
  981. the median if ``q=50``, the same as the minimum if ``q=0`` and the
  982. same as the maximum if ``q=100``.
  983. Examples
  984. --------
  985. >>> a = np.array([[10., 7., 4.], [3., 2., 1.]])
  986. >>> a[0][1] = np.nan
  987. >>> a
  988. array([[ 10., nan, 4.],
  989. [ 3., 2., 1.]])
  990. >>> np.percentile(a, 50)
  991. nan
  992. >>> np.nanpercentile(a, 50)
  993. 3.5
  994. >>> np.nanpercentile(a, 50, axis=0)
  995. array([ 6.5, 2., 2.5])
  996. >>> np.nanpercentile(a, 50, axis=1, keepdims=True)
  997. array([[ 7.],
  998. [ 2.]])
  999. >>> m = np.nanpercentile(a, 50, axis=0)
  1000. >>> out = np.zeros_like(m)
  1001. >>> np.nanpercentile(a, 50, axis=0, out=out)
  1002. array([ 6.5, 2., 2.5])
  1003. >>> m
  1004. array([ 6.5, 2. , 2.5])
  1005. >>> b = a.copy()
  1006. >>> np.nanpercentile(b, 50, axis=1, overwrite_input=True)
  1007. array([ 7., 2.])
  1008. >>> assert not np.all(a==b)
  1009. """
  1010. a = np.asanyarray(a)
  1011. q = np.true_divide(q, 100.0) # handles the asarray for us too
  1012. if not function_base._quantile_is_valid(q):
  1013. raise ValueError("Percentiles must be in the range [0, 100]")
  1014. return _nanquantile_unchecked(
  1015. a, q, axis, out, overwrite_input, interpolation, keepdims)
  1016. def _nanquantile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
  1017. interpolation=None, keepdims=None):
  1018. return (a, q, out)
  1019. @array_function_dispatch(_nanquantile_dispatcher)
  1020. def nanquantile(a, q, axis=None, out=None, overwrite_input=False,
  1021. interpolation='linear', keepdims=np._NoValue):
  1022. """
  1023. Compute the qth quantile of the data along the specified axis,
  1024. while ignoring nan values.
  1025. Returns the qth quantile(s) of the array elements.
  1026. .. versionadded:: 1.15.0
  1027. Parameters
  1028. ----------
  1029. a : array_like
  1030. Input array or object that can be converted to an array, containing
  1031. nan values to be ignored
  1032. q : array_like of float
  1033. Quantile or sequence of quantiles to compute, which must be between
  1034. 0 and 1 inclusive.
  1035. axis : {int, tuple of int, None}, optional
  1036. Axis or axes along which the quantiles are computed. The
  1037. default is to compute the quantile(s) along a flattened
  1038. version of the array.
  1039. out : ndarray, optional
  1040. Alternative output array in which to place the result. It must
  1041. have the same shape and buffer length as the expected output,
  1042. but the type (of the output) will be cast if necessary.
  1043. overwrite_input : bool, optional
  1044. If True, then allow the input array `a` to be modified by intermediate
  1045. calculations, to save memory. In this case, the contents of the input
  1046. `a` after this function completes is undefined.
  1047. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
  1048. This optional parameter specifies the interpolation method to
  1049. use when the desired quantile lies between two data points
  1050. ``i < j``:
  1051. * linear: ``i + (j - i) * fraction``, where ``fraction``
  1052. is the fractional part of the index surrounded by ``i``
  1053. and ``j``.
  1054. * lower: ``i``.
  1055. * higher: ``j``.
  1056. * nearest: ``i`` or ``j``, whichever is nearest.
  1057. * midpoint: ``(i + j) / 2``.
  1058. keepdims : bool, optional
  1059. If this is set to True, the axes which are reduced are left in
  1060. the result as dimensions with size one. With this option, the
  1061. result will broadcast correctly against the original array `a`.
  1062. If this is anything but the default value it will be passed
  1063. through (in the special case of an empty array) to the
  1064. `mean` function of the underlying array. If the array is
  1065. a sub-class and `mean` does not have the kwarg `keepdims` this
  1066. will raise a RuntimeError.
  1067. Returns
  1068. -------
  1069. quantile : scalar or ndarray
  1070. If `q` is a single percentile and `axis=None`, then the result
  1071. is a scalar. If multiple quantiles are given, first axis of
  1072. the result corresponds to the quantiles. The other axes are
  1073. the axes that remain after the reduction of `a`. If the input
  1074. contains integers or floats smaller than ``float64``, the output
  1075. data-type is ``float64``. Otherwise, the output data-type is the
  1076. same as that of the input. If `out` is specified, that array is
  1077. returned instead.
  1078. See Also
  1079. --------
  1080. quantile
  1081. nanmean, nanmedian
  1082. nanmedian : equivalent to ``nanquantile(..., 0.5)``
  1083. nanpercentile : same as nanquantile, but with q in the range [0, 100].
  1084. Examples
  1085. --------
  1086. >>> a = np.array([[10., 7., 4.], [3., 2., 1.]])
  1087. >>> a[0][1] = np.nan
  1088. >>> a
  1089. array([[ 10., nan, 4.],
  1090. [ 3., 2., 1.]])
  1091. >>> np.quantile(a, 0.5)
  1092. nan
  1093. >>> np.nanquantile(a, 0.5)
  1094. 3.5
  1095. >>> np.nanquantile(a, 0.5, axis=0)
  1096. array([ 6.5, 2., 2.5])
  1097. >>> np.nanquantile(a, 0.5, axis=1, keepdims=True)
  1098. array([[ 7.],
  1099. [ 2.]])
  1100. >>> m = np.nanquantile(a, 0.5, axis=0)
  1101. >>> out = np.zeros_like(m)
  1102. >>> np.nanquantile(a, 0.5, axis=0, out=out)
  1103. array([ 6.5, 2., 2.5])
  1104. >>> m
  1105. array([ 6.5, 2. , 2.5])
  1106. >>> b = a.copy()
  1107. >>> np.nanquantile(b, 0.5, axis=1, overwrite_input=True)
  1108. array([ 7., 2.])
  1109. >>> assert not np.all(a==b)
  1110. """
  1111. a = np.asanyarray(a)
  1112. q = np.asanyarray(q)
  1113. if not function_base._quantile_is_valid(q):
  1114. raise ValueError("Quantiles must be in the range [0, 1]")
  1115. return _nanquantile_unchecked(
  1116. a, q, axis, out, overwrite_input, interpolation, keepdims)
  1117. def _nanquantile_unchecked(a, q, axis=None, out=None, overwrite_input=False,
  1118. interpolation='linear', keepdims=np._NoValue):
  1119. """Assumes that q is in [0, 1], and is an ndarray"""
  1120. # apply_along_axis in _nanpercentile doesn't handle empty arrays well,
  1121. # so deal them upfront
  1122. if a.size == 0:
  1123. return np.nanmean(a, axis, out=out, keepdims=keepdims)
  1124. r, k = function_base._ureduce(
  1125. a, func=_nanquantile_ureduce_func, q=q, axis=axis, out=out,
  1126. overwrite_input=overwrite_input, interpolation=interpolation
  1127. )
  1128. if keepdims and keepdims is not np._NoValue:
  1129. return r.reshape(q.shape + k)
  1130. else:
  1131. return r
  1132. def _nanquantile_ureduce_func(a, q, axis=None, out=None, overwrite_input=False,
  1133. interpolation='linear'):
  1134. """
  1135. Private function that doesn't support extended axis or keepdims.
  1136. These methods are extended to this function using _ureduce
  1137. See nanpercentile for parameter usage
  1138. """
  1139. if axis is None or a.ndim == 1:
  1140. part = a.ravel()
  1141. result = _nanquantile_1d(part, q, overwrite_input, interpolation)
  1142. else:
  1143. result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
  1144. overwrite_input, interpolation)
  1145. # apply_along_axis fills in collapsed axis with results.
  1146. # Move that axis to the beginning to match percentile's
  1147. # convention.
  1148. if q.ndim != 0:
  1149. result = np.moveaxis(result, axis, 0)
  1150. if out is not None:
  1151. out[...] = result
  1152. return result
  1153. def _nanquantile_1d(arr1d, q, overwrite_input=False, interpolation='linear'):
  1154. """
  1155. Private function for rank 1 arrays. Compute quantile ignoring NaNs.
  1156. See nanpercentile for parameter usage
  1157. """
  1158. arr1d, overwrite_input = _remove_nan_1d(arr1d,
  1159. overwrite_input=overwrite_input)
  1160. if arr1d.size == 0:
  1161. return np.full(q.shape, np.nan)[()] # convert to scalar
  1162. return function_base._quantile_unchecked(
  1163. arr1d, q, overwrite_input=overwrite_input, interpolation=interpolation)
  1164. def _nanvar_dispatcher(
  1165. a, axis=None, dtype=None, out=None, ddof=None, keepdims=None):
  1166. return (a, out)
  1167. @array_function_dispatch(_nanvar_dispatcher)
  1168. def nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=np._NoValue):
  1169. """
  1170. Compute the variance along the specified axis, while ignoring NaNs.
  1171. Returns the variance of the array elements, a measure of the spread of
  1172. a distribution. The variance is computed for the flattened array by
  1173. default, otherwise over the specified axis.
  1174. For all-NaN slices or slices with zero degrees of freedom, NaN is
  1175. returned and a `RuntimeWarning` is raised.
  1176. .. versionadded:: 1.8.0
  1177. Parameters
  1178. ----------
  1179. a : array_like
  1180. Array containing numbers whose variance is desired. If `a` is not an
  1181. array, a conversion is attempted.
  1182. axis : {int, tuple of int, None}, optional
  1183. Axis or axes along which the variance is computed. The default is to compute
  1184. the variance of the flattened array.
  1185. dtype : data-type, optional
  1186. Type to use in computing the variance. For arrays of integer type
  1187. the default is `float32`; for arrays of float types it is the same as
  1188. the array type.
  1189. out : ndarray, optional
  1190. Alternate output array in which to place the result. It must have
  1191. the same shape as the expected output, but the type is cast if
  1192. necessary.
  1193. ddof : int, optional
  1194. "Delta Degrees of Freedom": the divisor used in the calculation is
  1195. ``N - ddof``, where ``N`` represents the number of non-NaN
  1196. elements. By default `ddof` is zero.
  1197. keepdims : bool, optional
  1198. If this is set to True, the axes which are reduced are left
  1199. in the result as dimensions with size one. With this option,
  1200. the result will broadcast correctly against the original `a`.
  1201. Returns
  1202. -------
  1203. variance : ndarray, see dtype parameter above
  1204. If `out` is None, return a new array containing the variance,
  1205. otherwise return a reference to the output array. If ddof is >= the
  1206. number of non-NaN elements in a slice or the slice contains only
  1207. NaNs, then the result for that slice is NaN.
  1208. See Also
  1209. --------
  1210. std : Standard deviation
  1211. mean : Average
  1212. var : Variance while not ignoring NaNs
  1213. nanstd, nanmean
  1214. numpy.doc.ufuncs : Section "Output arguments"
  1215. Notes
  1216. -----
  1217. The variance is the average of the squared deviations from the mean,
  1218. i.e., ``var = mean(abs(x - x.mean())**2)``.
  1219. The mean is normally calculated as ``x.sum() / N``, where ``N = len(x)``.
  1220. If, however, `ddof` is specified, the divisor ``N - ddof`` is used
  1221. instead. In standard statistical practice, ``ddof=1`` provides an
  1222. unbiased estimator of the variance of a hypothetical infinite
  1223. population. ``ddof=0`` provides a maximum likelihood estimate of the
  1224. variance for normally distributed variables.
  1225. Note that for complex numbers, the absolute value is taken before
  1226. squaring, so that the result is always real and nonnegative.
  1227. For floating-point input, the variance is computed using the same
  1228. precision the input has. Depending on the input data, this can cause
  1229. the results to be inaccurate, especially for `float32` (see example
  1230. below). Specifying a higher-accuracy accumulator using the ``dtype``
  1231. keyword can alleviate this issue.
  1232. For this function to work on sub-classes of ndarray, they must define
  1233. `sum` with the kwarg `keepdims`
  1234. Examples
  1235. --------
  1236. >>> a = np.array([[1, np.nan], [3, 4]])
  1237. >>> np.var(a)
  1238. 1.5555555555555554
  1239. >>> np.nanvar(a, axis=0)
  1240. array([ 1., 0.])
  1241. >>> np.nanvar(a, axis=1)
  1242. array([ 0., 0.25])
  1243. """
  1244. arr, mask = _replace_nan(a, 0)
  1245. if mask is None:
  1246. return np.var(arr, axis=axis, dtype=dtype, out=out, ddof=ddof,
  1247. keepdims=keepdims)
  1248. if dtype is not None:
  1249. dtype = np.dtype(dtype)
  1250. if dtype is not None and not issubclass(dtype.type, np.inexact):
  1251. raise TypeError("If a is inexact, then dtype must be inexact")
  1252. if out is not None and not issubclass(out.dtype.type, np.inexact):
  1253. raise TypeError("If a is inexact, then out must be inexact")
  1254. # Compute mean
  1255. if type(arr) is np.matrix:
  1256. _keepdims = np._NoValue
  1257. else:
  1258. _keepdims = True
  1259. # we need to special case matrix for reverse compatibility
  1260. # in order for this to work, these sums need to be called with
  1261. # keepdims=True, however matrix now raises an error in this case, but
  1262. # the reason that it drops the keepdims kwarg is to force keepdims=True
  1263. # so this used to work by serendipity.
  1264. cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=_keepdims)
  1265. avg = np.sum(arr, axis=axis, dtype=dtype, keepdims=_keepdims)
  1266. avg = _divide_by_count(avg, cnt)
  1267. # Compute squared deviation from mean.
  1268. np.subtract(arr, avg, out=arr, casting='unsafe')
  1269. arr = _copyto(arr, 0, mask)
  1270. if issubclass(arr.dtype.type, np.complexfloating):
  1271. sqr = np.multiply(arr, arr.conj(), out=arr).real
  1272. else:
  1273. sqr = np.multiply(arr, arr, out=arr)
  1274. # Compute variance.
  1275. var = np.sum(sqr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  1276. if var.ndim < cnt.ndim:
  1277. # Subclasses of ndarray may ignore keepdims, so check here.
  1278. cnt = cnt.squeeze(axis)
  1279. dof = cnt - ddof
  1280. var = _divide_by_count(var, dof)
  1281. isbad = (dof <= 0)
  1282. if np.any(isbad):
  1283. warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning, stacklevel=2)
  1284. # NaN, inf, or negative numbers are all possible bad
  1285. # values, so explicitly replace them with NaN.
  1286. var = _copyto(var, np.nan, isbad)
  1287. return var
  1288. def _nanstd_dispatcher(
  1289. a, axis=None, dtype=None, out=None, ddof=None, keepdims=None):
  1290. return (a, out)
  1291. @array_function_dispatch(_nanstd_dispatcher)
  1292. def nanstd(a, axis=None, dtype=None, out=None, ddof=0, keepdims=np._NoValue):
  1293. """
  1294. Compute the standard deviation along the specified axis, while
  1295. ignoring NaNs.
  1296. Returns the standard deviation, a measure of the spread of a
  1297. distribution, of the non-NaN array elements. The standard deviation is
  1298. computed for the flattened array by default, otherwise over the
  1299. specified axis.
  1300. For all-NaN slices or slices with zero degrees of freedom, NaN is
  1301. returned and a `RuntimeWarning` is raised.
  1302. .. versionadded:: 1.8.0
  1303. Parameters
  1304. ----------
  1305. a : array_like
  1306. Calculate the standard deviation of the non-NaN values.
  1307. axis : {int, tuple of int, None}, optional
  1308. Axis or axes along which the standard deviation is computed. The default is
  1309. to compute the standard deviation of the flattened array.
  1310. dtype : dtype, optional
  1311. Type to use in computing the standard deviation. For arrays of
  1312. integer type the default is float64, for arrays of float types it
  1313. is the same as the array type.
  1314. out : ndarray, optional
  1315. Alternative output array in which to place the result. It must have
  1316. the same shape as the expected output but the type (of the
  1317. calculated values) will be cast if necessary.
  1318. ddof : int, optional
  1319. Means Delta Degrees of Freedom. The divisor used in calculations
  1320. is ``N - ddof``, where ``N`` represents the number of non-NaN
  1321. elements. By default `ddof` is zero.
  1322. keepdims : bool, optional
  1323. If this is set to True, the axes which are reduced are left
  1324. in the result as dimensions with size one. With this option,
  1325. the result will broadcast correctly against the original `a`.
  1326. If this value is anything but the default it is passed through
  1327. as-is to the relevant functions of the sub-classes. If these
  1328. functions do not have a `keepdims` kwarg, a RuntimeError will
  1329. be raised.
  1330. Returns
  1331. -------
  1332. standard_deviation : ndarray, see dtype parameter above.
  1333. If `out` is None, return a new array containing the standard
  1334. deviation, otherwise return a reference to the output array. If
  1335. ddof is >= the number of non-NaN elements in a slice or the slice
  1336. contains only NaNs, then the result for that slice is NaN.
  1337. See Also
  1338. --------
  1339. var, mean, std
  1340. nanvar, nanmean
  1341. numpy.doc.ufuncs : Section "Output arguments"
  1342. Notes
  1343. -----
  1344. The standard deviation is the square root of the average of the squared
  1345. deviations from the mean: ``std = sqrt(mean(abs(x - x.mean())**2))``.
  1346. The average squared deviation is normally calculated as
  1347. ``x.sum() / N``, where ``N = len(x)``. If, however, `ddof` is
  1348. specified, the divisor ``N - ddof`` is used instead. In standard
  1349. statistical practice, ``ddof=1`` provides an unbiased estimator of the
  1350. variance of the infinite population. ``ddof=0`` provides a maximum
  1351. likelihood estimate of the variance for normally distributed variables.
  1352. The standard deviation computed in this function is the square root of
  1353. the estimated variance, so even with ``ddof=1``, it will not be an
  1354. unbiased estimate of the standard deviation per se.
  1355. Note that, for complex numbers, `std` takes the absolute value before
  1356. squaring, so that the result is always real and nonnegative.
  1357. For floating-point input, the *std* is computed using the same
  1358. precision the input has. Depending on the input data, this can cause
  1359. the results to be inaccurate, especially for float32 (see example
  1360. below). Specifying a higher-accuracy accumulator using the `dtype`
  1361. keyword can alleviate this issue.
  1362. Examples
  1363. --------
  1364. >>> a = np.array([[1, np.nan], [3, 4]])
  1365. >>> np.nanstd(a)
  1366. 1.247219128924647
  1367. >>> np.nanstd(a, axis=0)
  1368. array([ 1., 0.])
  1369. >>> np.nanstd(a, axis=1)
  1370. array([ 0., 0.5])
  1371. """
  1372. var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  1373. keepdims=keepdims)
  1374. if isinstance(var, np.ndarray):
  1375. std = np.sqrt(var, out=var)
  1376. else:
  1377. std = var.dtype.type(np.sqrt(var))
  1378. return std