_rvs_sampling.py 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
  1. from __future__ import division, print_function, absolute_import
  2. import numpy as np
  3. import warnings
  4. from scipy._lib._util import check_random_state
  5. def rvs_ratio_uniforms(pdf, umax, vmin, vmax, size=1, c=0, random_state=None):
  6. """
  7. Generate random samples from a probability density function using the
  8. ratio-of-uniforms method.
  9. Parameters
  10. ----------
  11. pdf : callable
  12. A function with signature `pdf(x)` that is the probability
  13. density function of the distribution.
  14. umax : float
  15. The upper bound of the bounding rectangle in the u-direction.
  16. vmin : float
  17. The lower bound of the bounding rectangle in the v-direction.
  18. vmax : float
  19. The upper bound of the bounding rectangle in the v-direction.
  20. size : int or tuple of ints, optional
  21. Defining number of random variates (default is 1).
  22. c : float, optional.
  23. Shift parameter of ratio-of-uniforms method, see Notes. Default is 0.
  24. random_state : int or np.random.RandomState instance, optional
  25. If already a RandomState instance, use it.
  26. If seed is an int, return a new RandomState instance seeded with seed.
  27. If None, use np.random.RandomState. Default is None.
  28. Returns
  29. -------
  30. rvs : ndarray
  31. The random variates distributed according to the probability
  32. distribution defined by the pdf.
  33. Notes
  34. -----
  35. Given a univariate probability density function `pdf` and a constant `c`,
  36. define the set ``A = {(u, v) : 0 < u <= sqrt(pdf(v/u + c))}``.
  37. If `(U, V)` is a random vector uniformly distributed over `A`,
  38. then `V/U + c` follows a distribution according to `pdf`.
  39. The above result (see [1]_, [2]_) can be used to sample random variables
  40. using only the pdf, i.e. no inversion of the cdf is required. Typical
  41. choices of `c` are zero or the mode of `pdf`. The set `A` is a subset of
  42. the rectangle ``R = [0, umax] x [vmin, vmax]`` where
  43. - ``umax = sup sqrt(pdf(x))``
  44. - ``vmin = inf (x - c) sqrt(pdf(x))``
  45. - ``vmax = sup (x - c) sqrt(pdf(x))``
  46. In particular, these values are finite if `pdf` is bounded and
  47. ``x**2 * pdf(x)`` is bounded (i.e. subquadratic tails).
  48. One can generate `(U, V)` uniformly on `R` and return
  49. `V/U + c` if `(U, V)` are also in `A` which can be directly
  50. verified.
  51. Intuitively, the method works well if `A` fills up most of the
  52. enclosing rectangle such that the probability is high that `(U, V)`
  53. lies in `A` whenever it lies in `R` as the number of required
  54. iterations becomes too large otherwise. To be more precise, note that
  55. the expected number of iterations to draw `(U, V)` uniformly
  56. distributed on `R` such that `(U, V)` is also in `A` is given by
  57. the ratio ``area(R) / area(A) = 2 * umax * (vmax - vmin)``, using the fact
  58. that the area of `A` is equal to 1/2 (Theorem 7.1 in [1]_). A warning
  59. is displayed if this ratio is larger than 20. Moreover, if the sampling
  60. fails to generate a single random variate after 50000 iterations (i.e.
  61. not a single draw is in `A`), an exception is raised.
  62. If the bounding rectangle is not correctly specified (i.e. if it does not
  63. contain `A`), the algorithm samples from a distribution different from
  64. the one given by `pdf`. It is therefore recommended to perform a
  65. test such as `stats.kstest` as a check.
  66. References
  67. ----------
  68. .. [1] L. Devroye, "Non-Uniform Random Variate Generation",
  69. Springer-Verlag, 1986.
  70. .. [2] W. Hoermann and J. Leydold, "Generating generalized inverse Gaussian
  71. random variates", Statistics and Computing, 24(4), p. 547--557, 2014.
  72. .. [3] A.J. Kinderman and J.F. Monahan, "Computer Generation of Random
  73. Variables Using the Ratio of Uniform Deviates",
  74. ACM Transactions on Mathematical Software, 3(3), p. 257--260, 1977.
  75. Examples
  76. --------
  77. >>> from scipy import stats
  78. Simulate normally distributed random variables. It is easy to compute the
  79. bounding rectangle explicitly in that case.
  80. >>> f = stats.norm.pdf
  81. >>> v_bound = np.sqrt(f(np.sqrt(2))) * np.sqrt(2)
  82. >>> umax, vmin, vmax = np.sqrt(f(0)), -v_bound, v_bound
  83. >>> np.random.seed(12345)
  84. >>> rvs = stats.rvs_ratio_uniforms(f, umax, vmin, vmax, size=2500)
  85. The K-S test confirms that the random variates are indeed normally
  86. distributed (normality is not rejected at 5% significance level):
  87. >>> stats.kstest(rvs, 'norm')[1]
  88. 0.3420173467307603
  89. The exponential distribution provides another example where the bounding
  90. rectangle can be determined explicitly.
  91. >>> np.random.seed(12345)
  92. >>> rvs = stats.rvs_ratio_uniforms(lambda x: np.exp(-x), umax=1,
  93. ... vmin=0, vmax=2*np.exp(-1), size=1000)
  94. >>> stats.kstest(rvs, 'expon')[1]
  95. 0.928454552559516
  96. Sometimes it can be helpful to use a non-zero shift parameter `c`, see e.g.
  97. [2]_ above in the case of the generalized inverse Gaussian distribution.
  98. """
  99. if vmin >= vmax:
  100. raise ValueError("vmin must be smaller than vmax.")
  101. if umax <= 0:
  102. raise ValueError("umax must be positive.")
  103. exp_iter = 2 * (vmax - vmin) * umax # rejection constant (see [1])
  104. if exp_iter > 20:
  105. msg = ("The expected number of iterations to generate a single random "
  106. "number from the desired distribution is larger than {}, "
  107. "potentially causing bad performance.".format(int(exp_iter)))
  108. warnings.warn(msg, RuntimeWarning)
  109. size1d = tuple(np.atleast_1d(size))
  110. N = np.prod(size1d) # number of rvs needed, reshape upon return
  111. # start sampling using ratio of uniforms method
  112. rng = check_random_state(random_state)
  113. x = np.zeros(N)
  114. simulated, i = 0, 1
  115. # loop until N rvs have been generated: expected runtime is finite
  116. # to avoid infinite loop, raise exception if not a single rv has been
  117. # generated after 50000 tries. even if exp_iter = 1000, probability of
  118. # this event is (1-1/1000)**50000 which is of order 10e-22
  119. while True:
  120. k = N - simulated
  121. # simulate uniform rvs on [0, umax] and [vmin, vmax]
  122. u1 = umax * rng.random_sample(size=k)
  123. v1 = vmin + (vmax - vmin) * rng.random_sample(size=k)
  124. # apply rejection method
  125. rvs = v1 / u1 + c
  126. accept = (u1**2 <= pdf(rvs))
  127. num_accept = np.sum(accept)
  128. if num_accept > 0:
  129. take = min(num_accept, N - simulated)
  130. x[simulated:(simulated + take)] = rvs[accept][0:take]
  131. simulated += take
  132. if simulated >= N:
  133. return np.reshape(x, size1d)
  134. if (simulated == 0) and (i*N >= 50000):
  135. msg = ("Not a single random variate could be generated in {} "
  136. "attempts. The ratio of uniforms method does not appear "
  137. "to work for the provided parameters. Please check the "
  138. "pdf and the bounds.".format(i*N))
  139. raise RuntimeError(msg)
  140. i += 1