Create New Samplers and Distributions¶
Whereas this package already provides a large collection of common distributions out of box, there are still occasions where you want to create new distributions (e.g your application requires a special kind of distributions, or you want to contribute to this package).
Generally, you don’t have to implement every API method listed in the documentation. This package provides a series of generic functions that turn a small number of internal methods into userend API methods. What you need to do is to implement this small set of internal methods for your distributions.
Note: the methods need to be implemented are different for distributions of different variate forms.
Create a Sampler¶
Unlike a full fledged distributions, a sampler, in general, only provides limited functionalities, mainly to support sampling.
Univariate Sampler¶
To implement a univariate sampler, one can define a sub type (say Spl
) of Sampleable{Univariate,S}
(where S
can be Disrete
or Continuous
), and provide a rand
method, as
function rand(s::Spl)
# ... generate a single sample from s
end
The package already implements a vectorized version of rand!
and rand
that repeatedly calls the he scalar version to generate multiple samples.
Multivariate Sampler¶
To implement a multivariate sampler, one can define a sub type of Sampleable{Multivariate,S}
, and provide both length
and _rand!
methods, as
Base.length(s::Spl) = ... # return the length of each sample
function _rand!{T<:Real}(s::Spl, x::AbstractVector{T})
# ... generate a single vector sample to x
end
This function can assume that the dimension of x
is correct, and doesn’t need to perform dimension checking.
The package implements both rand
and rand!
as follows (which you don’t need to implement in general):
function _rand!(s::Sampleable{Multivariate}, A::DenseMatrix)
for i = 1:size(A,2)
_rand!(s, view(A,:,i))
end
return A
end
function rand!(s::Sampleable{Multivariate}, A::AbstractVector)
length(A) == length(s) 
throw(DimensionMismatch("Output size inconsistent with sample length."))
_rand!(s, A)
end
function rand!(s::Sampleable{Multivariate}, A::DenseMatrix)
size(A,1) == length(s) 
throw(DimensionMismatch("Output size inconsistent with sample length."))
_rand!(s, A)
end
rand{S<:ValueSupport}(s::Sampleable{Multivariate,S}) =
_rand!(s, Vector{eltype(S)}(length(s)))
rand{S<:ValueSupport}(s::Sampleable{Multivariate,S}, n::Int) =
_rand!(s, Matrix{eltype(S)}(length(s), n))
If there is a more efficient method to generate multiple vector samples in batch, one should provide the following method
function _rand!{T<:Real}(s::Spl, A::DenseMatrix{T})
... generate multiple vector samples in batch
end
Remember that each column of A is a sample.
Matrixvariate Sampler¶
To implement a multivariate sampler, one can define a sub type of Sampleable{Multivariate,S}
, and provide both size
and _rand!
method, as
Base.size(s::Spl) = ... # the size of each matrix sample
function _rand!{T<:Real}(s::Spl, x::DenseMatrix{T})
# ... generate a single matrix sample to x
end
Note that you can assume x
has correct dimensions in _rand!
and don’t have to perform dimension checking, the generic rand
and rand!
will do dimension checking and array allocation for you.
Create a Univariate Distribution¶
A univariate distribution type should be defined as a subtype of DiscreteUnivarateDistribution
or ContinuousUnivariateDistribution
.
Following methods need to be implemented for each univariate distribution type (say D
):

rand
(d::D)¶ Generate a scalar sample from
d
.

sampler
(d::D)¶ It is often the case that a sampler relies on some quantities that may be precomputed in advance (that are not parameters themselves).
If such a more efficient sampler exists, one should provide this
sampler
method, which would be used for batch sampling.The general fallback is
sampler(d::Distribution) = d
.

pdf
(d::D, x::Real)¶ Evaluate the probability density (mass) at
x
.Note: The package implements the following generic methods to evaluate pdf values in batch.
pdf!(dst::AbstractArray, d::D, x::AbstractArray)
pdf(d::D, x::AbstractArray)
If there exists more efficient routine to evaluate pdf in batch (faster than repeatedly calling the scalar version of
pdf
), then one can also provide a specialized method ofpdf!
. The vectorized version ofpdf
simply delegats topdf!
.

logpdf
(d::D, x::Real)¶ Evaluate the logarithm of probability density (mass) at
x
.Whereas there is a fallback implemented
logpdf(d, x) = log(pdf(d, x))
. Relying on this fallback is not recommended in general, as it is prone to overflow or underflow.Again, the package provides vectorized version of
logpdf!
andlogpdf
. One may overridelogpdf!
to provide more efficient vectorized evaluation.Furthermore, the generic
loglikelihood
function delegates to_loglikelihood
, which repeatedly callslogpdf
. If there is a better way to compute loglikelihood, one should override_loglikelihood
.

cdf
(d::D, x::Real)¶ Evaluate the cumulative probability at
x
.The package provides generic functions to compute
ccdf
,logcdf
, andlogccdf
in both scalar and vectorized forms. One may override these generic fallbacks if the specialized versions provide better numeric stability or higher efficiency.

quantile
(d::D, q::Real)¶ Evaluate the inverse cumulative distribution function at
q
.The package provides generic functions to compute
cquantile
,invlogcdf
, andinvlogccdf
in both scalar and vectorized forms. One may override these generic fallbacks if the specialized versions provide better numeric stability or higher efficiency.Also a generic
median
is provided, asmedian(d) = quantile(d, 0.5)
. However, one should implement a specialized version ofmedian
if it can be computed faster thanquantile
.

minimum
(d::D)¶ Return the minimum of the support of
d
.

maximum
(d::D)¶ Return the maximum of the support of
d
.

insupport
(d::D, x::Real)¶ Return whether
x
is within the support ofd
.Note a generic fallback as
insupport(d, x) = minimum(d) <= x <= maximum(d)
is provided. However, it is often the case thatinsupport
can be done more efficiently, and a specializedinsupport
is thus desirable. You should also override this function if the support is composed of multiple disjoint intervals.Vectorized versions of
insupport!
andinsupport
are provided as generic fallbacks.
It is also recommended that one also implements the following statistics functions:
mean
: compute the expectation.var
: compute the variance. (A genericstd
is provided asstd(d) = sqrt(var(d))
).modes
: get all modes (if this makes sense).mode
: returns the first mode.skewness
: compute the skewness.kurtosis
: compute the excessive kurtosis.entropy
: compute the entropy.mgf
: compute the moment generating functions.cf
: compute the characteristic function.
You may refer to the source file src/univariates.jl
to see details about how generic fallback functions for univariates are implemented.
Create a Multivariate Distribution¶
A multivariate distribution type should be defined as a subtype of DiscreteMultivarateDistribution
or ContinuousMultivariateDistribution
.
Following methods need to be implemented for each univariate distribution type (say D
):

length
(d::D)¶ Return the length of each sample (i.e the dimension of the sample space).

_rand!{T<:Real}(d::D, x::AbstractVector{T})
Generate a vector sample to
x
.This function does not need to perform dimension checking.

sampler
(d::D) Return a sampler for efficient batch/repeated sampling.

_logpdf{T<:Real}(d::D, x::AbstractVector{T})
Evaluate logarithm of pdf value for a given vector
x
. This function need not perform dimension checking.Generally, one does not need to implement
pdf
(or_pdf
). The package provides fallback methods as follows:_pdf(d::MultivariateDistribution, X::AbstractVector) = exp(_logpdf(d, X)) function logpdf(d::MultivariateDistribution, X::AbstractVector) length(X) == length(d)  throw(DimensionMismatch("Inconsistent array dimensions.")) _logpdf(d, X) end function pdf(d::MultivariateDistribution, X::AbstractVector) length(X) == length(d)  throw(DimensionMismatch("Inconsistent array dimensions.")) _pdf(d, X) end
If there are better ways that can directly evaluate pdf values, one should override
_pdf
(NOTpdf
).The package also provides generic implementation of batch evaluation:
function _logpdf!(r::AbstractArray, d::MultivariateDistribution, X::DenseMatrix) for i in 1 : size(X,2) @inbounds r[i] = logpdf(d, view(X,:,i)) end return r end function _pdf!(r::AbstractArray, d::MultivariateDistribution, X::DenseMatrix) for i in 1 : size(X,2) @inbounds r[i] = pdf(d, view(X,:,i)) end return r end function logpdf!(r::AbstractArray, d::MultivariateDistribution, X::DenseMatrix) size(X) == (length(d), length(r))  throw(DimensionMismatch("Inconsistent array dimensions.")) _logpdf!(r, d, X) end function pdf!(r::AbstractArray, d::MultivariateDistribution, X::DenseMatrix) size(X) == (length(d), length(r))  throw(DimensionMismatch("Inconsistent array dimensions.")) _pdf!(r, d, X) end function logpdf(d::MultivariateDistribution, X::DenseMatrix) size(X, 1) == length(d)  throw(DimensionMismatch("Inconsistent array dimensions.")) _logpdf!(Vector{Float64}(size(X,2)), d, X) end function pdf(d::MultivariateDistribution, X::DenseMatrix) size(X, 1) == length(d)  throw(DimensionMismatch("Inconsistent array dimensions.")) _pdf!(Vector{Float64}(size(X,2)), d, X) end
Note that if there exists faster methods for batch evaluation, one should override
_logpdf!
and_pdf!
.Furthermore, the generic
loglikelihood
function delegates to_loglikelihood
, which repeatedly calls_logpdf
. If there is a better way to compute loglikelihood, one should override_loglikelihood
.
It is also recommended that one also implements the following statistics functions:
mean
: compute the mean vector.var
: compute the vector of elementwise variance.entropy
: compute the entropy.cov
: compute the covariance matrix. (cor
is provided based oncov
).
Create a Matrixvariate Distribution¶
A multivariate distribution type should be defined as a subtype of DiscreteMatrixDistribution
or ContinuousMatrixDistribution
.
Following methods need to be implemented for each univariate distribution type (say D
):

size
(d::D)¶ Return the size of each sample.

rand{T<:Real}(d::D)
Generate a matrix sample.

sampler
(d::D) Return a sampler for efficient batch/repeated sampling.

_logpdf{T<:Real}(d::D, x::AbstractMatrix{T})
Evaluate logarithm of pdf value for a given sample
x
. This function need not perform dimension checking.