Class DiscreteDistribution


  • public class DiscreteDistribution
    extends java.lang.Object
    A utility class for calculating properties of discrete distributions. Generally, these distributions are represented as arrays of double values, which are assumed to be normalized such that the entries in a single array sum to 1.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      protected static void checkLengths​(double[] dist, double[] reference)
      Throws an IllegalArgumentException if the two arrays are not of the same length.
      static double cosine​(double[] dist, double[] reference)
      Returns the cosine distance between the two specified distributions, which must have the same number of elements.
      static double entropy​(double[] dist)
      Returns the entropy of this distribution.
      static double KullbackLeibler​(double[] dist, double[] reference)
      Returns the Kullback-Leibler divergence between the two specified distributions, which must have the same number of elements.
      static double[] mean​(double[][] distributions)
      Returns the mean of the specified array of distributions, represented as normalized arrays of double values.
      static double[] mean​(java.util.Collection<double[]> distributions)
      Returns the mean of the specified Collection of distributions, which are assumed to be normalized arrays of double values.
      static void normalize​(double[] counts, double alpha)
      Normalizes, with Lagrangian smoothing, the specified double array, so that the values sum to 1 (i.e., can be treated as probabilities).
      static double squaredError​(double[] dist, double[] reference)
      Returns the squared difference between the two specified distributions, which must have the same number of elements.
      static double symmetricKL​(double[] dist, double[] reference)
      Returns KullbackLeibler(dist, reference) + KullbackLeibler(reference, dist).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DiscreteDistribution

        public DiscreteDistribution()
    • Method Detail

      • KullbackLeibler

        public static double KullbackLeibler​(double[] dist,
                                             double[] reference)
        Returns the Kullback-Leibler divergence between the two specified distributions, which must have the same number of elements. This is defined as the sum over all i of dist[i] * Math.log(dist[i] / reference[i]). Note that this value is not symmetric; see symmetricKL for a symmetric variant.
        See Also:
        symmetricKL(double[], double[])
      • symmetricKL

        public static double symmetricKL​(double[] dist,
                                         double[] reference)
        Returns KullbackLeibler(dist, reference) + KullbackLeibler(reference, dist).
        See Also:
        KullbackLeibler(double[], double[])
      • squaredError

        public static double squaredError​(double[] dist,
                                          double[] reference)
        Returns the squared difference between the two specified distributions, which must have the same number of elements. This is defined as the sum over all i of the square of (dist[i] - reference[i]).
      • cosine

        public static double cosine​(double[] dist,
                                    double[] reference)
        Returns the cosine distance between the two specified distributions, which must have the same number of elements. The distributions are treated as vectors in dist.length-dimensional space. Given the following definitions
        • v = the sum over all i of dist[i] * dist[i]
        • w = the sum over all i of reference[i] * reference[i]
        • vw = the sum over all i of dist[i] * reference[i]
        the value returned is defined as vw / (Math.sqrt(v) * Math.sqrt(w)).
      • entropy

        public static double entropy​(double[] dist)
        Returns the entropy of this distribution. High entropy indicates that the distribution is close to uniform; low entropy indicates that the distribution is close to a Dirac delta (i.e., if the probability mass is concentrated at a single point, this method returns 0). Entropy is defined as the sum over all i of -(dist[i] * Math.log(dist[i]))
      • checkLengths

        protected static void checkLengths​(double[] dist,
                                           double[] reference)
        Throws an IllegalArgumentException if the two arrays are not of the same length.
      • normalize

        public static void normalize​(double[] counts,
                                     double alpha)
        Normalizes, with Lagrangian smoothing, the specified double array, so that the values sum to 1 (i.e., can be treated as probabilities). The effect of the Lagrangian smoothing is to ensure that all entries are nonzero; effectively, a value of alpha is added to each entry in the original array prior to normalization.
        Parameters:
        counts -
        alpha -
      • mean

        public static double[] mean​(java.util.Collection<double[]> distributions)
        Returns the mean of the specified Collection of distributions, which are assumed to be normalized arrays of double values.
        See Also:
        mean(double[][])
      • mean

        public static double[] mean​(double[][] distributions)
        Returns the mean of the specified array of distributions, represented as normalized arrays of double values. Will throw an "index out of bounds" exception if the distribution arrays are not all of the same length.