API

Types

Functions

Documentation

MCBench.AbstractKernelType
get_mmd(s1::DensitySampleVector, s2::DensitySampleVector; g=0, N=0)
compute_bandwidth(x, y)

Functions to calculate the Maximum Mean Discrepancy (MMD) between two samples. This code is adapted from the IPMeasures.jl package. Please refer to the repository for the original implementation.

source
MCBench.AnySamplerType
abstract type AnySampler

An abstract type that serves as a base for all sampling algorithms.

source
MCBench.CsvBasedSamplerType
struct CsvBasedSampler <: AbstractFileBasedSampler
IO funtionalities to read data from a set of CSV files.

# Fields
- `fbs::FileBasedSampler`: File-based sampler
- `header::Vector{String}`: Header of the CSV file
- `mask::Vector{Int}`: Mask to extract the desired columns
- `info::String`: Information about the sampler

# Constructors
- `CsvBasedSampler(; fields...)`
- `CsvBasedSampler(file_paths::Vector{String})`: Creates a `CsvBasedSampler` with the given vector of file paths.
- `CsvBasedSampler(path::String)`: Creates a `CsvBasedSampler` with the given file path. This will load all files in the directory if the path is a directory.
source
MCBench.DsvSamplerType
struct DsvSampler{D<:DensitySampleVector} <: AbstractFileBasedSampler
IO funtionalities to read data from a set of DensitySampleVector files.

# Fields
- `dsvs::Vector{D}`: Vector of file paths
- `current_dsv_index::Int`: Index of the current file in the vector
- `current_position::Int`: Position within the current file
- `weighted::Bool`: Whether the samples are weighted
- `total_samples::Int`: Total number of samples
- `neff::Vector{Float64}`: Number of effective samples
- `info::String`: Information about the sampler

# Constructors
- `DsvSampler(; fields...)`
- `DsvSampler(dsvs::Vector{D})`: Creates a `DsvSampler` with the given vector of file paths.
source
MCBench.DsvTestcaseType
struct DsvTestcase <: AbstractTestcase

A struct representing a test case based on DensitySampleVectors. This is meant to be used as IIDs when the samples are precalculated and stored in a file. In that case samples should be read from the file converted to a DensitySampleVector and used as the test case.

Fields

  • sampler::DS: The sampler that generates the samples.
  • dim::N: The dimension of the test case.
  • info::A: Additional information about the test case.

Constructors

  • DsvTestcase(s::DS, n::Int, info::A): Creates a test case with the given sampler, dimension and additional information.
  • DsvTestcase(s::DS, info::A): Creates a test case with the given sampler and additional information. The dimension is set to the dimension of the samples.
source
MCBench.FileBasedSamplerType
struct FileBasesSampler <: SamplingAlgorithm

IO funtionalities to read data from a set of files.

Fields

  • files::Vector{String}: Vector of file paths.
  • current_file_index::Int: Index of the current file in the vector.
  • current_position::Int: Position within the current file.
  • current_file_handle::IOStream: Handle to the currently open file.
  • info::String: Information about the sampler. Used for plotting.

Constructors

  • FileBasedSampler(; fields...)
  • FileBasedSampler(file_paths::Vector{String}): Creates a FileBasedSampler with the given vector of file paths.
  • FileBasedSampler(path::String): Creates a FileBasedSampler with the given file path. This will load all files in the directory if the path is a directory.
source
MCBench.IIDSamplerType
struct IIDSampler <: IIDSamplingAlgorithm

A struct representing an IID (Independent and Identically Distributed) Sampler to created instances of IID sampling algorithms.

Fields

  • n_steps::Int: The number of steps for the sampler. Identical to the number of samples for IID.
  • info::String: Information or description of the sampler.

Constructors

  • IIDSampler(): Creates an IIDSampler with default values of 10^5 steps and "IID" as the info string.
source
MCBench.IIDSamplingAlgorithmType
abstract type IIDSamplingAlgorithm <: SamplingAlgorithm

An abstract type for independent and identically distributed (IID) sampling algorithms that inherit from SamplingAlgorithm. This type doesn't have any fields or methods, but it is used to have testfunctions reference their IID Base.rand method when sampling.

source
MCBench.SamplingAlgorithmType
abstract type SamplingAlgorithm <: AnySampler

An abstract type for general sampling algorithms that inherit from AnySampler.

source
MCBench.TestMetricType
abstract type TestMetric

An abstract type for any test metrics. Each metric is its own struct and must implement the calc_metric function. Metrics are designed to be both structs that hold the metric value and determine which version of calc_metric to dispatch.

source
MCBench.TestcasesType
struct Testcases <: AbstractTestcase

A struct representing a test case to be used in the framework of MCBench. Testcases must consisit of a distribution or Target that is sampleable and a set of bounds.

Fields

  • f::D: The distribution or Target of the test case.
  • bounds::B: The bounds of the test case.
  • dim::N: The dimension of the test case.
  • info::A: Additional information about the test case.

Constructors

  • Testcases(; fields...): Creates a test case with the given fields.
  • Testcases(f::D, bounds::B, dim::N, info::A): Creates a test case with the given distribution or target, bounds, dimension and additional information.
  • Testcases(f::D, dim::N, info::A): Creates a test case with the given distribution or target, dimension and additional information. Bounds are set to [-10..10] for each dimension.
source
MCBench.TwoSampleMetricType
abstract type TwoSampleMetric

An abstract type for any two-sample test metrics. Such as Wasserstein, MMD, etc.

source
MCBench.chi_squaredType
struct chi_squared{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `chi_squared(; fields...)`
- `chi_squared(val::Real)` : Creates a `chi_squared` metric with the given metric value.

A struct for the chi_squared of the samples as a metric. 
The chi-squared is calculated for each dimension separately using an unbinned chi-squared.
source
MCBench.global_modeType
struct global_mode{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `global_mode(; fields...)`
- `global_mode(val::Real)` : Creates a `global_mode` metric with the given metric value.

A struct for the global mode of the samples as a metric. This will return the mode as a vector of the mode for each dimension.
source
MCBench.marginal_kurtosisType
struct marginal_kurtosis{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_kurtosis(; fields...)`
- `marginal_kurtosis(val::Real)` : Creates a `marginal_kurtosis` metric with the given metric value.

A struct for the marginal_kurtosis of the samples as a metric. Marginal kurtosis are calculated for each dimension of the samples and returned as a vector of `marginal_kurtosis` values.
source
MCBench.marginal_meanType

struct marginal_mean{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_mean(; fields...)`
- `marginal_mean(val::Real)` : Creates a `marginal_mean` metric with the given metric value.

A struct for the mean value for each dimension of the samples as a metric.
source
MCBench.marginal_modeType
struct marginal_mode{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_mode(; fields...)`
- `marginal_mode(val::Real)` : Creates a `marginal_mode` metric with the given metric value.

A struct for the marginal_mode of the samples as a metric. Marginal modes are calculated for each dimension of the samples and returned as a vector of `marginal_mode` values.
source
MCBench.marginal_skewnessType
struct marginal_skewness{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_skewness(; fields...)`
- `marginal_skewness(val::Real)` : Creates a `marginal_skewness` metric with the given metric value.

A struct for the marginal_skewness of the samples as a metric. Marginal skewness are calculated for each dimension of the samples and returned as a vector of `marginal_skewness` values.
source
MCBench.marginal_varianceType
struct marginal_variance{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_variance(; fields...)`
- `marginal_variance(val::Real)` : Creates a `marginal_variance` metric with the given metric value.

A struct for the variance for each dimension of the samples as a metric.
source
MCBench.maximum_mean_discrepancyType
struct maximum_mean_discrepancy{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the maximum mean discrepancy.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.

# Constructors
- `maximum_mean_discrepancy(; fields...)`
- `maximum_mean_discrepancy(val::Real)` : Creates a `maximum_mean_discrepancy` metric with the given metric value.
- `maximum_mean_discrepancy(val::Real, N::Int)` : Creates a `maximum_mean_discrepancy` metric with the given metric value and number of points.

A struct for the maximum_mean_discrepancy of the samples as a metric. 
Per default the number of points used to calculate the maximum mean discrepancy is 10^4.
source
MCBench.sliced_wasserstein_distanceType
struct sliced_wasserstein_distance{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the sliced Wasserstein distance.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.

# Constructors
- `sliced_wasserstein_distance(; fields...)`
- `sliced_wasserstein_distance(val::Real)` : Creates a `sliced_wasserstein_distance` metric with the given metric value.

A struct for the sliced_wasserstein_distance of the samples as a metric. 
The sliced Wasserstein distance is calculated by projecting the samples onto a random direction and calculating the Wasserstein distance in that direction.
source
MCBench.wasserstein_1dType
struct wasserstein_1d{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `wasserstein_1d(; fields...)`
- `wasserstein_1d(val::Real)` : Creates a `wasserstein_1d` metric with the given metric value.

A struct for the wasserstein_1d of the samples as a metric. The Wasserstein distance is calculated for each dimension separately and returned as a vector of `wasserstein_1d` values.
source
MCBench.build_teststat_reshuffleMethod
build_teststat_reshuffle(t<:AbstractTestcase, n::Int, m::Vector{TestMetric}, fnm::Vector{IOStream}; 
    s=IIDSampler(), n_steps=10^5, n_samples=0, unweight=true, par=false, use_sampler=true) -> Nothing

This function is prone to be changed mostly used for internal usage, please use the build_teststatistic function instead! Generates reshuffled test statistics for a given test case and metric vector, writing results to specified files.

Arguments

  • t<:AbstractTestcase: The test case to be evaluated, is a subtype of AbstractTestcase.
  • n::Int: Number of reshuffling iterations to perform.
  • m::Vector{TM}: A vector of metrics to evaluate, where TM is a subtype of TestMetric.
  • fnm::Vector{IOStream}: A vector of open file streams for writing test statistics.

Keyword Arguments

  • s: The sampling algorithm to use. Defaults to IIDSampler().
  • n_steps::Int: Number of steps for the sampling algorithm. Defaults to 10^5.
  • n_samples::Int: Number of samples to generate for each iteration. Defaults to 0 (use all available samples).
  • unweight::Bool: Whether to resample based on effective sample size (ESS) when weights are unequal. Defaults to true.
  • par::Bool: Enable parallel processing if true. Defaults to false.
  • use_sampler::Bool: Use the sampler for data generation if true. Defaults to true.

Returns

  • Nothing: Results are written to the provided file streams.

Notes

  1. Handles both cases of generating fixed or variable sample sizes (n_samples > 0 or n_samples <= 0).
  2. Ensures proper resampling when weights are unequal, maintaining consistent ESS.
  3. Supports parallel and sequential processing based on the par argument.
  4. Automatically closes all file streams upon completion.

Example

testcase = MyTestcase()
metrics = [Metric1(), Metric2()]
file_streams = [open("metric1.txt", "w"), open("metric2.txt", "w")]
build_teststat_reshuffle(testcase, 100, metrics, file_streams; n_steps=10^4, unweight=true, par=false)

Error Handling

  • Ensures file streams are closed in case of unexpected errors during execution.
source
MCBench.build_teststatisticMethod
build_teststatistic(t<:AbstractTestcase, m::Vector{TestMetric}; 
    s=IIDSampler(), n::Int=10^2, n_steps::Int=10^5, 
    n_samples::Int=10^5, par::Bool=true, 
    clean::Bool=false, unweight::Bool=true, 
    use_sampler::Bool=true, iid=false)

Build and store test statistics for a given test case and multiple Metrics. This function manages file creation and invokes reshuffling and sampling to calculate the desired statistics. The resampling is performed n times according to the effective sample size of the provided sampler. Sampling is repeted until n samples of effective samples are generated.

Arguments

  • t<:AbstractTestcase: The test case, is a subtype of AbstractTestcase.
  • m::Vector{TM}: A vector of metrics, where TM is a subtype of TestMetric.

Keyword Arguments

  • s: A sampling algorithm. Defaults to IIDSampler().
  • n::Int: Number of iterations aka the number of times the tests and calculations of the metrics is performed. Defaults to 10^2.
  • n_steps::Int: Number of steps for the sampling algorithm. This setting is only necessary for samplers that require a fixed number of steps. This is not relevant if the number of steps is defined in the sampler object. Defaults to 10^5.
  • n_samples::Int: Number of samples to generate. This setting referes to the number of IID samples. Defaults to 10^5.
  • par::Bool: Whether to enable parallel processing. Defaults to true.
  • clean::Bool: If true, clears the output files before calculation. Defaults to false.
  • unweight::Bool: Whether to unweight samples during reshuffling. If false resampling to the effective sample size will be skipped. Defaults to true.
  • use_sampler::Bool: Whether to utilize the sampler in the calculations. This is only necessary for internal sampler types which have custom settings (like fixed number of steps). It will use the actual sampling object instead of creating a new instance of the sampler object with default settings and the number of steps in this function. Defaults to true.
  • iid::Bool: If true, forces IID behavior regardless of the sampler type. Can be used for IID to IID comparisons. Defaults to false.

Returns

  • Nothing: Results are stored in specified files.

Notes

  1. Make sure to have created teststatistics and teststatistics_sampler filepaths.
  2. File paths are generated based on the info property of the test case, metric, and sampler.
  3. If clean is true, all files are overwritten. Otherwise, results are appended.
  4. The function ensures that files are properly closed, even in case of errors.
  5. Reshuffling and sampling are handled by the build_teststat_reshuffle function.

Example

metrics = [Metric1(), Metric2()]
testcase = MyTestcase()
build_teststatistic(testcase, metrics; n=100, par=true, clean=true)
source
MCBench.calc_metricMethod
calc_metric(t::AbstractTestcase, 
s::DensitySampleVector, 
m::TestMetric)

Calculate the metric `m` for the test case `t` and the samples `s`.
This function is used to calculate the metric value for a given test case and samples for any metric type.
`TwoSampleMetric` metrics can also be used, however the `calc_metric` then samples a second set of IID samples to compare against.
source
MCBench.get_sliced_wasserstein_distanceMethod
get_sliced_wasserstein_distance(dist1, dist2; d=50, L=1000, p=1, N=100_000)

Compute the sliced Wasserstein distance between two probability distributions.

Arguments

  • dist1: The first probability distribution.
  • dist2: The second probability distribution.
  • d: The dimension of the samples. Default is 50.
  • L: The number of random projections. Default is 1000.
  • p: The norm order. Default is 1.
  • N: The number of samples. Default is 100_000.

Returns

The sliced Wasserstein distance between dist1 and dist2.

source
MCBench.parse_teststatisticMethod
parse_teststatistic(filename::String)

Parses test statistics from a specified file.

Arguments

  • filename::String: The name of the file containing test statistics.

Returns

  • Array{Float64}: A reshaped array containing the parsed test statistics.

Notes

The function reads lines from the file, parses them as JSON, and reshapes the resulting data.

Example

statistics = parse_teststatistic("./teststatistics/example.txt")
println(statistics)
source
MCBench.parselineMethod
parseline(f::IOStream)

Parses a single line of JSON data from an open file stream.

Arguments

  • f::IOStream: The open file stream from which the line is read.

Returns

  • Vector{Float64}: A vector parsed from the JSON line.
  • []: An empty array if the line is empty.

Example

file = open("./teststatistics/example.txt", "r")
line_data = parseline(file)
close(file)
println(line_data)
source
MCBench.plot_teststatisticMethod
plot_metrics(t::AbstractTestcase, normvals::Vector{NamedTuple}; plotalpha=0.2, infos=[], s=[]) 
plot_teststatistic(t::AbstractTestcase, m::TM) where {TM <: TestMetric}
plot_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler; nbins=32, same_bins=true,sampler_bins=false)

Functions to plot test statistics and overview plots for a given test case and metrics. These functions are subject to change and changed for recepies and are not guaranteed to be stable

Arguments

  • t::AbstractTestcase: The open file stream from which the line is read.
  • m::TM: The metric for which the test statistic is plotted.
  • s::AnySampler: The sampling algorithm for which the test statistic is plotted.
  • nbins::Int: The number of bins for the histogram.
  • same_bins::Bool: Whether to use the same bins for the test statistic and the sampling algorithm.
  • sampler_bins::Bool: Whether to use the bins of the sampling algorithm for the test statistic or from the IID samples. This should be set to true as the distribution of the test statistics for the sampler i svery likely have a wider range than the IID samples.

Example

t = MyTestcase()
m = [MyMetric_1(), MyMetric_2()]
s = MySampler()

plot_teststatistic(t, m, s; nbins=32, same_bins=true, sampler_bins=true)
plot_metrics(t, normvals; plotalpha=0.2, infos=[], s=[])
source
MCBench.read_teststatisticMethod
read_teststatistic(t::AbstractTestcase, m::TM)
read_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler)

Functions to read test statistics from a file. Read test statistics for a given test case, metric, and optionally a sampling algorithm from a file.

Arguments

  • t::AbstractTestcase: The test case for which the statistics are being read.
  • m::TM: The metric, where TM is a subtype of TestMetric.
  • s::AnySampler (optional): The sampling algorithm used to generate the statistics. If not givin the IID statistics are read.

Returns

  • Array{Float64}: A reshaped array containing the parsed test statistics.

Notes

  • Constructs the file name using t.info and m.info, and optionally s.info if a sampler is provided.
  • Reads and parses the contents of the file to extract the statistics.

Example

testcase = MyTestcase()
metric = MyMetric()
statistics = read_teststatistic(testcase, metric)  # Without sampler
println(statistics)

sampler = MySampler()
statistics_with_sampler = read_teststatistic(testcase, metric, sampler)  # With sampler
println(statistics_with_sampler)
source
MCBench.repMethod
wasserstein1d(a::Vector, b::Vector; p=1, wa=nothing, wb=nothing)
get_sliced_wasserstein_distance(sample1::DensitySampleVector, sample2::DensitySampleVector; L=1000, p=1, N=0, parallel=true)

Functions to calculate the Wasserstein distance between two 1D distributions and the sliced Wasserstein distance between two probability distributions. This code is adapted from the Transport package in R translated to Julia.

source
MCBench.run_teststatisticMethod
run_teststatistic(
t::AbstractTestcase, 
samples1::DensitySampleVector,
samples2::DensitySampleVector,
m::TwoSampleMetric,
s::SamplingAlgorithm;)

run_teststatistic(
t::AbstractTestcase, 
samples::DensitySampleVector,
m::TwoSampleMetric, s::AnySampler;)

run_teststatistic(
t::AbstractTestcase, 
samples::DensitySampleVector,
m::TwoSampleMetric, s::Int;)

run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric,
s::AnySampler; n_steps::Int=10^5)

run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric;
n_steps::Int=10^5)

Functions to run a two-sample test statistic on a given testcase and metric. When the samples are provided, the function calculates the metric value using the samples. When the sampling algorithm is provided, the function samples using the algorithm and the IID samples and calculates the metric value. When no samples and sampling algorithm are provided, the function samples using the testcase and calculates the metric value.

Arguments

  • t::AbstractTestcase: The test case, is a subtype of AbstractTestcase.
  • samples1::DensitySampleVector: The first density sample vector.
  • samples2::DensitySampleVector: The second density sample vector.
  • m::TwoSampleMetric: The metric, is a subtype of TwoSampleMetric.
  • s::SamplingAlgorithm: The sampling algorithm used for calculations.
  • n_steps::Int: The number of steps to be generated.

Returns

  • TwoSampleMetric: The calculated metric value.
source
MCBench.sampleMethod
sample(t::DsvTestcase; n_steps=10^5)::DensitySampleVector
sample(t::DsvTestcase, s::SamplingAlgorithm; n_steps=10^5)::DensitySampleVector

The sample methods using DsvTestcase using the same logic as for Testcases but using the precalculated samples no matter the sampler used.

source
MCBench.sampleMethod
sample(t::Testcases; n_steps=10^5)::DensitySampleVector
sample(t::Testcases, n::Int)::DensitySampleVector
sample(t::Testcases, s::IIDSamplingAlgorithm; n_steps=10^5)::DensitySampleVector

IID sampling from the distribution of the test case t with n_steps or n samples. When integrating a custom sampling algorithm, the sample method should be overloaded for the new sampling algorithm type for s. Returns a density sample vector with the samples and the log densities.

source
MCBench.sampleMethod
sample(t<:AbsatractTestcase, s<:AbstractFileBasedSampler; n_steps=10^5)::DensitySampleVector

Sampling from the distribution of the test case t using a file-based sampler s with n_steps samples. The test case is not used in the sampling process, but it is used to calculate the log densities of the samples. Returns a density sample vector with the samples and the log densities.

source