API

Types

MCBench.AbstractFileBasedSampler
MCBench.AbstractKernel
MCBench.AnySampler
MCBench.CsvBasedSampler
MCBench.DsvSampler
MCBench.DsvTestcase
MCBench.FileBasedSampler
MCBench.IIDSampler
MCBench.IIDSamplingAlgorithm
MCBench.SamplingAlgorithm
MCBench.Target
MCBench.TestMetric
MCBench.Testcases
MCBench.TwoSampleMetric
MCBench.chi_squared
MCBench.global_mode
MCBench.marginal_kurtosis
MCBench.marginal_mean
MCBench.marginal_mode
MCBench.marginal_skewness
MCBench.marginal_variance
MCBench.maximum_mean_discrepancy
MCBench.sliced_wasserstein_distance
MCBench.wasserstein_1d

Functions

MCBench.build_teststat_reshuffle
MCBench.build_teststatistic
MCBench.calc_metric
MCBench.get_sliced_wasserstein_distance
MCBench.parse_teststatistic
MCBench.parseline
MCBench.plot_teststatistic
MCBench.read_teststatistic
MCBench.rep
MCBench.run_teststatistic
MCBench.sample
MCBench.sample
MCBench.sample

Documentation

MCBench.AbstractFileBasedSampler — Type

abstract type AbstractFileBasedSampler <: AnySampler

An abstract type for file-based sampling algorithms that inherit from AnySampler.

source

MCBench.AbstractKernel — Type

get_mmd(s1::DensitySampleVector, s2::DensitySampleVector; g=0, N=0)
compute_bandwidth(x, y)

Functions to calculate the Maximum Mean Discrepancy (MMD) between two samples. This code is adapted from the IPMeasures.jl package. Please refer to the repository for the original implementation.

source

MCBench.AnySampler — Type

abstract type AnySampler

An abstract type that serves as a base for all sampling algorithms.

source

MCBench.CsvBasedSampler — Type

struct CsvBasedSampler <: AbstractFileBasedSampler
IO funtionalities to read data from a set of CSV files.

# Fields
- `fbs::FileBasedSampler`: File-based sampler
- `header::Vector{String}`: Header of the CSV file
- `mask::Vector{Int}`: Mask to extract the desired columns
- `info::String`: Information about the sampler

# Constructors
- `CsvBasedSampler(; fields...)`
- `CsvBasedSampler(file_paths::Vector{String})`: Creates a `CsvBasedSampler` with the given vector of file paths.
- `CsvBasedSampler(path::String)`: Creates a `CsvBasedSampler` with the given file path. This will load all files in the directory if the path is a directory.

source

MCBench.DsvSampler — Type

struct DsvSampler{D<:DensitySampleVector} <: AbstractFileBasedSampler
IO funtionalities to read data from a set of DensitySampleVector files.

# Fields
- `dsvs::Vector{D}`: Vector of file paths
- `current_dsv_index::Int`: Index of the current file in the vector
- `current_position::Int`: Position within the current file
- `weighted::Bool`: Whether the samples are weighted
- `total_samples::Int`: Total number of samples
- `neff::Vector{Float64}`: Number of effective samples
- `info::String`: Information about the sampler

# Constructors
- `DsvSampler(; fields...)`
- `DsvSampler(dsvs::Vector{D})`: Creates a `DsvSampler` with the given vector of file paths.

source

MCBench.DsvTestcase — Type

struct DsvTestcase <: AbstractTestcase

A struct representing a test case based on DensitySampleVectors. This is meant to be used as IIDs when the samples are precalculated and stored in a file. In that case samples should be read from the file converted to a DensitySampleVector and used as the test case.

Fields

sampler::DS: The sampler that generates the samples.
dim::N: The dimension of the test case.
info::A: Additional information about the test case.

Constructors

DsvTestcase(s::DS, n::Int, info::A): Creates a test case with the given sampler, dimension and additional information.
DsvTestcase(s::DS, info::A): Creates a test case with the given sampler and additional information. The dimension is set to the dimension of the samples.

source

MCBench.FileBasedSampler — Type

struct FileBasesSampler <: SamplingAlgorithm

IO funtionalities to read data from a set of files.

Fields

files::Vector{String}: Vector of file paths.
current_file_index::Int: Index of the current file in the vector.
current_position::Int: Position within the current file.
current_file_handle::IOStream: Handle to the currently open file.
info::String: Information about the sampler. Used for plotting.

Constructors

FileBasedSampler(; fields...)
FileBasedSampler(file_paths::Vector{String}): Creates a FileBasedSampler with the given vector of file paths.
FileBasedSampler(path::String): Creates a FileBasedSampler with the given file path. This will load all files in the directory if the path is a directory.

source

MCBench.IIDSampler — Type

struct IIDSampler <: IIDSamplingAlgorithm

A struct representing an IID (Independent and Identically Distributed) Sampler to created instances of IID sampling algorithms.

Fields

n_steps::Int: The number of steps for the sampler. Identical to the number of samples for IID.
info::String: Information or description of the sampler.

Constructors

IIDSampler(): Creates an IIDSampler with default values of 10^5 steps and "IID" as the info string.

source

MCBench.IIDSamplingAlgorithm — Type

abstract type IIDSamplingAlgorithm <: SamplingAlgorithm

An abstract type for independent and identically distributed (IID) sampling algorithms that inherit from SamplingAlgorithm. This type doesn't have any fields or methods, but it is used to have testfunctions reference their IID Base.rand method when sampling.

source

MCBench.SamplingAlgorithm — Type

abstract type SamplingAlgorithm <: AnySampler

An abstract type for general sampling algorithms that inherit from AnySampler.

source

MCBench.Target — Type

abstract type AbstractTestcase

An abstract type for any test cases.

source

MCBench.TestMetric — Type

abstract type TestMetric

An abstract type for any test metrics. Each metric is its own struct and must implement the calc_metric function. Metrics are designed to be both structs that hold the metric value and determine which version of calc_metric to dispatch.

source

MCBench.Testcases — Type

struct Testcases <: AbstractTestcase

A struct representing a test case to be used in the framework of MCBench. Testcases must consisit of a distribution or Target that is sampleable and a set of bounds.

Fields

f::D: The distribution or Target of the test case.
bounds::B: The bounds of the test case.
dim::N: The dimension of the test case.
info::A: Additional information about the test case.

Constructors

Testcases(; fields...): Creates a test case with the given fields.
Testcases(f::D, bounds::B, dim::N, info::A): Creates a test case with the given distribution or target, bounds, dimension and additional information.
Testcases(f::D, dim::N, info::A): Creates a test case with the given distribution or target, dimension and additional information. Bounds are set to [-10..10] for each dimension.

source

MCBench.TwoSampleMetric — Type

abstract type TwoSampleMetric

An abstract type for any two-sample test metrics. Such as Wasserstein, MMD, etc.

source

MCBench.chi_squared — Type

struct chi_squared{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `chi_squared(; fields...)`
- `chi_squared(val::Real)` : Creates a `chi_squared` metric with the given metric value.

A struct for the chi_squared of the samples as a metric. 
The chi-squared is calculated for each dimension separately using an unbinned chi-squared.

source

MCBench.global_mode — Type

struct global_mode{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `global_mode(; fields...)`
- `global_mode(val::Real)` : Creates a `global_mode` metric with the given metric value.

A struct for the global mode of the samples as a metric. This will return the mode as a vector of the mode for each dimension.

source

MCBench.marginal_kurtosis — Type

struct marginal_kurtosis{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_kurtosis(; fields...)`
- `marginal_kurtosis(val::Real)` : Creates a `marginal_kurtosis` metric with the given metric value.

A struct for the marginal_kurtosis of the samples as a metric. Marginal kurtosis are calculated for each dimension of the samples and returned as a vector of `marginal_kurtosis` values.

source

MCBench.marginal_mean — Type

struct marginal_mean{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_mean(; fields...)`
- `marginal_mean(val::Real)` : Creates a `marginal_mean` metric with the given metric value.

A struct for the mean value for each dimension of the samples as a metric.

source

MCBench.marginal_mode — Type

struct marginal_mode{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_mode(; fields...)`
- `marginal_mode(val::Real)` : Creates a `marginal_mode` metric with the given metric value.

A struct for the marginal_mode of the samples as a metric. Marginal modes are calculated for each dimension of the samples and returned as a vector of `marginal_mode` values.

source

MCBench.marginal_skewness — Type

struct marginal_skewness{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_skewness(; fields...)`
- `marginal_skewness(val::Real)` : Creates a `marginal_skewness` metric with the given metric value.

A struct for the marginal_skewness of the samples as a metric. Marginal skewness are calculated for each dimension of the samples and returned as a vector of `marginal_skewness` values.

source

MCBench.marginal_variance — Type

struct marginal_variance{V<:Real,A<:Any} <: TestMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `marginal_variance(; fields...)`
- `marginal_variance(val::Real)` : Creates a `marginal_variance` metric with the given metric value.

A struct for the variance for each dimension of the samples as a metric.

source

MCBench.maximum_mean_discrepancy — Type

struct maximum_mean_discrepancy{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the maximum mean discrepancy.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.

# Constructors
- `maximum_mean_discrepancy(; fields...)`
- `maximum_mean_discrepancy(val::Real)` : Creates a `maximum_mean_discrepancy` metric with the given metric value.
- `maximum_mean_discrepancy(val::Real, N::Int)` : Creates a `maximum_mean_discrepancy` metric with the given metric value and number of points.

A struct for the maximum_mean_discrepancy of the samples as a metric. 
Per default the number of points used to calculate the maximum mean discrepancy is 10^4.

source

MCBench.sliced_wasserstein_distance — Type

struct sliced_wasserstein_distance{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the sliced Wasserstein distance.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.

# Constructors
- `sliced_wasserstein_distance(; fields...)`
- `sliced_wasserstein_distance(val::Real)` : Creates a `sliced_wasserstein_distance` metric with the given metric value.

A struct for the sliced_wasserstein_distance of the samples as a metric. 
The sliced Wasserstein distance is calculated by projecting the samples onto a random direction and calculating the Wasserstein distance in that direction.

source

MCBench.wasserstein_1d — Type

struct wasserstein_1d{V<:Real,A<:Any} <: TwoSampleMetric

# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.

# Constructors
- `wasserstein_1d(; fields...)`
- `wasserstein_1d(val::Real)` : Creates a `wasserstein_1d` metric with the given metric value.

A struct for the wasserstein_1d of the samples as a metric. The Wasserstein distance is calculated for each dimension separately and returned as a vector of `wasserstein_1d` values.

source

MCBench.build_teststat_reshuffle — Method

build_teststat_reshuffle(t<:AbstractTestcase, n::Int, m::Vector{TestMetric}, fnm::Vector{IOStream}; 
    s=IIDSampler(), n_steps=10^5, n_samples=0, unweight=true, par=false, use_sampler=true) -> Nothing

This function is prone to be changed mostly used for internal usage, please use the build_teststatistic function instead! Generates reshuffled test statistics for a given test case and metric vector, writing results to specified files.

Arguments

t<:AbstractTestcase: The test case to be evaluated, is a subtype of AbstractTestcase.
n::Int: Number of reshuffling iterations to perform.
m::Vector{TM}: A vector of metrics to evaluate, where TM is a subtype of TestMetric.
fnm::Vector{IOStream}: A vector of open file streams for writing test statistics.

Keyword Arguments

s: The sampling algorithm to use. Defaults to IIDSampler().
n_steps::Int: Number of steps for the sampling algorithm. Defaults to 10^5.
n_samples::Int: Number of samples to generate for each iteration. Defaults to 0 (use all available samples).
unweight::Bool: Whether to resample based on effective sample size (ESS) when weights are unequal. Defaults to true.
par::Bool: Enable parallel processing if true. Defaults to false.
use_sampler::Bool: Use the sampler for data generation if true. Defaults to true.

Returns

Nothing: Results are written to the provided file streams.

Notes

Handles both cases of generating fixed or variable sample sizes (n_samples > 0 or n_samples <= 0).
Ensures proper resampling when weights are unequal, maintaining consistent ESS.
Supports parallel and sequential processing based on the par argument.
Automatically closes all file streams upon completion.

Example

testcase = MyTestcase()
metrics = [Metric1(), Metric2()]
file_streams = [open("metric1.txt", "w"), open("metric2.txt", "w")]
build_teststat_reshuffle(testcase, 100, metrics, file_streams; n_steps=10^4, unweight=true, par=false)

Error Handling

Ensures file streams are closed in case of unexpected errors during execution.

source

MCBench.build_teststatistic — Method

build_teststatistic(t<:AbstractTestcase, m::Vector{TestMetric}; 
    s=IIDSampler(), n::Int=10^2, n_steps::Int=10^5, 
    n_samples::Int=10^5, par::Bool=true, 
    clean::Bool=false, unweight::Bool=true, 
    use_sampler::Bool=true, iid=false)

Build and store test statistics for a given test case and multiple Metrics. This function manages file creation and invokes reshuffling and sampling to calculate the desired statistics. The resampling is performed n times according to the effective sample size of the provided sampler. Sampling is repeted until n samples of effective samples are generated.

Arguments

t<:AbstractTestcase: The test case, is a subtype of AbstractTestcase.
m::Vector{TM}: A vector of metrics, where TM is a subtype of TestMetric.

Keyword Arguments

s: A sampling algorithm. Defaults to IIDSampler().
n::Int: Number of iterations aka the number of times the tests and calculations of the metrics is performed. Defaults to 10^2.
n_steps::Int: Number of steps for the sampling algorithm. This setting is only necessary for samplers that require a fixed number of steps. This is not relevant if the number of steps is defined in the sampler object. Defaults to 10^5.
n_samples::Int: Number of samples to generate. This setting referes to the number of IID samples. Defaults to 10^5.
par::Bool: Whether to enable parallel processing. Defaults to true.
clean::Bool: If true, clears the output files before calculation. Defaults to false.
unweight::Bool: Whether to unweight samples during reshuffling. If false resampling to the effective sample size will be skipped. Defaults to true.
use_sampler::Bool: Whether to utilize the sampler in the calculations. This is only necessary for internal sampler types which have custom settings (like fixed number of steps). It will use the actual sampling object instead of creating a new instance of the sampler object with default settings and the number of steps in this function. Defaults to true.
iid::Bool: If true, forces IID behavior regardless of the sampler type. Can be used for IID to IID comparisons. Defaults to false.

Returns

Nothing: Results are stored in specified files.

Notes

Make sure to have created teststatistics and teststatistics_sampler filepaths.
File paths are generated based on the info property of the test case, metric, and sampler.
If clean is true, all files are overwritten. Otherwise, results are appended.
The function ensures that files are properly closed, even in case of errors.
Reshuffling and sampling are handled by the build_teststat_reshuffle function.

Example

metrics = [Metric1(), Metric2()]
testcase = MyTestcase()
build_teststatistic(testcase, metrics; n=100, par=true, clean=true)

source

MCBench.calc_metric — Method

calc_metric(t::AbstractTestcase, 
s::DensitySampleVector, 
m::TestMetric)

Calculate the metric `m` for the test case `t` and the samples `s`.
This function is used to calculate the metric value for a given test case and samples for any metric type.
`TwoSampleMetric` metrics can also be used, however the `calc_metric` then samples a second set of IID samples to compare against.

source

MCBench.get_sliced_wasserstein_distance — Method

get_sliced_wasserstein_distance(dist1, dist2; d=50, L=1000, p=1, N=100_000)

Compute the sliced Wasserstein distance between two probability distributions.

Arguments

dist1: The first probability distribution.
dist2: The second probability distribution.
d: The dimension of the samples. Default is 50.
L: The number of random projections. Default is 1000.
p: The norm order. Default is 1.
N: The number of samples. Default is 100_000.

Returns

The sliced Wasserstein distance between dist1 and dist2.

source

MCBench.parse_teststatistic — Method

parse_teststatistic(filename::String)

Parses test statistics from a specified file.

Arguments

filename::String: The name of the file containing test statistics.

Returns

Array{Float64}: A reshaped array containing the parsed test statistics.

Notes

The function reads lines from the file, parses them as JSON, and reshapes the resulting data.

Example

statistics = parse_teststatistic("./teststatistics/example.txt")
println(statistics)

source

MCBench.parseline — Method

parseline(f::IOStream)

Parses a single line of JSON data from an open file stream.

Arguments

f::IOStream: The open file stream from which the line is read.

Returns

Vector{Float64}: A vector parsed from the JSON line.
[]: An empty array if the line is empty.

Example

file = open("./teststatistics/example.txt", "r")
line_data = parseline(file)
close(file)
println(line_data)

source

MCBench.plot_teststatistic — Method

plot_metrics(t::AbstractTestcase, normvals::Vector{NamedTuple}; plotalpha=0.2, infos=[], s=[]) 
plot_teststatistic(t::AbstractTestcase, m::TM) where {TM <: TestMetric}
plot_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler; nbins=32, same_bins=true,sampler_bins=false)

Functions to plot test statistics and overview plots for a given test case and metrics. These functions are subject to change and changed for recepies and are not guaranteed to be stable

Arguments

t::AbstractTestcase: The open file stream from which the line is read.
m::TM: The metric for which the test statistic is plotted.
s::AnySampler: The sampling algorithm for which the test statistic is plotted.
nbins::Int: The number of bins for the histogram.
same_bins::Bool: Whether to use the same bins for the test statistic and the sampling algorithm.
sampler_bins::Bool: Whether to use the bins of the sampling algorithm for the test statistic or from the IID samples. This should be set to true as the distribution of the test statistics for the sampler i svery likely have a wider range than the IID samples.

Example

t = MyTestcase()
m = [MyMetric_1(), MyMetric_2()]
s = MySampler()

plot_teststatistic(t, m, s; nbins=32, same_bins=true, sampler_bins=true)
plot_metrics(t, normvals; plotalpha=0.2, infos=[], s=[])

source

MCBench.read_teststatistic — Method

read_teststatistic(t::AbstractTestcase, m::TM)
read_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler)

Functions to read test statistics from a file. Read test statistics for a given test case, metric, and optionally a sampling algorithm from a file.

Arguments

t::AbstractTestcase: The test case for which the statistics are being read.
m::TM: The metric, where TM is a subtype of TestMetric.
s::AnySampler (optional): The sampling algorithm used to generate the statistics. If not givin the IID statistics are read.

Returns

Array{Float64}: A reshaped array containing the parsed test statistics.

Notes

Constructs the file name using t.info and m.info, and optionally s.info if a sampler is provided.
Reads and parses the contents of the file to extract the statistics.

Example

testcase = MyTestcase()
metric = MyMetric()
statistics = read_teststatistic(testcase, metric)  # Without sampler
println(statistics)

sampler = MySampler()
statistics_with_sampler = read_teststatistic(testcase, metric, sampler)  # With sampler
println(statistics_with_sampler)

source

MCBench.rep — Method

wasserstein1d(a::Vector, b::Vector; p=1, wa=nothing, wb=nothing)
get_sliced_wasserstein_distance(sample1::DensitySampleVector, sample2::DensitySampleVector; L=1000, p=1, N=0, parallel=true)

Functions to calculate the Wasserstein distance between two 1D distributions and the sliced Wasserstein distance between two probability distributions. This code is adapted from the Transport package in R translated to Julia.

source

MCBench.run_teststatistic — Method

run_teststatistic(
t::AbstractTestcase, 
samples1::DensitySampleVector,
samples2::DensitySampleVector,
m::TwoSampleMetric,
s::SamplingAlgorithm;)

run_teststatistic(
t::AbstractTestcase, 
samples::DensitySampleVector,
m::TwoSampleMetric, s::AnySampler;)

run_teststatistic(
t::AbstractTestcase, 
samples::DensitySampleVector,
m::TwoSampleMetric, s::Int;)

run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric,
s::AnySampler; n_steps::Int=10^5)

run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric;
n_steps::Int=10^5)

Functions to run a two-sample test statistic on a given testcase and metric. When the samples are provided, the function calculates the metric value using the samples. When the sampling algorithm is provided, the function samples using the algorithm and the IID samples and calculates the metric value. When no samples and sampling algorithm are provided, the function samples using the testcase and calculates the metric value.

Arguments

t::AbstractTestcase: The test case, is a subtype of AbstractTestcase.
samples1::DensitySampleVector: The first density sample vector.
samples2::DensitySampleVector: The second density sample vector.
m::TwoSampleMetric: The metric, is a subtype of TwoSampleMetric.
s::SamplingAlgorithm: The sampling algorithm used for calculations.
n_steps::Int: The number of steps to be generated.

Returns

TwoSampleMetric: The calculated metric value.

source

MCBench.sample — Method

sample(t::DsvTestcase; n_steps=10^5)::DensitySampleVector
sample(t::DsvTestcase, s::SamplingAlgorithm; n_steps=10^5)::DensitySampleVector

The sample methods using DsvTestcase using the same logic as for Testcases but using the precalculated samples no matter the sampler used.

source

MCBench.sample — Method

sample(t::Testcases; n_steps=10^5)::DensitySampleVector
sample(t::Testcases, n::Int)::DensitySampleVector
sample(t::Testcases, s::IIDSamplingAlgorithm; n_steps=10^5)::DensitySampleVector

IID sampling from the distribution of the test case t with n_steps or n samples. When integrating a custom sampling algorithm, the sample method should be overloaded for the new sampling algorithm type for s. Returns a density sample vector with the samples and the log densities.

source

MCBench.sample — Method

sample(t<:AbsatractTestcase, s<:AbstractFileBasedSampler; n_steps=10^5)::DensitySampleVector

Sampling from the distribution of the test case t using a file-based sampler s with n_steps samples. The test case is not used in the sampling process, but it is used to calculate the log densities of the samples. Returns a density sample vector with the samples and the log densities.

source