API
Types
MCBench.AbstractFileBasedSampler
MCBench.AbstractKernel
MCBench.AnySampler
MCBench.CsvBasedSampler
MCBench.DsvSampler
MCBench.DsvTestcase
MCBench.FileBasedSampler
MCBench.IIDSampler
MCBench.IIDSamplingAlgorithm
MCBench.SamplingAlgorithm
MCBench.Target
MCBench.TestMetric
MCBench.Testcases
MCBench.TwoSampleMetric
MCBench.chi_squared
MCBench.global_mode
MCBench.marginal_kurtosis
MCBench.marginal_mean
MCBench.marginal_mode
MCBench.marginal_skewness
MCBench.marginal_variance
MCBench.maximum_mean_discrepancy
MCBench.sliced_wasserstein_distance
MCBench.wasserstein_1d
Functions
MCBench.build_teststat_reshuffle
MCBench.build_teststatistic
MCBench.calc_metric
MCBench.get_sliced_wasserstein_distance
MCBench.parse_teststatistic
MCBench.parseline
MCBench.plot_teststatistic
MCBench.read_teststatistic
MCBench.rep
MCBench.run_teststatistic
MCBench.sample
MCBench.sample
MCBench.sample
Documentation
MCBench.AbstractFileBasedSampler
— Typeabstract type AbstractFileBasedSampler <: AnySampler
An abstract type for file-based sampling algorithms that inherit from AnySampler
.
MCBench.AbstractKernel
— Typeget_mmd(s1::DensitySampleVector, s2::DensitySampleVector; g=0, N=0)
compute_bandwidth(x, y)
Functions to calculate the Maximum Mean Discrepancy (MMD) between two samples. This code is adapted from the IPMeasures.jl package. Please refer to the repository for the original implementation.
MCBench.AnySampler
— Typeabstract type AnySampler
An abstract type that serves as a base for all sampling algorithms.
MCBench.CsvBasedSampler
— Typestruct CsvBasedSampler <: AbstractFileBasedSampler
IO funtionalities to read data from a set of CSV files.
# Fields
- `fbs::FileBasedSampler`: File-based sampler
- `header::Vector{String}`: Header of the CSV file
- `mask::Vector{Int}`: Mask to extract the desired columns
- `info::String`: Information about the sampler
# Constructors
- `CsvBasedSampler(; fields...)`
- `CsvBasedSampler(file_paths::Vector{String})`: Creates a `CsvBasedSampler` with the given vector of file paths.
- `CsvBasedSampler(path::String)`: Creates a `CsvBasedSampler` with the given file path. This will load all files in the directory if the path is a directory.
MCBench.DsvSampler
— Typestruct DsvSampler{D<:DensitySampleVector} <: AbstractFileBasedSampler
IO funtionalities to read data from a set of DensitySampleVector files.
# Fields
- `dsvs::Vector{D}`: Vector of file paths
- `current_dsv_index::Int`: Index of the current file in the vector
- `current_position::Int`: Position within the current file
- `weighted::Bool`: Whether the samples are weighted
- `total_samples::Int`: Total number of samples
- `neff::Vector{Float64}`: Number of effective samples
- `info::String`: Information about the sampler
# Constructors
- `DsvSampler(; fields...)`
- `DsvSampler(dsvs::Vector{D})`: Creates a `DsvSampler` with the given vector of file paths.
MCBench.DsvTestcase
— Typestruct DsvTestcase <: AbstractTestcase
A struct representing a test case based on DensitySampleVectors. This is meant to be used as IIDs when the samples are precalculated and stored in a file. In that case samples should be read from the file converted to a DensitySampleVector and used as the test case.
Fields
sampler::DS
: The sampler that generates the samples.dim::N
: The dimension of the test case.info::A
: Additional information about the test case.
Constructors
DsvTestcase(s::DS, n::Int, info::A)
: Creates a test case with the given sampler, dimension and additional information.DsvTestcase(s::DS, info::A)
: Creates a test case with the given sampler and additional information. The dimension is set to the dimension of the samples.
MCBench.FileBasedSampler
— Typestruct FileBasesSampler <: SamplingAlgorithm
IO funtionalities to read data from a set of files.
Fields
files::Vector{String}
: Vector of file paths.current_file_index::Int
: Index of the current file in the vector.current_position::Int
: Position within the current file.current_file_handle::IOStream
: Handle to the currently open file.info::String
: Information about the sampler. Used for plotting.
Constructors
FileBasedSampler(; fields...)
FileBasedSampler(file_paths::Vector{String})
: Creates aFileBasedSampler
with the given vector of file paths.FileBasedSampler(path::String)
: Creates aFileBasedSampler
with the given file path. This will load all files in the directory if the path is a directory.
MCBench.IIDSampler
— Typestruct IIDSampler <: IIDSamplingAlgorithm
A struct representing an IID (Independent and Identically Distributed) Sampler to created instances of IID sampling algorithms.
Fields
n_steps::Int
: The number of steps for the sampler. Identical to the number of samples for IID.info::String
: Information or description of the sampler.
Constructors
IIDSampler()
: Creates anIIDSampler
with default values of10^5
steps and "IID" as the info string.
MCBench.IIDSamplingAlgorithm
— Typeabstract type IIDSamplingAlgorithm <: SamplingAlgorithm
An abstract type for independent and identically distributed (IID) sampling algorithms that inherit from SamplingAlgorithm
. This type doesn't have any fields or methods, but it is used to have testfunctions reference their IID Base.rand
method when sampling.
MCBench.SamplingAlgorithm
— Typeabstract type SamplingAlgorithm <: AnySampler
An abstract type for general sampling algorithms that inherit from AnySampler
.
MCBench.Target
— Typeabstract type AbstractTestcase
An abstract type for any test cases.
MCBench.TestMetric
— Typeabstract type TestMetric
An abstract type for any test metrics. Each metric is its own struct and must implement the calc_metric
function. Metrics are designed to be both structs that hold the metric value and determine which version of calc_metric
to dispatch.
MCBench.Testcases
— Typestruct Testcases <: AbstractTestcase
A struct representing a test case to be used in the framework of MCBench. Testcases must consisit of a distribution or Target that is sampleable and a set of bounds.
Fields
f::D
: The distribution or Target of the test case.bounds::B
: The bounds of the test case.dim::N
: The dimension of the test case.info::A
: Additional information about the test case.
Constructors
Testcases(; fields...)
: Creates a test case with the given fields.Testcases(f::D, bounds::B, dim::N, info::A)
: Creates a test case with the given distribution or target, bounds, dimension and additional information.Testcases(f::D, dim::N, info::A)
: Creates a test case with the given distribution or target, dimension and additional information. Bounds are set to[-10..10]
for each dimension.
MCBench.TwoSampleMetric
— Typeabstract type TwoSampleMetric
An abstract type for any two-sample test metrics. Such as Wasserstein, MMD, etc.
MCBench.chi_squared
— Typestruct chi_squared{V<:Real,A<:Any} <: TwoSampleMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `chi_squared(; fields...)`
- `chi_squared(val::Real)` : Creates a `chi_squared` metric with the given metric value.
A struct for the chi_squared of the samples as a metric.
The chi-squared is calculated for each dimension separately using an unbinned chi-squared.
MCBench.global_mode
— Typestruct global_mode{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `global_mode(; fields...)`
- `global_mode(val::Real)` : Creates a `global_mode` metric with the given metric value.
A struct for the global mode of the samples as a metric. This will return the mode as a vector of the mode for each dimension.
MCBench.marginal_kurtosis
— Typestruct marginal_kurtosis{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `marginal_kurtosis(; fields...)`
- `marginal_kurtosis(val::Real)` : Creates a `marginal_kurtosis` metric with the given metric value.
A struct for the marginal_kurtosis of the samples as a metric. Marginal kurtosis are calculated for each dimension of the samples and returned as a vector of `marginal_kurtosis` values.
MCBench.marginal_mean
— Typestruct marginal_mean{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `marginal_mean(; fields...)`
- `marginal_mean(val::Real)` : Creates a `marginal_mean` metric with the given metric value.
A struct for the mean value for each dimension of the samples as a metric.
MCBench.marginal_mode
— Typestruct marginal_mode{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `marginal_mode(; fields...)`
- `marginal_mode(val::Real)` : Creates a `marginal_mode` metric with the given metric value.
A struct for the marginal_mode of the samples as a metric. Marginal modes are calculated for each dimension of the samples and returned as a vector of `marginal_mode` values.
MCBench.marginal_skewness
— Typestruct marginal_skewness{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `marginal_skewness(; fields...)`
- `marginal_skewness(val::Real)` : Creates a `marginal_skewness` metric with the given metric value.
A struct for the marginal_skewness of the samples as a metric. Marginal skewness are calculated for each dimension of the samples and returned as a vector of `marginal_skewness` values.
MCBench.marginal_variance
— Typestruct marginal_variance{V<:Real,A<:Any} <: TestMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `marginal_variance(; fields...)`
- `marginal_variance(val::Real)` : Creates a `marginal_variance` metric with the given metric value.
A struct for the variance for each dimension of the samples as a metric.
MCBench.maximum_mean_discrepancy
— Typestruct maximum_mean_discrepancy{V<:Real,A<:Any} <: TwoSampleMetric
# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the maximum mean discrepancy.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.
# Constructors
- `maximum_mean_discrepancy(; fields...)`
- `maximum_mean_discrepancy(val::Real)` : Creates a `maximum_mean_discrepancy` metric with the given metric value.
- `maximum_mean_discrepancy(val::Real, N::Int)` : Creates a `maximum_mean_discrepancy` metric with the given metric value and number of points.
A struct for the maximum_mean_discrepancy of the samples as a metric.
Per default the number of points used to calculate the maximum mean discrepancy is 10^4.
MCBench.sliced_wasserstein_distance
— Typestruct sliced_wasserstein_distance{V<:Real,A<:Any} <: TwoSampleMetric
# Fields
- `val::V`: The value of the metric.
- `N::I`: The number of points used to calculate the sliced Wasserstein distance.
- `info::A`: Information about the metric.
- `proc::P`: The processor information for the metric.
# Constructors
- `sliced_wasserstein_distance(; fields...)`
- `sliced_wasserstein_distance(val::Real)` : Creates a `sliced_wasserstein_distance` metric with the given metric value.
A struct for the sliced_wasserstein_distance of the samples as a metric.
The sliced Wasserstein distance is calculated by projecting the samples onto a random direction and calculating the Wasserstein distance in that direction.
MCBench.wasserstein_1d
— Typestruct wasserstein_1d{V<:Real,A<:Any} <: TwoSampleMetric
# Fields
- `val::V`: The value of the metric.
- `info::A`: Information about the metric.
# Constructors
- `wasserstein_1d(; fields...)`
- `wasserstein_1d(val::Real)` : Creates a `wasserstein_1d` metric with the given metric value.
A struct for the wasserstein_1d of the samples as a metric. The Wasserstein distance is calculated for each dimension separately and returned as a vector of `wasserstein_1d` values.
MCBench.build_teststat_reshuffle
— Methodbuild_teststat_reshuffle(t<:AbstractTestcase, n::Int, m::Vector{TestMetric}, fnm::Vector{IOStream};
s=IIDSampler(), n_steps=10^5, n_samples=0, unweight=true, par=false, use_sampler=true) -> Nothing
This function is prone to be changed mostly used for internal usage, please use the build_teststatistic
function instead! Generates reshuffled test statistics for a given test case and metric vector, writing results to specified files.
Arguments
t<:AbstractTestcase
: The test case to be evaluated, is a subtype ofAbstractTestcase
.n::Int
: Number of reshuffling iterations to perform.m::Vector{TM}
: A vector of metrics to evaluate, whereTM
is a subtype ofTestMetric
.fnm::Vector{IOStream}
: A vector of open file streams for writing test statistics.
Keyword Arguments
s
: The sampling algorithm to use. Defaults toIIDSampler()
.n_steps::Int
: Number of steps for the sampling algorithm. Defaults to10^5
.n_samples::Int
: Number of samples to generate for each iteration. Defaults to0
(use all available samples).unweight::Bool
: Whether to resample based on effective sample size (ESS) when weights are unequal. Defaults totrue
.par::Bool
: Enable parallel processing iftrue
. Defaults tofalse
.use_sampler::Bool
: Use the sampler for data generation iftrue
. Defaults totrue
.
Returns
Nothing
: Results are written to the provided file streams.
Notes
- Handles both cases of generating fixed or variable sample sizes (
n_samples > 0
orn_samples <= 0
). - Ensures proper resampling when weights are unequal, maintaining consistent ESS.
- Supports parallel and sequential processing based on the
par
argument. - Automatically closes all file streams upon completion.
Example
testcase = MyTestcase()
metrics = [Metric1(), Metric2()]
file_streams = [open("metric1.txt", "w"), open("metric2.txt", "w")]
build_teststat_reshuffle(testcase, 100, metrics, file_streams; n_steps=10^4, unweight=true, par=false)
Error Handling
- Ensures file streams are closed in case of unexpected errors during execution.
MCBench.build_teststatistic
— Methodbuild_teststatistic(t<:AbstractTestcase, m::Vector{TestMetric};
s=IIDSampler(), n::Int=10^2, n_steps::Int=10^5,
n_samples::Int=10^5, par::Bool=true,
clean::Bool=false, unweight::Bool=true,
use_sampler::Bool=true, iid=false)
Build and store test statistics for a given test case and multiple Metrics. This function manages file creation and invokes reshuffling and sampling to calculate the desired statistics. The resampling is performed n
times according to the effective sample size of the provided sampler. Sampling is repeted until n samples
of effective samples are generated.
Arguments
t<:AbstractTestcase
: The test case, is a subtype ofAbstractTestcase
.m::Vector{TM}
: A vector of metrics, whereTM
is a subtype ofTestMetric
.
Keyword Arguments
s
: A sampling algorithm. Defaults toIIDSampler()
.n::Int
: Number of iterations aka the number of times the tests and calculations of the metrics is performed. Defaults to10^2
.n_steps::Int
: Number of steps for the sampling algorithm. This setting is only necessary for samplers that require a fixed number of steps. This is not relevant if the number of steps is defined in the sampler object. Defaults to10^5
.n_samples::Int
: Number of samples to generate. This setting referes to the number of IID samples. Defaults to10^5
.par::Bool
: Whether to enable parallel processing. Defaults totrue
.clean::Bool
: Iftrue
, clears the output files before calculation. Defaults tofalse
.unweight::Bool
: Whether to unweight samples during reshuffling. Iffalse
resampling to the effective sample size will be skipped. Defaults totrue
.use_sampler::Bool
: Whether to utilize the sampler in the calculations. This is only necessary for internal sampler types which have custom settings (like fixed number of steps). It will use the actual sampling object instead of creating a new instance of the sampler object with default settings and the number of steps in this function. Defaults totrue
.iid::Bool
: Iftrue
, forces IID behavior regardless of the sampler type. Can be used for IID to IID comparisons. Defaults tofalse
.
Returns
Nothing
: Results are stored in specified files.
Notes
- Make sure to have created
teststatistics
andteststatistics_sampler
filepaths. - File paths are generated based on the
info
property of the test case, metric, and sampler. - If
clean
istrue
, all files are overwritten. Otherwise, results are appended. - The function ensures that files are properly closed, even in case of errors.
- Reshuffling and sampling are handled by the
build_teststat_reshuffle
function.
Example
metrics = [Metric1(), Metric2()]
testcase = MyTestcase()
build_teststatistic(testcase, metrics; n=100, par=true, clean=true)
MCBench.calc_metric
— Methodcalc_metric(t::AbstractTestcase,
s::DensitySampleVector,
m::TestMetric)
Calculate the metric `m` for the test case `t` and the samples `s`.
This function is used to calculate the metric value for a given test case and samples for any metric type.
`TwoSampleMetric` metrics can also be used, however the `calc_metric` then samples a second set of IID samples to compare against.
MCBench.get_sliced_wasserstein_distance
— Methodget_sliced_wasserstein_distance(dist1, dist2; d=50, L=1000, p=1, N=100_000)
Compute the sliced Wasserstein distance between two probability distributions.
Arguments
dist1
: The first probability distribution.dist2
: The second probability distribution.d
: The dimension of the samples. Default is 50.L
: The number of random projections. Default is 1000.p
: The norm order. Default is 1.N
: The number of samples. Default is 100_000.
Returns
The sliced Wasserstein distance between dist1
and dist2
.
MCBench.parse_teststatistic
— Methodparse_teststatistic(filename::String)
Parses test statistics from a specified file.
Arguments
filename::String
: The name of the file containing test statistics.
Returns
Array{Float64}
: A reshaped array containing the parsed test statistics.
Notes
The function reads lines from the file, parses them as JSON, and reshapes the resulting data.
Example
statistics = parse_teststatistic("./teststatistics/example.txt")
println(statistics)
MCBench.parseline
— Methodparseline(f::IOStream)
Parses a single line of JSON data from an open file stream.
Arguments
f::IOStream
: The open file stream from which the line is read.
Returns
Vector{Float64}
: A vector parsed from the JSON line.[]
: An empty array if the line is empty.
Example
file = open("./teststatistics/example.txt", "r")
line_data = parseline(file)
close(file)
println(line_data)
MCBench.plot_teststatistic
— Methodplot_metrics(t::AbstractTestcase, normvals::Vector{NamedTuple}; plotalpha=0.2, infos=[], s=[])
plot_teststatistic(t::AbstractTestcase, m::TM) where {TM <: TestMetric}
plot_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler; nbins=32, same_bins=true,sampler_bins=false)
Functions to plot test statistics and overview plots for a given test case and metrics. These functions are subject to change and changed for recepies and are not guaranteed to be stable
Arguments
t::AbstractTestcase
: The open file stream from which the line is read.m::TM
: The metric for which the test statistic is plotted.s::AnySampler
: The sampling algorithm for which the test statistic is plotted.nbins::Int
: The number of bins for the histogram.same_bins::Bool
: Whether to use the same bins for the test statistic and the sampling algorithm.sampler_bins::Bool
: Whether to use the bins of the sampling algorithm for the test statistic or from the IID samples. This should be set totrue
as the distribution of the test statistics for the sampler i svery likely have a wider range than the IID samples.
Example
t = MyTestcase()
m = [MyMetric_1(), MyMetric_2()]
s = MySampler()
plot_teststatistic(t, m, s; nbins=32, same_bins=true, sampler_bins=true)
plot_metrics(t, normvals; plotalpha=0.2, infos=[], s=[])
MCBench.read_teststatistic
— Methodread_teststatistic(t::AbstractTestcase, m::TM)
read_teststatistic(t::AbstractTestcase, m::TM, s::AnySampler)
Functions to read test statistics from a file. Read test statistics for a given test case, metric, and optionally a sampling algorithm from a file.
Arguments
t::AbstractTestcase
: The test case for which the statistics are being read.m::TM
: The metric, whereTM
is a subtype ofTestMetric
.s::AnySampler
(optional): The sampling algorithm used to generate the statistics. If not givin the IID statistics are read.
Returns
Array{Float64}
: A reshaped array containing the parsed test statistics.
Notes
- Constructs the file name using
t.info
andm.info
, and optionallys.info
if a sampler is provided. - Reads and parses the contents of the file to extract the statistics.
Example
testcase = MyTestcase()
metric = MyMetric()
statistics = read_teststatistic(testcase, metric) # Without sampler
println(statistics)
sampler = MySampler()
statistics_with_sampler = read_teststatistic(testcase, metric, sampler) # With sampler
println(statistics_with_sampler)
MCBench.rep
— Methodwasserstein1d(a::Vector, b::Vector; p=1, wa=nothing, wb=nothing)
get_sliced_wasserstein_distance(sample1::DensitySampleVector, sample2::DensitySampleVector; L=1000, p=1, N=0, parallel=true)
Functions to calculate the Wasserstein distance between two 1D distributions and the sliced Wasserstein distance between two probability distributions. This code is adapted from the Transport package in R translated to Julia.
MCBench.run_teststatistic
— Methodrun_teststatistic(
t::AbstractTestcase,
samples1::DensitySampleVector,
samples2::DensitySampleVector,
m::TwoSampleMetric,
s::SamplingAlgorithm;)
run_teststatistic(
t::AbstractTestcase,
samples::DensitySampleVector,
m::TwoSampleMetric, s::AnySampler;)
run_teststatistic(
t::AbstractTestcase,
samples::DensitySampleVector,
m::TwoSampleMetric, s::Int;)
run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric,
s::AnySampler; n_steps::Int=10^5)
run_teststatistic(
t::AbstractTestcase, m::TwoSampleMetric;
n_steps::Int=10^5)
Functions to run a two-sample test statistic on a given testcase and metric. When the samples are provided, the function calculates the metric value using the samples. When the sampling algorithm is provided, the function samples using the algorithm and the IID samples and calculates the metric value. When no samples and sampling algorithm are provided, the function samples using the testcase and calculates the metric value.
Arguments
t::AbstractTestcase
: The test case, is a subtype ofAbstractTestcase
.samples1::DensitySampleVector
: The first density sample vector.samples2::DensitySampleVector
: The second density sample vector.m::TwoSampleMetric
: The metric, is a subtype ofTwoSampleMetric
.s::SamplingAlgorithm
: The sampling algorithm used for calculations.n_steps::Int
: The number of steps to be generated.
Returns
TwoSampleMetric
: The calculated metric value.
MCBench.sample
— Methodsample(t::DsvTestcase; n_steps=10^5)::DensitySampleVector
sample(t::DsvTestcase, s::SamplingAlgorithm; n_steps=10^5)::DensitySampleVector
The sample
methods using DsvTestcase
using the same logic as for Testcases
but using the precalculated samples no matter the sampler used.
MCBench.sample
— Methodsample(t::Testcases; n_steps=10^5)::DensitySampleVector
sample(t::Testcases, n::Int)::DensitySampleVector
sample(t::Testcases, s::IIDSamplingAlgorithm; n_steps=10^5)::DensitySampleVector
IID sampling from the distribution of the test case t
with n_steps
or n
samples. When integrating a custom sampling algorithm, the sample
method should be overloaded for the new sampling algorithm type for s
. Returns a density sample vector with the samples and the log densities.
MCBench.sample
— Methodsample(t<:AbsatractTestcase, s<:AbstractFileBasedSampler; n_steps=10^5)::DensitySampleVector
Sampling from the distribution of the test case t
using a file-based sampler s
with n_steps
samples. The test case is not used in the sampling process, but it is used to calculate the log densities of the samples. Returns a density sample vector with the samples and the log densities.