Fake datasets
Three curves dataset
GeometricClusterAnalysis.noisy_three_curves
— Functionnoisy_three_curves(npoints, α, sigma, d)
noisy_three_curves(rng, npoints, α, sigma, d)
noisy_three_curves(rng, nsignal, nnoise, sigma, d)
nsignal
: number of signal pointsnnoise
: number of additionnal outliers
Signal points are $x = y+z$ with
- $y$ uniform on the 3 curves
- $z$ normal with mean 0 and covariance matrix $\sigma * I_d$ (with $I_d$ the identity matrix of $R^d$)
d
is the dimension of the data and sigma, the standard deviation of the additive Gaussian noise. When $d>2, y_i = 0$ for $i>=2$; with the notation $y=(y_i)_{i=1..d}$
using Random
using Plots
using GeometricClusterAnalysis
nsignal = 500 # number of signal points
nnoise = 200 # number of outliers
dim = 2 # dimension of the data
sigma = 0.02 # standard deviation for the additive noise
rng = MersenneTwister(1234)
dataset = noisy_three_curves( rng, nsignal, nnoise, sigma, dim)
plot(dataset, palette = :rainbow)
Infinity symbol dataset
GeometricClusterAnalysis.infinity_symbol
— Functioninfinity_symbol(
rng,
nsignal,
nnoise,
σ,
dimension,
noise_min,
noise_max
)
infinity_symbol(
nsignal,
nnoise,
σ,
dimension,
noise_min,
noise_max
)
infinity_symbol(
npoints,
α,
σ,
dimension,
noise_min,
noise_max
)
infinity_symbol(
rng,
npoints,
α,
σ,
dimension,
noise_min,
noise_max
)
nsignal = 500
nnoise = 50
σ = 0.05
dimension = 3
noise_min = -5
noise_max = 5
dataset = infinity_symbol(rng, nsignal, nnoise, σ, dimension, noise_min, noise_max)
plot(dataset)
Fourteen segments dataset
GeometricClusterAnalysis.noisy_fourteen_segments
— Functionnoisy_fourteen_segments(rng, nsignal, nnoise, σ, d)
nsignal
: number of signal pointsnnoise
: number of additionnal outliers
sampled accordingly to generate noise signal points are $X = Y+Z$ with $Y$ uniform on the 14 segments $Z$ normal with mean 0 and covariance matrix $σ*I_d$ (with Id the identity matrix of $R^d$) So, d is the dimension of the data and σ, the standard deviation of the additive Gaussian noise. When ``d>2, Yi = 0$for$i>=2$; with the notation$Y=(Yi){i=1..d}``
noisy_fourteen_segments(rng, npoints, α, σ, d)
npoints
: total number of pointsα
: fraction of outliers
noisy_fourteen_segments(npoints, α, σ, d)
noisy_fourteen_segments(nsignal, nnoise, σ, d)
using LinearAlgebra
nsignal = 490
nnoise = 200
d = 2
sigma = 0.02 .* Matrix(I, d, d)
dataset = noisy_fourteen_segments(nsignal, nnoise, sigma, d)
plot(dataset, aspect_ratio=1, palette = :lightrainbow)
Two spirals dataset
GeometricClusterAnalysis.noisy_nested_spirals
— Functionnoisy_nested_spirals(npoints, α, σ, dimension)
noisy_nested_spirals(nsignal, nnoise, σ, dimension)
noisy_nested_spirals(rng, npoints, α, σ, dimension)
noisy_nested_spirals(rng, nsignal, nnoise, σ, dimension)
nsignal
: number of signal pointsnnoise
: number of additionnal outliers
Signal points are $x = y+z$ with
- $y$ uniform on the two nested spirals
- $z$ normal with mean 0 and covariance matrix $\sigma * I_d$ (with $I_d$ the identity matrix of $R^d$)
d
is the dimension of the data and sigma, the standard deviation of the additive Gaussian noise. When $d>2, y_i = 0$ for $i>=2$; with the notation $y=(y_i)_{i=1..d}$
nsignal = 2000 # number of signal points
nnoise = 400 # number of outliers
dim = 2 # dimension of the data
σ = 0.5 # standard deviation for the additive noise
rng = MersenneTwister(1234)
dataset = noisy_nested_spirals(rng, nsignal, nnoise, σ, dim)
plot(dataset)