Fake datasets

Three curves dataset

GeometricClusterAnalysis.noisy_three_curvesFunction
noisy_three_curves(npoints, α, sigma, d)
source
noisy_three_curves(rng, npoints, α, sigma, d)
source
noisy_three_curves(rng, nsignal, nnoise, sigma, d)
  • nsignal : number of signal points
  • nnoise : number of additionnal outliers

Signal points are $x = y+z$ with

  • $y$ uniform on the 3 curves
  • $z$ normal with mean 0 and covariance matrix $\sigma * I_d$ (with $I_d$ the identity matrix of $R^d$)

d is the dimension of the data and sigma, the standard deviation of the additive Gaussian noise. When $d>2, y_i = 0$ for $i>=2$; with the notation $y=(y_i)_{i=1..d}$

source
using Random
using Plots
using GeometricClusterAnalysis

nsignal = 500 # number of signal points
nnoise = 200 # number of outliers
dim = 2 # dimension of the data
sigma = 0.02 # standard deviation for the additive noise

rng = MersenneTwister(1234)

dataset = noisy_three_curves( rng, nsignal, nnoise, sigma, dim)

plot(dataset, palette = :rainbow)
Example block output

Infinity symbol dataset

GeometricClusterAnalysis.infinity_symbolFunction
infinity_symbol(
    rng,
    nsignal,
    nnoise,
    σ,
    dimension,
    noise_min,
    noise_max
)
source
infinity_symbol(
    nsignal,
    nnoise,
    σ,
    dimension,
    noise_min,
    noise_max
)
source
infinity_symbol(
    npoints,
    α,
    σ,
    dimension,
    noise_min,
    noise_max
)
source
infinity_symbol(
    rng,
    npoints,
    α,
    σ,
    dimension,
    noise_min,
    noise_max
)
source
nsignal = 500
nnoise = 50
σ = 0.05
dimension = 3
noise_min = -5
noise_max = 5

dataset = infinity_symbol(rng, nsignal, nnoise, σ, dimension, noise_min, noise_max)

plot(dataset)
Example block output

Fourteen segments dataset

GeometricClusterAnalysis.noisy_fourteen_segmentsFunction
noisy_fourteen_segments(rng, nsignal, nnoise, σ, d)
  • nsignal : number of signal points
  • nnoise : number of additionnal outliers

sampled accordingly to generate noise signal points are $X = Y+Z$ with $Y$ uniform on the 14 segments $Z$ normal with mean 0 and covariance matrix $σ*I_d$ (with Id the identity matrix of $R^d$) So, d is the dimension of the data and σ, the standard deviation of the additive Gaussian noise. When ``d>2, Yi = 0$for$i>=2$; with the notation$Y=(Yi){i=1..d}``

source
noisy_fourteen_segments(rng, npoints, α, σ, d)
  • npoints : total number of points
  • α : fraction of outliers
source
noisy_fourteen_segments(npoints, α, σ, d)
source
noisy_fourteen_segments(nsignal, nnoise, σ, d)
source
using LinearAlgebra
nsignal = 490
nnoise = 200
d = 2
sigma = 0.02 .* Matrix(I, d, d)
dataset = noisy_fourteen_segments(nsignal, nnoise, sigma, d)
plot(dataset, aspect_ratio=1, palette = :lightrainbow)
Example block output

Two spirals dataset

GeometricClusterAnalysis.noisy_nested_spiralsFunction
noisy_nested_spirals(npoints, α, σ, dimension)
source
noisy_nested_spirals(nsignal, nnoise, σ, dimension)
source
noisy_nested_spirals(rng, npoints, α, σ, dimension)
source
noisy_nested_spirals(rng, nsignal, nnoise, σ, dimension)
  • nsignal : number of signal points
  • nnoise : number of additionnal outliers

Signal points are $x = y+z$ with

  • $y$ uniform on the two nested spirals
  • $z$ normal with mean 0 and covariance matrix $\sigma * I_d$ (with $I_d$ the identity matrix of $R^d$)

d is the dimension of the data and sigma, the standard deviation of the additive Gaussian noise. When $d>2, y_i = 0$ for $i>=2$; with the notation $y=(y_i)_{i=1..d}$

source
nsignal = 2000 # number of signal points
nnoise = 400   # number of outliers
dim = 2        # dimension of the data
σ = 0.5        # standard deviation for the additive noise
rng = MersenneTwister(1234)
dataset = noisy_nested_spirals(rng, nsignal, nnoise, σ, dim)
plot(dataset)
Example block output