StaticArrays.jl

Consider allocating on the stack for small fixed-size vector/matrix operations

using DifferentialEquations, BenchmarkTools, Plots

function lorenz(u,p,t)
 dx = 10.0*(u[2]-u[1])
 dy = u[1]*(28.0-u[3]) - u[2]
 dz = u[1]*u[2] - (8/3)*u[3]
 [dx,dy,dz]
end
lorenz (generic function with 1 method)
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz,u0,tspan)
sol = solve(prob,Tsit5())
plot(sol,vars=(1,2,3))
┌ Warning: To maintain consistency with solution indexing, keyword argument vars will be removed in a future version. Please use keyword argument idxs instead.
│   caller = ip:0x0
└ @ Core :-1
@benchmark solve(prob,Tsit5())
BenchmarkTools.Trial: 1316 samples with 1 evaluation.
 Range (minmax):  3.215 ms  8.295 ms   GC (min … max):  0.00% … 37.00%
 Time  (median):     3.333 ms                GC (median):     0.00%
 Time  (mean ± σ):   3.797 ms ± 996.074 μs   GC (mean ± σ):  11.78% ± 16.32%
  ▄█▆▂                                          ▁▁▁▂▂▁▁▁      
  ████▆▄▅▅▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▆████████████▆▆ █
  3.21 ms      Histogram: log(frequency) by time      6.25 ms <
 Memory estimate: 7.82 MiB, allocs estimate: 101102.

function lorenz!(du,u,p,t)
 du[1] = 10.0*(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end
lorenz! (generic function with 1 method)
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz!,u0,tspan)
@benchmark solve(prob,Tsit5())
BenchmarkTools.Trial: 7164 samples with 1 evaluation.
 Range (minmax):  607.872 μs  8.882 ms   GC (min … max): 0.00% … 36.12%
 Time  (median):     625.756 μs                GC (median):    0.00%
 Time  (mean ± σ):   695.312 μs ± 406.628 μs   GC (mean ± σ):  8.21% ± 11.61%
  ▁                                                          ▁
  █▆▅▄▆▇▇▇▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▇▇█ █
  608 μs        Histogram: log(frequency) by time       3.22 ms <
 Memory estimate: 996.23 KiB, allocs estimate: 11416.

StaticArray is statically-sized (known at compile time) and thus its accesses are quick. Additionally, the exact block of memory is known in advance by the compiler, and thus re-using the memory is cheap. This means that allocating on the stack has essentially no cost!

using StaticArrays

function lorenz_static(u,p,t)
 dx = 10.0*(u[2]-u[1])
 dy = u[1]*(28.0-u[3]) - u[2]
 dz = u[1]*u[2] - (8/3)*u[3]
 @SVector [dx,dy,dz]
end
lorenz_static (generic function with 1 method)
u0 = @SVector [1.0,0.0,0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz_static,u0,tspan)
@benchmark solve(prob,Tsit5())
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  249.812 μs 1.460 ms   GC (min … max): 0.00% … 67.77%
 Time  (median):     253.339 μs               GC (median):    0.00%
 Time  (mean ± σ):   263.418 μs ± 74.241 μs   GC (mean ± σ):  2.42% ±  6.62%
  ▆▅▂▁▃▁                                                     ▁
  ███████▆▆▆▇▇▅▅▆▅▄▅▅▆▅▁▄▃▃▄▃▁▃▁▃▃▁▃▃▁▃▁▁▁▃▁▁▄▃▁▄▆▇▇█▇▆▅▅▆▅▃ █
  250 μs        Histogram: log(frequency) by time       398 μs <
 Memory estimate: 387.30 KiB, allocs estimate: 1293.