Vectorized operations

f(x) = 3x.^2 + 4x + 7x.^3;

fdot(x) = @. 3x^2 + 4x + 7x^3; # = 3 .* x.^2 .+ 4 .* x .+ 7 .* x.^3

Both f and fdot compute the same thing.

x = rand(10^6);
f(x) # warmup
@time f(x);
  0.016093 seconds (12 allocations: 45.777 MiB, 39.86% gc time)
fdot(x) # warmup
@time fdot(x);
  0.000559 seconds (2 allocations: 7.629 MiB)
f.(x) # warmup
@time f.(x);
  0.001560 seconds (4 allocations: 7.629 MiB)

fdot(x) is faster and allocates less memory, because each * and + operation in f(x) allocates a new temporary array and executes in a separate loop.

Consider using views for slices

let

   N = 50_000_000
   a = 1.2
   x = rand(Float64, N)
   y = rand(Float64, N)
   
   nn = 100
   n_start = 1 + nn
   n_end = N - nn
   
   # timing
   @time @. y[n_start:n_end] += a * x[n_start:n_end];

   # timing
   @time @. @views y[n_start:n_end] += a * x[n_start:n_end];

   nothing

end
  0.159749 seconds (4 allocations: 762.936 MiB, 2.57% gc time)
  0.044672 seconds

Copy irregularly-accessed data into a contiguous array before operating on it

using Random

x = randn(1_000_000);

inds = shuffle(1:1_000_000)[1:800000];

A = randn(50, 1_000_000);

xtmp = zeros(800_000);
Atmp = zeros(50, 800_000);

@time sum(view(A, :, inds) * view(x, inds))
@time sum(view(A, :, inds) * view(x, inds))
  0.280936 seconds (214.16 k allocations: 13.800 MiB, 42.26% compilation time)
  0.162002 seconds (5 allocations: 624 bytes)
6642.838110703082

Irregular access patterns and non-contiguous views can drastically slow down computations on arrays because of non-sequential memory access. Copying the views into plain arrays speeds up the multiplication even with the cost of the copying operation.


@time begin
    copyto!(xtmp, view(x, inds))
    copyto!(Atmp, view(A, :, inds))
    sum(Atmp * xtmp)
end
  0.302451 seconds (209.90 k allocations: 14.376 MiB, 42.12% compilation time)
6642.838110702842
@time begin
    copyto!(xtmp, view(x, inds))
    copyto!(Atmp, view(A, :, inds))
    sum(Atmp * xtmp)
end
  0.174616 seconds (5 allocations: 624 bytes)
6642.838110702842