Improving performance

General tips

Preallocation

clear();

n = 10000000;

tic();
x = [];
for k = 1:n
  x(k) = sin(k);
end
toc(); % => Elapsed time is 5.311631 seconds.

clear x k;
tic();
x = zeros(1, n);
for k = 1:n
  x(k) = sin(k);
end
toc(); % => Elapsed time is 1.971644 seconds.

Vectorization

clear();
tic();
x = sin(1:n);
toc(); % => Elapsed time is 1.172200 seconds.

List of functions often used for vectorization

all
True if all elements of a vector are true (nonzero)
any
True if any elements of a vector are true (nonzero)
sum, cumsum
Sum and cumulative sum of elements in a vector
prod, cumprod
Sum and cumulative sum of elements in a vector
find
Find indices where the elements are nonzero
reshape
Reshape an array
repmat
Replicate an array
ind2sub, sub2ind
Convert between linear indices and subscripts

Example: Run-length decoding

First (ugly) implementation:

function y = runlengthdecode1(x, r)

y = [];
for k = 1:numel(x)
  y = [y x(k) + zeros(1, r(k))];
end
end
>> runlengthdecode1([42 0 3 7], [4 2 1 3])
ans =
    42    42    42    42     0     0     3     7     7     7

Preallocation:

function y = runlengthdecode2(x, r)

y = zeros(1, sum(r));

ind = 1;
for k = 1:numel(x)
  y(ind:(ind + r(k) - 1)) = x(k);
  ind = ind + r(k);
end
end

Vectorization:

function y = runlengthdecode3(x, r)

cR = cumsum(r);
inc = zeros(1, cR(end));
inc([1 cR(1:(end - 1)) + 1]) = 1;
ind = cumsum(inc);
y = x(ind);
end

Test:

function runlengthdecodetest(nx, maxR, nevals)

if nargin < 3
  nevals = 10000;
end

x = randperm(nx);
r = randi(maxR, size(x));

tic();
for k = 1:nevals
  y = runlengthdecode1(x, r);
end
toc();
clear k y;

tic();
for k = 1:nevals
  y = runlengthdecode2(x, r);
end
toc();
clear k y;

tic();
for k = 1:nevals
  y = runlengthdecode3(x, r);
end
toc();

end
>> runlengthdecodetest(200, 100)
Elapsed time is 6.191004 seconds.
Elapsed time is 3.630533 seconds.
Elapsed time is 1.209952 seconds.
>> runlengthdecodetest(1000, 20)
Elapsed time is 27.902129 seconds.
Elapsed time is 15.252254 seconds.
Elapsed time is 1.496761 seconds.
>> runlengthdecodetest(2000, 10)
Elapsed time is 52.159993 seconds.
Elapsed time is 29.201567 seconds.
Elapsed time is 1.585380 seconds.

Profiling