Tuesday, August 02, 2005

Making Programs Faster

This afternoon I'm in Mark-Jason Dominus' tutorial Making Programs Faster. He started off by saying that the main point of this class is performance tuning and profiling tools, and that he's going to talk about how to build custom profiling tools from Perl's darkest recesses, but apparently,
Performance tuning is really, really, hard...
and the main theme in performance tuning is that nobody guesses right, so guessing doesn't work, which is a bit of surprise because there is a lot of programming that you can get through thorough flying by the seat of your pants. You have to be scientific and methodical.

MJD then went on to talk about the Schwartzian Transform as a classic example of optimising things,
@sorted_names = 
map { $_ ->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, -M $_ }
readdir D;
as opposed to,
sort { -M $b <=> -M $a } readdir D;
where the former runs about 50% faster than the latter...

The goal of the Schwartzian Transform is to reduce the amount of system time, although at the expense of spending some extra user time.
Optimisation, like any engineering, is a trade off...
However, you have to be careful about what you optimise, because you might even up making things slower. Always remember to ask what you spend and what you get in return.

Wall clock time is the most understandable way to measure performance, but measuring wallclock time directly is really hard due to pre-emptive multitasking. So although the amount of CPU time used won't vary too much between runs, the amount of wall clock time can vary enormously. Because of this we normally concentrate on measuring CPU time because we can, rather than because we want to...

For CPU bound programs if you can reduce the amount of computation carried out then you can significantly increase the user experience. But what about programs which are IO bound? You have to address the network latency time, there isn't any point in reducing the amount of computation is doomed to failure. What to do instead? The only real solution is to parallelise the IO. The final case of memory bound programs, where the run time is dominated by swapping time, are probably the hardest, here the only way to optimise things is to reduce the amount of memory your code uses, which isn't always that easy.

So how do you measure time? Well of course most Unix systems come with a time command, and Perl has its own built in time( ) function. But since Perl 5.7.2 Perl has come with the Time::HiRes module,
use LWP::Simple 'get';
use Time::HiRes 'time';

my $url = shift;
my $start = time( );
my $doc = get( $url );
my $elapsed = time( ) - $start;
print "$elapsed second(s) elapsed.\n";

1.49982798099518 second(s) elapsed.
Time::HiRes also provides high-resolution versions of sleep and alarm.

There is also the built-in times( ) function which measures CPU time,
($u, $s, $cu, $cs) = times( );
which can form the basis of a simple benchmarking program although the benchmarking apparatus itself can bias your results and you need to introduce null tests. So you might be better advised to use Perl's own benchmarking module called Benchmark which runs its own null tests. However MJD doesn't use Benchmark.pm anymore because,
The numbers are all over the place...
Heisenberg said that it's impossible to measure something without altering the measurement, and that's certainly true of benchmarking. Every benchmark introduces some bias into the thing it measures, you can either try and minimise this or try and quantify it and correct for it, which is the Benchmark.pm approach. This doesn't always work well...

Profilers are really interesting, they allow you approach optimisation scientifically rather than by guesswork. They will tell you which parts of the program contribute the most to the run time. If the program is CPU-bound, you should use a profiler and find out which bits are slowest and then concentrate on those. Although of course you should always try and figure out whether it would just be cheaper to throw hardware at the problem.

Perl comes with its own profiler called Devel::DProf and a post-processor called dprofpp so that you can use to poke around inside your code, including XS,
although apparently we're going to learn how to write our own profilers later on in the tutorial. But right now, it's coffee time...

Update: Turns out Brad was sitting in the back and blogging the tutorial as well, and I hadn't realised till we broke for coffee...

Update: ...and we're back after coffee, chocolate and popcorn.

We start off with the 90/10 rule, which says that 10% of your code accounts of 90% of the run time, and if anything MJD says that this may be too conservative. So you should optimise the program concentrating on the the crucial 10%, but it's vitally important to test your optimised program to make sure you still get the same answer as before. Oddly enough this is a step which many people miss out, although of course optimisation may fix an outstanding bug, which can become confusing.

The other thing to bear in mind is that you have to know when to stop, you'll reach a point where it's just not worth the effort to optimise any further. People waste a tremendous amount of time on performance improvements that aren't actually worthwhile, a lot of time, buying new hardware is a lot more cost effective and more likely to be successful.

At this point MJD went off and talked us through the optimisation of perldoc, which was a useful real world example of how to go about this stuff...

Moving onwards, another handy profiling tool is Devel::SmallProf which measures the contribution perl line rather than per subroutine like Devel::DProf, although the output format is somewhat opaque, and MJD claims it's actually easier to write your own profiler using the hooks intended for the Perl debugger. If your module is called Devel::* and it has a DB::DB( ) function then you get access to various magic variables...

At this point MJD talked us through how you can do this, and it doesn't actually look too bad, and he's right the output from his simple profiler is a lot easier to understand. Who'd have thought it?

He closed the tutorial by talking about optimisation blunders, using pseudo-hashes as his main example. Ouch!

Update: Brad had this say about the tutorial post-coffee break...