Thursday, May 2, 2013

Performance-testing Moodle

Background

The Open University is moving from Moodle 2.3.x to Moodle 2.4.3 in June. As is usual with a major upgrade, we (that is Rod and Derek) did some load testing to see if it still runs fast enough on our servers.

The first results were spectacularly bad! Moodle 2.4 was ten times slower. We were expecting Moodle 2.4 to be faster than 2.3. The first step was easy.

Performance advice: if you are running Moodle 2.4 with load-balanced web servers, don't use the default caching option that stores the data in moodledata on a shared network drive. Use memcache instead.

Take 2 was a lot better. Moodle 2.4 was now only about 1.5 times slower. Still not good enough, but in the right ball park. This blog post is about what we did next, which was to use the tools Moodle provides to work out what was slow and fix it.

Moodle's profiling tool

When your software is too slow, you need measurements to tell you which are the slow bits. Tools that do that are called profilers. One of the better profiling tools for PHP is called XHProf. The good news is that it has already been integrated into Moodle, and there is documenation about getting it working. Basically, you just need to install a PHP extension and turn on some options under Admin -> Development -> Profiling.

Since we already had the necessary PHP extension on our severs, that was really easy. The option I chose was to profile a page when &PROFILEME was added to the end of the URL, but there are several ways to control it.

Profiling output

Once you have profiled a page, the results appear under Admin -> Development -> Profiling runs.

This just lists the runs you have done. You need to click through to see the details of one run. That looks like a big table of all the function that were called as part of rendering the page, how many times each one was called, and how much time each function was responsible for.

Inclusive time is the amount of time taken by that function, and all the other functions it called. Exclusive time is the time taken by that function itself. Some people, like sam, seem to like that tabular view. I am a more visual person, so I tend to click on the [View full callgraph] link. That produces a crazily big image, showing graphically which functions call which other functions, and how much time is spent in each one. Here is the image for the run we are looking at:

You can click for the full-sized image. The yellow and red highlighting is applied automatically to try to highlight places where a lot of time is being spent. Sometimes it is helpful. Sometimes not. The red box in the bottom right is where we do database queries. No suprise there. We know calling the database is one of the slowest things you can do in Moodle code. The other red box is fetching data from memcache, which also involves connecting to another server.

What you have to look for is somewhere on the diagram that makes you go "What! We are spending how much time doing that?! That's surely not necessary." In this case, my eye was drawn to the far right of the yellow chain. When viewing this small course, we are fetching the course format object 134 times, and doing that is accounting for about 9% of the page-load time. There is no way we need to do that.

Fixing the problem

Once you have identified what appears to be a gross inefficiency, then you have to fix it. Mostly that follows the normal Moodle bug-fixing mechanics, but it is worth saying a bit about the different approaches you could take to changing the code:

  1. You might work out that what is being done is unnecessary. Then you can just remove it. For example MDL-39452 or MDL-39449. This is the best case. We have both improved performance and simplified the code.
  2. The next option is to take an overview of the code, and re-organise it to be more sensible. For example, in the course format case, we should probably just get the course format object once, and then use it. However, that would be a big risky change, which I did not want to do at this time (just before the Moodle 2.5 release). This approach does, however, also have the potential to simplify the code while improving performance.
  3. The next option is some other sort of refactoring. For example get_plugin_list was getting called a lot, and it in turn was calling the generic function clean_param to validate something. clean_param is quite fast, but not when you call it a thousand times. Therefore, it was worth extracting a simpler is_valid_plugin_name function. Doing that (MDL-39445) reduced the page load time by about 2%, but did make the code slighly more complex. Still, that is a worth-while trade off.
  4. The last option is to add caching. If you are doing the same thing repeatedly, and it is slow, and you can't avoid doing it repeatedly, then remember the answer the first time you compute it, and reuse it later. This should be the option of last resort because caches definitely increase the code complexity, and if you forget to clear them when necessary you introduce bugs. However, as in the course formats example we are looking at they can make a big difference. This fix reduced page-load times by 8%.

So far, we have found nine speed-ups we can make to Moodle 2.4 in the core Moodle code, and about the same in OU plugins. That is probably a 10-20% speed-up on most pages. Some of those are new problems introduced in Moodle 2.4. Others have been there since Moodle 2.0. We could really benefit from more people looking at Moodle profiling output often, and that is why I wrote this article.