Thursday, November 28, 2013

Bug fixing as knowledge creation

There are lots of ways you can think about bug-fixing: it is just a job that developers do; it is problem solving; etc. Here I want to take one particular viewpoint, that it is generating new knowledge about a software system.

One was to think about software is that it is the embodiment of a set of requirements, of how something should work. For example, Moodle can be thought of as a lot of knowledge about what software is required to teach online, and how that software should be designed. Finding and fixing bugs increases that pool of knowledge by identifying errors or omissions and then correcting them.

The bug fixing process

We can break down the process of discovering and fixing a bug into the following steps. This is really trying to break the process down as finely as possible. As you read this list, please think about what new knowledge is generated during each step.

  1. Something's wrong: We start from a state of blissful ignorance. We think our software works exactly as it should, and then some blighter comes along and tells us "Did you know that sometimes ... happens?" Not what you want to hear, but just knowing that there is a problem is vital. In fact the key moment is not when we are told about the problem, but when the user encountered it. Good users report the problems they encounter with an appropriate amount of detail
  2. Steps to reproduce: Knowing the problem exists is vital, but not a great place to start investigating. What you need to know is something like "Using Internet Explorer 9, if you are logged in as a student, are on this page, and then click that link then on the next page press that button, then you get this error." and that all the details there are relevant. This is called steps to reproduce. For some bugs they are trivial. For bugs that initially appear to be random, identifying the critical factors can be a major undertaking.
  3. Which code is broken: Once the developer can reliably trigger the bug, then it is possible to investigate. The first thing to work out is which bit of code is failing. That is, which lines in which file.
  4. What is going wrong: As well as locating the problem code, you also have to understand why it is misbehaving. Is it making some assumption that is not true? Is it misusing another bit of code? Is it mishandling certain unusual input values? ...
  5. How should it be fixed: Once the problem is understood, then you can plan the general approach to solving it. This may be obvious given the problem, but in some cases there is a choice of different ways you could fix it, and the best approach must be selected.
  6. Fix the code: Once you know how you will fix the bug, you need to write the specific code that embodies that fix. This is probably the bit that most people think of when you say bug-fixing, but it is just a tiny part.
  7. No unintended consequences: This could well be the hardest step. You have made a change which fixed the specific symptoms that were reported, but have you changed anything else? Sometimes a bug fix in one place will break other things, which must be avoided. This is a place where peer review, getting another developer to look at your proposed changes, is most likely to spot something you missed.
  8. How to test this change: Given the changes you made, what should be done to verify that the issue is fixed, and that nothing else has broken? You can start with the steps to reproduce. If you work through those, there should no longer be an error. Given the previous point, however, other parts of the system may also need to be tested, and those need to be identified.
  9. Verifying the fix works: Given the fixed software, and the information about what needs to be tested, then you actually need to perform those tests, and verify that everything works.

Some examples

In many cases you hardly notice some of the steps. For example, if the software always fails in a certain place with an informative error message, then that might jump you to step 4. To give a recent example: MDL-42863 was reported to me with this error message:

Error reading from database

Debug info: ERROR: relation "mdl_questions" does not exist

LINE 1: ...ECT count(1) FROM mdl_qtype_combined t1 LEFT JOIN mdl_questi...

SELECT count(1) FROM mdl_qtype_combined t1 LEFT JOIN mdl_questions t2 ON t1.questionid = t2.id WHERE t1.questionid <> $1 AND t2.id IS NULL

[array (0 => '0',]

Error code: dmlreadexception

Stack trace:

  • line 423 of /lib/dml/moodle_database.php: dml_read_exception thrown
  • line 248 of /lib/dml/pgsql_native_moodle_database.php: call to moodle_database->query_end()
  • line 764 of /lib/dml/pgsql_native_moodle_database.php: call to pgsql_native_moodle_database->query_end()
  • line 1397 of /lib/dml/moodle_database.php: call to pgsql_native_moodle_database->get_records_sql()
  • line 1470 of /lib/dml/moodle_database.php: call to moodle_database->get_record_sql()
  • line 1641 of /lib/dml/moodle_database.php: call to moodle_database->get_field_sql()
  • line 105 of /admin/tool/xmldb/actions/check_foreign_keys/check_foreign_keys.class.php: call to moodle_database->count_records_sql()
  • line 159 of /admin/tool/xmldb/actions/XMLDBCheckAction.class.php: call to check_foreign_keys->check_table()
  • line 69 of /admin/tool/xmldb/index.php: call to XMLDBCheckAction->invoke()

I have emboldened the key bit that says where the error is. Well, there are really two errors here. One is that the Combined question type add-on refers to mdl_questions when it should be mdl_question. The other is that the XMLDB check should not die with a fatal error if presented with bad input like this. The point is, this was all immediately obvious to me from the error message.

Another recent example at the other extreme is MDL-42880. There was no error message in this case, but presumably someone noticed that some of their quiz settings had changed unexpectedly (Step 1). Then John Hoopes, who reported the bug, had to do some careful investigation to work out what was going on (Step 2). I am glad he did, because it was pretty subtle thing, so in this case Step 2 was probably a lot of work. From there, it was obvious which bit of the code was broken (Step 3).

Note that Step 3 is not always obvious even when you have an error message. Sometimes things only blow up later as a consequence of something that went wrong before. To use an extreme example, if someone fills your kettle with petrol, instead of water, and then you turn it on to make some tea and it blows up. The error is not with turning the kettle on to make tea, but with filling it with petrol. If all you have is shrapnel, finding out how the petrol ended up in the kettle might be quite hard. (I have no idea why I dreamt up that particular analogy!)

MDL-42880 also shows the difference between the conceptual Steps 4 and 5, and the code-related Steps 3 and 6. I though the problem was with a certain variable becoming un-set at a certain time, so I coded a fix to ensure the value was never lost. That led to complex code that required a paragraph-long comment to try to explain it. Then I had a chat with Sam Marshall who suggested that in fact the problem was that another bit of code was relying on the value that variable, when actually the value was irrelevant. That lead to a simpler (hence better) fix: stop depending on the irrelevant value.

What does this mean for software?

There are a few obvious consequences that I want to mention here, although they are well known good practice. I am sure there are other more subtle ones.

First, you want the error messages output by your software to be as clear and informative as possible. They should lead you to where the problem actually occurred, rather than having symptoms only manifesting later. We don't want exploding kettles. There are some good examples of this in Moodle.

Second, because Step 7, ensuring that you have not broken anything else, is hard, it really pays to structure your software well. If you software is made up of separate modules that are each responsible for doing one thing, and which communicate in defined ways, then it is easier to know what the effect of changing a bit of one component is. If your software is a big tangle, who knows the effect of pulling one string.

Third, it really pays to engage with your users and get them to report bugs. Of course, you would like to find and fix all the bugs before you release the software, but that is impossible. For example, we are working towards a new release of the OU's Moodle platform at the start of December. We have had two professional testers testing it for a month, and a few select users doing various bits of ad-hoc testing. That adds up to less than 100 person days. On the day the software is released, probably 50,000 different users will log in. 50,000 user days, even by non-expert testers, are quite likely to find something that no-one else noticed.

What does this mean for users?

The more important consequences are for users, particularly of open-source software.

  • Reporting bugs (Step 1) is a valuable contribution. You are adding to the collective knowledge of the project.

There are, however, some caveats that follow from the fact that in many projects, the number of developers available to fix bugs is smaller than the number of users reporting bugs.

  • If you report a bug that was already reported, then someone will have to find the duplicate and link the two. Rather than being a useful contribution, this just wastes resources, so try hard to find any existing bug report, and add your information there, before creating a new one.
  • You can contribute more by reporting good steps to reproduce (Step 2). It does not require a developer to work those out, and if you can do it, then there is more chance that someone else will do the remaining work to fix the bug. On the other hand, there is something of a knack to working out and testing which factors are, or are not, significant in triggering a bug. The chances are that an experienced developer or tester can work out the steps to reproduce quicker than you could. If, however, all the experienced developers are busy then waiting for them to have time to investigate is probably slower than investigating yourself. If you are interested, you can develop your won diagnosis skills.
  • If you have an error message then copy and paste it exactly. It may be all the information you need to give to get straight to Step 3 or 4. In Moodle you can get a really detailed error message by setting 'debugging' to 'DEVELOPER' level, then triggering the bug again. (One of the craziest mis-features in Windows is that most error pop-ups do not let you copy-and-paste the message. Paraphrased error messages can be worse than useless.)

Finally, it is worth pointing out that Step 9 is another thing that can be done by the user, not a developer. For developers, it is really motivating when the person who reported the bug bothers to try it out and confirm that it works. This can be vital when the problem only occurs in an environment that the developer cannot easily replicate (for example an Oracle-specific bug in Moodle).

Conclusion

Thinking about bug finding and fixing as knowledge creation puts a more positive spin on the whole process than is normally the case. This shows that lots of people, not just developers and testers, have something useful to contribute. This is something that open source projects are particularly good at harnessing.

It also shows why it makes sense for an organisation like the Open University to participate in an open source community like Moodle: Bugs may be discovered before they harm our users. Other people may help diagnose the problem, and there is a large community of developers with whom we can discuss different possible solutions. Other people will help test our fixes, and can help us verify that they do not have unintended consequences.

Wednesday, July 3, 2013

Assessment in Higher Education conference 2013

Last week I attended the Assessment in Higher Education conference in Birmingham. This was the least technology and most education conference that I have been to. It was interesting to learn about the bigger picture of assessment in universities. One reason for going was that Sally Jordan wanted my help running a 'masterclass' about producing good computer-marked assessment on the first morning. I may write more about that in a future post. Also I presented a poster about all the different online assessment systems the OU uses. Again a possible future topic. For now I will summarise the other parts of the conference, the presentations I listed to.

One thing I was surprised to discover is how much the National Student Survey (NSS) is influencing what universities do. Clearly it is seen as something that prospective students pay attention to, and attracting students is important. However, as Margaret Price from Oxford Brookes University, the first keynote speaker said, the kind of assessment that students like (and so rate highly in NSS) is not necessarily the most effective educationally. That is, while student satisfaction is something worth considering, students don't have all the knowledge to evaluate the teaching they receive. Also, she suggested that the NSS ratings have made universities more risk-averse in trying innovative forms of assessment and teaching.

The opening keynote was about "Assessment literacy", making the case that students need to be taught a bit about how assessment works, so they can engage with it most effectively. That is, we want the students to be familiar with the mechanics of what they are being asked to do in assessment, so those mechanics don't get in the way of the learning; but more than that, we want the students to learn the most from all the tasks we set them, and assessment tasks are the ones students pay the most attention to, so we should help the students understand why they are being asked to do them. I dispute one thing the Margaret Price said. She said that at the moment, if assessment literacy is developed at all, that only happens serendipitously. However, in my time as a student, there were plenty of times when it was covered (although not by that name) in talks about study skill and exam technique.

Another interesting realisation during the conference was that, at least in that company (assessment experts), the "Assessment for learning" agenda is taken as a given. It is used as the reason that some things are done, but there is no debate that it is the right thing to do.

Something that is a hot topic at the moment is more authentic assessment. I think it is partly driven by technology improvements making it possible to capture a wider range of media, and to submit eportfolios. It is also driven by a desire for better pedagogy, and assessments that by their design make plagiarism harder. If you are being asked to apply what you have learned to something in your life (for example in a practice-based subject like nursing) it is much harder to copy from someone else.

I ended up going to all three of the talks given by OU folks. Is it really necessary to go to Birmingham to find out what is going on in the OU? Well, it was a good opportunity to do so. The first of these was about an on-going project to review the OU's assessment strategy across the board. So far a set of principles have been agreed (for example affirming the assessment for learning approach, athough that is nothing new at the OU) and they are about to be disseminated more widely. There was an interesting slide (which provoked some good discussion) pointing out that you need to balance top-down policy and strategy with bottom up implementation that allows each faculty use assessment that is effective for their particular discipline. There was another session by people from Ulster and Liverpool Hope universities that also talked about the top-down/bottom-up balance/conflict in policy changes.

In this OU talk, someone made a comment along the lines, "why is the OU re-thinking its assessment strategy? You are so far ahead of us already and we are still trying to catch up." I am familiar with hearing comments like that at education technology conferences. It was interested to learn that we are also held in similarly high for policy. The same questioner also used the great phrase "the OU effectively has a sleeper-cell in every other university, in the associate lecturer you employ". That makes what the OU does sound far more excitingly aggressive than it really is.

In the second OU talk, Janet Haresnape described a collaborative online activity in a third level environmental science course. These are hard to get right. I say that having suffered one as a student some years ago. This one seems to have been more successful, at least in part because it was carefully structured. Also, it started with some very easy tasks (put your name next to a picture and count some things in it), and the students could see the relationship between the slightly artificial task and what would happen in real fieldwork. Janet has been surveying and interviewing students to discover their attitudes towards this activity. The most interesting finding is that weaker students comment more, and more favourably, on the collaboration than the better students. They have more to learn from their peers.

The third OU talk was Sally Jordan talking about the ongoing change in the science faculty from summative to formative continuous assessment. It is early days, but they are starting to get some data to analyse. Nothing I can easily summarise here.

The closing keynote was about oral assessment. In some practice-based subjects like law and veterinary medicine it is an authentic activity. Also, a viva is a dialogue, which allows the extent of the student's knowledge to be probed more deeply than a written exam. With an exam script, you can only mark what is there. If something the student has written is not clear, then there is no way to probe that further. That reminded me of what we do in the Moodle quiz. For example in the STACK question type, if the student has made a syntax error in the equation they typed, we ask them to fix it before we try to grade it. Similarly, in Pattern-match questions, we spell check the student's answer and let them fix any errors before we try to grade it. Also, with all our interactive questions, if the student's first answer is wrong, we give them some feedback then let them try again. If they can correct their mistake themselves, then they get some partial credit. Of course computer-marked testing is typically used to assess basic knowledge and concepts, whereas an oral exam is a good way to test higher-order knowledge and understanding, but the parallel of enabling two-way dialogue between student and assessor appealed to me.

This post is getting ridiculously long, but I have to mention two other talks. Calum Delaney from Cardiff Metropolitan University reported on some very interesting work trying to understand what academics think about as they mark an essays. Some essays are easy to grade, and an experienced marker will rapidly decide on the grade. Others, particularly those that are partly right and partly wrong, take a lot longer weighing up the conflicting evidence. Overall though, the whole marking process struck me, a relative outsider, as scarily subjective.

John Kleeman, chair of QuestionMark, UK, summarised some psychology research that shows that the best way to learn something so that you can remember it again is to test yourself on it, rather than just reading it. That is, if you want to be able to remember something, then practice remembering it. It sounds obvious when you put it that way, but the point is that there is strong evidence to back up that statement. So, clearly you should all now go and create Moodle (or QuestionMark) quizzes for your students. Also, in writing this long rambling blog post I have been practising recalling all the interesting things I learned at the conference, so I should remember them better in future. If you read this far, thank you, and I hope you got something out of it too.

Monday, July 1, 2013

Open University question types ready for Moodle 2.5

This is just a brief note to say that Colin Chambers has now updated all the OU question types to work with Moodle 2.5. Note that we are not yet running this code ourselves on our live servers, since we are on Moodle 2.4 until the autumn, but Phil Butcher has tested them all and he is very thorough.

You can download all these question types (and others) from the Moodle add-ons database.

Thanks to Dan Poltawski's Github repository plugin, that is easier than it used to be. Still, updating 10 plugins is pretty dull, so I feel like I have contributed a bit. I also reviewed most of the changes and fixed the unit tests.

I hope you enjoy our add-ons. I am wondering whether we should add the drag-and-drop questions types to the standard Moodle release. What do you think? If that seems like a good idea to you, I suggest posting something enthusiastic in the Moodle quiz forum. It will be easier to justify adding these question types to standard Moodle if lots of non-OU Moodlers ask for it.

Friday, June 21, 2013

Book review: Computer Aided Assessment of Mathematics by Chris Sangwin

The book coverChris is the brains behind the STACK online assessment system for maths, and he has been thinking about how best to use computers in maths teaching for well over ten years. This book is the distillation what he has learned about the subject.

While the book focusses specifically on online maths assessment, it takes a very broad view of that topic. Chris starts by asking what we are really trying to achive when teaching and assessing maths, before considering how computers can help with that. There are broadly two areas of mathematics: solving problems and proving theorems. Computer assessment tools can cope with the former, where the student performs a calculation that the computer can check. Getting computers to teach the student to prove theorems is an outstanding research problem, which is touched on briefly at the end of the book.

So the bulk of the book is about how computers can help students master the parts of maths that are about performing calculations. As Chris says, learning and practising these routine techniques is the un-sexy part of maths education. It does not get talked about very much, but it is important for students to master these skills. Doing this requires several problems to be addressed. We want randomly generated questions, so we have to ask what it means for two maths questions to be basically the same, and equally difficult. We have to solve the problem of how students can type maths into the computer, since traditional mathematics notation is two dimensional, but it is easier to type a single line of characters. Chris precedes this with a fascinating digression into where modern maths notation came from, something I had not previously considered. It is more recent than you probably think.

Example of how STACK handles maths input

If we are going to get the computer to automatically assess mathematics, we have to work out what it is we are looking for in students' work. We also need to think about the outcomes we want, namely feedback for the student to help them learn; numerical grades to get a measure of how much the student has learned; and diagnostic output for the teacher, identifying which types of mistakes the students made, which may inform subsequent teaching decisions. Having discussed all the issues, Chris them brings them together by describing STACK. This is an opportune moment for me to add the dislaimer that I worked with Chris for much of 2012 to re-write STACK as a Moodle question type. That was one of the most enjoyable projects I have ever worked on, so I am probably biassed. If you are interested, you can try out a demo of STACK here.

Chris rounds off the book with a review of other computer-assissted assessment systems for maths that have notable features.

In summary, this is a facinating book for anyone who is interested in this topic. Computers will never replace teachers. They can only automate some of the more routine things that teachers do. (They can also be more available than teachers, making feedback on their work available to students even when the teacher is not around.) To automate anything via a computer you really have to understand that thing. Hence this book about computer-assessted assessment gives a range of great insights into maths education. Highly recommended. Buy it here!

Thursday, May 2, 2013

Performance-testing Moodle

Background

The Open University is moving from Moodle 2.3.x to Moodle 2.4.3 in June. As is usual with a major upgrade, we (that is Rod and Derek) did some load testing to see if it still runs fast enough on our servers.

The first results were spectacularly bad! Moodle 2.4 was ten times slower. We were expecting Moodle 2.4 to be faster than 2.3. The first step was easy.

Performance advice: if you are running Moodle 2.4 with load-balanced web servers, don't use the default caching option that stores the data in moodledata on a shared network drive. Use memcache instead.

Take 2 was a lot better. Moodle 2.4 was now only about 1.5 times slower. Still not good enough, but in the right ball park. This blog post is about what we did next, which was to use the tools Moodle provides to work out what was slow and fix it.

Moodle's profiling tool

When your software is too slow, you need measurements to tell you which are the slow bits. Tools that do that are called profilers. One of the better profiling tools for PHP is called XHProf. The good news is that it has already been integrated into Moodle, and there is documenation about getting it working. Basically, you just need to install a PHP extension and turn on some options under Admin -> Development -> Profiling.

Since we already had the necessary PHP extension on our severs, that was really easy. The option I chose was to profile a page when &PROFILEME was added to the end of the URL, but there are several ways to control it.

Profiling output

Once you have profiled a page, the results appear under Admin -> Development -> Profiling runs.

This just lists the runs you have done. You need to click through to see the details of one run. That looks like a big table of all the function that were called as part of rendering the page, how many times each one was called, and how much time each function was responsible for.

Inclusive time is the amount of time taken by that function, and all the other functions it called. Exclusive time is the time taken by that function itself. Some people, like sam, seem to like that tabular view. I am a more visual person, so I tend to click on the [View full callgraph] link. That produces a crazily big image, showing graphically which functions call which other functions, and how much time is spent in each one. Here is the image for the run we are looking at:

You can click for the full-sized image. The yellow and red highlighting is applied automatically to try to highlight places where a lot of time is being spent. Sometimes it is helpful. Sometimes not. The red box in the bottom right is where we do database queries. No suprise there. We know calling the database is one of the slowest things you can do in Moodle code. The other red box is fetching data from memcache, which also involves connecting to another server.

What you have to look for is somewhere on the diagram that makes you go "What! We are spending how much time doing that?! That's surely not necessary." In this case, my eye was drawn to the far right of the yellow chain. When viewing this small course, we are fetching the course format object 134 times, and doing that is accounting for about 9% of the page-load time. There is no way we need to do that.

Fixing the problem

Once you have identified what appears to be a gross inefficiency, then you have to fix it. Mostly that follows the normal Moodle bug-fixing mechanics, but it is worth saying a bit about the different approaches you could take to changing the code:

  1. You might work out that what is being done is unnecessary. Then you can just remove it. For example MDL-39452 or MDL-39449. This is the best case. We have both improved performance and simplified the code.
  2. The next option is to take an overview of the code, and re-organise it to be more sensible. For example, in the course format case, we should probably just get the course format object once, and then use it. However, that would be a big risky change, which I did not want to do at this time (just before the Moodle 2.5 release). This approach does, however, also have the potential to simplify the code while improving performance.
  3. The next option is some other sort of refactoring. For example get_plugin_list was getting called a lot, and it in turn was calling the generic function clean_param to validate something. clean_param is quite fast, but not when you call it a thousand times. Therefore, it was worth extracting a simpler is_valid_plugin_name function. Doing that (MDL-39445) reduced the page load time by about 2%, but did make the code slighly more complex. Still, that is a worth-while trade off.
  4. The last option is to add caching. If you are doing the same thing repeatedly, and it is slow, and you can't avoid doing it repeatedly, then remember the answer the first time you compute it, and reuse it later. This should be the option of last resort because caches definitely increase the code complexity, and if you forget to clear them when necessary you introduce bugs. However, as in the course formats example we are looking at they can make a big difference. This fix reduced page-load times by 8%.

So far, we have found nine speed-ups we can make to Moodle 2.4 in the core Moodle code, and about the same in OU plugins. That is probably a 10-20% speed-up on most pages. Some of those are new problems introduced in Moodle 2.4. Others have been there since Moodle 2.0. We could really benefit from more people looking at Moodle profiling output often, and that is why I wrote this article.

Monday, April 8, 2013

Do different media affect the effectiveness of teaching and learning

Here is some thirty-year-old research that still seems relevant today:

Richard E. Clark, 1983, "Reconsidering Research on Learning from Media", Review of Educational Research, Vol. 53, No. 4 (Winter, 1983), pp. 445-459.

This paper reviews the the seemingly endless research trying to ask whether teaching using Media X inherrently more effective than the same instruction in Media Y. Given the age of the paper, you will not suprised to learn that the research cited covers media like Radio for education (hot research topic in the 1950s), Television (1960s) and early computer-assessted assessment (1970s). Clark's earliest citation, however, is "since Thorndike (1912) recommended pictures as a labor saving device in instruction." Images as novel educational technology! Well, they were once. The point is  that basically the same reserach was done for each new media to come along, and it was all equally inconclusive.

Here are some choice quotes that nicely summarise the article:

Based on this consistent evidence, it seems reasonable to advise strongly against future media comparison research. Five decades of research suggest that there are no learning benefits to be gained from employing different media in instruction, regardless of their obviously attractive features or advertised superiority.

Where learning benefits are at issue, therefore, it is the method, aptitude, and task variables of  instruction that should be investigated.

The best current evidence is that media are mere vehicles that deliver instruction but do not influence student achievement any more than the truck that delivers our groceries causes changes in our nutrition

Clark does not miss out on the fact that effectiveness of the learning is the only problem in education:

Of course there are instructional problems other than learning that may be influenced by media (e.g., costs, distribution, the adequacy of different vehicles to carry different symbol systems, equity of access to instruction).

Since this paper is a thorough review of a lot of the available literature, it contains a number of other gems. For example:

Ksobiech (1976) told 60 undergraduates that televised and textual lessons were to be (a) evaluated, (b) entertainment, or (c) the subject of a test. The test group performed best on a subsequent test with the evaluation group scoring next best and the entertainment group demonstrating the poorest performance.

Hess and Tenezakis (1973) ... Among a number of interesting findings was an unanticipated attribution of more fairness to the computer than to the teacher.

I wonder how much later research fell it to the trap outlined in this paper. I am not familiar enough with the literature, but presumably there was lots of papers about the world-wide web, VLEs, social media, mobiles and tablets for education. I wonder how novel they really were?

Today, computers and the internet have made media cheaper to produce and more readily accessible than ever before. This removes many constraints on the instructional techniques available, but what this old paper is reminding us is that when it comes to teaching, it is not the media that matters, but the instructional design.