Showing posts with label Quizzes. Show all posts
Showing posts with label Quizzes. Show all posts

Thursday, March 2, 2017

The online assessment Turing test

In the back-channel for yesterday's Transforming Assessment webinar (which I would recommend) Geoff Crisp asked me:

"Tim - what about the Turing test - what if a student could not tell the difference between a computer giving them feedback and the teacher?"

I think this is a really nice question. Food for some quite wide-ranging thoughts about what online assessment should be.

On the whole, I stand my the snap answer I gave at the time. Computers and human markers (at least currently) have different strengths. The computer (having been set up by a human teacher) can be there at any time the student wants, able to give immediate feedback on a range of more or less basic practice activities. A human teacher in only available at certain times, but is able to give feedback in a more holistic way. They may know the student, and have some concept about how their subject is best learned, on on that bases give the student some really meaningful advice about how best to improve.

I know there is adaptive learning hype about computers being able to know the students and therefore offer contextual advice, but I will believe that when (if) I see it.

If you are thinking about designing a course today, you are much better off understanding the strengths and weaknesses of both computer-marked and conventional assessment, and using each for where they work best. There is currently nothing to be gained by trying to hide where you are using computer marking.

I think a reasonable analogy is with searching for information. You might do a Google search, which will give you the kind of results that a computer can give. Alternatively, you could ask a friend who knows more about the subject, and they will give you a different sort of advice about what to read. Neither is necessarily better. In some cases one of the two approaches might be clearly more appropriate. In other cases either would do. If you really want to understand something in depth, you probably want use both approaches, and it is an advantage that each will give different results that help in different ways.

If we are trying to create self-regulating learners, then it can be a merit that a computer only gives basic templated feedback, which could be as little as just right/wrong. The learner needs to do more work themself to get from the feedback to an action to take to improve. This is not always a benefit, but it could be.

So, while the idea of an assessment Turing test is usefully thought provoking, I don't think it is educationally useful, at least not for the foreseeable future. Having said that, the nicest thing anyone said about an online assessment system I helped build is still

"It's like having a tutor at your elbow."

The key word there is "like", which is not the same as "indistinguisable from".

Thursday, June 25, 2015

The Assessment in Higher Education conference 2015

I am writing this on a sunny evening, sitting in a pub overlooking Old Turn Junction, part of the Birmingham Canal Navigations, with a well-earned beer after two fascinating and exhausting days at the Assessment in Higher Education conference.

It was a lovely conference. The organising committee had set out to try to make it friendly and welcoming and they succeeded. There was a huge range of interesting talks and since I could not clone myself I was not able to go to them all. I am not going to describe individual talks in detail, but rather draw out what seemed to me to be the common themes.

A. It is all just assessment

The first keynote speaker (Maddalena Taras) said this directly, and there were a couple of other things along the same lines: the split between formative and summative assessment is a false dichotomy. If an assessment does not actually evaluate the students (give them a grade, hence summative) then it misses the main function of an assessment. This is not the same as saying that every assessment must be high stakes. Conversely, in the words of a quote Sally reminded me of:

“As I have noted, summative assessment is itself ‘formative’. It cannot help but be formative. That is not an issue. At issue is whether that formative potential of summative assessment is lethal or emancipatory. Does formative assessment exert its power to discipline and control, a power so possibly lethal that the student may be wounded for life? … Or, to the contrary, does summative assessment allow itself to be conquered by the student, who takes up a positive, even belligerent stance towards it, determined to extract every human possibility that it affords?” (Boud & Falchikov (2007) Rethinking Assessment in Higher Education: Learning for the Longer Term)

The first keynote was a critique of Assessment for Learning (AfL). Not that assessment should not help students learn. Of course it should. Rather, the speaker questioned some of the specific recommendations from the AfL literature in a thought-provoking way.

The 'couple of other things' were a talk from Jill Barber of School of Pharmacy at Birmingham, about giving students quite detailed feedback after their end of year exams; and Sally Jordan’s talk (which I did not go to since I have heard it internally at the OU) about the OU Science faculty's semantic wranglings about whether all their assessment gets called “summative” or “formative”, and hence how the marks for the separate assignments are added up, without changing what the assessed tasks actually are.

B. Do students actually attend to feedback?

The second main theme came out many times. On the one hand, students say they like feedback and demand more of it. On the other hand, there is quite a lot of evidence that many students don’t spend much time reading it, or that when they do, it does not necessarily help them to improve. So, there were various approaches suggested for getting students to engage more with feedback, for example by

  • giving feedback via a screen-cast video, talking them through their essay highlighting with the mouse (David Wright & Damian Kell, Manchester Metropolitan University). Would students spend 19 minutes reading and digestion written feedback on an essay? Well, they got a 19 minute (on average) video - one of the few cases where some students thought it was too much feedback!
  • making feedback a dialogue. That is, encouraging students to write questions on the cover sheet when they hand the work in, for their tutor to answer as part of the feedback. That was what Rebecca Westrup from the University of East Anglia was doing.
  • Stefanie Sinclair from the OU religious studies department talked about work she had one with John Butcher & Anactoria Clarke assessing reflection in an access module (a module to designed to help students with limited prior education to develop the skills they need to study at Level 1). Again, this was to encourage students to engage in a dialogue with their tutor about their learning.
  • Using peer and self assessment, so that students spend more time engaging with the assessment criteria by applying them to their own and other’s work. Also the suggestion from Maddalena Taras was that initially you give the student’s work back without the marks or feedback (but after a couple of weeks of marking) so that they read it with fresh eyes before they get the feedback (first) then the marks.
  • There was another peer assessment talk, by Blazenka Divjak of the University of Zagreb, using the Moodle Workshop tool. The results were along the same lines as other similar talks I have seen (for example at the OU where we are also experimenting with the same tool). Peer assessment activities do help students understand the assessment criteria. It helps them appreciate what teachers do more. Students’ grading of their peers, particularly in aggregate, is reliable, and comparable to the teacher’s grade.
  • A case of automated marking (in this case of programming exercises) where students clearly did engage with the feedback because they were allowed to submit repeatedly until they got it right. In computer programming this is authentic. It is what I do when doing Moodle development. (Stephen Nutbrown, Su Beesley, Colin Higgins, University of Nottingham and Nottingham Trent University.)
  • It was also something Sally touched on in her part of our talk. With the OU's computer-marked questions with multiple tries, students say the feedback helps them learn and that they like it. However, if you look at the data or usability lab observations, you see that in some cases some students are clearly paying not attention to the feedback they get.

C. The extent to which transparency in assessment is desirable

This was the main theme of the closing keynote by Jo-Anne Baird from the Oxford University Centre for Educational Assessment. The proposition is that if assessment is not transparent enough, it is unfair because students don’t really understand what is expected of them. A lot of university assessment is probably towards this end of the spectrum.

Conversely, if assessment is too transparent it encourages pathological teaching to the test. This is probably where most school assessment is right now, and it is exacerbated by the excessive ways school exams are made hight stakes, for the student, the teacher and the school. Too much transparency (and risk averseness) in setting assessment can lead to exams that are too predicable, hence students can get a good mark by studying just those things that are likely to be on the exam. This damages validity, and more importantly damages education.

Between these extremes there is a desirable balance where students are given enough information about what is required of them to enable them to develop as knowledgable and independent learners, without causing pathological behaviour. That, at least, is the hope.

While this was the focus of the last keynote, it resonated with several of the talks I listed in the previous section.

D. The NSS & other acronyms

The National Student Survey (NSS) is clearly a driver for change initiatives at a lot of other universities (as it was two years ago). It is, or at least it is perceived to be a, big deal. Therefore it can be used as a catalyst or leaver to get people to review and change their assessment practices since feedback and assessment is something that students often give low ratings for. This struck me as odd, since I am not aware of this happening at the OU. I assume that is because the OU has so far scored highly in the NSS.

The other acronym floating around a lot was TESTA. This seems to be a framework for reviewing the assessment practice of a whole department or degree programme. In one case, however (a talk by Jessica Evans & Simon Bromley of the OU faculty of Social Science) their review was done before TESTA was invented, though along similar lines.

Finally

A big thank-you to Sue Bloxham and the rest of the organising team for putting together a great conference. Roll on 2017.

Friday, May 1, 2015

eSTEeM conference 2015

eSTEeM is an organising group within the Open University which brings together people doing research into teaching and learning in the STEM disciplines, Science, Technology, Engineering and Maths. Naturally enough for the OU, a lot of that work revolves around educational technology. Once a year they have an annual conference for people to share what they have been doing. I went along because I like to see what people have been doing with our VLE, and hence how we could make it work better for students and staff in the future.

It started promisingly enough in a way. As I walked in to get my cup of coffee after registration, I was immediately grabbed by Elaine Moore from Chemistry who had two Moodle Quiz issues. She wanted the Combined question type to use the HTML editor for multiple choice choices (good idea, we should put that on the backlog) and a problem with a Pattern-match questions which we could not get to the bottom of over coffee.

But, on to the conference itself. I cannot possibly cover all the keynotes and parallel sessions so I will pick the highlights for me.

Assessment matters to students

The first was a graph from Linda Price’s keynote. Like most universities, at the end of every module we give have a student satisfaction survey. The graph showed the student's ratings in response to three of the questions:

  • Overall, I am satisfied with the quality of this module.
  • I had a clear understanding of what was required to complete the assessed activities.
  • The assessment activities supported my learning.

There was an extremely strong correlation between those. This is nothing very new. We know that assessment is important in determining the ‘hidden curriculum’, and hence we like to think that ‘authentic assessment’ is important. However, it is interesting to see how much this matters this is to students. Previously, I would not even have been sure that they could tell the difference.

The purpose of education

Into the parallel sessions. There was an interesting talk from the module team for TU100 my digital life, the first course in the computing and technology degrees. Some of the things they do in that module’s teaching is based around the importance of language, even in science. Learning a subject can be thought of as learning to construct the world of that subject through language, or as they put it, humanities style thinking in technology education. Unsurprisingly, many students don’t like that “I came to learn computing, not writing.” However, there is a strong correlation between students language use and their performance in assessments. By the end of the module some students do come to appreciate what the module is trying to do.

This talk triggered a link to back to another part of Linda Price’s keynote. An important (if now rather cliched question) for formal education is “What is education for everything is now available on the web?” (or one might put that more crudely as “Why should students pay thousands of pounds for one of our degrees?”). The answer that came to me during this talk was “To make them do things they don’t enjoy, because we know it will do them good.” OK, so that is a joke, but I would like to think there is a nugget of truth in there.

Peer assessment

On to more specifically Moodle-related things. A number of modules have been trying out Moodle’s Workshop activity. That is a tool for peer review or peer assessment. The talk was from the SD815 Contemporary issues in brain and behaviour module team. Their activity involved students recording a presentation (PowerPoint + audio) that critically evaluated a research article. Then they had to upload them to the Moodle Workshop, and review each others presentations as managed by the tool. Finally, they had to take their slide-cast, the feedback they had received, and a reflective note on the process and what they had learned from it, and hand it all in to be graded by their tutor.

Now for OU students (at least) collaborative activities, particularly those tied to assessments, are typically another thing we make them do that they don’t enjoy. This activity added the complexities of PowerPoint and/or Open Office and recording audio. However, it seems to have worked remarkably well. Students appreciated all the things that are normally said about peer review: getting to see other approaches to the same task; practising the skills of evaluating others’ work and giving constructive feedback. In this case the task was one that the students (healthcare workers studying at postgraduate level) could see was relevant to their vocation, which brings us back to visibly authentic assessment, and the student satisfaction graph from the opening keynote.

For me the strongest message from this talk, however, is what was not said. There was very little said about the Moodle workshop tool, beyond a few screen-grabs to show what it looked like. It seems that this is a tool that does what you need it to do without getting in the way, which is normally what you want from educational technology.

Skipping briefly over

There are many more interesting things I could write about in detail, but to keep this post to a reasonable length I will just skim over the posters with lunch. For example,

And, some of the other talks:

  • a session on learning analytics, in this case with a neural net, to try to identify early on those students (on TU100 again) who get through all the continuous assessment tasks with a passing grade, only to fail the end of module assessment, so that they could be targeted for extra support.
  • a whole morning on the second day, where we saw nine different approaches to remote experiments from around the world. For example, the Open University's remote control telescope PIRATE. I was left me with the impression that this sort of thing is much more feasible and worthwhile than I had previously thought.

Our session on online Quizzes

The only other session I will talk about in detail is the one I helped run. It was a ‘structured discussion’ about the OU’s use of iCMAs (which is what we call Moodle quizzes). I found this surprisingly nerve-wracking. I have given plenty of talks before, and you prepare them. You know what you are going to say, and you are fairly sure it is interesting. Therefore you are pretty sure what is going to happen. For this session, we just had three questions, and it was really up to the attendees how well it worked.

We did allow ourselves two five-minute presentations. We started with Frances Chetwynd showing some the different ways quizzes are used in modules’ teaching and assessment strategies. This set up a 10-minute discussion of our first question: “How are iCMAs best be used as part of an assessment strategy?”. For this, delegates were seated around four tables, with four of five participants and a facilitator to each table. The tables were covered with flip-chart paper for people to write on.

We were using a World Café format, so after 10 minutes I rang my bell, and all the delegates move to a new table while the facilitators stayed put. Then, in new groups, they discussed the second question: "How can we engage students using iCMAs?" The facilitators were meant to make a brief bridge between what had been said in the previous group at their table, before moving on to the new question with the new group.

After 10 minutes on the second question, we had the other five-minute talk from Sally Jordan, showing some examples of what we have previously learned through scholarship into how iCMAs work in practice. (If you are interested in that, come to my talk at either MoodleMoot IE UK 2015 or iMoot 2015). This lead nicely, after one more round of musical chairs, to the third question: "Where next for iCMAs? Where next for iCMA scholarship?". Finally we wrapped up with a brief plenary to capture they key answers to that last question from each table.

By the end, I really had no idea how well it had gone, although each time I rang my bell, I felt I was interrupting really good conversations. Subsequently, I have written up the notes from each table, and heard from some of the attendees that they had found it useful and interesting, so that is a relief. We had a great team of facilitators (Frances, Jon, Ingrid, Anna) which helped. I would certainly consider using the the same format again. With a traditional presentation, you are always left with the worry that perhaps you got more out of preparing and delivering the presentation than any of the audience did out of listening. In this case, I am sure the audience got much more out of it than me, which is no bad thing.

Wednesday, July 3, 2013

Assessment in Higher Education conference 2013

Last week I attended the Assessment in Higher Education conference in Birmingham. This was the least technology and most education conference that I have been to. It was interesting to learn about the bigger picture of assessment in universities. One reason for going was that Sally Jordan wanted my help running a 'masterclass' about producing good computer-marked assessment on the first morning. I may write more about that in a future post. Also I presented a poster about all the different online assessment systems the OU uses. Again a possible future topic. For now I will summarise the other parts of the conference, the presentations I listed to.

One thing I was surprised to discover is how much the National Student Survey (NSS) is influencing what universities do. Clearly it is seen as something that prospective students pay attention to, and attracting students is important. However, as Margaret Price from Oxford Brookes University, the first keynote speaker said, the kind of assessment that students like (and so rate highly in NSS) is not necessarily the most effective educationally. That is, while student satisfaction is something worth considering, students don't have all the knowledge to evaluate the teaching they receive. Also, she suggested that the NSS ratings have made universities more risk-averse in trying innovative forms of assessment and teaching.

The opening keynote was about "Assessment literacy", making the case that students need to be taught a bit about how assessment works, so they can engage with it most effectively. That is, we want the students to be familiar with the mechanics of what they are being asked to do in assessment, so those mechanics don't get in the way of the learning; but more than that, we want the students to learn the most from all the tasks we set them, and assessment tasks are the ones students pay the most attention to, so we should help the students understand why they are being asked to do them. I dispute one thing the Margaret Price said. She said that at the moment, if assessment literacy is developed at all, that only happens serendipitously. However, in my time as a student, there were plenty of times when it was covered (although not by that name) in talks about study skill and exam technique.

Another interesting realisation during the conference was that, at least in that company (assessment experts), the "Assessment for learning" agenda is taken as a given. It is used as the reason that some things are done, but there is no debate that it is the right thing to do.

Something that is a hot topic at the moment is more authentic assessment. I think it is partly driven by technology improvements making it possible to capture a wider range of media, and to submit eportfolios. It is also driven by a desire for better pedagogy, and assessments that by their design make plagiarism harder. If you are being asked to apply what you have learned to something in your life (for example in a practice-based subject like nursing) it is much harder to copy from someone else.

I ended up going to all three of the talks given by OU folks. Is it really necessary to go to Birmingham to find out what is going on in the OU? Well, it was a good opportunity to do so. The first of these was about an on-going project to review the OU's assessment strategy across the board. So far a set of principles have been agreed (for example affirming the assessment for learning approach, athough that is nothing new at the OU) and they are about to be disseminated more widely. There was an interesting slide (which provoked some good discussion) pointing out that you need to balance top-down policy and strategy with bottom up implementation that allows each faculty use assessment that is effective for their particular discipline. There was another session by people from Ulster and Liverpool Hope universities that also talked about the top-down/bottom-up balance/conflict in policy changes.

In this OU talk, someone made a comment along the lines, "why is the OU re-thinking its assessment strategy? You are so far ahead of us already and we are still trying to catch up." I am familiar with hearing comments like that at education technology conferences. It was interested to learn that we are also held in similarly high for policy. The same questioner also used the great phrase "the OU effectively has a sleeper-cell in every other university, in the associate lecturer you employ". That makes what the OU does sound far more excitingly aggressive than it really is.

In the second OU talk, Janet Haresnape described a collaborative online activity in a third level environmental science course. These are hard to get right. I say that having suffered one as a student some years ago. This one seems to have been more successful, at least in part because it was carefully structured. Also, it started with some very easy tasks (put your name next to a picture and count some things in it), and the students could see the relationship between the slightly artificial task and what would happen in real fieldwork. Janet has been surveying and interviewing students to discover their attitudes towards this activity. The most interesting finding is that weaker students comment more, and more favourably, on the collaboration than the better students. They have more to learn from their peers.

The third OU talk was Sally Jordan talking about the ongoing change in the science faculty from summative to formative continuous assessment. It is early days, but they are starting to get some data to analyse. Nothing I can easily summarise here.

The closing keynote was about oral assessment. In some practice-based subjects like law and veterinary medicine it is an authentic activity. Also, a viva is a dialogue, which allows the extent of the student's knowledge to be probed more deeply than a written exam. With an exam script, you can only mark what is there. If something the student has written is not clear, then there is no way to probe that further. That reminded me of what we do in the Moodle quiz. For example in the STACK question type, if the student has made a syntax error in the equation they typed, we ask them to fix it before we try to grade it. Similarly, in Pattern-match questions, we spell check the student's answer and let them fix any errors before we try to grade it. Also, with all our interactive questions, if the student's first answer is wrong, we give them some feedback then let them try again. If they can correct their mistake themselves, then they get some partial credit. Of course computer-marked testing is typically used to assess basic knowledge and concepts, whereas an oral exam is a good way to test higher-order knowledge and understanding, but the parallel of enabling two-way dialogue between student and assessor appealed to me.

This post is getting ridiculously long, but I have to mention two other talks. Calum Delaney from Cardiff Metropolitan University reported on some very interesting work trying to understand what academics think about as they mark an essays. Some essays are easy to grade, and an experienced marker will rapidly decide on the grade. Others, particularly those that are partly right and partly wrong, take a lot longer weighing up the conflicting evidence. Overall though, the whole marking process struck me, a relative outsider, as scarily subjective.

John Kleeman, chair of QuestionMark, UK, summarised some psychology research that shows that the best way to learn something so that you can remember it again is to test yourself on it, rather than just reading it. That is, if you want to be able to remember something, then practice remembering it. It sounds obvious when you put it that way, but the point is that there is strong evidence to back up that statement. So, clearly you should all now go and create Moodle (or QuestionMark) quizzes for your students. Also, in writing this long rambling blog post I have been practising recalling all the interesting things I learned at the conference, so I should remember them better in future. If you read this far, thank you, and I hope you got something out of it too.

Monday, July 1, 2013

Open University question types ready for Moodle 2.5

This is just a brief note to say that Colin Chambers has now updated all the OU question types to work with Moodle 2.5. Note that we are not yet running this code ourselves on our live servers, since we are on Moodle 2.4 until the autumn, but Phil Butcher has tested them all and he is very thorough.

You can download all these question types (and others) from the Moodle add-ons database.

Thanks to Dan Poltawski's Github repository plugin, that is easier than it used to be. Still, updating 10 plugins is pretty dull, so I feel like I have contributed a bit. I also reviewed most of the changes and fixed the unit tests.

I hope you enjoy our add-ons. I am wondering whether we should add the drag-and-drop questions types to the standard Moodle release. What do you think? If that seems like a good idea to you, I suggest posting something enthusiastic in the Moodle quiz forum. It will be easier to justify adding these question types to standard Moodle if lots of non-OU Moodlers ask for it.

Friday, June 21, 2013

Book review: Computer Aided Assessment of Mathematics by Chris Sangwin

The book coverChris is the brains behind the STACK online assessment system for maths, and he has been thinking about how best to use computers in maths teaching for well over ten years. This book is the distillation what he has learned about the subject.

While the book focusses specifically on online maths assessment, it takes a very broad view of that topic. Chris starts by asking what we are really trying to achive when teaching and assessing maths, before considering how computers can help with that. There are broadly two areas of mathematics: solving problems and proving theorems. Computer assessment tools can cope with the former, where the student performs a calculation that the computer can check. Getting computers to teach the student to prove theorems is an outstanding research problem, which is touched on briefly at the end of the book.

So the bulk of the book is about how computers can help students master the parts of maths that are about performing calculations. As Chris says, learning and practising these routine techniques is the un-sexy part of maths education. It does not get talked about very much, but it is important for students to master these skills. Doing this requires several problems to be addressed. We want randomly generated questions, so we have to ask what it means for two maths questions to be basically the same, and equally difficult. We have to solve the problem of how students can type maths into the computer, since traditional mathematics notation is two dimensional, but it is easier to type a single line of characters. Chris precedes this with a fascinating digression into where modern maths notation came from, something I had not previously considered. It is more recent than you probably think.

Example of how STACK handles maths input

If we are going to get the computer to automatically assess mathematics, we have to work out what it is we are looking for in students' work. We also need to think about the outcomes we want, namely feedback for the student to help them learn; numerical grades to get a measure of how much the student has learned; and diagnostic output for the teacher, identifying which types of mistakes the students made, which may inform subsequent teaching decisions. Having discussed all the issues, Chris them brings them together by describing STACK. This is an opportune moment for me to add the dislaimer that I worked with Chris for much of 2012 to re-write STACK as a Moodle question type. That was one of the most enjoyable projects I have ever worked on, so I am probably biassed. If you are interested, you can try out a demo of STACK here.

Chris rounds off the book with a review of other computer-assissted assessment systems for maths that have notable features.

In summary, this is a facinating book for anyone who is interested in this topic. Computers will never replace teachers. They can only automate some of the more routine things that teachers do. (They can also be more available than teachers, making feedback on their work available to students even when the teacher is not around.) To automate anything via a computer you really have to understand that thing. Hence this book about computer-assessted assessment gives a range of great insights into maths education. Highly recommended. Buy it here!

Wednesday, June 20, 2012

Interesting workshop about self-assessment tools

About 10 days ago, I took part in a very interesting workshop about the use of assessment tools to promote learning:

Self-assessment: strategies and software to stimulate learning

The day was organised by Sally Jordan from the OU, and Tony Gardner-Medwin from UCL, and supported by the HEA, so thanks to all of them for making it happen.

People talked about different assessment tools (not all Moodle), how they were getting students to use them, and in some cases what evidence there was for whether that was effective.

Parts of the event were recorded, and you can now access the recordings at http://stadium.open.ac.uk/stadia/preview.php?whichevent=1955&s=1. There is a total of 3.5 hours of video there, so you may not want to watch it all. My presentation is in Part 3, which also includes the final discussion, all in 30 minutes, and provides a reasonable summary of the day.

Despite having spent the whole day at the event, and discussed various aspects of what self-assessment is, I don't think we reached a single definition for what is self-assessment. Actually, I think it is clear that it is not one thing, but it is a useful way of looking at many different things, from the point of view of what is the most useful thing to help students learn.

One of the tools discussed during the day was PeerWise. If you have not come across that yet, then you should take a look, becuase it looks like a very interesting tool. There is a good introduction on Youtube:

.

Tuesday, September 27, 2011

What I want to build next

Earlier this summer I finally finished the new Moodle question engine, which was released as part of Moodle 2.1. As you might expect with such a large change, a number of minor bugs were not spotted until after the release, but I (and others) have fixed quite a lot of them, and we will continue to fix more. I want to say "thank you" to everyone who has taken the time to report the problems they encountered. Pleasingly, some people, including Henning Bostelmann, Tony Levi, Pierre Pichet, Jamie Pratt, Joseph Rézeau and Jean-Michel Vedrine have not only been sending in bug reports, but also submitting bug fixes. I would like to thank them in particular. I don't know whether this means that the new Moodle development processes are working well and encouraging more contributors, or that I released the new question engine full of trivial bugs.

At the moment, apart from fixing bugs, we are about two months away from the end of the OU's one-year project to move from Moodle 1.9 to 2.x and implement a lot of new features at the same time. In the eAssessment area, we had about 30 work-packages to do, of which finishing the question engine was by far the biggest, and we have about 6 left to go. Most of the remaining tasks are at least started, but finishing them is what I, and the developers on my team, will be doing in the near future.

I have, however, been thinking ahead a bit, and I have an idea for what I would like to build, should I be given the opportunity. Honesty compels me to say these are not my ideas. I stole them from other people, and there are proper acknowledgements at the end of this post. I wanted to post about this because: 1. in my experience, if you post about your half-baked ideas, people will be able to suggest ways to make them better; and 2. I am hoping that at least one course-team at the OU will see this and say "we would love to use this in our teaching" because that might persuade the powers that be to let me build this.

Rationale

The Moodle quiz is a highly structured, teacher-controlled tool for building activities where students attempt questions. What I want to create is a more open activity where students can take charge of their learning using a bank of questions to practice some skill where the computer can mark their efforts and give feedback. For the sake of argument, I have been calling this the "Question practice" activity module.

The entry page

When a student goes into a Question practice activity, they see a front screen that lists all the categories in the question bank for this activity.

Next to each category, there are statistics for how the student has performed on that category so far. For example, it might say "recently you scored 19/21 (90%); all time you scored 66/77 (86%).” The categories are nested, and there is a subtotal for each category.

At the bottom of the page is an Attempt some questions… button. This takes the student to the …

Start a session form

… where they set up what practice they would like to do. Students can select which categories they want to attempt questions from. They may also be able to choose how many questions they want. For example "Give me 10 questions", "As many as possible in 20 minutes", or "Keep going until I say stop". The teacher will probably be able to constrain the range of options available here.

Once they are satisfied, the they clicks the "Start session" button. This takes them to the …

Attempt page

… which shows the student the first question, chosen according to the criteria they set. There will probably be a display of running statistics "In this session you have attempted 0 questions so far". The question will contain the controls necessary for attempting the question. There will also probably be a "Please stop, I'm bored" button, so the student can leave at any time.

When they get back to the front page, the statistics will have been updated.

If the student crashes out of a session, then when they go back in, the front page will have a "Continue current session" button.

Overall activity log

One batch of attempting questions will be called a 'practice session'. The system will keep track of all the sessions that the student has done, and what they achieved during each session.

The front page will have a link to a page that lists all of the student's sessions, showing what they achieved in each. This provides more detail than is visible on the front page.

Possible extensions

That is the key idea. Here are some further things that could be added to the basic concept.

Milestones

The system could recognise targets, goal, or achievement (I'm not sure of the best name). That would be something like "Attempt more than 10 questions from the Hard category, and score more than 90%". If the student achieves that target at any time, they system would notice, and the achievement would be recorded on the front page and in the session log in an ego-boosting way (e.g. a medal icon).

The whole point of this activity is to be as student-driven as possible, so should students be able to define their own targets or goals? Should students be able to set goals for each other?

Locks / Conditional access

The activity could also have locks, so that the student cannot access the questions in the Multiplication category until after they have scored more than 80% in the Hard addition category. Of course, unlocking a new category could be an achievement. We appear to be flirting with the gamification buzz-word here, so I will stop.

Performance comparison

Should there by any way for students to compare their performance, or achievements, with their peers? We are definitely getting to features that should be left until version 2.0. Let's get a basic system working first, but make sure it is extensible.

How hard would this be to build

I think this would not require too much work because a lot of the necessary building blocks already exist in Moodle. The question bank already handles questions organised into categories, and we would just use that. Similarly, the attempt page and practice sessions are very easy to manage with the new question engine.

The real work is in two places. First, building the start attempt form, and then writing the code that randomly selects questions based on the options chosen. Second, deciding what statistics to compute, and then writing the code to compute them.

Of course, before we can start writing any code, there are still a lot of details of the design to decide. Also one most not forget things like backup and restore, creating the database, and all the usual Moodle plumbing.

Overall, I think it would take a few months work to get a really useful activity built.

Credit where credit is due

I said earlier that I got most of these ideas from other people. To start with, things like this have been mooted in the Moodle quiz forum over the years. The discussions there usually start from Computerised Adaptive Testing, whereas this idea is about student-driven use of questions. I think the latter is more interesting. (As a mathematician, I think CAT is an interesting concept. I just don't think it would make a useful Moodle activity.)

The real inspiration for this came at a meeting in London at the start of 2011. That meeting was at UCL with Tony Gardiner-Medwin who has already built a system something like this, but stand-alone, not in Moodle; and David Emmett from University of Queensland, Brisbane (who was giving a seminar). David had been hoping to get a grant to build something like this proposal (in Moodle) but that did not pan out. We did, however, have a very interesting discussion, and that is where I got the key idea that this sort of question practice was most interesting if you could give the student control of their own learning as much as possible.

We have also discussed ideas like this on-and-off for a long time at the OU. There has, however, been a lot of other things we needed to deal with first. We had to do a lot of work getting the quiz system working to our satisfaction (a strand of work that eventually lead to the new question engine). We had to sort out the reporting of grades, including working with Moodle HQ on the new gradebook in Moodle 1.9, and integrating Moodle with our student information system. We had to make a new question types that our users wanted. Only now can we start to think seriously about the last piece of the jigsaw: more activities that use all the question infrastructure we have built. I hope this post is a useful starting point for discussing what one of those activities might be.

Wednesday, March 9, 2011

Moodle bug tracker

Today, between fixing bugs and reviewing code, I spent a bit of time tinkering with my dashboard in the Moodle bug tracker. I was trying to make it as clear as possible which issues need my attention. I am quite pleased with the result:

tracker screen grab

The issue statistics widget does not just show you the pretty graphs, it also makes it easy to get at those issues. For example, if I click on 1.9.12 in the My targetted issues box, then I am taken to a list of those 11 issues. That particular widget I have used for a while, the new parts are the boxes just under there.

My: Ongoing pull requests I added to make it easy to find the things I have submitted for inclusion in next week's weekly build (hopefully). Thanks to Eloy, that filter is now available to everyone in the jira-developers group.

The next two boxes let me quickly get to issues with patches attached. There is an emerging convention of adding the label patch to such issues, where the attached code needs to be reviewed. This makes finding such issues very much easier. The whole point of the new development processes is to encourage more people to contribute patches, and then ensure those patches get looked at, rather than just sitting there for years. (Here is an example I found yesterday of what used to happen: MDL-13983). Therefore, as quiz maintainer, I need to be able to see easily if anyone has submitted any relevant patches. I also want easy access to bugs with patches that I created or commented on.

Having brought it up, can I say that I am quite happy with how the new processes are working so far. My impression is that since they were introduced, I have received more usable bug fixes for the quiz that in the past. I am not sure how much causality one can claim there, however, since as well as the new processes, we also had the Moodle 2.0 release. Moodle 2.0 has plenty of minor bugs that are ripe for fixing. So, it may just be that we are seeing lots of bug fixes because there are lots of bugs.

At the other end, it has made it a bit easier to get my code reviewed. Well, finished code where I have created a PULL request certainly gets is reviewed. It is still sometimes a problem to get comments on work-in-progress, because everyone is so busy.

Wednesday, October 20, 2010

The new question engine - how it works

In my last blog post I promised more details of the new question engine "in a week or so". Unfortunately, things like work (mainly fixing minor bugs in the aforementioned question engine); rehearsing for a rather good concert that will take place this Friday; and buying curtains for my new flat, have been rather getting in the way. Now is the time to remedy that.

Last time I explained roughly what a question engine was, and that I had made big changes to the one in Moodle. I now want to say more about what the question engine has to do, and how the new one does it.

The key processes

There are three key code-paths:

To display a page of the quiz:

  1. Load the outline data about the student's attempt.
  2. Hence work out which questions are on this page.
  3. Load the details of this student's attempt at those questions within this quiz attempt.
  4. Get the display settings to use (should the marks, feedback, and so on be visible to this user).
  5. Taking note of the state of each question (from Step 3) update the display options. For example, if the student has not answered the question yet, we don't want to display the feedback now, even if we will later.
  6. Using the details of the current state of each question, and the relevant display options, output the HTML for the question.

The bits in italic are the bits done by the quiz. The rest is done by the question engine.

To start a new attempt at the quiz:

  1. From the list of questions in the quiz, work out the layout for this attempt. (This is only really interesting if you are shuffling the order of the questions, or selecting questions randomly.)
  2. Create an initial state for each question in the quiz attempt. (This is when things like the order of multiple choice options are randomised.)
  3. Write the initial states to the database.

To process a student's responses to a page of the quiz.

  1. From the submitted data, work out which attempt and questions are affected.
  2. Load the details of the current state of those question attempts.
  3. Sort all the submitted data, into the bits belonging to each question. (The bits of data have names like 'q188:1_answer'. The prefix before the '_' identifies this as data belonging to the first question in attempt 188, and the bit after the '_' identifies this as the answer to that question.)
  4. For each question, process its data to work out whether the state has changed and, if so, what the new state is. This is really the most important procedure, and I will talk more about it in the next section.
  5. Write any updated states to the database.
  6. Update the overall score for the attempt, if appropriate, and store it in the database.

These outlines are, of course, oversimplifications. If you really want to know what happens, you will have to read the code.

There are other processes which I will not cover in detail. These include finishing a quiz attempt, re-grading an attempt, fetching data in bulk for the quiz reports, and deleting attempts.

The most important procedure

This is the step where we take the data submitted by the student for one particular question and use it to update the state of that question.

Moodle has long had the concept of different questions types. It can handle short-answer questions, multiple-choices questions, matching questions, and so on. Naturally, what happens when updating the state of the question depends on the question type. That is true for both the old and new code.

Now, however, there is a new concept in addition to question types. The concept of 'question behaviours'.

In previous versions of Moodle, there was a rather cryptic setting for the quiz: Adaptive mode, Yes/No. That affected what happened as the student attempted the quiz. When adaptive mode was off, the student would go through the quiz entering their response to each question. Those responses were saved. At the end, they would submit the quiz and everything would be marked, all at once. Then the student could review their attempt (either immediately, or later, depending on the quiz settings) to see their marks and the feedback. When adaptive mode was on, the student could submit each question individually during the attempt. If they were right first time, they got full marks. If they were wrong, the got some feedback and could try again for reduced marks.

The problem with the previous version of Moodle was the way this was implemented. There was a single process_responses function that was full of code like "if adaptive mode, do this, else do that". It was a real tangle. It was very common to change the code to fix a bug in adaptive mode (for example), only to find that you had broken non-adaptive mode. Another problem was the essay question type, which has to be graded manually by the teacher. It did not really follow either adaptive or non-adaptive mode, but was still processed by the same code. That lead to bugs.

A very important realisations in the design of the new question engine was identifying this concept of a 'question behaviour' as something that could be isolated. There is now a behaviour called 'Deferred feedback' that works like the old non-adaptive mode; there is an 'Adaptive' behaviour; and there is a 'Manually graded' behaviour specially for the essay question type. Since these are now separate, you can alter one without risking breaking the others. Of course, the separate behaviours still share common functions like 'save responses' or 'grade responses'. We now also have a clean way to add new behaviours. I made a 'certainly-based marking' behaviour, and a behaviour called 'Interactive', which is a bit like the old Adaptive mode but modified to work exactly how the Open University wants.

It takes two to tango, and three to process a question

In order to do anything, there now has to be a three-way dance between the core of the question engine, the behaviour and the question type. Does this just replace the old tangle with a new tangle (of feet)? Fortunately there is a consistent logic. The request arrives at the question engine. The question engine inspects it, and passes it on to the appropriate behaviour. The behaviour inspects it in more detail, to work out exactly what need to be done. For example is the student just saving a response, or are they submitting something for grading. The behaviour then asks the question type to do that specific thing. All this leads to a new state that is passed back to the question engine.

So, the flow of control is question engine -> behaviour -> question type except, critically, in one place. When we start a new attempt, we have to choose which behaviour to use for each question. At this point the question engine directly asks the question type to decide. Normally, the question type will just say "use whatever behaviour the quiz settings ask for", but certain question types, like the essay, can instead say "I don't care about the quiz settings, I demand the manual grading behaviour."

If you like software design patterns, you can think of this as a double application of the strategy pattern. The question engine uses a behaviour strategy, which uses a question type strategy (with the subtlety that the choice of behaviour strategy is made by the question type).

Summary

So that is roughly how it works. A clear separation of responsibility between three separate components. Each component focussing on doing one aspect of the processing accurately, which makes the system highly extensible, robust and maintainable. Of course, everyone says that about the software they design, but based on my experiences over the last year, first of building all the parts of the system, and then of fixing the bugs that were found during testing, I say it with some confidence.

Tuesday, October 5, 2010

Introducing the new Moodle question engine

The rest of the (Moodle) world is eagerly anticipating Moodle 2.0, but I would like to tell you about what I have been doing for most of the last year, but which you won't be able to have until Moodle 2.1 - unless, that is, you are a student or teacher with the Open University, in which case you will be using it from this December.

What I have done is to rewrite a large chunk of the Moodle quiz system. What chunk is that? Well, first you can split a quiz system into two main parts. There is the quiz part, which says, "This quiz comprises these questions, and will be open to students between these dates". It tracks the student and they attempt the quiz, and stores their total score. Then there is the part that deals with the details of the individual questions within each quiz.

The question part can again be split in two. There is the question bank which lets the teacher create and store questions. For example "This is a multiple choice question where the student must select one of these three options, and it is an 'Elementary maths' question." Then there is the code that controls what happens when a student attempts a question "The student sees three radio buttons and a Submit button, and when they click the button we compute a score as follows and show this feedback." That second bit is what I call the question engine, and that is what I have rewritten.

However, you cannot just change the question engine in isolation. There are knock-on effects. For example, the quiz module still maintains overall control of things, even though it delegates a lot of the details to the question engine. So there are places where the quiz says things like "Dear question engine, please display this question now", or "Dear question engine, the student submitted this data, please process it", or "Dear question engine, the teacher wants to see all students responses to all questions in this quiz, give me the data to display." All those places have to change when the question engine changes.

There were also small changes required to the question bank. mainly because the new question engine has some new features that need some extra options stored with each question. So, the question bank needs to store the new options; let teachers edit them; back them up and restore them; import them and export them; and so on.

Altogether, my year's work added about 52,000 lines of new code and removed about 25,000 lines of old code (or, if you prefer, added 27,000 lines and altered 25,000 lines). At least that is the size of the change that I committed to the OU's CVS server last Friday, just in time to make the feature-freeze for the December update of our VLE. For comparison, the whole of Moodle 2.0 is about 1,600,000 lines of code, although that includes several large third-party libraries.

I am sure that there will be some minor bugs still to be found and fixed, but this new code has already had extensive testing from my colleagues Phil Butcher and Paul Johnson, so I am confident that the remaining bug-fixes will be minor.

There is much more I want to write about the new question engine, but I think this introductory post is already long enough. Therefore, I will split the remainder of what I want to say into separate posts which I hope to publish over the next week or so.

Thursday, March 25, 2010

When do students submit their online tests?

I am currently studying an Open University course (M888 Databases in Enterprise systems. There was an assignment due today, and like many students, I submitted only an hour before the deadline.

That got me thinking, are all students really like that? Well, I don't have access to our assessment submission system, but I do work on our Moodle-based VLE, so I can give you the data from there.

This graph shows how many hours before the deadline students submit their Moodle quizzes (iCMAs in OU-speak)



That is not exactly what I was expecting. Certainly, there is a bit of a peak in the last few hours, but there is another peak almost exactly 24 hours before that, with lesser peaks two and three days before.

Note that all our deadlines are at noon (it used to be midnight, but that changed a few months ago). The graph above is consistent with our general pattern of usage. The following graph shows what time of day students submitted their quiz attempts. It is same shape as our general load graph for most OU online systems.



I don't know what, if anything, this means, but I thought it was interesting enough to share.

By the way, if you want to compute these graphs for your own Moodle, here are the database queries I used:

-- Number of quiz submissions by hour before deadline
SELECT 
    (quiz.timeclose - qa.timefinish) / 3600 AS hoursbefore,
    COUNT(1)

FROM mdl_quiz_attempts qa
JOIN mdl_quiz quiz ON quiz.id = qa.quiz

WHERE
    qa.preview = 0 AND
    quiz.timeclose <> 0 AND
    qa.timefinish <> 0

GROUP BY
    (quiz.timeclose - qa.timefinish) / 3600

HAVING (quiz.timeclose - qa.timefinish) / 3600 < 24 * 7

ORDER BY
    hoursbefore

-- Number of quiz submissions by hour of day
SELECT 
    DATE_PART('hour', TIMESTAMP WITH TIME ZONE 'epoch' + timefinish * INTERVAL '1 second') AS hour,
    COUNT(1)

FROM mdl_quiz_attempts qa

WHERE
    qa.preview = 0 AND
    qa.timefinish <> 0

GROUP BY
    DATE_PART('hour', TIMESTAMP WITH TIME ZONE 'epoch' + timefinish * INTERVAL '1 second')

ORDER BY
    hour