Wednesday, August 15, 2012

Automating git

This is a long-overdue follow-up to my previous post about using git to fix Moodle bugs. Thanks to Andrew Nichols of LUNS for nudging me in to writing this.

Git has an efficient command-line interface, but even so, there are some sequences of commands that you find yourself typing repeatedly. Git provides a mechanism called aliases which can be used to reduce this repetitive typing. This post explains how I use it in my Moodle development.

Basic usage

Let us start with the simplest possible example. I get bored typing git cherry-pick in full all the time. The solution is to edit the file .gitconfig in my home directory, and add

[alias]
        cp     = cherry-pick

Then git cp … is equivalent to git cherry-pick …. That saves 9 characters every time I have to copy a bug fix to a stable branch.

Simple aliases like this can also be used to to supply options. Another one I have set up is

        ff     = merge --ff-only

I use that when I need to update one of my local branches to match a remote branch. Suppose I think I are on the master branch, and I want to update that to the latest moodle/master. Normally one would just type git merge moodle/master and it would look like this:

timslaptop:moodle_head tim$ git merge moodle/master
Updating ddd84e9..b658200
Fast-forward

Suppose, however, that I had made a mistake, and I was actually on some other branch. Then git would try to do a merge between master and that branch, which is not what I want. The --ff-only option tells git not to do that. Instead it will stop with an error if it can't do a fast forward. So, to prevent mistakes, I normally use that option, and I do it frequently enough I found it worthwhile to create the alias.

Getting more ambitious

Sometimes the repeated operation you want to automate is a sequence of git commands. For example, when a new weekly build of Moodle comes out, I need to type a sequence of commands like this:

git checkout master
git fetch moodle
git merge --ff-only moodle/master
git push origin master

That updates my local copy of the master branch with the latest from moodle.org and then copies that to my area on github. To automate this sort of thing, you have to start using the power of Unix shell scripting. (If you are on Windows, don't worry, because you typically get the bash shell when you install git.)

Fortunately, you don't need to know much scripting, and you can probably just copy these examples blindly. The first thing to know is that you can put two commands on one line if you separate them using a semi-colon (just like in PHP). The previous sequence of commands could be typed on one line as

git checkout master; git fetch moodle; git merge --ff-only moodle/master; git push origin master

(Note that these lines of code are getting quite long, and will probably line-wrap. It should, however, be a single line of code.)

Doing it this way turns out to be a bad idea. What happens if one of the commands gives an error? Well, the system will just move on to the next command, even though the error from the previous command probably left things in an unknown state. Dangerous! Fortunately there is a better way. If you use && instead of ; then any error will cause everything to stop immediately. If you are familiar with PHP, then just image that every command is a function that returns true or false to say whether it succeeded or not. That is not so far from the truth. So, the right way to join the commands together looks like this:

git checkout master && git fetch moodle && git merge --ff-only moodle/master && git push origin master

Now we know what we want to automate, we need to teach this to git. It is a bit tricky because we don't just want to convert one single git command into another single git command. Instead we want to convert one git command into a sequence of shell commands. Fortunately this is supported, you just need to know the right syntax:

        updatemaster = !sh -c 'git checkout master && git fetch moodle && git merge --ff-only moodle/master && git push origin master' -

Now I just have to type git updatemaster to run that sequence of commands.

Parameterising your aliases

That is all very well for master, but what about all the stable branches? Do I have to create lots of separate aliases like update23, update22, update21, …? Of course not. Git was created by and for computer programmers. Shell scripts can take parameters, and the solution is an alias that looks like

        update = !sh -c 'git checkout MOODLE_$1_STABLE && git fetch moodle && git merge --ff-only moodle/MOODLE_$1_STABLE && git push origin MOODLE_$1_STABLE' -

With that alias, git update 23 will update my MOODLE_23_STABLE branch, git update 22 will update my MOODLE_22_STABLE, and so on.

You can use any number of parameters. If you remember my previous blog post, typically I will create the bug fix on a branch with a name like MDL-12345 that starts from master, and then I will want to copy that to a branch called MDL-12345_23 branching off MOODLE_23_STABLE. With the following alias, I just have to type git cpfix MDL-12345 23 in my Moodle 2.3 stable check-out:

        cpfix = !sh -c 'git fetch -p origin && git checkout -b $1_$2 MOODLE_$2_STABLE && git cherry-pick origin/master..origin/$1 && git push origin $1_$2' -

One final example that belongs in this section:

        killbranch = !sh -c 'git branch -d $1 && git push origin :$1' -

That deletes a branch both in the local repository and also from my area on github. That is useful once one of my bug fixes has been integrated. I then no longer need the MDL-12345 branch and can eliminate it with git killbranch MDL-12345.

To boldly go …

Of course, all this automation comes with some risk. If you are going to screw up, automation lets you screw up more things quicker. I feel obliged to emphasis that at this point. If you are going to shoot yourself in the foot, a machine gun gives the most spectacular results, and we are about to build one, at least metaphorically.

We just saw the killbranch command that can be used to clean up branches that have been integrated. What happens if I submitted lots of branches for integration last week. I have to delete lots of branches. Can that be automated? Using git I can at least get a list of those branches:

timslaptop:moodle_head tim$ git checkout master
Already on 'master'
timslaptop:moodle_head tim$ git branch --merged
  MDL-12345
  MDL-23456
* master

Those are the branches that are included in master, and so are presumably ones that have already been integrated. It is a bit irritating that the master branch itself is included in the list, but I can get rid of it using the standard command grep:

timslaptop:moodle_head tim$ git branch --merged | grep -v master
  MDL-12345
  MDL-23456

I have a list of branches to delete, but how can I actually delete them? I need to execute a command for each of those branch names. Once again, we find that shell scripting was developed by hacker, for hackers. The command xargs does exactly that. xargs executes a given command once for each line of input it receives. Feeding in the list of branches, and getting it to execute the killbranch command looks like this:

git branch --merged | grep -v $1 | xargs -I "{}" git killbranch "{}"

Now to make that into an alias

        killmerged = !sh -c 'git checkout $1 && git branch --merged | grep -v $1 | xargs -I "{}" git killbranch "{}"' -

With that in place, git killmerged master will delete all my branches that have been integrated into master. Note that you can use one alias (killbranch) inside another (killmerged). That makes it easier to build more complex aliases.

Once I have deleted all the things that were integrated, I am left with the branches I have in progress that have not been integrated yet. Those all need to be rebased, and that can be automated too:

        updatefix = !sh -c 'git checkout $1 && git rebase $2 && git checkout $2 && git push origin -f $1' -
        updatefixes = !sh -c 'git checkout $1 && git branch | grep -v $1 | xargs -I "{}" git updatefix "{}" $1' -

With those in place, I just just type git updatefixes master, and that will rebase all my branches, both locally and on github. Use at your own risk!

Thats all folks

To summarise, here is the whole of the alias section of my .gitconfig file:

[alias]
        cp     = cherry-pick
        ff     = merge --ff-only
        cpfix  = !sh -c 'git fetch -p origin && git checkout -b $1_$2 MOODLE_$2_STABLE && git cherry-pick origin/master..origin/$1 && git push origin $1_$2' -
        update = !sh -c 'git checkout MOODLE_$1_STABLE && git fetch moodle && git merge --ff-only moodle/MOODLE_$1_STABLE && git push origin MOODLE_$1_STABLE' -
        killbranch = !sh -c 'git branch -d $1 && git push origin :$1' -
        killmerged = !sh -c 'git checkout $1 && git branch --merged | grep -v $1 | xargs -I "{}" git killbranch "{}"' -
        updatefix = !sh -c 'git checkout $1 && git rebase $2 && git checkout $2 && git push origin -f $1' -
        updatefixes = !sh -c 'git checkout $1 && git branch | grep -v $1 | xargs -I "{}" git updatefix "{}" $1' -

There is limited documentation for this on the git config man page. There is more on the git wiki.

Thursday, August 2, 2012

Standards

Standardisation efforts are odd things. Most successful standards seem to have come out of one or a few brilliant individuals, and the standardisation committees only took over after the thing in question became widely adopted. Think of C, C++, Java, HTML, HTTP, JavaScript, SQL… Of course, it is only with hind-sight that we know those were successful things, that the people who created them were brilliant, and that it was worth investing effort in a standardisation committee to get different implementations to be interoperable. There are many fewer examples of successful standards that started with a committee. I am sure there are some, but I am failing to think of any right now.

Even when there are standards, that does not magically solve all your problems. Ask any developer about the problems of getting their web site to work on all browsers, particularly Internet Explorer, despite the existence of the HTML, CSS and JS standards; or look at the work Moodle has to do to work with the four databases we support, even though SQL is supposed to be a standardised language.

In theory a standard makes sense. If you have n different systems you want to move data between, then

  • If you go directly from system to system, you would have to write ½n*(n-1) different importers.
  • Given a common standard, you only need to write n different importers.

In practice, different systems have slightly different features, so you cannot perfectly copy data from one system to another. An importer from X to Y is not a perfect thing, it has to fudge some details. Now compare the two ways of handling import:

  • An importer for System Y that directly imports the files output by System X can know all about the details of System X, so it can do the best possible job of dealing with the incompatible features.
  • Using Standard S, System X has to save its data in format S dealing with any incompatibilities between what X supports and what S supports. Then System Y has to take file S and import it, dealing with any incompatibilities between what S supports and what Y supports, and it has to do that without the benefit of knowing that the data originally came from System X.

Therefore, going for direct import is likely to give better results, although at the cost of more work.

The particular case I am thinking about is, of course, moving questions between different eAssessment systems. The only standard that exists is IMS QTI, which has always struck me as the worst sort of product of a committee. It is not widely adopted and it is horribly complicated. Also, if we wanted to make Moodle support QTI, we would have to completely rewrite the Moodle to work the way QTI specifies. That is sort-of fair enough. If you want to display HTML like a web browser, you basically have to start from scratch and write you code from the ground up to work the way the HTML, CSS and JavaScript standards say. These standards are not designed to make content interoperate between different existing systems. You need only look at the horrible mess you get when you do Save as… -> HTML in MS Word, or even just copy-and-paste from Word to Moodle, to be convinced of that.

So, QTI is trying to solve the wrong problem. It is trying to be a full-featured standard that you can only support by basing your whole software around what the standard says. We don’t want to rewrite the whole Moodle question engine just to support some standard that hardly anyone else uses yet. We just want to be able to import 99%+ of questions from other systems, and from publishers, that Teachers can get access to. The kind of standard we want is more like CSV files. CSV is a nice simple standard to transfer data between spreadsheets and other applications.

In the past, it has always been easier to write separate importers for each other system Moodle wants to import from, rather than trying to deal with one very complex generic standard like QTI. See the screen-grab of Moodle's import UI to the right. To write a new importer, you just need some example files in the format you want to support, containing a few questions of each type. Then it is easy to write the code to parse that sort of file, and converting the data to the format Moodle expects.

Having said that, the current situation is not perfect. The problem is that most of these other file formats are output by commercial software. Therefore, many developers cannot easily get sample files in those formats to use for developing and testing code. As a result, some of the importers are buggy. We have to rely on people in the community who care enough, and who have access to the software, to create example files for us. There was a good example of that recently: Someone called Rick Jerz from Le Claire, Iowa produced a good example Examview file, and long-time quiz contributor Jean-Michel Vedrine from St Etienne, France used that to fix the bugs in the Examview importer.

On the standardisation front, there is a glimmer of hope. IMS Common Cartridge is a standard for moving learning content from one VLE to another. It uses a tiny, and hence manageable, subset of QTI that tries to solve the “transfer 99%+ of the questions teachers use” problem. It should be possible to get Moodle to read and write that QTI subset. We just need someone with the time and inclination to do the work. It is even possible that the OU's OpenLearn project will be that someone, but QTI import/export is just one of many things on their to-do list.