Friday, April 17, 2009

PHP global variables are not necessarily evil

Global variables in software are generally a bad idea. They serve to magically teleport information from one place in a program to another in a way that is very hard to follow. Thus they cause subtle bugs, make maintenance difficult, and kill kittens.

In a simple program, it is normally easy to avoid global variables. When you call a function or method to get it to do something for you, you can probably just pass all the necessary inputs to the function as parameters, and get all the results back in the return statement. The flow of data is easy to follow.

However, in a large, complex system, that may become untenable. For example many, but not all functions in Moodle need a database connection to get information or update it. Does that mean all functions in Moodle end up taking a $db parameter, whether they need them or not, just so they can pass it on to any functions they in turn call? Well, we don't do that, because that way lies madness. Instead we have a 'global' $DB object, and that is not all. We also have $CFG, $COURSE, and so on. Imaging if you needed many of those inside your function. Calling it with all those parameters would be a pain.

Now, I just put the word 'global' in quotes, why was that? Well, Moodle is a web application, so the process that is running is a web server, probably with multiple threads executing simultaneously. A true global variable would be accessible from all threads, and last forever. Something you declare in PHP as a global is only accessible for the duration of one server request, and not accessible from any other thread. In Java, that would be called a ThreadLocal, not a global.

So, a PHP global is not truly a global. However, if abused, it can still be a way to teleport data from one place to another during the processing of a single request. The point I want to make is that there are non-evil ways to use it. In Patterns of Enterprise Application Architecture Martin Fowler describes the Registry pattern (sorry, that online summary is inadequate). That is an object in your program which all the other parts can get hold of, and from which they can get the service objects, like the database connection, that they must depend on. In web applications, you often want dependancies with thread local scope. My argument is that the PHP global keyword is a language feature that implements a thread-local registry for you with no effort. Thus it is a good thing if used properly.

What is using it properly? Well it all depends on the kind of things you store there. Are you making common dependancies easy to find, or are you magically teleporting information. I think that Moodle does mostly use this feature properly. Our global variables like $DB, $CFG and $COURSE are things that can legitimately be stored in and accessed from a Registry. However, we have some horrors, like $CFG->pagepath and $CFG->stylesheets, which I am currently trying to exorcise from Moodle 2.0.

One advantage of storing your dependencies in a Registry, as opposed to, say, making them singletons, is that it makes them substitutable. This becomes important when you try to write unit tests. When unit testing, you often need to switch in a test double in place of one of the dependencies. Well, with a Registry you can swap one in during test set up, and switch it back out during tear down. I have been doing that a lot recently in Moodle, and it works. (You could not do it, for example, when you accessing the database in Moodle 1.9. There, the database connection was hidden in functions like get_records, and could not be substituted. Now we have $DB->get_records with $DB in the 'global' registry where it can be substituted.) Some other global-like mechanisms, like static methods or singletons are not substitutable, and so make testing much harder, if not impossible.

Of course, some people would argue that you still should not use the global keyword, that instead you should implement your own Registry class (an example). I disagree. Appropriate use of language features tends to create more idiomatic code.

12 comments:

  1. A side effect of the use of the moodle global $DB vs a potential singleton is that it is also extremely easy to grep - something I find incredibly useful in day to day working with moodle.

    ReplyDelete
  2. I am researching the cons of global and I came across your post. Nice one, indeed I feel that using global has nothing wrong in PHP, and it's a lot faster than using a Registry class when I benchmark my scripts

    ReplyDelete
  3. Using globals removes your ability to control write access to the global data. Any piece of new code could (maliciously or unintentionally) bork your database handle. If it were a private member of a "Registry" class this would no longer be the case, and the "Register" class (the authority for your DB handle) would be the only one who could bork your DB handle. The latter makes sense from a design standpoint, the former does not.

    Thus you can use globals for the sake of marginally simpler and more efficient code, but at the expense of good design and making your code easier to break. Or you can spend an extra ~5 minutes writing a class to own and marshal out DB handles making your code encapsulated, breakable only from within the DB manager itself, well designed, and only marginally more complicated and slower. The speed difference will likely be negligible in most applications as the network is usually the bottleneck unless you're reeeeeeeeally taxing your CPU. The additional code complexity would be in the class itself; your client code would just be replacing "global $dbh;" with "$dbh = Config::getDB();" which is just replacing one line for another.

    In my professional opinion, there's very very very rarely a case for globals when a global-scope class can do a far better job with only a few minutes of work to duplicate PHP's "global $var" feature.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Mark, while I don't disagree with your opinion, you still have not convinced me that it is the only defensible opinion.

    What I think you are overlooking is that programming is a collaborative activity. When you are working on a project, generally you are trying to write bug free code. You are not trying to abuse every potential weakness in other parts of the code. So what is important is how easy it is to write bug-free code, not how hard it is to write buggy code.

    I mean, given the opinion you expressed, why are you working with a weakly typed language like PHP at all? Wouldn't you be happier working in Java? That would help you avoid all sorts of other types of bug, and the cost of a ~5 minutes having to declare the type of every variable before you first use it. For example, in Java you could not make the mistake $a = 'four'; $b = sqrt($a);

    Meanwhile, this week's discovery was that in PHP strpos('123', 2) is not the same of strpos('123', '2'), even though '2' == 2. In a statically and strongly typed language like Java, it is at least clear that strpos(String, int) is a different method from strpos(String, String). However, I don't think it would be a good idea in any language to overload and API in the way PHP strpos does. At least in this case I was writing unit tests and found the problem immediately.

    ReplyDelete
  6. I admit I come from a C++ background, but also know that I professionally code in PHP on a day to day basis. I am not familiar with moodle, but I agree that global PHP variables are not necessarily evil, or maybe more accurately are a necessary evil. PHP already forces you to use many global variables and even goes so far as to create "Super Globals" which are basically part of the language. Then they have magic constants which are of a similar idea (such as __FILE__) and even further there is the dreaded mutation of $_POST and $_GET variables by magic quotes in older versions of PHP.

    Then the weak type system which you've just outlined along with the ability to generate dynamic php code with eval and call functions by their name stored in a variable as a string. The complete lack of structure and ability to merge HTML with PHP seamlessly all for the sake of simplicity. This is not even to mention there is no "program entry point" everything in the global scope just gets executed, and variables used in this scope are treated as global by default.

    PHP has the potential to be terrible (and encourages this behavior in beginner programmers.) There are so many hidden dependencies that are built directly into the language that to get around to try and make things more othrogonal you end up fighting the language.

    But PHP is much different from most languages as you say. Each execution of a single page is its own program. So use of a few globals in a small program is fine, you don't have to worry about other instances messing with the user defined globals. Every page is an island.

    That said, the common functionality between each page (primarily common libraries) should be held to a higher standard and should avoid globals wherever possible. Fine, your library makes use of a $DB global, what happens when someone wants to use your library with another which also has this name overlap? This is actually a problem with lack of namespacing more than anything. Obviously functions and classes have this possibility as well, but multiple use of global variables is SILENT and therefore much more evil, where multiple function names clashing will cause errors.

    I'm stuck developing for PHP 4.3 and 5.0 at the same time so I must keep my objects very primative, present an interface and strongly scold anyone who accesses things I mark as private because I can't do much about it beyond comment that the variable should not be directly accessed. There are many real problems with the language, but if you are vigilant (and god knows the language won't force you to be) you can make beautiful code.

    Let me reccomend you strongly consider writing your own setting object and store all your config information there, then use dependency injection to squash some of those globals.

    I'm still working on minimizing the use of globals in my CMS implementation and the design is benefiting from this greatly as it matures.

    ReplyDelete
  7. Thank you for your thoughtful comments.

    You have my sympathy that you are still stuck with PHP 4.

    If you have not yet come to appreciate some of the advantages of a weakly-typed, highly dynamic programming language, I recommend that you look at Ruby and/or JavaScript some time. They are languages that look like they were actually designed, rather than just congealing, and are rather elegant and powerful. (Just to be clear, JavaScript the core language is elegant. The way it gets used in web browsers is less so.) I am not saying that dynamic languages are better than statically typed ones, just that they have different advantages, and it took me (coming from a Java background) some time to appreciate what those were.

    I agree with you that you can good or bad code in any language, and that some languages make it easier or harder. PHP certainly makes it easy to write bad code, but it is perfectly possible to write good PHP code.

    Also, since Moodle is nearly 10 years old, there are bigger problems (if it is one) than our use of globals to implement the registry pattern, and that is where my attention is focused at the moment.

    ReplyDelete
  8. Oh, do not let my rambling rant allow you to believe that I do not make judicious use of the advantages offered by a weakly typed, highly dynamic language. I may come from a C++ background, but I am the sole developer for a mid-sized CMS (about 25000 lines of code, maintained for almost two years and implemented across about 30 websites. Though lines of code doesn't mean much beyond giving a rough idea of the scale of the project.) It is used by the other developers at my work to implement client websites. The code base is not perfect, there are many things which can be improved (and I do so incrementally while maintaining backwards compatibility... Refactoring.)

    I mention this because I can cite a few places where I definitly appreciate the dynamic properties of PHP. I've created a templating language which allows nested conditional statements and loops. Typically there is a lot of skepticism about using a templating system, but in a CMS where you must detect content sections and allow for highly dynamic layouts with minimal programming involved it becomes a very useful abstraction. You can rely on parsing the html for hints as to images, and files which must be uploaded by the user and other things such as the composition of the panels etc. In any case, the development of a templating system in PHP has convinced me it would be a much more involved process with a more strongly typed language. The amount of meta-data programming involved really benefits from a dynamic language as well where you can generate code if needed.

    This is not to say you couldn't do this in C++, clearly you can, but the highly dynamic act of parsing and assembling a meta-language is a prime example of this.

    I also use JavaScript often in my interfaces, and have developed a few simple widget like tools with it. I primarily use MooTools and have made use of more than just the tweens (set up classes and such) and I totally appreciate the use of functions and objects as first class citizens.

    ReplyDelete
  9. All of that said, my criticism of dynamic languages aren't for what they are capable of, but rather, for the ease in which they are rendered completely unmaintainable as they grow in scope. When you've got a large application it becomes very important to nail down weird behavior, and a lot of little things can be missed if you aren't always vigilant. Even simple things like mis-typing a variable name can cause confusion (though you can turn E_Notices on in PHP and catch these.)

    I am always learning, however, and am very appreciative of the power, but equally wary of it. The main problem really comes down to giving that much power to someone who typically has little to no programming background beyond the language. You see (or I do anyway) a lot of novice PHP programmers out there, and they produce reams and reams of unmaintainable crap primarily because they are not encouraged to be vigilant by the language. It's very loose, which can be very powerful, but with great power comes great responsibility, and most who start out programming in PHP don't really seem to understand that.

    I'm making sweeping generalizations, and for this I apologize.

    With all of that said, some of the coolest code I've written is in PHP. I don't think it's a graceful language, it is really pretty piecemeal. So is the implementation of the JavaScript DOM (though MooTools cleans this up a bit for us, as you say, the language itself is elegant, though I don't really like it's lack of references, and casting is idiomatic at best.) And while we're at it, C++ is incredibly verbose and has more undefined behavior than anything else I'm aware of. All of these languages have different purposes and different strengths and weaknesses. I suppose after all this diatrab I just wanted to point out that I largely agree with you though I still strive to minimize globals, you seem to have a solid head on your shoulders and you know what you're about.

    And do not be self-conscious about the inelegance of your old projects. As they say, the difference between legacy code and any other proposed solution is that one is currently working and has been for some time and the other doesn't. But I wish you luck in your refactoring. I was lucky to be able to start a medium sized project nearly from scratch (many pieces borrowed from previous toolsets I had been developing separately) and work on it almost exclusively to hammer out the inelegance for a year and I'm still finding lots of stuff that could be better.

    e-mail: maxmike@gmail.com
    Currently working for http://www.squareflo.com

    ReplyDelete
  10. Finally, this rambling is more to just get the thoughts out to the general public who might stumble by. As I said, you have a good head on your shoulders and I'm certain that I'm just preaching to the choir, hopefully my posts have been more entertaining than annoying (or boring!)

    Best of luck with Moodle! It seems to be very useful for a lot of teachers. This has been split into 3 separate posts because of the length restrictions on your blog.

    ReplyDelete
  11. Sorry about the length restriction. That's not something that I set up. It must be a Blogger 'feature'. Yes, the debate has been more entertaining than annoying.

    One thing dynamic language advocates say is that no matter what sort of language you are using, you should be doing unit testing; and if you are doing unit testing, that will probably catch all the sorts of things that a statically typed language's compiler will catch anyway, and more. Not sure I have enough experience yet to know if that is true, but I am definitely a convert to unit tests.

    ReplyDelete
  12. Thanks Tim and Michael for your lively debate. Very informative for us newbie-PHPers i think!

    ReplyDelete