Garbage Collection


Stuart Yeates neatly sums up Garbage Collection like this:

In a garbage collected language, it's the responsibility of each module to maintain It's data structures and arrange for them to only contain those objects which that module needs. when an object is no longer needed locally, It's local references are extingished (deleted, copied over, popped of the stack, etc).

In a non-garbage collected language, in addition to determining doing this, each module must be aware of whether any other (including those which haven't been written yet) also accesses the object, and if not free it.

Garbage collection thus reduces "the base problem of not having a handle on the useful life of your objects" from a global one to a local one.


Max Hailperin, Associate Professor of Computer Science, at Gustavus Adolphus College, says:

In the garbage collected system, you need to reason locally: each module needs to say that "if I don't need this widget anymore, then I shouldn't hang onto a reference to it anymore." If each module does that, then when no module needs the widget anymore, none will have a reference, and so it will become garbage collectable.

In the manual-deallocation system, you need to reason globally: each module needs to say that "if I don't need this widget anymore, and I am the last module that did need it, then I should deallocate it." In other words, you need to reason about not only your own module's lack of future need for the widget, but also about all the other modules, in order to determine whether you are the last.

In both cases, it is necessary to do some sort of reasoning. The dream many people have that you can avoid reasoning about these issues at all is just a dream. With GC, you get away with it more often, which serves to lull people into complacency, and then one day they get burnt big time, typically by refering to lots of big objects from "inactive" portions of a data structure (like elements of an array beyond the number that some counter tells you are "in use"). If you can overcome the complacency, it is relatively easy to get the reasoning right, however, because of its local nature. With manual deallocation you get burnt all the time, so you never get complacent, but despite the constant reminder that you need to reason, you still find it very difficult to do so correctly, because of the globalness.

The best programmers of manual-deallocation systems typically go to great lengths to avoid having to do fully-general global reasoning about the question "am I the last". For example, they may copy structures that there is no logical reason to copy, so long as there is no necessity of sharing. For structures that need to be shared, they may designate an explicit "owner" who all the other users "check in" and "check out" with, effectively manually implementing reference counting. Etc. For simple situations it isn't too bad, but for hairy stuff this effort can easily overwhelm the effort actually devoted to the program's intended purpose.

So, no garbage collection doesn't solve the memory leak problem by itself, in the sense of without thought, but it isn't just a matter of making leaks less common (and premature deallocations/reallocations impossible). Instead, it is a matter of making the reasoning much easier that it takes to make the leaks go away entirely, as opposed to just becoming less common.


My advise for writing code in a language with a GC, or with a non-GC language (like C/C++) to which you've added a GC, is to avoid using global data as much as possible. This is easy to do in a purely functional language, like Haskell, but in Lisp or C++ it might take a little more discipline.

I don't think that this makes it impossible to do, and the advantages of using a GC should outweight the disadvantages. In my experience, this tends to be the case. Perhaps this is because I take care with what data I make global, and remember to use something like "ptr = nil" when appropriate, or it could just be that most of the code I've written uses data where the lifetime is pretty clear. In all of my code, there are objects which are shortlived, but which are (or would be) tedious to use without a GC. These objects can be anything from strings and simple records, thru to more complex data structures like graphs, indexes, etc.

We don't all make the same demands of our tools, and so we may sometimes be choosing problems to solve that look sufficiently "easy" to do. I find that tools like Garbage Collection can help make a few problems solvable that would otherwise to "too hard".

As always with these things, Your Mileage May Vary. I just hope that you find something here that can help you, whetever that may be.


"GC is not a generic solution for memory leaks, but a (correct) GC is a generic solution for 'dangling pointers'. Just as there is no general solution for 'loops' (due to undecidability), there is no general solution for 'leaks'."
Henry Baker