AddThis Social Bookmark Button


Listen Print Discuss

Better Code Through Destruction

by Igor Gariev
June 07, 2007

Larry Wall said that Perl makes easy things easy and hard things possible. Perl is good both for writing a two-line script that saves the world at the last minute (well, at least it saves you and your project) and for robust projects. However, good Perl programming techniques can be quite different between small and complex applications. Consider, for example, Perl's garbage collector. It frees a programmer from memory management issues most of the time...until the programmer creates circular references.

Perl's garbage collector counts references. When the count reaches zero (which means that no one has a reference), Perl reclaims the entity. The approach is simple and effective. However, circular references (when object A has a reference to object B, and object B has a reference to object A) present a problem. Even if nothing else in the program has a reference to either A or B, the reference count can never reach zero. Objects A and B do not get destroyed. If the code creates them again and again (perhaps in a loop), you get a memory leak. The amount of memory allocated by the program increases without a sensible reason and can never decrease. This effect may be acceptable for simple run-and-exit scripts, but it's not acceptable for programs running 24x365, such as in a mod_perl or FastCGI environment or as standalone servers.

Circular references are sometimes too useful to avoid. A common example is a tree-like data structure. To navigate both directions--from root to leaves and vice versa--a parent node has a list of children and a child node has a reference to its parent. Here are the circular references. Many CPAN modules implement their data models this way, including HTML::Tree, XML::DOM, and Text::PDF::File. All these modules provide a method to release the memory. The client application must call the method when it no longer needs an object. However, the requirement of an explicit call is not very appealing and can result in unsafe code:

    ##
    ## Code with a memory leak
    #
    use HTML::TreeBuilder;

    foreach my $filename (@ARGV) {
        my $tree = HTML::TreeBuilder->new;
        $tree->parse_file($filename);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## Do the actual work (say, extract images) here
        ## ...
        ## and release the memory
        ##
        $tree->delete;
    }

The problem in the code is the next statement; HTML documents with no <img ... tags will not be released. Actually, any call of next, last, return (inside a subroutine), or die (inside an eval {} block) is unsafe and will lead to a memory leak. Of course, it is possible to move the release code into a continue block for last or next, or to write code to delete the tree before every return or die, but the code easily becomes messy.

There is a better solution--the paradigm of "resource acquisition is initialization (and destruction is resource relinquishment)." (Ironically, the second half of its name is often omitted, even though it's probably the most important part). The idea is simple. Create a special guard object (of another class) whose sole responsibility is to release the resource. When the guard object gets destroyed, its destructor deletes the tree. The code may look like:

    ##
    ## A special sentry object is employed
    ##
    use HTML::TreeBuilder;

    foreach my $filename (@ARGV) {
        my $tree = HTML::TreeBuilder->new;
        $tree->parse_file($filename);

        my $sentry = Sentry->new($tree);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## next, last or return are safe here.
        ## Tree will be deleted automatically.
        ##
    }

    package Sentry;

    sub new {
        my $class = shift;
        my $tree  = shift;
        return bless {tree => $tree}, $class;
    }

    sub DESTROY {
        my $self = shift;
        $self->{tree}->delete;
    }

Note that now there is no need to call $tree->delete explicitly at the end of the loop. The magic is simple. When program flow leaves the scope, $sentry is reclaimable because it participates in no circular references. The code of DESTROY method of the Sentry package calls, in turn, the method delete of the $tree object. This is one solution for all means; memory will be released however you leave the block.

Finally, there is no need to code your own Sentry class. Use Object::Destroyer, originally written by Adam Kennedy. As you may guess by its name, it is the object to destroy other objects:

    ##
    ## An of-the-CPAN solution with Object::Destroyer
    ##
    use HTML::TreeBuilder;
    use Object::Destroyer 2.0;

    foreach my $filename (@ARGV) {
        my $tree   = HTML::TreeBuilder->new;
        my $sentry = Object::Destroyer->new($tree, 'delete');
        $tree->parse_file($filename);

        next unless $tree->look_down('_tag', 'img');
        ##
        ## You can safely return, die, next or last here.
        ##
    }

Because the name of the release method may vary between modules, it is the constructor's second argument.

Pages: 1, 2, 3

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2006 O’Reilly Media, Inc. All Rights Reserved.
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email