meadhbh lives here: strange programmer habits : avoiding goto's by using do..while's

Wednesday, July 13, 2011

strange programmer habits : avoiding goto's by using do..while's

In the early days of computer software, programmers were using languages like assembly, fortran, cobol and lisp to produce reasonably small programs to compute trajectories, maintain inventory databases or accounting systems and whatnot. Computing systems weren't big enough to allow programmers to make the massive software systems like modern operating systems, web browsers or computer games.

This is probably why it took a couple decades for people to understand how bad "spaghetti code" was; it's easy to dismiss spaghetti being a problem when your complete software system is one or two pages long. But when a printout of your system requires you to chop down a medium sized forest, concepts like "structured programming" and "design patterns" really start to become important.

One of the popular issues around the programmer's water-cooler in the 1980's was whether or not people who use goto's should have their fingers chopped off. Edsger Djikstra penned the canonical software engineering jeremiad about this subject entitled “go-to statement considered harmful” [PDF]. You can probably guess his opinion from the paper's title.

So in the 80's and 90's, software engineers were taught that goto's led to un-maintainable software, headaches and all manner of social ills ranging from global warming to bad movies to disco. "Use a goto," they would say, "and it's like asking the local DJ to play Disco Duck on the radio." Structured programming, good software engineering technique and eschewing goto's would lead to a new era of increasingly good Cure albums, Alien sequels that didn't suck and fewer evenings in the office at midnight debugging the crap code you wrote last year.

But, like the one true ring, the expressive power of the goto is difficult to resist. Many software wizards, on their way to a life of perdition (i.e. - writing video games) countered that the goto could be used on occasion, if done correctly. Consider the following routine; it tries to open a file and read a few bytes. If there are errors along the way, it uses a goto to branch to clean-up routine before exiting:

int doSomething( char *filename ) { int err = 0; FILE *file = (FILE *) NULL; char buffer[ 80 ]; size_t bytesRead = 0; if( NULL == filename ) { err = -1; goto exuent_omnis; } if( (FILE *)NULL == ( file = fopen( filename, “r” ) ) ) { err = errno; goto exuent_omnis; } bytesRead = fread( buffer, 80, 1, file ); if( ferror( file ) ) { err = -2; goto exuent_omnis; } /* more code here */ exuent_omnis: if( (FILE *) NULL != file ) { fclose( file ); } return( err ); }

"What could be wrong with this?" the pro-goto lobby would ask. IMHO, this example is pretty readable, and the goto DOES actually increase readability. Especially if you consider that nested if's are frequently offered as the alternative:

int doSomething( char *filename ) { int err = 0; FILE *file = (FILE *) NULL; char buffer[ 80 ]; size_t bytesRead = 0; if( NULL != filename ) { if( (FILE *)NULL != ( file = fopen( filename, “r” ) ) ) { bytesRead = fread( buffer, 80, 1, file ); if( ! ferror( file ) ) { /* more code here */ } else { err = -2; break; } fclose( file ): } else { err = errno; } } else { err = -1; } return( err ); }

People who propose extensive use of nested if's should have their thumbs broken. This example isn't that bad, but when nested ifs start spanning pages, they can get a bit difficult to read. The alternative to using gotos in this example would be to use a do...while() loop whose repeat condition has been explicitly set to zero (or false for C++ users.):

int doSomething( char *filename ) { int err = 0; char buffer[ 80 ]; size_t bytesRead = 0; do { if( NULL == filename ) { err = -1; break; } if( (FILE *)NULL == ( file = fopen( filename, “r” ) ) ) { err = errno; break; } bytesRead = fread( buffer, 80, 1, file ); if( ferror( file ) ) { err = -2; break; } /* more code here */ } while( 0 ); if( (FILE *) NULL != file ) { fclose( file ); } return( err ); }

Developers who like this kind of code will tell you it captures the succinct directness of a goto without actually having a goto. Because we break out of the loop, there's only one place control can go: to the statement after the while( 0 ); And we avoid nested if's. I've encountered at least one developer who believes this technique is harmful; it uses the do...while language feature for something it was not intended for, and as such, could be confusing to younger programmers.

Whether you make Djikstra cry by using a goto, produce deep levels of indents or use a do...while(0) that's confusing to in-expert programmers; it's entirely up to you. But hopefully this article will have made you aware of the different techniques you'll encounter in the wild.

-Cheers!

6 comments:

OhMeadhbhJuly 13, 2011 at 10:10 AM
it's not clear from this article, but in the 70's and 80's a lot of peeps were taught you were supposed to have one return from a function. it's a little clearer why you want something like this when you're using functional languages like LISP, Scheme, LOGO and sometimes JavaScript.

the main benefit of having a single exit point, IMHO, is you can break on the return statement and examine the state of the local variables before you exit. this is sometimes useful.

a lot of people from the "old days" would tell you that code is more readable when you have a single return statement. i think that's true when you have individual functions that are several pages long, but the problem there is not that you have multiple returns, the problem is you have these huge functions.
ReplyDelete
Replies
ScottJuly 13, 2011 at 12:05 PM
'break' (other than when used at the end of a case within a switch, where there is no alternative), is just a goto in disguise.

There are other advantages to not using multiple returns... when you're allocating dynamic memory that should be freed before you leave, it's much easier to check that is happening if there's only one way out, and easy to leak such memory by adding a return.
ReplyDelete
Replies
Keith W. BooneJuly 13, 2011 at 2:19 PM
I don't like the example at all. I don't see much benefit from having a single return statement. As to deallocation of dynamic memory, try/finally is much cleaner way to handle that.
ReplyDelete
Replies
OhMeadhbhJuly 15, 2011 at 7:53 AM
keith... not all languages (like C) support try / finally. python and php, for instance, have a try / catch mechanism, but no finally.

scott... when i made a similar comment to a PhD from one of those to-remain-nameless east coast technical institutes, the response was that the do-while-break construct could not redirect flow backwards and the 'jump' is local within the do-while construct. i think the assumption was that it was "better" than using a goto for that reason.
ReplyDelete
Replies
Tom DePlontyJuly 16, 2011 at 11:06 AM
Ah, the dogmatic prohibition on goto. If avoiding goto leads you to write code that is more obscure than it would have been otherwise, what have you accomplished? Especially in C, and I found (rare) occasions where a tactful goto was the most readable way to go.

About having one return point from functions/methods - I find if they're very short, it is often better to have more than one. In longer ones, it's very desirable to have exactly one, at the end - but if they're long enough to reach that threshold, then they're probably too long in the first place...
ReplyDelete
Replies
Mr ZApril 21, 2013 at 12:45 PM
K&R were rather straightforward about why they included goto in C: Most of the time you should avoid it, but sometimes, you just need a goto.

The error handling example is one of the canonical examples. The other example I can think of is to implement a 'multi-level break', when you need to break out of more than one level of loop nest.

I take issue with the notion that 'break' is just a goto in disguise. As ugly as it might be, the do-while above is a nesting construct. Any ugly think you might build out of a looping construct and break must nest in a well-controlled way. The control flow graph still looks fairly manageable. It's largely a directed acyclic graph, with the few cycles injected by the looping constructs. The name for the property escapes me.

The thing about goto is that you can build arbitrarily tangled control flow graphs. You can make some serious spaghetti with goto-ridden code that is simply impossible with nested looping constructs, no matter how many if-else, continue and break statements you have. That's the sort of code you should seek to avoid.

As for the single-return: I never really understood that fetish. In particular, if you have some sanity checking at the start of your function that returns errors, it seems far more natural to just return the error than to branch to a return at the end, or wrap the whole function in an if-statement.

That is:

int foo( ...yadda... )
{
int result;

if (!sanity_check())
return -1;

...stuff...
return result;
}

seems way more natural than:

int foo( ...yadda... )
{
int result;

if (sanity_check())
{
...stuff...
} else
{
result = -1;
}

return result;
}
ReplyDelete
Replies

Add comment