Wednesday, July 20, 2011

strange programmer habits : number and string mutation

Some of my favorite programming languages allow you to be "fast and loose" with data types. JavaScript and PHP, for instance, will convert variables between number and string types as needed. Some consider this behavior to be "sub-optimal," while others don't. But understanding how your programming language converts literals or variables between types is important, no matter what language you're using.

Consider the following C program:
#include <stdio.h>
int main( int argc, char *argv[] ) {
  char *start = "1234";
  int a = 2;
  printf( "%s\n", (start + a ) );
}
When you compile and run this program, it prints out the string "34" and then exits. Now look at this JavaScript function:
function foo() {
  var start = "1234";
  var a = 2;
  console.log( start + a );
}
It should print out the string "12342". Understanding why C does one thing and JavaScript does another is important. C aficionados can probably quickly point out that the printf() function was taking a pointer to an array of characters as it's input. Adding the integer 2 to the pointer caused it to point 2 bytes ahead. When interpreted as a string, "(start + a)" is simply a two byte string with the value "34".

JavaScript, on the other hand, converts the number 2 into the string '2' and appends it to the string.

Doing the same thing in PHP yields even different results. Executing the following PHP fragment will cause the system to print the string "1236":
$start = "1234";
$a = 2;
echo ( $start + $a )
PHP peeks inside the variable $start, sees that it looks like a number and then converts it to an integer and performs the addition.

JavaScript provides a functions to convert numbers to strings and vice versa. The "String( val )" function attempts to convert the argument 'val' to a string while the "Number( val )" function attempts to convert 'val' to a number. People went to the trouble of specifying these functions and documenting them, so you might as well use them.

Some people, their minds perhaps addled by exposure to early versions of PHP have been seen to do things like this in javascript:
var a = 12;
console.log( "" + a );
or
var b = '34';
console.log( 1 * b );
Adding an empty string to a number in JavaScript will (should) cause the interpreter to convert the value of a into a string. Multiplying the string b by one should do the opposite (convert the string into a number.)

Some people believe this type of conversion is faster, others think it's just plain ugly. It is certainly the case that "standard" functions exist to do the same thing, and might convey the programmer's intent more clearly.

It's up to you, of course, which technique you use to coerce a value to a particular type, but if you inherit code with superfluous additions and multiplication, this might be what's going on.

Wednesday, July 13, 2011

strange programmer habits : avoiding goto's by using do..while's

In the early days of computer software, programmers were using languages like assembly, fortran, cobol and lisp to produce reasonably small programs to compute trajectories, maintain inventory databases or accounting systems and whatnot. Computing systems weren't big enough to allow programmers to make the massive software systems like modern operating systems, web browsers or computer games.

This is probably why it took a couple decades for people to understand how bad "spaghetti code" was; it's easy to dismiss spaghetti being a problem when your complete software system is one or two pages long. But when a printout of your system requires you to chop down a medium sized forest, concepts like "structured programming" and "design patterns" really start to become important.

One of the popular issues around the programmer's water-cooler in the 1980's was whether or not people who use goto's should have their fingers chopped off. Edsger Djikstra penned the canonical software engineering jeremiad about this subject entitled “go-to statement considered harmful” [PDF]. You can probably guess his opinion from the paper's title.

So in the 80's and 90's, software engineers were taught that goto's led to un-maintainable software, headaches and all manner of social ills ranging from global warming to bad movies to disco. "Use a goto," they would say, "and it's like asking the local DJ to play Disco Duck on the radio." Structured programming, good software engineering technique and eschewing goto's would lead to a new era of increasingly good Cure albums, Alien sequels that didn't suck and fewer evenings in the office at midnight debugging the crap code you wrote last year.

But, like the one true ring, the expressive power of the goto is difficult to resist. Many software wizards, on their way to a life of perdition (i.e. - writing video games) countered that the goto could be used on occasion, if done correctly. Consider the following routine; it tries to open a file and read a few bytes. If there are errors along the way, it uses a goto to branch to clean-up routine before exiting:
int doSomething( char *filename ) { int err = 0; FILE *file = (FILE *) NULL; char buffer[ 80 ]; size_t bytesRead = 0; if( NULL == filename ) { err = -1; goto exuent_omnis; } if( (FILE *)NULL == ( file = fopen( filename, “r” ) ) ) { err = errno; goto exuent_omnis; } bytesRead = fread( buffer, 80, 1, file ); if( ferror( file ) ) { err = -2; goto exuent_omnis; } /* more code here */ exuent_omnis: if( (FILE *) NULL != file ) { fclose( file ); } return( err ); }
"What could be wrong with this?" the pro-goto lobby would ask. IMHO, this example is pretty readable, and the goto DOES actually increase readability. Especially if you consider that nested if's are frequently offered as the alternative:
int doSomething( char *filename ) { int err = 0; FILE *file = (FILE *) NULL; char buffer[ 80 ]; size_t bytesRead = 0; if( NULL != filename ) { if( (FILE *)NULL != ( file = fopen( filename, “r” ) ) ) { bytesRead = fread( buffer, 80, 1, file ); if( ! ferror( file ) ) { /* more code here */ } else { err = -2; break; } fclose( file ): } else { err = errno; } } else { err = -1; } return( err ); }
People who propose extensive use of nested if's should have their thumbs broken. This example isn't that bad, but when nested ifs start spanning pages, they can get a bit difficult to read. The alternative to using gotos in this example would be to use a do...while() loop whose repeat condition has been explicitly set to zero (or false for C++ users.):
int doSomething( char *filename ) { int err = 0; char buffer[ 80 ]; size_t bytesRead = 0; do { if( NULL == filename ) { err = -1; break; } if( (FILE *)NULL == ( file = fopen( filename, “r” ) ) ) { err = errno; break; } bytesRead = fread( buffer, 80, 1, file ); if( ferror( file ) ) { err = -2; break; } /* more code here */ } while( 0 ); if( (FILE *) NULL != file ) { fclose( file ); } return( err ); }
Developers who like this kind of code will tell you it captures the succinct directness of a goto without actually having a goto. Because we break out of the loop, there's only one place control can go: to the statement after the while( 0 ); And we avoid nested if's. I've encountered at least one developer who believes this technique is harmful; it uses the do...while language feature for something it was not intended for, and as such, could be confusing to younger programmers.

Whether you make Djikstra cry by using a goto, produce deep levels of indents or use a do...while(0) that's confusing to in-expert programmers; it's entirely up to you. But hopefully this article will have made you aware of the different techniques you'll encounter in the wild.

-Cheers!

Wednesday, July 6, 2011

strange programmer habits : commas at the beginning of lines

So, consider this C program:
#include <stdio.h> char *verbs[] = { "quit" , "score" , "inventory" , "go" , "get" , NULL }; int main() { int i; for( i = 0; verbs[ i ] != NULL; i++ ) { printf( "verb %02d: %s\n", i, verbs[ i ] ); } }
or it's javascript equivalent::
var verbs = [ "quit" , "score" , "inventory" , "go" , "get" ]; for( var i = 0, il = verbs.length; i < il; i++ ) { console.log( "verb " + i + ": " + verbs[ i ] ) }
Both these programs declare an array of strings and then print them out. But contrary to popular convention, the commas separating individual elements of the verbs array come not at the end of the line, but at the beginning.

The compiler (or interpreter) couldn't care less about this stylistic convention, of course. All it cares about is if there are commas between array elements. The comma-first style is there to make it easier for you to add, delete or move single lines in the array. By putting the comma at the beginning of the line, you move the elements in the array around without having to manually add (or remove) a trailing comma at the end of the array.

This strange habit doesn't effect the output your compiler produces and it's main benefit is to save a couple milliseconds when cutting and pasting entries in an array. But a small number of programmers (myself included) have gotten used to seeing arrays that look like this, so don't be surprised if you see this style from time to time.