Sunday, June 26, 2011

Marwan's Programming Guiding Rules

Ok, now that I have expose my view on maxims, I can write my own ! (but remember, don't trust the maxims !)

I'll start with pure coding guidelines (and keep the rest for later.)

copy/paste are evil !

(with its corollary: copying code implies copying bugs)

This one is classic (at least for my students): whenever you want to copy/paste some code, you should reorganize it. Most of the time, a function (or something similar) will be useful later, and even if there'll be only one copy, you will probably find a better way to write it.

Just take some examples: here it is two function on linked list, a simple first place add and a sorted insertion, with copy/paste:

typedef struct s_list *t_list;
struct s_list
{
  t_list next;
  int    data;
};

t_list add(int x, t_list l)
{
  t_list t;
  t = malloc(sizeof (struct s_list));
  t->data = x;
  t->next = l;
  return t;
}

void insert(int x, t_list *l)
{
  if (*l == NULL || x < (*l)->data)
    {
      t_list t = malloc(sizeof (s_list));
      t->data = x;
      t->next = *l;
      *l = t;
    }
  else
    insert(x, &((*l)->next));
}

Now, the version of insert without copy/paste:

void insert(int x, t_list *l)
{
  if (*l == NULL || x < (*l)->data)
    *l = add(x,*l);
  else
    insert(x, &((*l)->next));
}
First, the later version is shorter. But that's not all, in the second version, the reference to the size of  the struct is done in one place. If you finally want to use a specific allocator (such as a recycling pool) you have one change to make, not two nor more.

Another toy example, with "one copy" (no need to write a function), we implement FIFO queue using circular list (this is only the push operation):

// with copy/paste
void push(int x, t_list *q)
{
  t_list t;
  if (*q)
    {
      t = malloc(sizeof (struct s_list));
      t->data = x;
      t->next = (*q)->next;
      (*q)->next = t;
      *q = t;
    }
  else
    {
      t = malloc(sizeof (struct s_list));
      t->data = x;
      t->next = t;
      *q = t;
    }
}

//without
void push(int x, t_list *q)
{
  t_list t;
  t = malloc(sizeof (struct s_list));
  t->data = x;
  if (*q)
    {
      t->next = (*q)->next;
      (*q)->next = t;
    }
  else
    t->next = t;
  *q = t;
}

Again, the code is smaller and simpler to verify. Those two example are very basic, and most programmer will probably use the second form intuitively, but in more complex situation, taking as guideline to avoid copy/paste may save you from long nightmarish bug hunt.

Any programming features is good as long as it fit your need.
There's no reason to not use a feature in your programming language  as long as you understand it and it do the job. If it's possible, it means that somehow, it was meant by the language designers for a good reason (or, else, they should have find a way to forbid it.)

When teaching programming, I often heard students arguing about bad practices, rules they have read on the net that they don't really understand. Most of the time, theses rules only take sense in a precise context or express a taste rather than a reasonable motivation.

The main goal of programmer is to make its code works, it's not to produce shinny piece of code or perfect models of how to code. This lead us to my next rule:

The end justifies the means.
To understand this one, we must define the end and the means. The end is your goal, what my program should do. But it's also all the requirements linked to it: should it be fast ? should it be robust and reliable ? what will be its life cycle (one shot or  high availability permanent run) ? would it be extended ? ...

The means deal with how you will produce it and it's strongly connected to the end: don't spend to much time on code for a one shot script, make it works, on the other hand remove all quick and dirty implementation from a production release !

Building a program is most of the time some parts of professional work you're paid for. So, this is not a challenge of best programming practices, there's no jury's prices for nice attempt: you have to make it works the way your future users expected it to work. If they ask for simple script that will save them hours of boring manipulation, they don't want months until you find it finally presentable, the want it now !

The important point is to define the correct goals, and make the right efforts to achieve these goals.

This also means that any tricks are fine as long as it makes your code works. Again, students some times seems disappointed by some hacks they found in some piece of code. They saw it as cheating as if finding a way to get around some difficulties is not elegant and should not be tolerated !

Coding is no game, you can have fun doing it, but there's no place for ridiculous honor code. You will find that the satisfying users' expectations will probably force you to produce code as clean as it can be and that you won't need arbitrary rules to guide you.

Take for example the use of goto and the particular cases of goto into the body of loop: goto is generally considered evil and goto inside a loop is the probably the most heretic things you can do. But, there are some cases where this can be useful. The best known example is the Duff Device example of smart loop unwinding, take a look at it, it's worth the read. I will use a simpler example, not focus on optimization, that I found more striking.

The idea is simple, you have a list or an array and want to print each element separated by a ';', by separated we really mean that the separator is between each element and not after, thus the last element will not be followed by a ';'. There are several ways to do that, let's take a first example:

void printArray(int tab[], size_t count)
{
  size_t i;
  size_t i;
  for (i=0; i < count; ++i)
    {
      if (i > 0) // not the first time
        printf(";");
      printf("%d",tab[i]);
    }
}
Ok, this is not the best version you can write, but it focus on the fact that the first case is handle separately. We can have done like that also:

void printArray(int tab[], size_t count)
{
  if (count)
    {
      size_t i;
      printf("%d",tab[0]);
      for (i=1; i < count; ++i)
        printf(";%d",tab[i]);
    }
}

In this version, the treatment of the first case is less obvious and we need to check the count. Now, take a look at this one:

void printArray(int tab[], size_t count)
{
  if (count)
    {
      size_t  i=0;
      goto start;
      for (; i < count; ++i)
        {
          printf(";");
        start:
          printf("%i",tab[i]);
        }
    }
}

We use a goto to skip the first printf in the loop. The result is better than the first version and it let appears the specificity of the first case. Of course, this is a toy example, and the goto version is not dramatically better in any way, but it illustrate the fact that dirty code can be useful and readable.

Keep it simple, stupid !
One of the classic ! Prefer the simplest path. When you can't figure out yourself how your code work, it means that it's too complex and will probably won't work. Here are some other maxims and quote on the subjects:

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it ?"  (Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2)
Smarter code means smarter bugs ! (me ;)
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies." (C.A.R. Hoare)
So, all those rules deal finally with the same issues: making your programs work ! That's the only things that really matter in programming.

Sunday, June 19, 2011

Reading ...

In the previous entry, I quoted Rob Pike's Notes on Programming in C.

This leads me to re-read it. This is a damn good paper, it is worth reading it, not too long and very interesting. I found that Rob Pike and I have quite the same ideas most discussed matter in his paper.

Maybe, I'm not so dumb ;)

SO, if you haven't read it before, read it: Rob Pike: Notes on Programming in C

About comments

Okay, comments are one of the most obscure subject in programming. Yes, obscure, I really mean it.

Why ? Probably because there is no good way to use comments in your code. As I remember, in my student days, teachers explain to us that comments are mandatory, that a good source code is made up of more comments line than code line and so on, but none explains what to put in that damn comments.

Of course, there's the running joke on the "increments i" comment, but that's all.

As I'm now on the other side (yeah, I'm teaching programming), it was obvious that I should not make same mistake. But, how on earth should I explain to student how to comment their code, if I still don't know how myself ?

So, for some times I was doing like my teachers did, which means I avoided the subject !

But, then I ran into some text from Rob Pike (about programming in general) and he makes a strange remark: "I tend to err on the side of eliminating comments, for several reasons." (Rob Pike: Notes on Programming in C) This sentence strike me first, but then I realize that it was the good way to manage comments.

Most of the time comments are useless !

Why ? Just because the code is probably the best way to explain what you are trying to do !

So, should we always avoid comments ? Of course not, but we should use wisely ! The first question is "why" (not when or how !):


  • You need comments to explain how to use your code
  • You need comments to describe where to find some piece of code
  • Sometimes, you need comments to explain a trick or an unusual hack
  • You may need comments to track down modifications, but fixes and ugly workarounds
Other comments are just a waste of time.

So, the first and most important comments are what we should call "interface comments", that is comments in header files or before definitions. Those comments will explain: what the function does, how to call it, and what constraints should verified (pre-conditions, post-conditions and invariant.)

You can also put comments in front of files to explain globally how things work. Take for example what you found in a lot of driver's source files: a global explanation on how the device works and what are the specific points.

Then you can sometimes explain some tricks. But be careful, those comments are often less informative than it seems to be ! Explaining, using natural language, what a complex piece of code is doing is harder than writing the code itself. Most of the time, the reader will be able to understand what your doing by reading the code, if not, he has nothing to do here ...

A good example of bad comments can be found in the source files of GNU C library. Take a look at the definition of the  strlen function. The function is quite simple, the glibc version is little bit trickier than usual (read word by word rather than byte by byte) but no so complex. The trickiest part is the detection of the final NULL char inside a whole word. Ok, this can require some explanation, but the comments double fail this job: first, the text is less understandable than the code, second, the comments are obfuscating the reading of the code by breaking it into parts separated by big pieces of text ! And, the actual version is more readable than the first one I see.

So, most of the time don't even try to make a comment on your tricks, or put it outside of the code !

Commenting changes and corrections are the most difficult part. These comments are very useful, but they tend to grow bigger than they should be. You have several way to do it, but I found none very satisfactory. First, you can only rely on your version control system's log, in fact, you are probably already doing it, the only issue is that it very hard to find out the history of a specific piece of code (not a whole file, but a single function or data structure.) You can also add it to the code itself, but again, if you put it in front of the file, you loose the connection with the concerned code and if you put it right to the code, it will break the natural flow of your source code.

It would have been nice to have a clever integration of source files, version control and revision logs, but I haven't found such a tool.

So, as a conclusion, I will use some striking mantra (remember, you can't trust maxim !)
Comments are nothing but noise !

Sunday, June 12, 2011

"Don't trust maxims ... "

A lot of books, articles or teaching materials are full of "Maxims" or "aphorisms". The intention is to strike the reader and convince him with a strong idea. Based, on my own experience, I'll start this article with my own maxim:
"Don't trust Maxims !"
Ok, I ended up with the traditional "don't trust me, I'm a liar". But this contradiction serves my vision: Maxims are most of the time meaningless and of no use unless provided with the right explanations. In fact, my own maxim should be read:
"Don't trust Maxims, understand them !"
 So, this is general thoughts, what is the relation with programming ?

After several years of teaching programming and computer science, I often heard students quoting aphorisms religiously and most of the time completely irrelevantly ! Why, because they haven't read the original sources, nor they tried to understand the true meaning behind.

The most interesting example is the famous:
"Gotos are evil !"
This is one of the famous mantras of the structured programming approach. Are "gotos" really evil ? Probably not, there's a lot of legitimate uses of gotos (basic exception mechanism, loop escaping ... ) So, why should gotos be considered evil ?

Let's us go back to the elder ages of computer, when programming languages were no more than macro-assembly. In that time, the goto instruction were the only way to structure programs. Have you ever read a code with no functions, no procedures, no while loops ? These codes tend to be as cryptic as poem written by a drunk schizophrenic !

You don't even need to go back to the 60's. The first programming language that I learned was BASIC (for zx81 computer), the last time I read a bunch of code I wrote in those day make feel seek ! It was full of indirections, stupid line numbering and intricate gotos, I was unable to understand it !

So, yes, used that way gotos are evil, but this does not mean that you should not use it. This is only a matter of code semantic, take a look at the following examples:

void user_accept()
{
  again:
  printf("confirm with : ");
  if (getchar() != 'y')
      goto again;
}

(Remarque: this code works badly due to the newline read with the input char ... )

This code make sense, this is not a loop, all you want is to read 'y'. So, the goto is not evil in that case. Of course, you should have used a while loop to obtain the same result, but the meaning is the same. On the other hand, the following code is a bad use of goto:

int f(int **p)
{
  if (!(*p))
    goto getsome;
 doit:
  **p = 42;
  goto end;
 getsome:
  *p = malloc(sizeof (int));
  goto doit;
 end:
  return **p;
}

The previous code is just stupid (but anyway, it is inherently stupid) and its use of goto is just misleading. In fact. Of course, we shouldn't have write it this way, this is bad style programming, but it reflects the idea of evil goto: using it in place of functions or procedures (here, we can even do it without, but this how you should read it, jumping to a specific piece of code in order to solve some issue before returning to normal behavior.)

So, returning to our subject, we illustrate a case where a maxim is right but need further refinement. This lead us to my last maxim of today:
"Never restraint yourself for a bad reason, any programming features can be useful."

Thursday, June 9, 2011

New blog !

After many years of programming and teaching programming, I've accumulate a lot of reflexions that I want to share.

This won't be a technical blog, but a place for me to share my vision of programming. I'll tend to stay funny and I hope that my poor English  won't depreciate my thoughts.

Test edit, this is just a test, it will disappear soon … $x^y$