Saturday, March 23, 2013

Beginners' corner: factorization !

This a sample extracted from an exercise given to beginners students in C: a simple graph implementation. We're looking at function that add an edge between two vertices.

Context

The types are following traditional graph implementations, and tends to respect formalism used in algorithmic classes (so don't bother me about typedefed pointers !)

typedef struct s_adj   *adj;

struct s_adj {
  size_t        src, dst;
  adj           next;
};

typedef struct s_vertex *vertex;

struct s_vertex {
  size_t        ins, outs;
  adj           succ, pred;
};

struct s_graph {
  size_t        order;
  vertex        vertices;
};

Our representation stores directed edges in both ends: in the source's successors and in the destination's predecessors. Our function will take the graph and the identifiers of source and destination (index in the vertices tables).

Naive code


Straightforward implementation look like that:

void add_link(graph g, int src, int dst) {
  adj           cell_succ;
  cell_succ = malloc(sizeof (struct s_adj));
  cell_succ->src = src;
  cell_succ->dst = dst;
  cell_succ->next = g->vertices[src].succ;
  g->vertices[src].succ = cell_succ;
  g->vertices[src].ins += 1;
  adj           cell_pred;
  cell_pred = malloc(sizeof (struct s_adj));
  cell_pred->src = src;
  cell_pred->dst = dst;
  cell_pred->next = g->vertices[dst].pred;
  g->vertices[dst].pred = cell_pred;
  g->vertices[dst].outs += 1;
}

We build two cells for the lists of successors and predecessors of both ends. This code works, but that's all, it is far too long and looks dirty. And, even if it's a short piece of code, it is unmaintainable: if we change simple details (like the fact that the edge is described in the same direction in both ends) we have at least two modifications per change.

First, add variables for our vertices. Our type defines vertices as pointer to structure, and that's correct for me, I'll use pointer rather than array cells, so the new version looks like:

void add_link(graph g, int src, int dst) {
  vertex        s = g.vertices + src;
  vertices      d = g.vertices + dst;
  adj           cell_succ;
  cell_succ = malloc(sizeof (struct s_adj));
  cell_succ->src = src;
  cell_succ->dst = dst;
  cell_succ->next = s->succ;
  s->succ = cell_succ;
  s->ins += 1;
  adj           cell_pred;
  cell_pred = malloc(sizeof (struct s_adj));
  cell_pred->src = src;
  cell_pred->dst = dst;
  cell_pred->next = s->pred;
  s->pred = cell_pred;
  s->outs += 1;
}

It's a small change, but it gains a little more on readability. Oh, just a note: I don't like long lines, and long variable's (or function's) names, even they seems more explanatory, a variable with more than 5 characters makes your code unreadable. Reserve long name for major functions, class or types name, not for local variables nor parameter's names. This is a common error: thinking that saying more make the code more readable, but that's wrong, if you can't figure out what this variable is used for, it means that the code your reading is already fucked-up, you don't need an explicit name, nor a comment.

And about the vertex declaration: I prefer the use of pointer's arithmetic here for readability. It may seems strange to not use array notation, but, what do you think is better: &(g->vertices[src]) or g->vertices + src ? For me the choice is obvious …

Now: factorization !

The biggest issue in this function is the two almost identical blocks of code. A good idea should be to deffer cell initialization to an external function. This function is doing half of the link job, which give us a good name for that new function:

void add_half_link(int src, int dst, adj cell) {
  cell->src  = src;
  cell->dst  = dst;
}

But can we attach this cell to the cell directly to their corresponding list ? Yes, just pass a pointer to the list (I really mean a pointer to the list, thus a pointer to the pointer to cell in the list, the list is already a pointer, not a structure.

void add_half_link(int src, int dst, adj cell, adj *l) {
  cell->src  = src;
  cell->dst  = dst;
  cell->next = *l;
  *l = cell;
}

Our new function to the job for half of the link and the way we passed the list of edeges, let us use it in both cases (successors and predecessors.)

Putting everything together …

Now we just have to integrate our sub-function in the original one:

static inline
void add_half_link(int src, int dst, adj cell, adj *l) {
  cell->src  = src;
  cell->dst  = dst;
  cell->next = *l;
  *l = cell;
}

void add_link(graph g, int src, int dst) {
  vertex        s = g->vertices + src;
  vertex        d = g->vertices + dst;
  add_half_link(src, dst, malloc(sizeof (struct s_adj)), &(s->succ));
  add_half_link(src, dst, malloc(sizeof (struct s_adj)), &(d->pred));
  d->ins  += 1;
  s->outs += 1;
}

Note that I have declared our sub-function as static inline, it is obviously static since we don't want it to be exported, and asking for inline seems logic. We could have find a way to increment degrees (ins and outs) in the sub-function also …

Finally we got a code, that is (not so) smaller, but, and that's the most important, is more readable and maintainable.

Of course, you should have written the second version directly …

This is a simple example showing the idea that you can increase your code quality by using factorization.

Oh, and for the record, the functions only need a specification comment for the main one. Avoid as much as possible inner comments, they don't increase readability nor understanding of code, and most of the time they break the reading flow. That kind of functions are self-explanatory, and so must be most of your code, if you feel the need to explain it, think of reorganizing it or some other kind of refactor, only very specific tricks need comments, if you too many, you're probably doing things the wrong way.

There's another points about code comments: they tend to desynchronize. Your code will change, you'll probably find bugs … are you sure that all your comments are in sync with these changes ? No comment, no problem …

That's it for today ! Code well !