Perspectives on LedgerSMB: Patterns and Anti-Patterns in Code Comments

Code commenting is something which takes a lot of introspection and practice to get right. As a heads-up when I am looking at hiring someone, I will be asking about commenting style and if you can't talk intelligently about the topic, you won't get hired. I wouldn't want you to just agree with me. I would expect you to demonstrate you have thought about the issue enough to have your own opinions and reasons for thinking the way you do.

We started LedgerSMB as a fork of SQL-Ledger and one of the immediate challenges we had was that the only comments in SQL-Ledger were brief descriptions of what files did, copyright notices, and strings for translation or other meta-information actively used by the software. Debugging uncommented code is not fun, and it took us a long time to get an acceptable number of comments and/or POD in place. Over time, working in a collaborative environment, I have come to have very strong opinions on code commenting.

In this post I will pull examples from PostgreSQL's source files (using 9.1.2 because I have the source handy for that) as well as from LedgerSMB.

Comments are extremely important as they represent an important collaborative tool, but like all powerful tools, it can be misused.

Commenting Pitfalls and Anti-patterns

In general, when debugging an application we have to read the code, not the comments. Sparse, helpful comments are very helpful and will be read. Copious comments that appear to describe the inner workings of the code get ignored by necessity. Being ignored, they are not updated with the code, and consequently become out of date. Three or four people work on a code section and eventually the comments and the code don't match. This is a real problem and it is one reason why over-commenting needs to be avoided.

A second problem with this approach is that people put faith in comments, rather than code, to explain what the program is doing to other developers. This can lead to unclear code. In general if you have to describe how it works, and you can't find a better way to write it, at least do everyone a favor by flagging the section of code as FIXME so that the comment and the code will be read together and the explanation deleted as clarity emerges.

A third problem is the anti-pattern of magic comments which I have seen crop up in more than one place. Comments are designed to be ignored by the program as a way to annotate code. When you start turning them into code, individuals may not be able to maintain the files properly and may delete a magic comment by accident. Similarly using database-level COMMENT ON statements to add info for programs throws off documentation tools and the like. Don't do it. Keep code and comments separate. If you really must use magic comments, at least do us a favor and note clearly that they are important and must not be deleted.

Finally if there are too many comments in code, the comments will not get read and so helpful comments will get missed.

Alternatives to Comments (Use Them!)

Comments are not the only way to make code readable, and often they are not even the best. It's important also to pay attention to things like whitespace, use blank lines to separate blocks that logically belong together, and otherwise organize the code well. Additionally enforcing design patterns which include things like declaring all variable for a function or code block at the earliest possible opportunity are helpful.

Similarly care in variable names goes a long way to rendering code readable. In an SQL query for example, table aliases should be abbreviations of the table name, not arbitrary letters.

Finally to the extent you can structure your code you can make it easier to read. Design patterns and coding patterns help organize code and make it easier to read. To the extent that the code is easy to read, comments are less necessary. Again, comments shouldn't ever be necessary to explain how code works, but sometimes they are. In general it's good to keep these minimal in frequency and flagged with a note that one recognizes that this section of the code is unclear.

The above three boil down to the mentality of writing code with two distinct intended audiences: the computer and other coders.

The Documentation Comment Pattern

Functions should be documented. This helps programmers understand what a function is supposed to do but it has a number of more important functions as well. By documenting a function call, you establish an API. The code is expected to conform to the API, not the other way around. These code comments are important but the code implements what is described, not the other way around. It's hard to think of these as comments because they do not annotate the code per se, but rather describe the interfaces for interacting with it.

In most languages, we place the documentation at the front of the file. In PostgreSQL with stored procedures, however, we have to put the comments after if we want to use COMMENT ON and therefore allow documentation tools to work properly. We use POD for Perl, and COMMENT ON for SQL, and so these are distinguished from the other types of comments listed below in accordance with the literate programming pattern.

These comments may be extensive or not but should be seen as specifications rather than comments.

The Section Heading Comment Pattern

In a function or other code block of significant length, debugging requires being able to find a relevant section of code quickly. Comments can help organize code and make it easy to find a given section.

For example, in the PostgreSQL code, it is common to see a comment stating:

/* local functions */

This is very helpful. If you know the header is there and you want a list of local functions that are declared for function re-use purposes, you find that definition quickly. if you need to add a function name, it is easy to add it where it will be noticed not only by the compiler but also by fellow programmers. The whole goal of these comments is just to help enforce organization of code and help the next programmer find the appropriate section of code quickly.

In general, my experience is that, like section headers in a blog post, these should be short, concise, and descriptive.

The Footnote Comment Pattern

A footnote comment is one which annotates the code, by providing descriptions of why a given approach was taken, references to other works or algorithms used, or the like. In general, my experience is that these comments should be short but provide information that cannot be deduced from the code. For example one might state why a given approach was taken or allude to an external document or standard for more information.

I prefer to inline my footnote comments because then it is clear exactly what the comment refers to, but other people may disagree. For example the following comment from analyzejoins.c is a good footnote comment and not inlined (it explains the use of a goto statement):

/*

 * Restart the scan.  This is necessary to ensure we find all

* removable joins independently of ordering of the join_info_list
* (note that removal of attr_needed bits may make a join appear
* removable that did not before). Also, since we just deleted the
* current list cell, we'd have to have some kluge to continue the
* list scan anyway.
*/

The comment clearly and concisely describes why a given statement is there. It explains what it does to but only as necessary to get into why it is necessary.

The Warning Comment Pattern

Sometimes comments can and should be used to warn programmers about a certain segment of code. For example in LedgerSMB we have the following comment near the top of the admin.sql which defines functions for user management. Because user management functions are not parameterized we have to build them through string interpolation and run them with elevated privileges, which leads to dangers of in-stored-proc SQL injection as a database superuser.

-- README: This module is unlike most others in that it requires most functions
-- to run as superuser. For this reason it is CRITICAL that the following
-- practices are adhered to:
-- 1: When using EXECUTE, all user-supplied information MUST be passed through
-- quote_literal.
-- 2: This file MUST be frequently audited to ensure the above rule is followed
--
-- -CT

Similarly inside the admin__save_user function we include the comment:

-- WARNING TO PROGRAMMERS: This function runs as the definer and runs
-- utility statements via EXECUTE.
-- PLEASE BE VERY CAREFUL ABOUT SQL-INJECTION INSIDE THIS FUNCTION.

Of course this is only required because the utility statements in question (like CREATE ROLE) are not parameterized and so parameterized queries are not an option.

The function of these comments is to call attention to pitfalls specific to certain blocks of code. In this case we are warning about security implications. In others we are warning about potential other problems.

TODO, XXX, and FIXME comments are all very helpful and part of this pattern. Typically I find it useful to include sections that need to be in caps in caps where a warning is critical, but otherwise leave it lower case.

Signed Comments

One thing we have found helpful in a collaborative environment is to sign comments. This is not something we have seen people do elsewhere but we have found we get a lot of benefit from it. In a typical woven code approach (below), the idea is that you get an autonomous discourse which is the code, documentation,and comments all together, but as things change, the discourse becomes less autonomous, as it should. People may see a TODO or a FIXME and have different ideas as to what should be done. Therefore signing a comment fills a number of rolls.

First it provides a clear record as to who one should talk to about a possible code change, and helps give people credit for their work. However beyond this it sets up individuals as people who are publicly thinking about problems in the code and this provides some additional conversation points. Finally a signed comment comes across as less authoritative than an unsigned one and so it allows for dialog in the comments between individuals. So consider this:

# we need to add checks here for .... --CT

# As we move this to Moose, the problem may go away. -- YT

The above dialog of course is fictional but there are examples in the LedgerSMB code where you can see this sort of dialog develop. It becomes helpful to a programmer because they can see what different programmers are thinking about an issue and what sorts of solutions may be available. Also each programmer can speak for him/herself and doesn't have to amend the master, autonomous discourse in order to do so.

The Woven Code Pattern

Borrowed in part from Donald Knuth's concept of literate programming, this pattern involves writing the documentation and code as a single work intended to be read by other people. Typically when I employ this I write my Perl POD first, then annotate the POD with code, and then add non-POD comments as needed to draw other programmers' attention to some things. In this pattern four things are the case:

Documentation is Authoritative and Done First
Code Implements and Explains Documentation
Footnote/Warning Comments Annotate Code, but only as needed.
The above three components are woven together to create a text for the next programmer. All components of that text evolve together.

Conclusions

Comments are essential to long-term maintainability and collaboration in software development. In general, I have found that they are more helpful when used to annotate, rather than describe, the source code. The source code should be able to speak for itself but is necessarily an incomplete glimpse into the mind of the programmers involved. Comments bridge this gap, but poorly used comments get in the way more than they help.

Any others?

What patterns do you like to push for when commenting in an environment where others will read your code and comments?

3 comments:

Chris TraversAugust 3, 2012 at 11:18 PM
Just as a note, not every comment should be signed. If you are just adding a footnote to the code, perhaps as a link to a specification, there is no value to signing it. If however you are expressing an opinion or criticism of the code, having signed comments is very helpful because it gives folks a person to go to and talk about differences of opinion.
UnknownAugust 6, 2012 at 9:39 AM
Thanks for such a wonderful post.
Andrew StevensonAugust 7, 2012 at 3:58 AM
There's several more worth checking out if you search for "mobile patterns" "android patterns" etc.

PHP Development

Thursday, August 2, 2012

Patterns and Anti-Patterns in Code Comments

3 comments: