Why Long Queries Aren't a Problem Per Se
In a typical programming language, there isn't an enforced structure to any subroutines. What this means is as a subroutine becomes longer, maintaining internal patterns and an understanding of state becomes more difficult. SQL queries themselves, however, are defined by their structure. For example, a standard select statement (excluding common table expressions) includes the following parts in order:
- Returned column list
- Starting relation
- Join operations
- Filters
- Aggregation criteria
- Aggregation filters
- Postprocessing (Ordering, limits, and off-set)
Because of the structure, it is pretty easy to dive into even a very long query and figure out exactly where the problems are quickly. Maintainability is not dependent on length or overall complexity.
Moreover if you have common table expressions, it is easy to jump back to the beginning, where these are defined, and reference these as needed.
A second difference is that SQL statements work with state which is typically assumed to be unchanging for purposes of the operation. With rare exceptions, the order of execution doesn't matter (order of operations however does). Consequently you don't have to read line-by-line to track state.
In essence debugging an SQL statement is very much like searching a b-tree, while debugging Perl, Python, or C is very much like traversing a singly linked list.
What Can Be a Problem
Once I spent several days helping a customer troubleshoot a long, complex query. The problem turned out to be a bad filter parameter, but we couldn't tell that right away. The reason was that the structure of the query had decayed a bit and this made maintenance difficult.
The query was not only long, but it was also difficult to understand, because it didn't conform to the above structure. The problem in that case was the use of many inline views to create a cross-tab-type report.
If you can't understand a long query, or if you don't know immediately where to look, it is hard to troubleshoot these.
Best Practices
The following recommendations are about keeping long queries well-structured. Certain features of the SQL language are generally to be avoided more as queries become longer, and other features should be used carefully.
- Avoid inline views. Ok, sometimes I use inline views too on longer queries, but usually these are well-tested units in their own right, and re-used elsewhere. These should probably be moved into defined views, or CTE's if applicable.
- Avoid union and union all in long queries. These complicate query maintenance in a number of ways. It's better to move these to small testable units, like a defined view based on a shorter query or a common table expression.
- In long stored procedures, keep your queries front and center, and move as much logic as is reasonable into them.
- Avoid implicit joins, which work by doing cross-joins and then placing the filter condition in the result. Join logic should be separate from filter logic.
In my experience the most important thing in any programming language is to comment your code / query. The longer the sub routine or query, the more important it is to include detailed comments both from a structural and explanatory point of view.
ReplyDeleteCommenting is tricky to do right. The reason is that you don't want the comments to be the basis of debugging. I will probably have a post at some point on commenting style.
DeleteBy structural comments do you mean something like section headers (i.e. comments to help you find portions of logic quickly)? Or something else?
By explanatory point of view, do you mean explaining design decisions? Or how things operate?
Agreed on the importance of commenting, however.
Delete