Sunday, August 18, 2013

In defence of hand coded SQL

One common comment I get when I point out I hand-write all my SQL queries rather than relying on an ORM or the like is that this is drudge work,  obsoleted by modern tools, and when I mention these are usually wrapped in stored procedures, the reactions go from disdainful to horrified.  This piece is the other side, why I do this and why I find it works.  I am not saying these approaches are free of costs, but software engineering is about tradeoffs.   These tradeoffs are real.  My approach is not a magic bullet, but it forms a vital piece of how I build software on the database. 

The first thing to note is that I use a lot of SELECT * FROM table queries when querying tables that match output structure.  We all know we run into tables that cannot be reasonably further normalized and where the application structure can feed directly into the application.  In a stored procedure wrapper, SELECT * reduces maintenance points of such tables if new fields need to be added (in which case the query still matches the specified return type with no modifications).  This has costs in that it discourages refactoring of tables down the road but this just needs to be checked.  One can still have central management by using views if needed.  Central management of type definitions is generally a good thing.  Views can take the place of an ORM....

The second point is that CRUD queries of this sort don't really take significant time to write, even on a well-normalized database and having these encapsulated behind a reasonably well-designed procedural interface is not a bad thing provided that some of the classical difficulties of stored procedures are addressed.

I find that my overall development time is not slowed down by hand-writing SQL.  This remains true even as the software matures.   The time-savings of automatic query tools is traded for the fact that one doesn't get to spend time thinking about how best to utilize queries in the application.  The fact is that as application developers, we tend to do a lot in application code that could be better done as part of a query.  Sitting down and thinking about how the queries fit into the application is one of the single most productive exercises one can do.

The reason is that a lot of data can be processed and filtered in the queries themselves.  This allows one to request that the database send back data in the way the application can make best use of it.  This can eliminate a lot of application-level code and lead to a shrinking codebase.  This in turn allows application-level code to make better use of data returned from queries, which leads to better productivity all around.

17 comments:

  1. AMEN!

    SQL is a declarative (4G) language and like you say, requires a lot of thinking *up front*.
    In my experience most developers are either lazy and don't want to think that much and/or want to dive right into coding. Other developers can't or won't think that way and prefer to fall-back to their 3G skills.
    All of this -- combined with the fact that many people still don't understand the power of an RDBMS -- means that us SQL developers are rather looked down upon, even regarded as old-fashioned in the face of the latest fads.

    The truth is that an app with well-designed views and SQL can be built with a much higher number of "function points per line of code", higher maintainability and fewer defects.

    ReplyDelete
    Replies
    1. I have usually found that if I code this way, a large portion of actual lines of code (> 50%) ends up as SQL. It's really nice actually.

      Delete
  2. I seem to remember having this conversation with a boss of mine. We concluded that there was probably a balance to be struck: allow a tool to auto generate the initial queries, but never rerun the tool, hand code everything thereafter. This gets all of your CRUD in place, and then allows you to optimize those queries as you come by later.

    Doing it this way means you only ever use the tool once, meaning that cost savings are only realized at the very onset of the project.

    "This allows one to request that the database send back data in the way the application can make best use of it."

    That *is* the point. I totally agree. I've seen too many data processing routines that spend hours processing only to find that if someone had considered writing some SQL they would see they could have filtered at the database level, and skipped transferring massive quantities of data.

    Sometimes I feel like the industry has moved backwards with people loading entire files from disk and doing the filtering in their app.

    ReplyDelete
    Replies
    1. Most of what I do is in-db, so ideally you'd manage these things through stored procedures and there isn't much room for an automated tool. The procedures are designed to be usable across different languages, and the software contracts are set to be relatively loosely tied. I would be willing to consider something like what you propose in the case of calling SQL from the application code though.

      "That *is* the point. I totally agree. I've seen too many data processing routines that spend hours processing only to find that if someone had considered writing some SQL they would see they could have filtered at the database level, and skipped transferring massive quantities of data."

      Isn't it annoying when you replace a nested for loop with a HAVING clause, esp, when this eliminates millions of records from being processed?

      Delete
  3. Hi
    developpez.com needs your permission to the French translation of your articles on "Building a solid database". (phrase obtenue par google traduction!!!)
    Thank you

    ReplyDelete
    Replies
    1. I have just sent an email verifying my permission.

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I think I just want something that allows me to easily map the results returned into objects. I'm otherwise perfectly content with hand writing the query itself.

    I will admit however that I do not like looking at Stringy SQL amongst 3GL code, and would like to have something that moves SQL into it's own place.

    When working on the database I want to think in the DB's terms (usually relational, SQL ) when thinking in my 3GL I usually want to think in terms of Objects. I don't need the mapper to make the queries, just turn the results into objects, and hide the queries from where I'm thinking about object logic.

    ReplyDelete
    Replies
    1. "I will admit however that I do not like looking at Stringy SQL amongst 3GL code, and would like to have something that moves SQL into it's own place."

      Agreed, btw. I use stored procedures much as "named queries" and this reduces the stringy SQL to a couple of files which generate the calls. I much prefer, to the extent possible, to follow a "one language per file" rule. We currently use the approach I outlined in the first past on this blog but this has some issues regarding function overloading in the db.

      "When working on the database I want to think in the DB's terms (usually relational, SQL ) when thinking in my 3GL I usually want to think in terms of Objects. I don't need the mapper to make the queries, just turn the results into objects, and hide the queries from where I'm thinking about object logic."

      Agreed here too. So the questions boil down to what the options are to do this. We do things via stored procedures with modules that provide a dynamic call interface. There are no doubt other ways to do it.

      Delete
  6. Despite my being a devout Atheist, let me add my "Amen" to the above.

    Generated code is one of those things that works 90% of the time (probably), but it's really the other 10% that interesting.

    For automatic features, like change logs, what we really need is an CREATE TABLE trigger to add those.

    And, most genned SQL is butt ugly.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. There is a lot of truth in the above, but there are also some merits to ORMs that cannot be denied. As with many things, the discussion boils down to

    1. "It depends"
    2. "I might personally prefer this over that"

    SQL is a very underestimated technology. When I give talks about jOOQ, I'm constantly surprised about how many people are unaware of clauses like:

    - HAVING
    - CONNECT BY (Oracle)
    - GROUPING SETS
    - Window functions

    ... just to name a few. The problem probably isn't the fact that people are lazy or ignorant, or even sickened by SQL. The problem probably is the fact that a lot of middleware software vendors have generated high expectations towards ORM over the last 10 years, which ORM simply never can fulfil. And, there has been no one to help other "3G" developers with alternative, adequate tooling. Consequently, JPA 2.1 still continues specifying tweaks to patch lazy / eager fetching, which don't exactly simplify / beautify things.

    I'd like to hear your opinion about www.hibernate-alternative.com, btw.

    ReplyDelete
    Replies
    1. Jooq looks quite interesting. It isn't my style largely because I prefer to encase my databases behind a discoverable API (and hand code SQL there), but I can very much see the use of it.

      Interestingly I am in the process of writing stored proc service locators in Java (a language I don't know very well yet) and Jooq looks very helpful in the few areas where we have to put queries in the Java code.

      Delete
    2. I see, yes, I have appreciated a well-designed stored procedure API in the past. This can make for a very clean interface between applications and database.

      Unfortunately, calling stored procedures from Java (JDBC or JPA 2.1) is really a pain. With jOOQ and jOOQ's source code generator, things should be a lot easier.

      Delete
    3. This comment has been removed by the author.

      Delete
    4. Just as a note, I plan to use jOOQ in my service locator. Thanks for the reference.

      Delete