It is no surprise that Tony continues to take issue with the idea of putting business logic in the database and in fact expands that to include things like foreign keys and check constraints. However many of the points are actually interesting and worth discussing. My own sense is that we can relegate Tony's viewpoint largely to cases where software as a service. As a bonus, I will respond also to his points in his article Stored Procedures are Evil
Marston's New Points in His Reply
That has not been my experience. I have spent all of my career in writing applications which have sole access to their databases. Having multiple applications trying to perform the same function on the same database is something which I have never encountered.
Marston is right that this is usually not on the requirement sheet anywhere, though sometimes it is. Rather additional applications are usually added after the fact, for example to move data from one application to another, to generate reports, or even to do other things.
One of my customers uses LedgerSMB in a fairly automated way, and has all financial information imported through a staging schema from production applications, as well as a second application which creates and manages vendor profiles and payment information (and which eventually imports the data into LedgerSMB). They are an exception though. Most customers who end up with multi-app databases don't set out to build them. Instead they have additional requirements added after the fact where new applications are clearly the right approaches. By removing the logic which prevents invalid data from being stored in the database, Marston's approach slams shut the door on such after-market development.
I disagree completely. It is implementing business logic in the database which is more time consuming, difficult and expensive. And please don't muddy the waters with NoSQL databases as they are a recent phenomena which have yet to reach full maturity and live up to their hype.
The discussion about NoSQL holds true for other schemaless designs. If all you are using the RDBMS for is a very light-weight data store with some backup capabilities, why not use a schemaless database?
NoSQL engines in fact bring few things to the table compared to prerelational engines which are relevant to this discussion. So in this regard, they are not significantly different from any other key/value store, for example Berkley Database, which I think we can all agree has reached full maturity.
The tradeoff is quite simple really. If all you are doing is app development with no reporting, it's far faster development-wise to just throw your data into key/value stores (whether pre-relational or post-relational) than it is to deal with the overhead of relational design. On the other hand, you lose the ability to do data constraints, ensure all stored data is meaningful, and any reasonable ad-hoc reporting. On the whole, the losses for a business application usually outweigh the gains for reasons based on the very features Tony Marston doesn't appear to want to use (foreign keys, check constraints, views, reporting.....) and therefore the relational model wins out more times than not.
Now, my guess is that since Marston repeatedly says he doesn't do reporting but has an ERP, he is specifically talking about ad hoc reporting. Any ERP worth its salt relies heavily on reporting generally. So here we are beginning to see the narrow use case where Marston (and admittedly some big ERP's too) fall into, though I think a lot of the reason for this is licensing revenue.
Most large ERP's that I am aware of do in fact require that all access to the db goes through a middleware layer. I think there are two fundamental reasons for this. The first is that most of these depend on large, expensive, proprietary databases, and since most businesses are more interested in consolidating servers, it is hard to insist that customers, for example, buy SQL Server instead of Oracle, or DB2 instead of SQL Server. This means that there is a requirement to use the lowest common denominator in the RDBMS set.
The second reason is that the ERP can only enforce client access license limits on the middle tier--- it cannot do this effectively at the database level. Therefore in order to enforce client access license payments, this is usually enforced here, and no writes can be allowed beyond the ERP application itself. The goal of course is to build an ERP framework that is at the center of the business application environment and therefore the ERP vendor is most heavily in control of the license fees that they will collect.
That neither of these apply to LedgerSMB is why we can afford to move away from this model and aggressively re-use what frameworks are provided by the database and other components.
You misunderstand me. When I said "dumb" data store I meant the basic properties of any DBMS (whether relational or not) which is the ability to specify tables, columns within tables with data types and sizes, primary keys, unique keys and indexes. More advanced features, such as foreign key constraints, column constraints, triggers, stored procedures and user-defined functions were later additions. It is the use of these "advanced" features which I regard as optional and therefore I have the right to exercise that option and not use them.
Um...... Where to begin......
First, tables and columns are what define the relational model. Non-relational databases use other ways of representing data. Primary keys are also specific to a relational model, as I would think would be unique keys. That means of the features listed for all DBMS's ("whether relational or not") only the most general (indexes) are applicable outside the relational world, and that term is in fact so vague that it isn't clear to me that it is in fact even meaningful outside of a more specific context.
Berkeley Database, for example, is a basic key/value store. You can store whatever you like as a value, and it is referenced by it's key. You could store JSON, XML, YAML... whatever. No tables, columns, unique constraints, etc. You can just serialize your objects and put them there.
The idea that foreign key constraints are an advanced feature that should not be used would be laughable were it not so dangerous. I have seen first hand what the refusal to declare foreign keys can do in the context of an ERP application. I suppose it is great if you want to make money off of untangling people's data. It is very bad though if you want people to trust the application to give the right results from an accounting perspective. I remember a panicked call from a customer one April a few years before we forked LedgerSMB, "It's getting close to tax time and our books don't balance. We need this fixed right now. Can you help?" How long had the books been out of balance? About 8 months. Ouch, that's gotta hurt......
One of the big developments in LedgerSMB 1.2 has been the addition of foreign key constraints, and these were further improved upon in 1.3. Why? Because we take data integrity seriously......
I am most definitely *NOT* advocating a return to the COBOL/VSAM/IMS paradigm. I am simply questioning the argument that because you *CAN* implement business rules in the database then you *MUST*. I don't have to, and I choose not to.
The question of course is one of tradeoffs. How important is your data? How sure are you that folks will not want to create other import/export routines hitting the database? How confident are you in your code?
The more you can leverage declarative constraints at all levels of your application, the more sure you can be, mathematically, that your data is meaningful. It really is that simple.
Marston's Points (And Responses to them) in His Comments On Business Logic in the Database
In the blog entry mentioned above, "Business Logic in the Database" I think effectively narrow the use case where Tony is talking about to those where either revenue from middleware licenses is an issue or where something equivalent is going on:
Luck has nothing to do with it. Nobody is allowed direct access to my databases without my permission, and I never grant that permission. Access is either through my application or through my web services.Now, thinking for a moment here about what "my databases" must mean here, this can only mean software as a service. Here the customer does not control their own data, which is a compelling reason to switch to LedgerSMB ;-). In particular this sort of control ends for all intents and purposes the moment the application is deployed at a customer's site. Therefore this application either requires license agreements, or it requires control over the database servers.
Either way, it has no place in open source.
To this, a commenter named "Thomas" replied:
While I realize you think your systems are confined to only being accessed by your applications, in the 25 years I’ve worked with databases, I’ve never encountered of a situation where such a restriction could be enforced indefinitely.That matches my experience too. In fact I would say that I have never found an application deployed in production in a standard RDBMS at a customer's site where the customer found this restriction to be valuable. If it can be enforced indefinitely it is only through legal agreements requiring it, and those are burdens....
The fundamental problem, as I have repeatedly noted in this discussion is that kinds of logic that must be included in the database goes way up as soon as you want to support even the possibility of that second application. Yes, that means at a minimum, check constraints, referential integrity constraints, domains, and the like. Depending on your application, you might certainly have to go to stored procedures just to get an assurance of data integrity. Certainly that is the case when trying to enforce a rule like "all GL transactions must be balanced."
A Diversion into History: Goths, Marston, and Anti-Intellectualism
Robert Young commented:
Considering that you’re championing the failed COBOL/VSAM/IMS paradigm that Dr. Codd put a stake in, why aren’t you the one making such comments? The Goths put Europe into the Dark Ages by enforcing the pre-intelligence paradigm. Those, such as yourself, seek to return to the thrilling days of yesteryear, where each application was its own silo, and data sat in dumb files accessible only through bespoke code. Kind of the Dark Ages. Have a look at WebSocket, and see where the rest of us are going. No more disconnected client edit screens; just call the validation from the database from any client application. Complete flexibility, on both the server and client side. Separation of responsibility. And all that nice rhetoric that OO folks espouse, but seldom perform. You’re not “reinventing the wheel”, just rediscovering the square one that your grandfather used. That’s not progress.I recognize it is unusual to see a history buff of my sort in application development but.....
Robert Young is wrong here on most counts, but not quite all. Most particularly, he is wrong about the Goths. The two most important books to read on the Goths are probably Peter Heather's book in the Peoples of Europe series and Herwig Wolfram's "The Roman Empire and Its Germanic Peoples." Both these books provide a view of the Goths which is well outside the standard dark ages narrative that Peter Young describes. In fact both authors tend to stress the continuity of Roman customs, land titles, and administration when the Goths ruled Italy and Spain. Sure there was some (necessary) simplification of the laws and so forth, but the decay of Italy continued, as Wolfram notes, because the concentration of wealth was so high that it prevented effective taxation and thus Ostrogothic Italy eventually fell to the Byzantines. Wolfram also credits the fall of the Western Empire with the same forces he claims doomed the Goths.
In Arms and Armor of the Medieval Knight, Miles and Paddock suggest that the Migration Ages were times when the Germanic peoples spread important metallurgical technologies across Europe, including the all-important technique of pattern-welding different forms of steel into weapons of far greater quality than the Romans managed. (The next major advancement in metallurgy would have to wait until the conversion of Scandinavia allowed Christians to import technologies for manufacture of homogeneous steel swords and other tools from Northern Europe.)
In fact, it's pretty clear due to the fact that the Goths arose from within the Roman empire and that at least one Gothic rebellion sought to obtain a Roman generalship for their leader, that the Goths were far less anti-Roman than the Romans were anti-Goth..... A lot of this had to do with the Goths early conversion to a form of Christianity known as Arianism, after its founder, Arius of Alexandria. This lead to several centuries of religious feuding between Nicene and Arian Christianity which is quite clear from every author of anywhere near that time.
And so we have a partial rebuttal of the master narrative that the great, progressive, scientific Roman civilization was destroyed by the backwards, barbaric, superstitious Goths.
But narratives matter including in this discussion. Just as the history is wrong, I think Young's central point is wrong too. I have come to the conclusion that conservatism in design choices is a good thing, and that a post-modern approach means valuing even approaches that seem at the moment to be outmoded. The real question is not which method is best, but rather in which use cases one method or another wins out.
In other words, to me there is no progress of methodologies, innovation is a dirty word, and we should be sceptical of new, overly hyped technologies, looking at them through a lens of what has been learned in the past. From this viewpoint, I have to say that I think that accusing people of getting in the way of progress is actually giving them a compliment, for in the words of Henry Spenser, "Those who do not learn from UNIX are destined to reinvent it badly."
Where Mr. Young is right is in the separation of responsibility point. This point cannot be overemphasized. The idea that a database takes on responsibility for storing meaningful data is something which no database which is both non-relational and non-dedicated can manage. The only use cases where one can avoid this are things like LDAP (which is a horror for reasons I won't go into here). And therefore relational databases usually win out, again, for reasons pertaining to the features Marston doesn't use.
Marston's Points in "Stored Procedures Are Evil"
In this article, Marston oversimplifies the positions of where/when to use stored procedures to two very extreme positions, entirely erasing any middle:
Use stored procedures and triggers only when it is an absolutely necessity.
Use stored procedures and triggers at every possible opportunity simply because you can.
He argues that his knowledge is better because:
You only know what you have been taught, whereas I know what I have learned.Well, I used to agree with Marston. I have since come to learn that he is wrong. Perhaps it is this change of opinion which qualifies me to look at his arguments and address them, acknowledging where he has a going.
Before I start, though, I will say that people who use stored procedures at every possible opportunity (and I have met a few) do cause problems and that such is not a viable position. The obvious middle ground is:
Use stored procedures wherever they make sense given the functions of the databaseBut that may be too nuanced for Marston..... Presumably we should look at his arguments against stored procedures and it will make more sense.
First though let's look at what he says about arguments in favor of stored procedures:
This is a common argument that many people echo without realising that it became defunct when role-based security was made available. A good DBA defines user-roles in the database, and users are added to those roles and rights are defined per role, not per user. This way, it is easy to control which users can insert / update and which users can for example select or delete or have access to views in an easy way.
There are certain cases where role-based security doesn't quite get you there. Now these may not impact all applications but they do affect a significant set of them. The big one is something like "users of role x must be given the permission to mark transactions as approved, but cannot mark approved transactions as unapproved." Because the security setting depends on the value of the input, you cannot enforce the security using database role-based security. In these cases, the only secure way to grant access is through a stored procedure.
He also says:
With a view it is possible to control which data is accessed on a column basis or row basis. This means that if you want user U to select only 2 or so columns from a table, you can give that user access to a view, not the underlying table. The same goes for rows in one or more tables. Create a view which shows those rows, filtering out others. Give access rights to the view, not the table, obviously using user-roles. This way you can limit access to sensitive data without having to compromise your programming model because you have to move to stored procedures.The problem here is that with update permissions, views suffer from all the portability problems of stored procedures. So this doesn't necessarily provide a significant win.
It is also said that stored procedures are more secure because they prevent SQL injection attacks. This argument is false for the simple reason that it is possible to have a stored procedure which concatenates strings together and therefore open itself up to sql injection attacks (generally seen in systems which use procedures and have to offer some sort of general search routine), while the use of parameterized queries removes this vulnerability as no value can end up as being part of the actually query string.First, it's important to note that stored procedures get called through a SQL query like anything else, and therefore the underlying SQL query is still vulnerable to SQL injection attacks and these need to be addressed. So parameterized queries still have to be used. Similarly, Marston has a point about dynamic SQL in stored procedures (a problem which is actually more severe in some cases than he gives credit for).
This being said, it is simpler to understand where the potential issues are in stored procedures, and it is simpler to audit them for problems than it is to deal with either dynamic SQL or generated SQL based on the sorts of mappers that Marston suggests.
As for performance, Marston states:
The execution of SQL statements in stored procedures may have been faster than with dynamic SQL in the early days of database systems, but that advantage has all but disappeared in the current versions. In some cases a stored procedure may even be slower than dynamic SQL, so this argument is as dead as a Dodo.
My general experience is that the more dynamic the SQL is, the harder it is to tune because the more abstraction layers one has to go through in order to find the problems. ORM's are usually horrible in this regard because they typically do a very large number of operations when a smaller number of operations could succeed and this is usually not fixable. Similarly dynamic SQL, where the whole query is created at run-time imposes additional overhead when trying to locate and correct performance issues.
I have seen horrible performance from stored procedures, but I have seen how much easier they are to tune (when written in a maintainable way).
One thing to keep in mind though is that stored procedures are not necessarily more or less efficient. They do take skill and care to write so that they perform well under demanding cases. There are some additional gotchas to be aware of especially when folks who are more app authors than SQL authors start trying to write them. However on the whole, I think this problem is manageable and the performance gains worth it.
He goes on to say:
Performance should not be the first question. My belief is that most of the time you should focus on writing maintainable code. Then use a profiler to identify hot spots and then replace only those hot spots with faster but less clear code. The main reason to do this is because in most systems only a very small proportion of the code is actually performance critical, and it's much easier to improve the performance of well factored maintainable code.
I completely agree with this btw. Clear code is less costly to maintain than is unclear code. And I have read horribly coded stored procedures just as I have read beautifully coded ones. Focus on clear, maintainable code. Keep things simple.
BTW this brings me to one clear advantage of stored procedures: you can keep all your SQL code in different files from your higher tier programming logic. This can be used to really help with maintainability. Additionally I recommend the following for people writing stored procedures: "Use stored procedures, as much as possible, as named queries. Keep them, as much as you can, to single SQL statements with named parameters."
His arguments against stored procedures become more interesting:
Instead of having a structure which separates concerns in a tried and trusted way - GUI, business logic and storage - you now have logic intermingling with storage, and logic on multiple tiers within the architecture. This causes potential headaches down the road if that logic has to change.
This mangling goes both ways. A clear stored procedure architecture, using stored procedures where they make sense, can have the effect of untangling your database queries from your application code, and thus making a multi-tiered architecture clearer and cleaner. Of course, this isn't always the case, so common sense needs to be applied. Again, use stored procedures where they make sense and simplify things, but don't use them just because you can, and don't avoid them just because you can.
He then goes on to talk about the maintenance issues of stored procedures:
The reason for this is that stored procedures form an API by themselves. Changing an API is not that good, it will break a lot of code in some situations. Adding new functionality or new procedures is the "best" way to extend an existing API. A set of stored procedures is no different. This means that when a table changes, or behaviour of a stored procedure changes and it requires a new parameter, a new stored procedure has to be added. This might sound like a minor problem but it isn't, especially when your system is already large and has run for some time. Every system developed runs the risk of becoming a legacy system that has to be maintained for several years. This takes a lot of time, because the communication between the developer(s) who maintain/write the stored procedures and the developer(s) who write the DAL/BL code has to be intense: a new stored procedure will be saved fine, however it will not be called correctly until the DAL code is altered. When you have Dynamic SQL in your BL at your hands, it's not a problem. You change the code there, create a different filter, whatever you like and whatever fits the functionality to implement.
I don't understand this one at all. If the stored procedures are an API and extending them breaks things, why aren't the tables an API and extending them breaks things? Of course, Marston talks about his own approach here which does dynamic discovery. But in that case..... LedgerSMB does dynamic discovery on stored procedures. So it isn't clear to me that this is something that is specific to one side or the other.
On to the next one:
Business logic in stored procedures is more work to test than the corresponding logic in the application. Referential integrity will often force you to setup a lot of other data just to be able to insert the data you need for a test (unless you're working in a legacy database without any foreign key constraints). Stored procedures are inherently procedural in nature, and hence harder to create isolated tests and prone to code duplication. Another consideration, and this matters a great deal in a sizable application, is that any automated test that hits the database is slower than a test that runs inside of the application. Slow tests lead to longer feedback cycles.I haven't found that to be true at all. In fact coming up with good unit tests for database-driven applications usually means some sort of sample data set anyway. A very clear win additionally is the ability to run tests on production systems in transactions that are guaranteed to roll back. Testing this on a live system via application logic is a lot more dangerous, and more likely to insert test data into the database by mistake.
If all the business logic is held in the database instead of the application then the database becomes the bottleneck. Once the load starts increasing the performance starts dropping. With business logic in the application it is easy to scale up simply by adding another processor or two, but that option is not readily available if all that logic is held in the database.This is not necessarily the case. If you put everything in stored procedures that the database would already be doing, there is no reason to think the db becomes more of a bottleneck than it would have been before.
This is a big issue if you want an application where the customer can insert their own business logic, or where different logic is required by different customers. Achieving this with application code is a piece of cake, but with database logic it is a can of worms.He speaks authoritatively here of something he professes at the beginning of the article not to really know well. The fact here is that adding custom logic is *different* in a stored-procedure-centered application than it is in a business application. There are ways of doing this right and managing the problem, just as there are ways of doing it on the application side incorrectly and making a mess.
The next point made me laugh:
A big problem with database triggers is that the application does not know that they exist, therefore does not know whether they have run or not. This became a serious issue in one application (not written by me) which I was maintaining. A new DBA who was not aware of the existence of all these triggers did something which deactivated every trigger on the main database. The triggers were still there, they had not been deleted, but they had been turned off so did not fire and do what they were supposed to do. This mistake took several hours to spot and several days to fix.What competent DBA disables triggers without figuring out what they do first? However, for all of that he does have a point. Triggers can be used incorrectly to ensure that an application operation has a desired side effect. This is not a very maintainable way to go. It is better generally to use triggers to hook into a host application as a means of either enforcing RI, or allowing a third party application to force side effects that it needs. The main application should not depend on the use of triggers to run effectively. Only other applications should depend on triggers they provide for information as to what is going on in the host application, or as additional safety measures aimed at protecting data integrity.
As for his points about version control, all I can say is "test cases, test cases, test cases, all in transactions that roll back." You might not know which version your stored procedures are, but you WILL know that they are behaving the way you want them to.
Finally to the point of vendor lock-in.
Here Marston has a clear point. If you build a stored-procedure-centric application it will be difficult (though not necessarily impossible depending on various factors) to migrate to another database. Of course, migrating applications between databases is not necessarily a walk in the park either (Oracle empty string handling comes to mind).
In general, if you are writing on an Oracle db, it is feasible to migrate stored procedures perhaps to Fyracle and PostgreSQL, but less feasible to migrate to MS SQL. If you are writing Java stored procedures in Oracle, then you can migrate these to PostgreSQL without too much difficulty. If you are writing PL/Perl stored procedures in PostgreSQL, you will probably never be able to migrate them anywhere. On the other hand if you are writing an application to ship to customers and it must run on Oracle, DB2, MS SQL, and PostgreSQL, then stored procedures are out. In general, this comes back to "know your technology and choose it appropriately."
In conclusion, there are certainly cases where you don't want to use stored procedures, for example, because you don't want to be tied to a single RDBMS. However this is no reason to have a blanket avoidance strategy any more than it is a good idea to put everything possible inside the db.
Marston concludes by saying that stored procedures are optional and therefore his view is not clearly wrong, but calling something optional and calling something evil are very different. I think Marston is clearly wrong with the "evil" categorization, but generally right with the optional one. For his specific use case, stored procedures may not work very well, and hence might be avoided. However, generalizing that out is dangerous.