LedgerSMB, as an ERP and accounting system, has over time begun to move towards an intelligent database design based on PostgreSQL. We believe that this best matches our long-term goals for the project, especially because interoperability is a very important requirement for any ERP system. Read more to see our general case in this area and see what other projects may be able to learn from our experience.
A Short Developer's Definition of Intelligent Databases
An intelligent database is a database encapsulated behind a defined API.
A Longer Description
While most programmers primarily use the database primarily as a place to simply store data, this reduces a database to basically an ACID-compliant data store with some reporting capability. This approach saves the programmer from worrying about advanced database functionality, but it also prevents the database system from being used to do a number of useful tasks that databases can really be leveraged to do.
This post will explore why LedgerSMB is going this direction, and the sorts of functionality (particularly regarding PostgreSQL) that is a part of this plan, what sorts of tradeoffs we see in it, and where we expect to benefit. This is not an article that discusses the details of implementation (see previous posts for that), but rather at a high-level how and why.
The fundamental idea of encapsulating a database inside a defined API is that usually the application should hit the API and not the underlying relations where the data is stored if at all possible., The API acts as an interface between actual storage/retrieval and the application.
The major aspects of an API can include views and/or stored procedures. We use stored procedures for most things but may include views for some sorts of objects as time goes on.
What do Intelligent Databases Cost?
Many current development environments seek to save the programmer from spending heavy engineering time on a database, particularly in the early stages of a project where requirements can change. The database is relegated to a simple data store with ACID compliance. Frameworks like Rails will even generate a database schema from programming data structures.
If one goes with an intelligent design for the database, one cannot use agile programming methodologies to design the database. Instead the database requires some real engineering in its own right. This takes some time and effort in its own right and can slow down development compared to ORM-based code generators such as we see in Rails.
Such engineering requires taking a clear look at what data is expected to be collected, what internal structure it has, designing the database storage first and foremost around that analysis,and then building an API based on how it will be used. For simple schema development this doesn't take a long time but the complexity does grow with the data collected.
In the LedgerSMB project this is not the end of the world since we are re-engineering the database incrementally and so each major branch includes at least a few areas where old db structures have been pulled out and new, better engineered ones added. These areas then have an API and the application generally hits the API not the relations. Old code areas may hit the API or the relations.
Other projects may find that this adds significant development time beyond what is reasonable, but these must still be weighed against the benefits below.
What benefits do Intelligent Databases Provide?
Intelligent databases offer a number of specific benefits that LedgerSMB tries to capitalize on. The fact that these designs better match database theory is not enough. tangible benefits were required for us to reach this decision.
Some of these benefits speed up time troubleshooting and debugging code, partly offsetting the additional investment mentioned above. Others provide compelling features to third party developers who may want to integrate our software with theirs.
The single most important benefit is that of clarity. First, with very few exceptions, LedgerSMB tries to keep to one language per file. This means that database queries are in .sql files, and Perl is in .pl and .pm files. The modules which do query mapping are the only Perl modules that contain both Perl and SQL. In this way if I want to know what a query does, I can go to the relevant SQL file and read it. Using features like named inputs in PL/PGSQL, even 100 line queries can be remarkably readable. The API is also well defined, allowing us to troubleshoot issues with the query writer.
Clarity also means that performance problems can more easily be reviewed and addressed without getting into messy questions of ORMs and how they generate queries. The queries that produce a specified result set from an API call can be defined and the actual function written out in SQL.
Along with clarity, we get the ability to do as much as possible with declarative constraints. This provides, we believe, greater robustness in the application and therefore better software in the long run.
The second important benefit is in the area of integration. Because the API is defined, there are specified integration points that other applications can use to integrate without worrying as much about ensuring that corner cases are properly covered. This makes integration more robust. Moreover important areas like LISTEN and NOTIFY allow other applications to integrate in ways approximating message queues, and the like.
In fact, the integration benefit was the driving force behind the decision to move to an intelligent database design. We wanted to open up the application to add-ons written in other languages (Python at the time, but others could supported without too much effort).
Addressing Separation of Concerns
We have all heard horror stories of what happens when you put everything in the database. We have also heard of developers afraid to put anything in the database because of fear of going down the roads of these other horror stories. The secret to success here is in fact to separate concerns and to separate them well. Many lines are ultimately arbitrary in detail but bright in outlines and so they require a team dedicated to making decisions carefully rather than drawing bold lines and mindlessly following them.
Of course that's a long way of saying you can't get away with sloppy engineering and have to actually put some thought into things.
There are other options for application development. However, they all presume that the application has intimate knowledge of the internals of the database. I am somewhat puzzled why programmers who are so careful to avoid modules understanding the internals of eachother are so careless about encapsulation in the database.
The main other options are use of an ORM, which typically involves designing the database around the ORM and the data model for the application, and thus the RDBMS becomes basically an extension of the object model. This limits interoperability, and reduces the role that the database has relative to interoperability with other applications. This being said there isn't necessarily a problem with using VIEWs and ORMs together to provide this sort of abstraction.
A second possibility is to use a dedicated store, like many NoSQL engines offer where the database really is a store for objects, and where the "schema" (in scare quotes because NoSQL usually doesn't really have a schema concept) is just a direct extension of the object model itself.
The tradeoff with NoSQL is similar to using an ORM and building your database around that ORM, but more extreme. Typically you see even more rapid development early on, combined possibly with additional scalability in some cases, but with the tradeoff of interoperability is more severe, and ad hoc reporting is generally not practical outside the relational model.
Because of our desire to focus on interoperability and a reliance on good, flexible reporting, these options were not options for our project. They might be for others though.
LedgerSMB has benefited greatly benefited from the decision to move towards an intelligent database layout. Reasons include our focus on interoperability, and our willingness to do engineering up front. Other projects may find their mileage varying.