Monday, August 13, 2012

ACID as a basic building block of eventually consistent, distributed transactions

Anything worth doing is worth doing well.  It therefore follows that anything worth tracking for a business is worth tracking well.  While this involves tradeoffs which are necessarily business decisions, consistency and accuracy of data are always important considerations.

In a previous post, I looked at a the the possible origin of double entry accounting in stock and foil split tally sticks.  Now let's look at how financial systems of today might provide a different way of looking at distributed transactions in loosely coupled systems.  This  approach recognizes that in human systems all knowledge is fundamentally local and applies this to distributed computing environments, broadly defined.

Eventual Consistency as Financial Firewall

In the non-computer world there are very few examples of tightly coupled distributed transactions.  Dancing, perhaps comes to mind as does a string quartet playing a piece of music.  However most important work is done using loosely coupled systems.  Loosely coupled systems done right are more robust and avoid some of the problems associated with the CAP theorem.  If the first violinist suffers a mishap part way through a quartet and is unable to continue, you probably will have to stop playing.  If the petty cash manager suddenly falls ill after you have withdrawn money from the petty cash drawer to go buy urgently needed office supplies you can continue on your way.  You might have to wait until someone else can take over before you can give your receipt and change back, however.

Such systems however have two things in common:  first they are always locally consistent and this is extremely important.  The petty cash drawer and all cash vouchers are together in a consistent state.  Secondly counterparties are all locally consistent and transactions can be tied clearly back to such counterparties.  The party and counterparty together provide a basis for long-term conflict resolution in the form of an audit.  Eventual consistency is a property of the global system, not of any local component.

Moreover there are times when globally eventual consistency is actually desirable, but this business need is premised on absolute local consistency.  For example , if I am processing credit cards, the processing systems will be locally consistent and my accounting systems will be locally consistent, but these will not always be in sync.  My accounting department will have control over the data coming into the books.  Similarly if my inventory is stored by a third party and shipped, they may send me a report every day, but it is not going to hit the books until it is reviewed by a person.  Both of these approaches use global eventual consistency as a firewall against bad financial data entering the books.  In other words this is a control point where humans can review and correct the problems.  In addition to the performance issues, this is a major reason why you will rarely see two phase commit used to synchronize transactions between financial accounting systems.  Instead these will be moved in, in ways which are not globally consistent but only can become such after human review and approval.  The computer, in essence, is treated like a person which can make mistakes or worse.

If this is the case internally it is even more the case between businesses.  If I run a bank, I am not giving your bank direct access to my financial database, and I am not going to touch your database.  The need for eventual consistency in transferring money between our banks will have to take the form of messages exchanged, human review, and more.

Why SQL?  Why ACID?  Why not a NoSQL ERP?

I have often said that NoSQL is an extraordinarily poor choice for ERP software.  In addition to the difficulties in doing ad hoc reporting, you have a need for absolute, local consistency which is not a goal of NoSQL software.  Without local consistency, you can't audit your books, and you cannot determine where something went wrong.  ACID is not a global property of the business IT infrastructure and it shouldn't be (if you try to make it you run into a dreaded brick wall which is called "The CAP Theorem."  It is a property of local data stores and all your internal controls are based on the assumption that data is locally consistent and knowledge is local.

Example 1: An Eventually Consistent Cash Register

The first example of how we might look at this might be an eventually consistent retail environment.  In this environment we aren't taking materials out of the store to ship, or if we do we are going down and pulling them off the shelf before entry.  The inventory on the shelf is thus authoritative here.  The books are also reviewed every day and transactions reviewed/posted.

Since nobody can pull a product off the shelf that doesn't exist, we don't have to worry about real-time stock tracking.  If we did though there would be ways to handle this.  See example 2 below.

In this example the cash register would locally be running an ACID-compliant database engine and store the data through the day locally. At the end of the day it would export the transactions it did to the accounting system where the batch would be reviewed by a person before posting to the books.  Both the cash register and the accounting system would retain records of the transfer and synchronization, making an audit possible.  Because of the need for disconnected operation I am considering building such a cash register for LedgerSMB 1.5

Example 2:  Real-time inventory tracking for said cash register.

So business needs change, and now the disconnected cash register needs to report inventory changes on as real-time basis as possible to the main ERP system.  This is made easier if the cash register is running PostgreSQL

So we add a trigger to the table that stores the inventory movements which queues these for processing.  A trigger on the queue table issues a NOTIFY to another program which attempts to contact the ERP system.  It sends info on the inventory plus invoice number.  This is digitally signed with a key for the cash register.  The ERP stores this information and uses it for reporting and reconciliation at the end of the day, but treats it as a signed voucher checking out inventory.  At the end of the day these are reconciled and errors flagged.  If the message exchange fails, it tries again later.

Now, here you have three locally consistent systems: the POS, the ERP, and the messaging module.   Eventual consistency is a global property arising from it, and is preferable, business-wise, to absolute consistency because it gives an opportunity for human oversight.

These systems draw their inspiration from the paper accounting world.  They are partition-tolerant, available, and guaranteed, absent hardware destruction, to be eventually consistent.  Not only does this get around the problems flagged in the CAP theorem but they also provide opportunities for humans to be in control of the business.

Unlike BASE systems, these systems, built on ACID provide a basic framework where business process controls are consistently enforced.

Considerations

For such a system to work a few very specific requirements must be met (ACID allows us to meet them but we still have to design them into a system):
  • Messages must be durable on the receiving side.
  • Messages must be reproducible on the sending side.,
  • Each side must be absolutely consistent at all times.
  • Sender and receiver do not require knowledge of the operations handled by the other side, just of the syntax of the messages.  The sender must have knowledge that the message was received but need not have knowledge that it was durably stored (such knowledge can be helpful but it is not required.
  • Humans, not machines, must supervise the overall operation of the system.


Summary and thoughts on the CAP theorem

 In the off-line world, the CAP theorem not only enforces real limits in coordinated activities, but it also provides solutions.  Individuals are entrusted with autonomy, but controls are generally put in place to ensure both local consistency and the ability to guarantee eventual consistency down the road.  Rather than building on the BASE type approach, these build on the fundamental requirements of ACID compliance.   Paper forms are self-contained, and are atomic for practical reasons.  They maintain consistent state.  The actor's snapshots of information are inherently limited regarding what information they receive about the state on the other side, providing something akin to isolation.  And finally the whole point of a paper trail requires some degree of durability.  Thus the BASE approach resembles the global system but all components are in fact ACID-compliant.

This provides a different way to think of distributed computing, namely that functional partitions are sometimes desirable and can often be made useful, thus allowing one to move from the CAP theorem as a hard limit to viewing the CAP theorem as a useful reminder of what is inherently true, that loosely coupled systems create more robust global environments than tightly coupled ones, and that people rather than computers are often necessary for conflict resolution and business process enforcement.  The basic promise is that global consistency will be eventually maintained, and that the system will continue to offer basic services despite partitions that form.  This is the sort of thing the BASE/NoSQL proponents are keen on offering, but their solutions make it difficult to enforce business requirements because individual applications may see eventual consistency.  Here each application is absolutely consistent, but the entire environment is eventually consistent.  This matches off-line realities more closely.

Instead human systems, perhaps because they are not scalable in the CAP sense (having limited communications bandwidth), have developed very sophisticated systems of eventual consistency.  These systems require absolute consistency on the node (person) level and then coordination and reconciliation between systems.  In my accounting work the approach I have usually taken is to put the human in control but do whatever you can to simplify and streamline the human's workflow.  For example, when trying to reconcile a checking account it may be useful to get data from the bank.  This data is then matched to what's in the database using a best guess approach (in some cases we can match on check number but for wires and transfers we cannot and so we guess based on date and amount)  and the human is left to resolve the inevitable differences as well as review and approve. 

The nice thing about this model (and it differs from BASE considerably), is that you can expect absolute consistency at all times on every node, and availability of the global environment does not depend on every individual component functioning.  In the cash register example, the cash register can go down without the ERP going down and vice versa.  This means that you can achieve basic availability and eventual consistency without sacrificing ACID on a component level.  The key however is to be able to go back to the components and be able to generate a transaction history if needed, so if something failed to come through you can re-run it.

Because this is based on the ACID rather than the BASE model, I would therefore offer as a cute name:  Locally Available and Consistent Transaction and Integrity Control ACID as a name for this model of consistency.  This of course can be referred to by the cute shorthand of LACTIC ACID, or better "the LACTIC ACID model of eventual consistency."  It is a local knowledge model, rather than the more typical global knowledge model, and assumes that components, like people, have knowledge only of the things they need to know, that they are capable of functioning independently in disconnected mode, and that they are capable of being generating consistent, and accurate, pictures of the business later on demand.

This approach further relegates traditional tools like two-phase commit to the role of tools, which can be very helpful in some environments (particularly replication for high availability), but are not needed to ensure consistency of the environment when the systems are loosely coupled.  They may still be helpful in order to handle some sorts of errors, but they are one tool among many.

Finally although financial systems are not transportation vehicles, the automation paradox applies there as well.  If humans rely on too much automation, they are inclined to trust it, and therefore when problems arise are ill-prepared to deal with them.  In this approach, humans are integral parts of the operation of the distributed computing environment and therefore are never out of the loop.  This may sound less than desirable but in my experience every case of embezzlement I have heard of could have been prevented by more human eyes on the books and less, rather than more, immediate consistency.  There is nothing more consistent than having a single employee with full access to the money.  Separation of duties implies less consistency between human actors but this is why it is useful.

This approach is intended to be evolutionary rather than revolutionary.  Rather than try to create something new, it is an attempt to existing mature processes and apply them in new ways.

3 comments:

  1. This article makes a great point about local ACID consistency with eventual consistency between locations. This model is very attractive if it can be achieved. Also, you are completely on the money as far as matching distributed systems to real life business processes. A lot of programmers miss that.

    That said, the real problem in most systems is creating workable models for eventual consistency. One of the reasons that NoSQL systems don't have transactions or referential integrity is that it's much easier to implement eventual consistency. For example you can patch up individual rows without worrying about violating integrity constraints. There is active research (example: CALM theorem) to try to move beyond the primitive NoSQL approaches, but it is still very speculative.

    Also, I think your point about human control has to be phrased carefully. Humans are quite poor at reconciliation and consistency checking compared to computers. In large systems these features must be built-in, though you may audit them for proper functioning from time to time.

    ReplyDelete
    Replies
    1. First, I agree that you don't want humans doing most of the work regarding checking consistency and reconciliation. The best way to think of it is that the computer does the grunt work and the human supervises, reviews at least in summary (perhaps in detail if required) and approves.

      Now, if you have loose coupling, you can do some neat things with eventual consistency. For example, system A can import data into system B (which has RI constraints), and system B processes that data and enters it into system C (which also has RI constraints). The system A -> system B interface may be entirely unsupervised, and the system B -> system C interface may be completely supervised. Supervision here referring to humans being in the loop and performing batch approval.

      Within two updates you have consistency between system A and system C. However unlike the NoSQL model, systems A, B, and C are functionally different, so we are looking at syncing systems for different functions together without losing integrity of our data, and without keeping humans out of the loop at least to the point of having some frequent review of the data.

      I think the reason why NoSQL misses local consistency is that it is a scale out solution. You aren't talking about eventual consistency between functionally different systems. You are talking about eventual consistency between hot-swappable systems, and that is where I start to cringe. "You processed my credit card but I never received what I ordered" becomes a complaint that is very difficult to track.

      Delete
  2. This is a very thoughtful article about the importance of ACID in business.The model defined is very attractive if made into practice.Auditing by humans can be faulty at some time as humans are prone to errors.So, this model can be made effective.
    sap solution manager

    ReplyDelete