Perspectives on LedgerSMB: Thoughts on VoltDB and performance vs flexibility

Saturday, May 12, 2012

Thoughts on VoltDB and performance vs flexibility

VoltDB is the latest project of Mike Stonebreaker (original founder of Postgres) aimed at very high performance OLTP. It is fairly clearly a niche product however, and not really usable in the ERP space for reasons I will discuss below.

Stonebreaker's own presentations (one example, at 11:25) suggest he is aware of this issue given his triangle framework.

In the vertices of the triangle he places NoSQL, high performance OLTP (like VoltDB) and column-oriented databases, while in the second, as lower performance generalists, what he calls the legacy databases or elephants are in the middle. The argument is that you can get a significant performance boost by specializing your systems beyond what you could when you have every system doing everything. This is a fair argument to a point, but the implementation details show that it is true only in some areas.

Stonebreaker's analysis of processing time (see above presentation, at 14:36) in traditional databases places virtually all processing time in four areas, namely buffer management, locking, thread locks/semaphores, and recovery tasks, and he suggests that in order to get high degrees of performance one must eliminate these tasks. This requires, however, rethinking how we address concurrency (the locking issues) and durability (recovery and disk-based storage). The VoltDB approach is to get rid of concurrency entirely and see durability as a network, rather than a system, property. Concurrency elimination accepted because the db is fast enough that queries can be run one at a time, and intraquery parallelism can be used instead of interquery parallelism. However, this imposes significant limitations on the database because it means that every transaction is limited to a single query. You can't do round-tripping in your transactions because this would impose locking requirements on the database.

This approach works very well for certain kinds of processing, such as consuming rapid data feeds and then feeding that information into a data warehouse at specified intervals., However one thing that is generally missing from the discussion is that the more complex the application, the more general the database solution needs to be. One solution might be to separate data entry from reporting and use multiple tools, but who wants to pull their trial balance from a different system than they enter invoices on? Since ERP systems usually tightly integrate decision support and OLTP, there isn't really room here to use specialized databases like VoltDB.

As the database market has expanded, it has created niches for products like VoltDB. These niches may indeed grow with time. However, I think it is incorrect to treat generalists as legacy, outdated approaches.

In essence VoltDB typifies what I call a specialism paradox, namely that to perform well in a specialist niche one must give up generalist solutions. It is true that sometimes things improve and supplant older systems but at least as often they only do when the systems are not vastly more complex than the ones they are replacing. For example, Stonebreaker's comparison of high performance OLTP to the shift from cable-operated digging excavation equipment to hydrolics misses the fact that hydrolic systems are not that much more complex than cables and such. There is some additional complexity, but it isn't a huge gap. A better comparison might be comparing semi trucks to race cars and locomotives. These are all highly engineered systems, but the basic design considerations are very different, and specialization comes at the cost of flexibility. You will never see race cars hauling twenty-foot containers of furniture cross-country, and you will never see travelling along our nation's highways to reach out-of-the-way towns. The semi however will never pull the loads that the locomotive can, and will never win the Indie 500...... The semi truck, however, can go fast enough, and pull the loads it needs to pull.....

6 comments:

AnonymousMay 12, 2012 at 9:06 PM
Hi Chris. Thanks for taking the time to write a bit about VoltDB.

As a VoltDB engineer, the first thing I want to make sure is clear is that VoltDB allows for multiple queries per transaction, as well as complex decision logic within a transaction. The limitation is that this bundle of logic and SQL must be executed with a single round-trip from the client. This is indeed a limitation, but not quite as severe as it's worded in your post.

Presumptively speaking for Dr. Stonebraker, I think he largely agrees with you. The point of that triangle diagram is to show that generalist, open-source RDBMSs like Postgres, are so good these days that it’s difficult to offer value as a commercial generalist. Meanwhile, specialized systems like VoltDB are so good at the corners of the triangle, then really all that’s left for legacy commercial RDBMSs are the middle of the edges.

So speaking personally again, if you can solve a problem with Postgres on a single-node, by all means do so. Postgres isn’t about to be replaced by VoltDB. VoltDB is, for the most part*, only interesting at scale, though what’s interesting to me is watching scale problems trickle down to more and more companies.

Though whether your scale pain is tremendous or mild, we hear you that flexibility is important. First we made the system fast and safe; much of the work we do today is to make this system easier to use and applicable to more use cases.

Specific feedback is always welcome, especially in the form, “If you did/supported X, it would be easier for me build an application”. Multiple-round trips per transaction is probably not going to happen anytime soon, but a lot of other ideas are on the table.

Thanks again.

==
*Some of our users aren’t in absolute scale pain, but have chosen VoltDB for other reasons. HA is a good example; VoltDB is probably the easiest system in the world to run in redundant HA mode. Getting HA working with traditional RDBMSs has gotten better, but it’s still not trivial.
ReplyDelete
Replies
Chris TraversMay 13, 2012 at 1:16 AM
One interesting thought occurred to me. It seems that there is often 1-2 orders of magnitude performance difference generally in using specialized tools vs using generalized tools. Trying to do row locking in application because locks must persist across stateless http requests is something that seems to cause an *instant* performance hit of approximately 20x (in one case I tested it, a query went from 2 sec to 45 sec). In essence that's going from a specialized tool (db locking mechanisms) to a general one (storing/checking data in the db relating to locking). With further efforts, maybe we could get it up to only on order of magnitude slower, but such is life. The real solution is to move to a thick client where this is an issue and use SELECT... FOR UPDATE......

This comes down to using the right tool for the job and suggesting that if the database market grows, there will be room for more tools.
ReplyDelete
Replies
UnknownMay 13, 2012 at 5:58 PM
Great post, really enjoyed reading!

I work for ScaleBase. I invite you to look at our solution to the well-known problem of RDBMS performance, latency and scalability.
I couldn't agree more with Stonebreaker's min 14:36, most of DB's processing power goes on "DB engine stuff" and the more data and more concurrent sessions - the more severe the problem gets.

We're going with a simple-to-perceive approach of just splitting data and also sessions across multiple DBs. The heaviness of the good old RDBMS is still there, however it's isolated and contained to x sessions and y GB of data. We never hear people complain about their databases when there are 100 users and 10GB of data... Data appears only once in the array, the database still locks, manages, do all the good things but there are 10 databases doing it in parallel in much smaller portions.

The DB is still the same DB, MySQL... And in the "generic" tests we seem to pass quite well as well as the scale and performance of course.

I would love to hear your thoughts about it.
ReplyDelete
Replies

Add comment