Sunday, March 20, 2016

When PostgreSQL Doesn't Scale Well Enough

The largest database I have ever worked on will eventually, it looks like, be moved off PostgreSQL.  The reason is that PostgreSQL doesn't scale well enough.  I am writing here however because the limitations are so extreme that it ought to give plenty of ammunition for those who think databases don't scale.

The current database size is 10TB and doubling every year.  The main portions of the application have no natural partition criteria.  The largest table currently is 5TB and the fastest growing portion of the application.

10TB is quite manageable.  20TB will still be manageable.  By 40TB we will need a bigger server.  But in 5 years we will be at 320 TB and so the future does not look very good for staying with PostgreSQL.

I looked at Postgres-XL and that would be useful if we had good partitioning criteria but that is not the case here.

But how many cases are there like this?  Not too many.

EDIT:  It seems I was misunderstood.  This is not complaining that PostgreSQL doesn't scale well  It is about a case that is outside of all reasonable limits.

Part of the reason for writing this is that I hear people complain that the RDBMS model breaks down at 1TB which is hogwash.  We are facing problems as we look towards 100TB.  Additionally I think that PostgreSQL would handle 100TB fine in many other cases, but not in ours.  PostgreSQL at 10, 20 or 50TB is quite usable even in cases where big tables have no adequate partitioning limit (needed to avoid running out of page counters), and at 100TB in most other cases I would expect it to be a great database system.  But the sorts of problems we will hit by 100TB will be compounded by the exponential growth of the data (figure within 8 years we expect to be at 1.3PB).  So the only solution really is to move to a big data platform.


  1. Could you give a bit more insight in why your main parts is not applicable to relevant partitioning but still seems to grow exponentially?

    1. the problem is that the largest table stores protein alignment report information from an external program. Join conditions between this table and others readily span most other criteria. So there isn't a major ability to partition this table very well, at least not in a way that would support something like Postgres-XL.

      Again, the problem isn't that PostgreSQL doesn't scale well. I figure we have a few years before we hit the limit but I don't see the limits (which we'd hit before the database reaches the 100TB range). It is that in this case we are headed towards limits at an ever-accelerating rate.

    2. Could you elaborate why protein alignment cannot be partitioned? - the fact that you must join with other criteria does not exclude partitioning - in XL you have more sharding than partitioning anyways - distribution key is hashed and depending on 'other criteria' joins would be pushed down (fast) or not (slow) but it always boils down to selectivity.
      We are aiming at nearly a petabyte for astronomical survey with XL (hopefully), but our sharding scheme works well thanks to ease of 2D projection using special spatial indexing techniques that are part of Pks the same time.

  2. You make a generalized statement that "PostgreSQL does not scale well", and "The main portions of the application have no natural partition criteria", yet you do not provide any specifics or details. To be fair, I believe you should state what the actual application is and also details about the large table(s).

    1. Sorry, you misunderstood me.

      PostgreSQL scales *very well* just not well enough for this rare circumstance.

  3. I'm just curious what you are switching to that scales better for your needs?

    My thought is also that given that PostgreSQL improves each year so much, if your outgrowth outlook is in 3 years. Seems a little premature for you to switch. By that time, PostgreSQL might scale enough for this. PostgreSQL will have some degree of parallelization in 9.6 for example.

    1. > I'm just curious what you are switching to that scales better for your needs?

      Probably Hadoop or similar.

      It is also almost certain that the client will continue to use PostgreSQL for smaller databases (smaller being in the hundreds of GB).

      > By that time, PostgreSQL might scale enough for this.

      A year ago, that was the hope. However, in 3 years, the problem is running out of addressable pages in a table. Fixing that would break pg_upgrade I think, at the least.

  4. Having large data sets that don't partition naturally is not uncommon. One of my former clients has such a data set, although it's not growing to anything like this extent.

  5. If you have a data model that does not allow partitioning I believe you will get into issues regardless of which database you would select.

    You mention that you have a lot of complex joins toward this and other relations. Having this kind of huge Single un-partitioned table in the center of your application must severely hurt your performance. How large are your indexes on this table, they must also be huge?

    I would (as Regina asked before) be very interesting to understand which DB solution your are looking at that effectively can handle that kind of design and target data volumes?

    In our application (using a data model adapted for partitioning) we currently handle petabyte scale of real-time telemetry data using partitioning and sharding on top of PostgreSQL and it works like a charm.

    Remember that PostgreSQL currently have a maximum table size of 32TB so you probably have to find an alternative sooner than you think if you can't partition your data.

    1. That table is a strange one. We don't have more than one index on it, but we have arrays of tuples there which are joined against other tables in a variety of ways. It is that way because that provides the best performance for the most latency-sensitive workflows.

      The much more interesting index is the 700GB GIN index on tsvectors on another table.....

      > Remember that PostgreSQL currently have a maximum table size of 32TB so you probably have to find an alternative sooner than you think if you can't partition your data.

      Yeah, I think the hardest limit is that one.

    2. If it's just a single table you can probably make all others replicated in XL and derive some statistics to get you a driving single column or synthetic composite column for distribution in XL to have equiwidth sharding. We also store a lot of data in arrays but do not reference them for joins. Out of curiosity, how long are your vectors?

    3. The first vector to be filled in... I don't remember how long the longest one is (and don't feel like crawling through 5TB of data to find out). Quick guess is up to 100, but maybe more.

      The second vector is a pared down version of the first, limited to 20 (filtered results).

      Replicating this table in Postgres-XL would be a bad idea. It would mean getting none of the benefits that Postgres-XL offers regarding write scaling or table size scaling.

    4. I meant deriving a distribution column from your vector element (or hashing number of them, all of them, if you do not know/trust the skewness of what could be a good sharding parameter from your data) and using it as a distribution column for distributed table.
      I am having problem to understand why you cannot find a single parameter that would be good for distribution or simply synthesize it. Even a database sequence would do in the worst case.

      Replication should be considered for anything smallish in comparison, say up to 10G-100G-1TB - all depends on your queries, data model.

  6. Hi again Chris.

    Are you using this GIN index to perform full text searches?.

    I recently stumble across a really cool postgres index extension that provides the very efficient Elasticsearch directly in PostgreSQL by using ES as index (one got to love PostgreSQL:s pluggable architecture). It is very possible that this is something you could benefit greatly from.

    1. We use Elasticsearch for log mining, we do not do FTS for the scientific part but the extension does look interesting. Thanks for sharing.