Perspectives on LedgerSMB: When PostgreSQL Doesn't Scale Well Enough

Sunday, March 20, 2016

When PostgreSQL Doesn't Scale Well Enough

The largest database I have ever worked on will eventually, it looks like, be moved off PostgreSQL. The reason is that PostgreSQL doesn't scale well enough. I am writing here however because the limitations are so extreme that it ought to give plenty of ammunition for those who think databases don't scale.

The current database size is 10TB and doubling every year. The main portions of the application have no natural partition criteria. The largest table currently is 5TB and the fastest growing portion of the application.

10TB is quite manageable. 20TB will still be manageable. By 40TB we will need a bigger server. But in 5 years we will be at 320 TB and so the future does not look very good for staying with PostgreSQL.

I looked at Postgres-XL and that would be useful if we had good partitioning criteria but that is not the case here.

But how many cases are there like this? Not too many.

EDIT: It seems I was misunderstood. This is not complaining that PostgreSQL doesn't scale well It is about a case that is outside of all reasonable limits.

Part of the reason for writing this is that I hear people complain that the RDBMS model breaks down at 1TB which is hogwash. We are facing problems as we look towards 100TB. Additionally I think that PostgreSQL would handle 100TB fine in many other cases, but not in ours. PostgreSQL at 10, 20 or 50TB is quite usable even in cases where big tables have no adequate partitioning limit (needed to avoid running out of page counters), and at 100TB in most other cases I would expect it to be a great database system. But the sorts of problems we will hit by 100TB will be compounded by the exponential growth of the data (figure within 8 years we expect to be at 1.3PB). So the only solution really is to move to a big data platform.

16 comments:

Anders SvenssonMarch 20, 2016 at 4:55 AM
Could you give a bit more insight in why your main parts is not applicable to relevant partitioning but still seems to grow exponentially?
ReplyDelete
Replies
UnknownMarch 20, 2016 at 8:17 AM
You make a generalized statement that "PostgreSQL does not scale well", and "The main portions of the application have no natural partition criteria", yet you do not provide any specifics or details. To be fair, I believe you should state what the actual application is and also details about the large table(s).
ReplyDelete
Replies
AnonymousMarch 20, 2016 at 10:42 AM
I'm just curious what you are switching to that scales better for your needs?

My thought is also that given that PostgreSQL improves each year so much, if your outgrowth outlook is in 3 years. Seems a little premature for you to switch. By that time, PostgreSQL might scale enough for this. PostgreSQL will have some degree of parallelization in 9.6 for example.
ReplyDelete
Replies
Andrew DunstanMarch 20, 2016 at 11:55 AM
Having large data sets that don't partition naturally is not uncommon. One of my former clients has such a data set, although it's not growing to anything like this extent.
ReplyDelete
Replies
Anders SvenssonMarch 20, 2016 at 11:59 AM
If you have a data model that does not allow partitioning I believe you will get into issues regardless of which database you would select.

You mention that you have a lot of complex joins toward this and other relations. Having this kind of huge Single un-partitioned table in the center of your application must severely hurt your performance. How large are your indexes on this table, they must also be huge?

I would (as Regina asked before) be very interesting to understand which DB solution your are looking at that effectively can handle that kind of design and target data volumes?

In our application (using a data model adapted for partitioning) we currently handle petabyte scale of real-time telemetry data using partitioning and sharding on top of PostgreSQL and it works like a charm.

ps
Remember that PostgreSQL currently have a maximum table size of 32TB so you probably have to find an alternative sooner than you think if you can't partition your data.
ReplyDelete
Replies
Anders SvenssonMarch 21, 2016 at 2:16 AM
Hi again Chris.

Are you using this GIN index to perform full text searches?.

I recently stumble across a really cool postgres index extension that provides the very efficient Elasticsearch directly in PostgreSQL by using ES as index (one got to love PostgreSQL:s pluggable architecture). It is very possible that this is something you could benefit greatly from.

https://github.com/zombodb/zombodb

ReplyDelete
Replies
Digital TechDecember 10, 2024 at 2:45 AM
While PostgreSQL is often praised for its robustness and flexibility, there are inherent limitations that can hinder its performance as databases grow in size and complexity. hp server distributor in dubai
ReplyDelete
Replies

Add comment