So we decided to partition the table into around 50 pieces in order to allow autovacuum to achieve a bit better parallelism in managing the data. This helped to some extent. But partitioning is a rare solution for rare problems and comes with unexpected costs. Interestingly most of our problems have been ORM-related. Here are some we ran into and their solutions (spoiler: at the end of the day, effectively, we stopped using an ORM on these tables). At the end of the day, throughput on these tables was increased around 10-fold, and db load cut by about 90%.
Annoyance 1: Redirection and ORM transparency
The first problem we had was getting DBIx::Class to work with the partitioned table. The solution was to add another view in between which did the redirection of inserts, updates, and deletes. This also allowed us to go through the ORM for inserts (we still do) without the cross-locking issues below being a problem.
Annoyance 2: Cross-locking and exclusion constraints
A second major problem is that autovacuum can only free up space when it gets an exclusive lock and if any queries are going through the parent table, then you get constraint exclusion coming into play. The problem here is that constraint exclusion takes out a relatively non-invasive lock on every table at planning time which means you cannot even plan to select a row from one partition if another partition is locked, if you are going through the parent table.
The obvious solution here is not to go through the parent table, but the ORM doesn't support that so we had to drop to SQL. It also took us about 6 months to find and fix.
Annoyance 3: Constraint exclusion doesn't always do what you expect it to!
One day we had a very slow running straight-forward query that should have been able to resolve quickly on an index scan on one of the partitions. However, because the constraint criteria was being brought in via a subquery, it was not available at plan time, so it was falling back on a sequential scan through another large partition. Ouch...... Found the query and fixed it.
Annoyance 4: Solving some performance problems puts more stress on the next bottleneck
The result of the initial success was increased db concurrency, which was great until it became clear our selection of rows to process and delete was leading to lots of indexes having huge numbers of dead tuples at their heads. This meant that selecting rows actually became slower than before. So we had to go back and engineer a new selection algorithm to avoid this problem....
Unrelated Annoyance: Long running transactions causing autovacuum headaches
An interesting unrelated issue we had was the fact that at the time, we had transactions that would sometimes remain open for a week. While the partitions directly affected were small, the problem is that autovacuum cannot clear tuples that are invalidated since the oldest transaction started, so higher processing throughput partitions were adversely affected. After significant effort, we got the worst offenders corrected and now the longest running transactions take just over a day. This is usually sufficient depending on the load of the system (but sometimes the duration spikes to 18 hours).
Was the partitioning worth it? Definitely! However it was a bit of a long road to get there
No comments:
Post a Comment