I thought it would be helpful to talk about what problems will be discussed in the talk.
We won't be talking about the ordinary issues that come with scaling up hardware, or the issues of backup or recovery, or of upgrades. Those could be talks of their own. But we will be talking about some deep, specific challenges we faced and along the way talking about some of the controversies in database theory that often come up in these areas, and we will talk about solutions.
Two of these challenges concern a subsystem in the database which handled large amounts of data in high-throughput tables (lots of inserts and lots of deletes). The other two address volume of data.
- Performance problems in work queue tables regarding large numbers of deletions off the head of indexes with different workers deleting off different indexes. This is an atypical case where table partitioning could be used to solve a number of underlying problems with autovacuum performance and query planning.
- Race conditions in stored procedures between mvcc snapshots and advisory locks in the work queue tables. We will talk about how this race condition happens and we solved it without using row locks. We solved this by rechecking results in a new snapshot which we decided was the cheapest solution to this problem.
- Slow access and poor plans regarding accessing data in large tables. We will talk about what First Normal Form really means, why we opted to break the requirements in this case, what problems this caused, and how we solved them.
- Finally, we will look at how new requirements on semi-structured data were easily implemented using procedural languages, and how we made these perform well.
Please join me in Malmo or Moscow for this talk.