With this client's permission I have decided to take a lot of the work I have done in optimizing their job queue system and create an extension under PostgreSQL for it.. The job queue currently runs tens of millions of jobs per day (meaning twice that number of write queries, and a fair number of read queries too) and is one of the most heavily optimized parts of the system, so this will be based on a large number of lessons learned on what is a surprisingly hard problem.
It is worth contrasting this to pg_message_queue of which I am also the author. pg_message_queue is intended as a light-weight, easy to use message queue extension that one can use to plug into other programs to solve common problems where notification and message transfer are the main problems. This project will be an industrial scale job queuing system aimed at massive concurrency. As a result simplicity and ease of use take second place to raw power and performance under load. In other words here I am not afraid to assume the dba and programming teams know what they are doing and has the expertise to read the manual and implement appropriately.
The first version (1.x) will support all supported versions of PostgreSQL and make the following guarantees:
- massively multiparallel, non-blocking performance (we currently use with 600+ connections to PostgreSQL by worker processes).
- Partitioning, coalescing, and cancelling of jobs similar in some ways to TheSchwartz
- Exponential pushback based on number of times a job has failed
- Jobs may be issued again after deletion but that this can always be detected and bad jobs pruned
- Optionally job table partitioning.
The first client written will rely on hand-coded SQL along with DBIx::Class's schema objects. This client will guarantee that:
- Work modules done always succeeds or fails in a transaction
- A job notifier class will be shown
- Pruning of completed jobs will be provided via the perl module and a second query.
The history of this is that this came from a major client's use of The Schwartz and they out grew it for scalability reasons. While the basic approach is thus compatible, the following changes are made:
- Job arguments are in json format rather than in Storable format in bytea columns
- Highly optimized performance on PostgreSQL
- Coalesce is replaced by a single integer cancellation column
- Jobs may be requested by batches of various sizes
2.x will support 9.5+ and dispense with the need for both advisory locks and rechecking. I would like to support some sort of graph management as well (i.e. a graph link that goes from one job type to another which specifies "for each x create a job for y" type of semantics. That is still all in design.
No comments:
Post a Comment