Thursday, August 30, 2012

O/R Modelling 3.2: Set/Subset Modelling with Inheritance

Despite the fact that set/subset modelling is usually not a win for inheritance, sometimes in the cases of more complex logic, it can be.  David Fetter has come up with an example where he tries to solve the problem of single parent/multiple child inheritance on a purely relational model.

In general when you get into constraints this complex, the question is no longer how to avoid complexity but how to manage it.  There are several ways in which refactoring the tables may be helpful, and table inheritance allows us to do just that (no we don't have to lose foreign key management capabilties to do so).  This allows us to look at set/subset management differently.

Of course how we go about managing this complexity may still be a matter of taste and of the exact circumstances involved (the same solution may not be optimal everywhere).  One person may prefer lots of NULLS and complex constraints.  One person may prefer few NULLs, no inheritance, and lots of triggers.  Someone else may prefer a richer toolkit with the pitfalls that provides.  Managing complexity is more an art than a science.  In general though, we don't blame the paintbrush when the artist doesn't know how to use it.

In my experience, using NULLs to indicate type furthermore creates a situation where it is quite possible to end up with misbehavior that takes some time to troubleshoot and so I don't typically like this approach but again, I recognize that the complexity issues remain regardless of the solution.

In general the approach here takes the following steps:

  1. Decompose the table in a relational model
  2. Look for common interfaces
  3. Inherit those interfaces
Fetter's Table Structure:

I am skipping over the check constraints and unique constraints which are relatively complex here.  You should read the full post yourself to see what complexity we are managing.  However for refactoring purposes, I can't really avoid quoting his table field structure:

CREATE TABLE payment (
    payment_id BIGSERIAL PRIMARY KEY,
    payer_id INTEGER NOT NULL REFERENCES payer,
    payment_date DATE NOT NULL,
    amount NUMERIC NOT NULL CHECK(amount > 0),
    currency_id INTEGER NOT NULL REFERENCES currency,
    /* Check */
    check_number TEXT,
    check_date DATE,
    /* Credit Card */\
    cc_id INTEGER REFERENCES locked_down_tight_cc_table,
    /* You wouldn't store an actual CC number here, would you?!? */
    expiry_date DATE,
        CHECK(expiry_date = DATE_TRUNC('month', expiry_date)),
    /* Manila Envelopes are pretty anonymous 
    is_manila BOOLEAN NOT NULL DEFAULT FALSE,
    -- check constraints then follow which are quite complex
    -- and partial unique constraints are added after the fact
); 
/* comments by David Fetter */
-- Comments added by Chris Travers 

Step 1:  Relational Refactoring

We can then factor this out into the following tables:
  • payment_class new, added to categorize payments.
    • Includes an extended storage column to indicate that storage is required for a type.
  • payment (main details go here), adding payment_class_id field
  • payment_check stores check-specific fields
    • payment_class_id set to 1
  • payment_cards stores card-specific fields
    • payment_class_id set to 2

We would then have to create a custom constraint trigger for managing relational integrity between payment and payment_check, payment_cards.  Note that standard foreign keys work one direction so this isn't too bad.

Step 2:  Common Interfaces:

Every payment extension table requires a payment_class_id and a payment_id field which together act as a foreign key in the parent table.  This provides an interface which can go both ways and simplify the interfaces.  This should then be our base table.

Step 3:  Completed redesign

Note that some foreign keys are omitted in order to allow the table to be created in this demo environment.

CREATE TABLE payment_ref (
    payment_id bigint not null,
    payment_class_id int
);

CREATE TABLE payment_class (
    id int not null unique, -- manually assigned, few
    label text primary key,
    extended_storage bool not null    
);

INSERT INTO payment_class (id, label, extended_storage)
VALUES (1, 'check', true),
       (2, 'cards', true),
       (3, 'cash', false);

CREATE TABLE payment (
    payment_id bigserial PRIMARY KEY,
    payment_class_id int NOT NULL references payment_class(id),
    -- above two columns will be primary key
    payer_id int not null, 
    -- foreign key on payer_id omitted due to demo here
    amount numeric not null check (amount > 0),
    currency_id int not null,
    -- foreign key on currency_id omitted due to demo here
    UNIQUE (payment_id, payment_class_id) -- secondary key
);

CREATE TABLE payment_check (
    check_date date not null,
    check_number text not null,
    CHECK (payment_class_id = 1),
    FOREIGN KEY (payment_class_id, payment_id) 
        REFERENCES payment (payment_class_id, payment_id)
        DEFERRABLE INITIALLY IMMEDIATE
) INHERITS (payment_ref);

CREATE TABLE payment_cards (
    cc_info int not null, 
    -- foreign key omitted so table can be created in demo environment
    CHECK (payment_class_id = 2),
    FOREIGN KEY (payment_class_id, payment_id) 
        REFERENCES payment (payment_class_id, payment_id)
        DEFERRABLE INITIALLY IMMEDIATE
) INHERITS (payment_ref);

All that is left is to create the constraint triggers and we are done.  Note inheritance here buys us a consistent interface for joins, plus a consistent interface for the constraint triggers.

In essence the way we are using table inheritance here is as a way to partition additional information to be joined to a table, where foreign keys against the total partition set are relatively uninteresting.  Note however, one could come up with ways here to define an additional unique index (on payment_class(id, extended_storage) which would allow effective foreign keys to be added to the inheritance tree.  The above approach, once constraint triggers are added (and tested) would be easier to understand and maintain in my experience than one using NULLs to indicate type, and would have more modular and maintainable check constraints.

Now the constraint triggers will still have to be created and tested and this adds significant complexity.  This may be a win where we need to maintain extensibility here, for example if we want to be open to adding gift certificates as forms of payment, or international money orders, etc.  On the other hand if we do not have to preserve this, then the inheritance-based approach adds needless complexity regarding foreign keys.

If the relational model is good enough (namely because of the lack of a need for extensibility in payment types) one can simply add an id field to each sub-table, a deferrable foreign key constraint (initially deferred) from payment to the other tables, and check constraints on whether these are NULL based on payment type.  In that case, inheritance just gets us enforcement for identical semantics for foreign keys, and a virtual catalog of payment references.

On the other hand if the purely relational model (fkeys against physical storage tables directly) is not good enough, we can build a system which we use constraint triggers to enforce foreign keys against an object catalog (payment_ref) in this case.

The Moral of the Story

Set/Subset modelling with table partitioning does have some uses, but maintainability requires that the tables be relationally factored as much as possible first, and inherit no more than necessary.  However, sparingly used, inheritance can facilitate (rather than replace!) good relational design.  This works best when one looks at inheritance as a way of doing table partitioning with additional fields per partition.

With 9.2, the addition of NO INHERIT check constraints will allow the top level partition to be used, but before that the top-level needs to be clear of entries.  In general, however, I am not convinced that having the top level partition be used in this case is actually a good idea.  As soon as you do this, a vanilla SELECT and a foreign key will have suddenly very different behavior.  Semantic clarity counsels for using NO INHERIT constraints to forbid all inserts into parent nodes.

Is the complexity tradeoff worth it though?  You be the judge.

1 comment:

  1. The link to David Fetter's blog entry has changed: http://people.planetpostgresql.org/dfetter/index.php?/archives/30-Multiple-Inheritance.html

    ReplyDelete