Monday, August 27, 2012

PostgreSQL O/R Modelling Part 3: Table Inheritance and O/R Modelling in PostgreSQL

Note:  PostgreSQL 9.2 will allow constraints which are not inherited.  This will significantly  impact the use of inheritance, and allow for real set/subset modelling with records present in both parent and child tables.  This is a major step forward in terms of table inheritance.

PostgreSQL allows tables, but not views or complex types, to inherit table structures.  This allows for a number of additional options when doing certain types of modelling.  In this case, we will built on our inventory_item type by adding attached notes.  However we may have notes attached to other tables too, and we want to make sure that each note can be attached to only one table.

This is exactly how we use table inheritance in LedgerSMB.

Before we begin, however it is worth looking at exactly what is inherited by a child table and what is not.

The following are inherited:
  • Basic column definitions (including whether a column is nullable)
  • Column default values
  • CHECK constraints (cannot be overridden in child tables)
  • Descendants can be implicitly casted to ancestors, and
  • Table methods are inherited 
The following are not inherited:
  • Indexes
  • Unique constraints
  • Primary Keys
  • Foreign keys
  • Rules and Triggers
The Foreign Key Problems and How to Solve Them

There are two basic foreign key problems when using table inheritance on PostgreSQL. The first is that foreign keys themselves are not inherited, and the second is that a foreign key may only target a specific relation, and so rows in child tables are not valid foreign key targets.  These problems together have the same solution, which is that where these are not distinct tables for foreign key purposes, the tables should be factored in such a way as to break this information out, and failing that, a trigger-maintained materialized view will go a long ways towards addressing all of the key management issues.

For example, for files, we might place the id and class information in a separate table and then reference these in on child relations.  This gives a valid foreign key target for the set of all inherited tuples.  Foreign keys can also be moved to this second table, allowing for them to be centrally managed as well, though it also means that JOIN operations must include the second table.  A second option might be to omit class and use the child table's tableoid, perhaps renaming it in the materialized view as table_id.

The Primary Key Problem and How to Solve It

Similarly because indexes are not inherited, we cannot ensure that a specific combination of tuples is unique through a hierarchy.

The most obvious solution is to treat the combination of the natural primary key and the tableoid as a primary key for modelling purposes.   However this becomes only really useful when the primary key is a composite field with mutually exclusive constraints on child tables.  In other words, partitioning is necessary to sane set/subset modelling using table inheritance.

Where Not to Use Table Inheritance


Most of the table inheritance documentation suggests that the primary use of table inheritance is set/subset modelling, where an subset of a set has its own extended properties.  In practical use, however, a purely relational solution is usually cleaner for this specific sort of case and results in fewer key management issues (and therefore less overall complexity).

For set/subset modelling, a much cleaner solution is to use composite foreign keys.

For example, revising the example from the manual for table inheritance, a cleaner way to address the issue where you have cities and capitals is to do something like:

CREATE TABLE cities (
    name text primary key,
    altitude float, 
    population int,
    is_capital bool not null,
    unique(is_capital, name)
);

The unique constraint does nothing to the actual table.  If "name" is guaranteed to be unique, then ("is_capital", "name") is guaranteed to be unique.  What it does, however, is designate that as a secondary key.

You can then create a table of:

CREATE TABLE capitals (
    name text primary key,
    is_capital bool not null,
    state text not null,
    foreign key (name, is_capital) 
        references cities (name, is_capital)
        DEFERRABLE INITIALLY DEFERRED
);


 This approach, where subset membership is a part of a secondary key, allows for greater control regarding set/subset modelling.  You can then create a trigger on cities which checks to see if the city exists in capitals (enforcing an insert flow of "insert into capitals, then insert into cities, then commit").

No primary or foreign key problems occur with this solution to set/subset models.   With multiple subsets you will likely wish to generalize and use a "city_class" field of type int referencing a city_class table so you can control subsets more broadly.  However in general, table inheritance is a poor match for this problem because of foreign key constraint issues.  Primary key management issues are less of an issue however (and becoming significantly easier to solve).


The one use case for set/subset modelling using inheritance is table partitioning.  It has all the problems above, and enforcing foreign keys between two partitioned tables can become very complex, very fast unless materialized views are used to proxy the enforcement.


Table Structure:  Purely Relational Approach

CREATE SCHEMA rel_examples;
CREATE TABLE rel_examples.note (
   note text not null,
   subject text not null,
   created_by int not null,
   note_class int not null,
   created_at timestamp not null,
   id int unique,
   primary key (id, note_class)
);

CREATE TABLE rel_examples.note_to_inventory_item (
   note_id int not null,
   note_class int not null check (note_class = 1),
   item_id int references inventory_item (id),
   foreign key (note_id, note_class) 
      references rel_examples.note (id, note_class)
);

CREATE TABLE rel_examples.note_to_......;

These would then be queried using something like:

SELECT n.* FROM rel_examples.note n
  JOIN rel_examples.note_to_inventory_item n2i 
       ON n2i.note_id = n.id AND n2i.note_class = n.note_class
  JOIN inventory_item i ON i.id = n2i.item_id
 WHERE i.sku = 'TEST123';

This doesn't preclude other Object-Relational approaches being used here.  I could of course add a method tsvector to make full text searches easier, or the like and apply that to note, and it would work just as well.  However it doesn't really apply to the data model very well, and some performance can be gained, since we are only looking at pulling notes by attachment, by breaking up the table into more workable chunks which could be queried independently.

Additionally the global note table is a maintenance  nightmare.  Enforcing business rules in that case is very difficult.

Table Structure:  Object-Relational Approach

An Object-Relational approach starts off looking somewhat different:

CREATE TABLE note (
    id serial not null unique,
    note text not null,
    subject text not null,
    note_class int not null,
    created_by int not null,
    ref_key int not null,
    created_at timestamp not null
);

We'd then probably add a trigger to prevent this table from receiving rows.  Note that we cannot use a check constraint for this purpose because it will be inherited by every child table.  Typically we'd use a trigger instead of a rule because that way we can easily raise an exception.

Then we'd

CREATE TABLE inventory_note (
    CHECK (note_class = 1),
    PRIMARY KEY (id, note_class),
    FOREIGN KEY (ref_key) references inventory_item(id)
) INHERITS (note);

Suppose we also create a tsvector method as such:

CREATE OR REPLACE FUNCTION tsvector(note) RETURNS tsvector
LANGUAGE SQL AS $$
SELECT tsvector($1.subject || ' ' || $1.note);
$$;

Now in my example database, I have an inventory_item with an id of 2, so I might:

INSERT INTO inventory_note
     (subject, note, note_class, created_by, created_at, ref_key)
VALUES
     ('Testing Notes', 'This is a test of the note system', 1, 1, now(), 2);

We know this worked:

or_examples=# select * from note;
 id |               note                |    subject    | note_class | created_b
y | ref_key |         created_at        
----+-----------------------------------+---------------+------------+----------
--+---------+----------------------------
  1 | This is a test of the note system | Testing Notes |          1 |         
1 |       2 | 2012-08-22 01:17:27.807437
(1 row)

Note above that I selected from the parent table and child tables were pulled in automatically.  If I want to exclude these I can use SELECT * FROM ONLY note and none of these will be pulled in.

We can also note that the method is inherited:

or_examples=# select n.tsvector from inventory_note n;
                              tsvector                              
---------------------------------------------------------------------
 'Notes' 'Testing' 'This' 'a' 'is' 'note' 'of' 'system' 'test' 'the'
(1 row)

The tables could then be queried with one of two equivalent approaches:

SELECT n.* FROM note n
  JOIN inventory_item i ON n.ref_key = i.id AND n.note_class = 1
 WHERE i.sku  = 'TEST123';

or alternatively:

SELECT n.* FROM inventory_note n
  JOIN inventory_item i ON n.ref_key = i.id
 WHERE i.sku = 'TEST123';

Both queries return the same results.

Advantages of the Object-Relational Approach


The object-relational model allows us to be sure that every note is attached to something, which we cannot do gracefully in the relational model, and it allows us to ensure that two notes attached to different items are in fact different.  It also allows for simpler SQL.

On the performance side, it is worth noting that in both the query cases above, PostgreSQL will only query child tables where the constraints could be met by the parameters of the query, so if we have a dozen child tables, each of which may be very large, we only query the table we are interested in.  In other words, inherited tables are essentially naturally partitioned (and in fact people use inheritance to do table partitioning in PostgreSQL).

Multiple Inheritance

Multiple inheritance is supported, but if used care must be taken that identically named fields are actually used in both classes in compatible ways.  If they are not, then problems occur.  Multiple inheritance can however be used safely both for interface development (when emulating inheritance in composite types, see below), and where only one of the inheriting relations contains actual columns (the other might contain constraints).

Thus we could still safely do something like:

CREATE note (
    note text,
    subject text,
    ....
);

CREATE table joins_inventory_item (
   inventory_id int,
);

CREATE NOTE inventory_note (
    FOREIGN KEY (inventory_id) REFERENCES inventory_item(id),
    ... -- partial primary key constraints, and primary key def
) inherits (note, joins_inventory_item);

This could then be used to enforce consistency in interface design, ensuring more readable queries and the like, but it doesn't stop there.  We can use joins_inventory_item to be the vehicle for a way to follow a reference.  So:

CREATE OR REPLACE FUNCTION inventory_item
(joins_inventory_item)
RETURNS inventory_item
LANGUAGE SQL AS
$$ SELECT * FROM inventory_item WHERE id = $1.inventory_id $$; 

In this case, then you can use inventory_item as a virtual pointer to the inventory item being joined as such:

CREATE TABLE inventory_barcode (
    barcode text,
    FOREIGN KEY (inventory_id) REFERENCES inventory_item(id)
) INHERITS (joins_inventory_item);

The uses for multiple inheritance however don't end there.  Multiple inheritance gives you the ability to define your database in re-usable chunks with logic following that chunk.    Alternative presentation of sets of columns then becomes manageable in a way it is not with complex types.

A More Complex Example:  Landmarks, Countries, and Notes
Another example may be where we are storing landmarks, which are attached to countries.  Our basic table schema may be:

CREATE TABLE country (
    id serial not null unique,
    name text primary key,
    short_name varchar(2) not null unique
);

CREATE TABLE country_ref (
    country_id int
);

Our traversal function:

CREATE FUNCTION country(country_ref) RETURNS country
LANGUAGE SQL STABLE AS
$BODY$ SELECT * FROM country WHERE id = $1.country_id $BODY$;

Next to notes:

CREATE TABLE note_fields (
    note_subject text,
    note_content text
);

Like country_ref, we probably will never query note_fields themselves.  There is very little of value in querying only contents like this.  However we can add a tsvector derived value:

CREATE FUNCTION note_tsvector(note_fields)
RETURNS tsvector IMMUTABLE
LANGUAGE SQL AS $BODY$

SELECT to_tsvector('english', coalesce($1.note_subject, '') || ' ' ||
coalesce($1.note_content, ''));

$BODY$;

Now, the function name is prefixed with note_ in order to avoid conflicts with other similar functions.  This would allow multiple searchable fields to be combined in a table and then mixed by inheriting classes.

CREATE TABLE landmark (
    id serial not null unique,
    name text primary key,
    nearest_city text not null,
    foreign key (country_id) references country(id),
    CHECK (country_id IS NOT NULL),
    CHECK (note_content IS NOT NULL)
) INHERITS (note_fields, country_ref);

The actual table data will not display too well here, but anyway here are a few slices of it:

or_examples=# select id, name, nearest_city, country_id from landmark;
 id |        name        | nearest_city  | country_id
----+--------------------+---------------+------------
  1 | Eiffel Tower       | Paris         |          1
  2 | Borobudur          | Jogjakarta    |          2
  3 | CN Tower           | Toronto       |          3
  4 | Golden Gate Bridge | San Francisco |          4
(4 rows)

and the rest of the table:

or_examples=# select id, name, note_content from landmark;
 id |        name        |              note_content             
----+--------------------+----------------------------------------
  1 | Eiffel Tower       | Designed by a great bridge builder
  2 | Borobudur          | Largest Buddhist monument in the world
  3 | CN Tower           | Major Toronto Landmark
  4 | Golden Gate Bridge | Iconic suspension bridge
(4 rows)

The note_subject field is null on all records.

SELECT name, (l.country).name as country_name FROM landmark l;

or_examples=# SELECT name, (l.country).name as country_name FROM landmark l;
        name        | country_name
--------------------+---------------
 Eiffel Tower       | France
 Borobudur          | Indonesia
 CN Tower           | Canada
 Golden Gate Bridge | United States
(4 rows)

Demonstrating the note_tsvector interface.  Note that this could be improved upon by creating different tsvectors for different languages.

or_examples=# select name, note_content from landmark l where
plainto_tsquery('english', 'bridge') @@ l.note_tsvector;
        name        |            note_content
--------------------+------------------------------------
 Eiffel Tower       | Designed by a great bridge builder
 Golden Gate Bridge | Iconic suspension bridge
(2 rows)

So the above example shows how the interfaces offered by two different parent tables can be invoked by a child table.  This sort of approach avoids a lot of the problems that come with storing composite types in columns (see upcoming posts) because the complex types are stored inline inside the table.


Base Tables as Catalogs

The base tables in these designs are not generally useful to query for data queries. However they can simplify certain types of other operations both manually and data-integrity wise.   The uses for data integrity will the subject of a future post.  However here we will focus on manual tasks where this sort of inheritance can help.


Suppose we use the above note structure in a program which manages customer contacts, and one individual discovers inappropriate language in some of the notes entered by another worker.  In this case it may be helpful to try to determine the scope of the problem.  Suppose the note includes language like "I told the customer to stfu."  We might want to:


SELECT *, tableoid::regclass::text as table_name
  FROM note_fields nf 
 WHERE nf.note_tsvector @@ to_tsquery([pattern]);


In this case we may be trying to determine whether to fire the offending employee, or we may have fired the employee and be trying to figure out what sort of damage control is necessary.


A test query with the current data set above shows what will be shown:


or_examples=# SELECT *, tableoid::regclass::text as table_name
  FROM note_fields nf
 WHERE nf.note_tsvector @@ to_tsquery('bridge');
 note_subject |            note_content            | table_name
--------------+------------------------------------+------------
              | Designed by a great bridge builder | landmark
              | Iconic suspension bridge           | landmark
(2 rows)


This sort of interface has obvious uses when trying to do set/subset modelling where the interface is inherited.  Careful design is required to make this perform adequately however.  In the case above, we will be having to do somewhat deep inspection of all tables but we don't necessarily require immediate answers, however.


In general, this sort of approach makes it somewhat difficult to store records in both parent and child tables without reducing semantic clarity of those tables.  In this regard, the parent acts like a catalog of all children.  For this reason I remain sceptical whether even with NO INHERIT check constraints it will be a good idea to insert records in both parent and child tables.  NO INHERIT check constraints, however finally provide a useful tool for enforcing this constraint however.


Contrasts with DB2, Informix, and Oracle

DB2 and Oracle have largely adapted Informix's approach to table inheritance, which supports single table inheritance only, and therefore requires that complex types be stored in columns.  With multiple inheritance, if we are careful about column naming, we can actually in-line multiple complex types in the relation.  This provides at once both a more relational interface and one which admits of better object design.

Emulating Inhertance in Composite Types

Composite types do not allow inheritance in PostgreSQL, and neither do views.  In general, I would consider mixing inheritance and views to be dangerous and so would urge folks not to get around this limitation (which may be possible using RULEs and inherited tables).

One basic way to address this is to create a type schema and inherit tables from a table called something like is_abstract as follows:

CREATE TABLE is_abstract (check (false ));

No table that inherits is_abstract will ever be allowed to store rows, but check constraints are only checked on data storage because otherwise many problems arise so the fact that we have a check constraint that, by definition, always fails allows us to deny use of the table for storing data but not use of a relation as a class which could be instantiated from data stored elsewhere.

Then we can do something like

CREATE TABLE types.my_table (
    id int,
    content text
) INHERITS (is_abstract);

Once this is done, types.mytable will never be able to store rows.  You can then inherit from it, and use it to build a logical data model.  If you want to change this later and actually store data, this can be done using alter table statements, first breaking the inheritance, and second droping the is_abstract_check constraint.  Once these two are done, the inheritance tree can be used to store information.

Next:  Inheriting interfaces in set/subset modelling
Next Week:  Complex Types

2 comments: