Sunday, September 2, 2012

O/R Modelling in PostgreSQL part 4: Composite Columns and Cross-Table Refs

So far, we have looked at tables as classes and various ways of looking at inheritance within that structure.  However, tables can be nested in PostgreSQL and this has a number of uses, particularly in views, and data structures passed to and from functions.  As PostgreSQL support for things like JSON becomes more important, these approaches will become more important as well.  Composite types in columns however opens up a significant number of gotchas and in general I tend to suggest avoiding this for physical storage.

Composite types, also known as structured data types, allow you to further define interfaces for user defined functions, and add additional sub-interfaces to those.  They can also be used to define complex data in tables although often multiple inheritance provides for a cleaner solution when it comes to actual storage.   However, complex types can be useful in a number of cases where existing data types are inadequate, as an alternative to building the type in C.

In some cases, the logic can be helpful here too.  In general when looking at base storage, my preference is to look at inheritance first, and then failing that to look at in-type storage.  Inheritance can give you all the benefits of in-column storage with fewer gotchas, and greater reusability.  However, there are cases where this breaks down, notably in views.  Also as we noted before tables are composite types (but not every composite type is a table).

We will look at the differences by considering an inheritable table storing address information and see how it behaves differently when used as a compsite type for a column.

Warning:  This is a comparatively new area for PostgreSQL and there are many areas of inconsistent behavior that have not been worked out.  If you are using composite types for column definitions, be prepared to run into unexpected sharp edges and the possibility of behavior changing in the future.  Additionally multiple inheritance (presuming unique column names) provides the same benefits with fewer inconsistencies.  This is really at the edges of the system.

Table Schema:

CREATE TABLE country (
     id serial not null unique,
     name text primary key,
     short_name text not null unique,
     state_nullable bool not null default false,
     requires_mailcode bool not null default true,
     unique(id, state_nullable, requires_mailcode),
);

Interface tables:

CREATE TABLE country_ref (
    country_id int,
);

CREATE TABLE country_ref_ext (
    country_state_nullable bool,
    country_requires_mailcode bool,
    CHECK (country_id NOT NULL)
) INHERITS (country_ref);

And of course, our traverse function:

CREATE OR REPLACE FUNCTION country(country_ref)
RETURNS country 
LANGUAGE SQL AS $$

SELECT * FROM country WHERE id = $1.country_id;

$$;

Ok, so now we have this part set up.

CREATE TABLE address (
    id serial not null unique,
    street_one text not null,
    street_two text,
    street_three text,
    floor text,
    care_of text,
    city text not null,
    state_province text,
    mail_code text,
    foreign key (country_state_nullable, 
                 country_requires_mailcode, 
                 country_id) 
       references country
               (state_nullable, 
                requires_mailcode, 
               id)
) INHERITS (country_ref_ext); 

Of course, now I can take an address and pull the country info off of it by dereferencing the primary key, but the nullable state and mailcode properties are also guaranteed to be equal to what they are in the country table.

Now in this case, we have inherited a specific country reference.  The country is required and so we know there will be a country.

Now in this case, any check constraints we have inherited would be applicable and we could subclass interface classes however we'd like.

Now, suppose we implement the country ref as a database column instead.

So our table looks like this instead:

CREATE TABLE address_noinherit (
    id serial not null unique,
    street_one text not null,
    street_two text,
    street_three text,
    floor text,
    care_of text,
    city text not null,
    state_province text,
    mail_code text,
    country_ptr country_ref_ext
);

Composite Type Behavior and Gotchas

In the first case we can simply do:

SELECT (a.country).name from address a where id = 1;

In the second gase, we would similarly:

SELECT ((a.country_ref).country).name
  FROM address_noinherit a WHERE id = 1;

You can then use this to implement a pointer system in your tables for recurring links.

Except that it doesn't really work that way as well as you would might like.

The NOT NULL check is not checked on a table which is not of the type where it is defined.  Consequently we have to declare that separately:

    CHECK ((country_ref).country_id IS NOT NULL)

Worse, adding a column to the type table can lead to problems.   You cannot add NOT NULL or DEFAULT clauses when adding the type if the table is being used as a type for the column of another table.  To work around this you must add the NOT NULL and DEFAULT constraints after first adding the column.

Except, assuming that runs you into trouble too.   There are exceptions to the above rule, particularly where domains are involved in types.  For example:

A domain for positive, not null integers:
or_examples=# create domain pos_not_null_int as int not null check (value > 0);
CREATE DOMAIN






A table which uses these:
or_examples=# create table rel_examples.domaintest (id pos_not_null_int);
CREATE TABLE



Constraints on the domain are enforced (as a test):
or_examples=# insert into rel_examples.domaintest values (-1);
ERROR:  value for domain pos_not_null_int violates check constraint "pos_not_null_int_check"
or_examples=# insert into rel_examples.domaintest values (null);
ERROR:  domain pos_not_null_int does not allow null values

A table which uses this as a column type:
or_examples=# create table comp_domain_test ( test rel_examples.domaintest);
CREATE TABLE


Despite what I said before, the constraints are enforced here (which they would not be if the initial table had its check constraints declared at the table level):
or_examples=# insert into comp_domain_test values (row(null));
ERROR:  domain pos_not_null_int does not allow null values
or_examples=# insert into comp_domain_test values (row(-1));
ERROR:  value for domain pos_not_null_int violates check constraint "pos_not_null_int_check"

When using complex types in columns the dba is advised not to make too many assumptions.  Given the inconsistent behavior here, I expect that it will likely change in future versions, but in which way one cannot say.

Additionally sub-tuple elements cannot be a part of foreign keys, so you are left with something with all the problems that Oracle's REF implementation does, namely that it gives you a weak foreign key implementation which does not actually enforce the underlying storage.  However it can still be automatically traversed.  In a future post we will look at solutions to foreign key enforcement.

In general, inheritance is cleaner than composite columns and there are far fewer gotchas.  Nevertheless there are some cases where they are an elegant solution to a problem, particularly when looking at nested data storage to enforce constraints which cannot otherwise be expressed.

Composite types in VIEWs

However, where storage is not at issue, composite types, whether based on tables or not, can be used here without the previous issues.  Stored procedures can be used to generate composite types (i.e. without inheritance) or directly from the inheritance tree.

For compsite types which are returned from stored procedures, these can be presented directly in the view, or not.  Methods of previous tables can be called as well (so inventory_item.markup in my previous example might go from a function call to an output column).  Similarly methods associated with types can be called here as well.  However, it may be useful to expose the types directly as well as a way of allowing for additional functionality to be requested.

In the initial address case which uses inheritance, we might:

CREATE VIEW address_view AS 
SELECT * FROM address;

But then we cannot:

SELECT (av.country).* from address_view av where ....;

One way around this is:

CREATE VIEW address_view AS
SELECT id, street_one, city, state_province, mail_code,
       a::country_ref AS country_ref
  FROM address a;

Now we can:

SELECT ((country_ref).country).name FROM address_view
 WHERE city = 'Paris'
GROUP BY country_ref;

Table aliases can be similarly captured for function output that creates complex types and behavior exposed in views this way.

In object-relational design it is critical to consider how a view or interface will be used.  One real problem with including too many heavy O/R features in a table is that it can complicate the use of the data by people running queries.   Nested data structures primarily are helpful when they create interfaces to O/R logic that is needed and would not otherwise be available, or where the application expects the data in this format.  In other words, human and computer interfaces may use varying levels of O/R logic.  This, of course, will be covered by a future item in this series.

Next:  Nested Data Structures and Storage (Uses and Gotchas)
Next Week:  Interlude:  Contrasting to MySQL and Messaging in PostgreSQL

No comments:

Post a Comment