Sunday, September 16, 2012

O/R Modelling in part 6: Polymorphism (and Gotchas) in PostgreSQL

Polymorphism is common in virtually all database management systems.  When we add an integer to a float, we expect this to just work.  Similarly various types of text fields can usually be concatenated together, etc.  In particularly extensible database systems, like PostgreSQL, polymorphism tends to lead to additional problems.  Indeed, implicit casts were cut way back a few years ago specifically because of problems they might cause.  This is an area I expect we will do quite a bit more with in LedgerSMB in the future.

The PostgreSQL Type System

PostgreSQL types can be divided into three specific types:  Primitives (those supplied by PostgreSQL out of the box or by extensions), domains, and object types.  The object types include composite types and table types.  Although typically these are lumped together, we will cover table types and composite types separately because the capabilities are far more developed when tables are created.

Base Primitive Types

Base types are intended to be the basic building blocks of relations.  They include such types as int4 and text.  These cannot have  methods attached but they can be arguments in functions provided that the procedural language understands the type.  Most of these will be passed in using the textual representation if all else fails.

Polymorphic Primitive Types

Polymorphic primitives are primitives which can take the form of other base types.  The primary ones are anyelement (which is any simple base type) and anyarray which is an array filled with any simple base type.  These are mostly used in functions so that the function can apply to array operations generally.  I have not encountered them elsewhere.

Pseudotype Primitives

Pseudotypes include triggers and a few other things.  They are mostly used as keywords, not as actual types.

SQL-Created Domain Types

Domains are the first significant type of SQL-created or user-defined type.  An SQL domain is a base type, valid for a subset of the base values.  Domains can be set to be enforced not null, and this applies only on storage, but unlike base types, this is checked as a property of a composite type as well.

SQL-Created Object Composite Types

PostgreSQL allows you to create composite types which can be used as input and output types for functions, elements in object composite types, columns in a table, and so forth.  Composite types have no inheritance, and cannot inherit interfaces from eachother.  Composite types cannot have any recursive elements and this is checked deeply.  For example you cannot:

CREATE TYPE a AS (
   a int
);

CREATE TYPE b  AS (
   a a
);

 ALTER TYPE a ADD attribute b b;

An attempt to do so will result in an error:

 ERROR:  composite type a cannot be made a member of itself

SQL-Created Object Table Types 

Table types are similar to composite types except that inheritance is supported.  This means in practice that the behavior of tables is quite different than the behavior of composite types when it comes to methods, implicit casting, and the like.  Note that implicit casts are not inherited.

Type Conversion:  CREATE CAST

Values and objects can be converted between types using casts.  Casts can be explicit only, implicit on assignment, and implicit generally.  In some cases casts can be dangerous.

The overall syntax of CREATE CAST is discussed in the PostgreSQL manual.   In general almost all user created casts are likely to use a SQL function to create one to another.  In general explicit casts are best because this forces clarity in code and predictability at run-time.

For example see our previous country table:

or_examples=# select * from country;
 id |     name      | short_name
----+---------------+------------
  1 | France        | FR
  2 | Indonesia     | ID
  3 | Canada        | CA
  4 | United States | US
(4 rows)

Now suppose we create a constructor function:

CREATE FUNCTION country(int) RETURNS country
LANGUAGE SQL AS $$
SELECT * FROM country WHERE id = $1 $$;

We can show how this works:

or_examples=# select country(1);
    country   
---------------
 (1,France,FR)
(1 row)

or_examples=# select * from country(1);
 id |  name  | short_name
----+--------+------------
  1 | France | FR
(1 row)

Now we can:

CREATE CAST (int as country)
WITH FUNCTION country(int);

 And now we can cast an int to country:

or_examples=# select (1::country).name;
  name 
--------
 France
(1 row)

The main use here is that it allows you to pass an int anywhere into a function expecting a country and we can construct this at run-time.  Care of course must be used the same way as with dereferencing foreign keys if performance is to be maintained.  In theory you could also cast country to int, but in practice casting complex types to base types tends to cause problems than it solves.

Inheritance-Based Implicit Casts

Inheritance-based implicit casts are an exception to the dangers of implicit casting.  These are not base type to base type casts, and they essentially act as slices of a table.  If your table structures use inheritance in a sane way they will work well.

Defined Implicit Casts

We can generally define implicit casts between tuples.  This is best done sparingly because any implicit cast multiplies execution pathways.  Defined implicit casts are not inherited.

To create an implicit cast, we could do as follows:

DROP CAST (int as country);

CREATE CAST (int as country)
WITH FUNCTION country(int)
AS IMPLICIT;

The problem of course is that as soon as we do this we have the potential for ambiguity.

Suppose I have a hypothetical state table, and I have functions called capital which can take arguments of country or state and returns the name of the capital city.  If I make a query of "select capital(1);" then am I asking for a country or a city capital?  I have always avoided this situation because it can never end well (so much so that I don't actually know how PostgreSQL resolves the issue).

Keep implicit casts for cases where you really need them and avoid them in other cases.

Problems with Implicit Casts to Primitives

The problems with implicit casts are greatly magnified when casting to (and hence between) primitives.  We will discuss just one such problem here:  numbers to timestamps.

Currently PostgeSQL supports casting dates as timestamps which is probably correct.  Suppose we want to take a number and turn it into a timestamp.  Maybe we are converting UNIX epochs to timestamps regularly and want to do so without a cast.

create cast (double precision as timestamp with time zone) 
with function to_timestamp(double precision) 
as implicit;

And so we try this:

CREATE FUNCTION to_timestamp(int)
returns timestamptz
language sql immutable as
$$ select $1::double precision::timestamptz; $$;

And add an explicit cast:
create cast (int as timestamp with time zone) 
with function to_timestamp(int) 
as implicit;

Works and well enough.  However, now we have a bug factory.

or_examples=# select extract(year from '2011-01-01'::date);
 date_part
-----------
      2011
(1 row)

or_examples=# select extract(year from 2011-01-01);
 date_part
-----------
      1969
(1 row)

Why?  Because the second is an integer math expression and evaluates to the integer of 2009, or 2009 seconds after midnight, January 1, 1969.  This is probably not what we had in mind.  Similarly:

or_examples=# select age('2011-01-01'::date);
      age     
---------------
 1 year 8 mons
(1 row)

or_examples=# select age(2011-01-01);
           age           
--------------------------
 42 years 8 mons 07:26:31
(1 row)

It is important to note here that the computer is doing exactly what we told it to do.  The problem however is that we now have bugs which are non-obvious to someone who is not quite fluent on the system to start with.

Another historical case that used to bite people is implicit casts of tuples to text.  This lead to very counterintuitive behavior in some cases, such as the infamous tuple.name or tuple.text issues.   Basically tuple.name would return name(tuple::text) which would be a textual representation of the tuple truncated to 63 characters.  This was fixed by making the cast of tuple to text explicit in PostgreSQL 8.3.

In general, polymorphism is a very important feature of PostgreSQL but it is something which has many gotchas.  In general adding explicit casts won't get you into too much trouble but implicit casts are often asking for issues.

No comments:

Post a Comment