Saturday, August 2, 2014

Math and SQL, part 1: Introduction: Relations and Functions

There are two goals I have in this series.  The first is to heavily ground database thinking in mathematics thinking.  The second is to connect this more explicitly to functional programming which attempts to do the same more generally.  The basic foundation of functional programming and relational theory is the same.  While this may seem very rudamentary for Planet Postgresql, I am hoping that the information here will be useful in theoretical courses, etc.

This series will also discuss uses of the same ideas in Perl.  Many of the same ideas are extremely important for developers in many languages.  Consequently these will all be tagged both PostgreSQL and Perl and most will include perl code as well as PostgreSQL code.

I:  SQL as Relational Math Implementation


In basic terms, SQL provides an interface for relational database systems.  The most powerful features of it are essentially relational algebra-based.  In order to better handle concurrent connections, there is a lot missing from relational math, and in order to handle real-world data, there are features in relational algebra not present in relational math.

However for purposes of this series, I will focus on PostgreSQL's SQL interpretation and I will use non-standard syntax features when that makes the math clearer.  I like PostgreSQL in this regard because it means that the math is fairly clear.

Code samples explaining ways to accomplish various operations will be provided in Perl.  Note that these are mostly fairly naive versions.  If one were really creating an RDBMS in Perl, the code would probably be very different.

II:  Relational Math, Relations and Functions


At the core of both databases and functional programming are the concepts of relations and functions.    At their foundation, these are fairly old concepts.  Since a function is a subset of relation, let's start by defining and understanding what a relation is:

A relation is a set of correlated facts.

For example, let's look at two relations, one showing square roots and one showing squares:


  • For square roots:
    • 1: 1
    • 1: -1
    • 4: 2
    • 4: -2
    • 9: 3
    • 9: -3
  • For squares
    • -3: 9
    • -2: 4
    • -1: 1
    • 0: 0
    • 1: 1
    • 2: 4
    • 3: 9
Both of these are relations. They provide a list of facts which correlate.  A square root of 1 is 1.  Another square root is -1.  -1 squared is 1, and so forth.

In programming terms, one may thing of a relation as being the possible output of a subroutine.  In relational databases, however, a relation is a set of correlating facts.  Typically this is implemented as a "table."

On to functions:

A function is a relation where the fact of the relation are fully determined by the domain of the function.  Therefore if you know the fact from the domain, there is only one related fact.

This definition is key to understanding both functional programming and relational databases.  It underlies almost all aspects of the both disciplines.

The following are mathematical functions:

  1. f(x) = x + 1
  2. f(x) = 2 * x
  3. f(x) = x2
These are functions because for any given x there is exactly one f(x).

So things like random() are not functions but rather very large relations.  Similarly now() is not a function because it has no input.  Programming in a way that emphasizes functions generally leads to more easily testable code, fewer bugs, and fewer concurrency problems than programming in a way which emphasizes state.

Now, in database terms, this all becomes important because of the fact that a function is a subset of relation.  Therefore we can speak of a functional dependency within a relation.   Put simply, a candidate key is a subset of a relation which, as a domain, turns the relation into a function.  All facts in a relation are a function of each candidate key.

So let's look at one example for how this works in Perl.  We will use a hash table, which is basically a key/value store.  A hash table is an unordered relation in Perl, such that the value is a function of the key.  In this example, we will mark this up to be a little more transparent mathematically, though in production.

package KeyValueStore;

my %store;

sub set {
    my ($key, $value) = @_;
    $store{$key} = $value;
}

sub remove {
    my $key = shift;
    delete $store{$key};
}

sub get {
    my $key = shift;
    $store{$key};
}

package main;
use KeyValueStore;
KeyValueStore::set('foo', 'bar');
KeyValueStore::set('1', '2');

print KeyValueStore::get('foo');
1;

Now, in the above code, set and remove define the relation, and get returns a function of the key, based on the relation at any given time.  Functional programming would not see this as a function because of state and concurrency issues, but from a basic math perspective, given a relation (%store in this case), this acts as a function.  There are of course ways to make the above work within the limits of functional programming.  These require turning set and remove into mathematical functions and changing the way we store the information.  In a later piece, when I talk about MVCC, I will talk about how to do this.

What is important here is that for a given relation, a fact is a function of a key.  This is is then usually said in a slightly different way, namely that a fact is functionally dependent on a key.  Functional dependency is the normal label because people frequently use the term "function" in databases in a non-mathematical sense.

Next week:  Functions and First Normal Form

1 comment:

  1. If you want to demonstrate relational stuff in Perl, give my Set::Relation CPAN module a try.

    ReplyDelete