Perspectives on LedgerSMB: Object-Relational Algebra 3: The Series Join function

Friday, January 4, 2013

Object-Relational Algebra 3: The Series Join function

I am back from a break due to working hard on getting LedgerSMB 1.4 closer to release (beta 2 has been released and we are working hard on moving towards beta 3). Anyway, to finish up the series on object-relational algebra.

The second important addition I would make to relational algebra to give it object-relational capabilities is what I call a "series join" operation. A series join only makes sense in object-relational approaches because the output can be minimal and yet certain functional dependencies on that output can be readily defined.

I use the capital sigma Σ to refer to a series join, acknowledging that the lower case sigma refers to the select operation and so there is a partial conflict.

A series join takes a theta condition much like any other join, but this theta condition operates on the input relation to the series join. The ut set is joined to itself in the first iteration and then to the output set of the previous iteration in subsequent iterations. This is repeated until the output set does not change with subsequent iterations. in a finite data set, the mappings will also be finite. An optional subscript provides a starting set of values in the input relation and an optional superscript provides a maximum iteration depth.

The output is set of tuples of (o1, o2) where o1 is the initial object's key or reference and o2 is the linked to object's key or reference. From here the following functional dependencies arise: path can be used to trace a path (how we get from one record to another) and depth (how many iterations we have to go to reach the destintion record) are two obvious ones.

A series join provides a useful way to express transitive operations. This allows, for example, binary transitive closure to be expressed and tested because one can generate all possible paths from one node on a graph up until the point where they loop back on themselves.

Series join operations in the SQL world are roughly approximated by the CONNECT BY and WITH RECURSIVE constructs, both of which are typically used to construct a finite series of self-joins. However there are key differences too. In my model we are less worried about the tuple membership than what we can clearly derive from a series join.

Please pardon my layout since I am not quite sure how to make mathematical equations display perfectly in blogger.

Suppose we have a relation employee.

We might have reports = employee Σ₁³ with a theta condition of id θ manager and this would provide a list of the direct reports to this employee, their direct reports, and their direct reports. We can also express certain functional dependencies, such as depth(reports), which is between 0 and 3, and path(reports) which will show the chain of command between the report and the employee. If reports = employee Σ₁ and employee 1 is the CEO, then we get the entire organizational chart of the business.

Series joins allow us to do graph traversal, hierarchical searches, and more in an OR database, and approximations of this have been implemented in the relational model. They are mathematically clear, clean, avoiding magic operations and solve a great number of problems.

4 comments:

Darren DuncanJanuary 7, 2013 at 12:30 AM
What you talk about makes me think of a generic "transitive closure" operation that Date and Darwen talk about in their literature, which takes a binary relation representing nodes of a graph as input and results in another binary relation representing arcs of that graph. This operator is defined in terms of join plus rename plus union plus projection. See also my Perl module Set::Relation which implements this, calling it "tclose". Is your proposed operator the same or different than the "tclose" implemented at http://cpansearch.perl.org/src/DUNCAND/Set-Relation-0.12.7/lib/Set/Relation/V2.pm ?
ReplyDelete
Replies

Add comment