Tuesday, October 11, 2011

Introduction to SODA

LedgerSMB 1.3 provides what is a prototype for what I hope to develop in the future as a Service Oriented Database Architecture, or SODA.    This is an early version, venturing largely into uncharted territory doing things that I am unaware of them being done elsewhere.  This is one way we are pushing the envelope, and one way we will likely continue to do so.

As a note, LedgerSMB is closely tied to PostgreSQL, and as time goes on is likely to become essentially a PostgreSQL database and some sample applications written in Perl, with reference implementations in Python and PHP for further development.

What is SODA

SODA is an approach to database design which supports the idea that the database should be sufficiently semantically rich to provide code generators everything (in the model side of an MVC framework) they need to interoperate with the database.   Objects and classes are, to the extent feasible, defined in the database and exported to applications, rather than defined in applications and mapped to relations using ORMs.  In a perfect database implementing SODA, it would be possible to run a code generator in a language of your choice, and have it define classes and methods based on database types and stored procedures, which could then be used by your application quickly.

The key to SODA is to use what tools the database provides to make interfaces discoverable.  In PostgreSQL these involve such things as return types and argument names.  Examples will follow.

Basic Principles of SODA
  1. Interfaces must be discoverable, meaning stored procedures, data types, etc. can all be discovered at run time and automatically used.
  2. The application should be loosely tied to the database, and other applications should be able to to use the database easily, in any other language.
  3. All stored procedures should always return useful data.  A stored procedure which inserts, for example, might return the primary key of the row inserted or even the whole row (allowing, for example, for defaults to be filled in).  Returning VOID or NULL, or even TRUE, is generally frowned on and to be avoided.
  4. Security is enforced primarily on the database level, with applications presumed to know what operations their users are permitted to do.  If the application fails to enforce security, the database will, and the user may get an error.
  5. Exceptions in functions should be short and machine-readable.  "No password" is fine but "You forgot to enter a password.  Please try again" is not.  The application should be responsible for translating this into a useful message to the user.  This is in part because translation into other languages in the db is problematic using stock procedural languages, and also  because applications may wish to handle exceptions themselves.

Role of RDBMS in SODA-based Applications

SODA envisions the RDBMS not as the lower tier(s) in a multi-tier application infrastructure but as the centerpiece of both the application and its development environment.  SODA doesn't necessarily envision everything being in the database, far from it.  Nor does SODA necessarily envision that the database takes the place of middleware, but rather the database takes on many traditional middleware roles.

Instead, SODA sees the database in the center of the environment, and assumes that many clients will in fact be connecting to the database for a variety of uses, and that the RDBMS becomes a point of collaboration between applications (queue tables and LISTEN/NOTIFY come in handy here), and hence the point where security must be enforced.  Security can be enforced on the procedural or relational level.

Instead of "everything in the database" then, we have the idea that everything which is required to read and write data in a useful manner should be in the database, along with whatever is required to coordinate between applications.  The applications can then re-use business logic in a consistent way.  Logic which is not required for these two tasks does not belong in the database, and even some coordination frameworks may better reside elsewhere.  At the same time, SODA is very stored-procedure centric.

Examples
The following is a stored procedure which provides an ability to search for fixed asset classes (groups of fixed assets depreciated together with the same method):

CREATE OR REPLACE FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text)
RETURNS SETOF asset_class_result AS
$$
DECLARE out_var asset_class_result;
BEGIN
        FOR out_var IN
                SELECT ac.id, ac.asset_account_id, aa.accno, aa.description,
                        ad.accno, ad.description, m.method, ac.method,
                        ac.label
                FROM asset_class ac
                JOIN account aa ON (aa.id = ac.asset_account_id)
                JOIN account ad ON (ad.id = ac.dep_account_id)
                JOIN asset_dep_method m ON (ac.method = m.id)
                WHERE
                        (in_asset_account_id is null
                                or in_asset_account_id = ac.asset_account_id)
                        AND (in_dep_account_id is null OR
                                in_dep_account_id = ac.dep_account_id)
                        AND (in_method is null OR in_method = ac.method)
                        AND (in_label IS NULL OR ac.label LIKE
                                '%' || in_label || '%')
               ORDER BY label
        LOOP
                RETURN NEXT out_var;
        END LOOP;
END;
$$ language plpgsql;

COMMENT ON FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text) IS
$$ Returns a list of matching asset classes.  The account id's are exact matches
as is the method, but the label is a partial match.  NULL's match all.$$;

Now, in the above example the key is in the header to the function:

CREATE OR REPLACE FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text)

From the name of the function, we know that this is for asset classes.  But thats not all we know.  We know exactly what information the stored procedure expects.  The variable names, prefixed with in_ to avoid collision with column names, tell the query generator what information to pull from the HTTP request object to build the search.  In this case, it will pull in the asset_account_id, dep_account_id, method, and label properties and map those into the stored procedure query.

Benefits of SODA

Thus far LedgerSMB has realized a number of benefits by applying these principles to our database design.  The first and most obvious is reduced development time.  With SODA, we define our database methods, then write Perl wrappers around the database procedures.  Then we can quickly write HTML templates and workflow scripts.  The heavy engineering is where it belongs:  In the mapper functions and the database.  The rest an be surprisingly agile.

A second important benefit is the fact that we have been able to retrofit real security into a codebase which did not admit to it when we began our fork.  By pushing security back into the database, we have been able to leave untrustworthy code untrusted.

Finally, we have found that it is very often far easier to tune hand-written SQL in stored procedures than it is to tune queries written through, for example, ORMs, allowing us to be able to address scalability problems when they arise.

4 comments:

  1. interesting, thanks for sharing this info.

    I've seen some projects implementing (a subset of) same principles, but never written dow in such way.

    Do you use some code generators (like, exporting DB API access methods to application code)?

    ReplyDelete
  2. Filiprem:

    What we currently do is use a mapper function. An object in Perl calls $search_object->exec_method({funcname => 'asset_class__search'});

    Then the mapper grabs the argument names from the pg_catalog.pg_proc and maps in the arguments. An optional args => [arg1, arg2] can be used for enumerated arguments.

    This works for the most part but it isn't perfect yet and we aren't to the point where anything more than a query generator can work. We are working on this though. In a few days, I will be posting a critical review of where we are now, and where we are going in future versions. I would be interested in your feedback there.

    But here also, if you have been working on a subset of the same principles, I'd be interested in hearing more about your experience. Care to share?

    ReplyDelete
  3. SODA is a suboptimal abbreviation for "Service Oriented Database Architecture" since in db technology there's already S.O.D.A. from db4o which means "Simple Object Database Access".

    I would propose to name yours something else, like SODBA?

    ReplyDelete
  4. I've been using this approach for some years and must agree the benfits are overwhelming ...
    * the one you have mentioned: it is language independent .. writting interface in any language is easy as ABC, just call SELECT * FROM procedure() ...
    * second: you can easily trace which procedure is slow, and tune it ... any client, in any language will see it immediately, with no single line of code changed
    * third (most important): Such approach is open for changes (which occurs ALWAYS, and mostly before you finish your project :)) You can completely redesign table relations, leaving the interface (that is, input/output of the api procedures) untouched ... again: any client, in any language will see it immediately, with no single line of code changed ..

    There is a simple SQL to discover DB api written in stored procedures:
    WITH tmp AS (
    SELECT n.nspname AS namespace,
    p.proname AS fname
    FROM pg_proc p,
    pg_namespace n
    WHERE p.pronamespace = n.oid AND
    n.nspname = 'api'
    )
    SELECT t.fname::text AS "name",
    pg_get_function_arguments(p.oid) AS args,
    pg_get_function_result(p.oid) AS "returns",
    substring(p.prosrc from position('/*' in p.prosrc)
    for position('*/' in p.prosrc)) AS description
    FROM pg_proc p,
    pg_namespace s,
    tmp t
    WHERE p.pronamespace = s.oid AND
    s.nspname = t.namespace AND
    p.proname = t.fname
    ORDER BY 1,2;

    The sumptions for this SQL are:
    * your API procedures are stored in "api" namespace (schema)
    * each procedure has top level comment describing the function (within /* ... */ )

    ReplyDelete