Wednesday, February 8, 2012

Further Response to Robert Young

Robert Young has responded to my previous post and clarified his position.  However, he is still incorrect in his assessment.

  The issue is lost cycles as one moves down to a single thread. It's more about the number of clients that can be supported "simultaneously". At one time I was also enamoured of thread count to execute queries in parallel; not so much now. For reporting generally, and BI/DW specifically, sure. But for OLTP, where my interest lies, not so much. There, client count is what matters. Clients run relatively discrete transactions on few rows; parallelism isn't much help for that sort of client. Yes, to some extent I've changed my mind.
I think this makes the mistake of assuming that single threaded multiple process models cannot utilize hyperthreading architectures to increase computational efficiency.

The first thing to note is that how a CPU cycles are utilized is primarily the job of the scheduler which is a part of any multitasking kernel, and different operating systems utilize very different process and thread models.  These cannot be merely reduced to their hardware equivalents.  If they were we would see process vs thread performance to be largely equivalent in Windows and Linux, but in fact while Windows is very thread-friendly and process-unfriendly, though my understanding is that applications on Windows running under SUA experience better multiprocess performance than under the Win32 subsystem.

On Linux and UNIX there is very little difference between a pthread and a process except for memory isolation.  Individual processes can be scheduled on logical cores supporting hyperthreading, for example.  The overhead for supporting multiple processes is low.

On the other hand, on Windows, threads are cheap and processes are expensive.  This is one reason why I typically recommend that high-concurrency PostgreSQL instances are not run on Windows.  My understanding is that SUA addresses this issue but I can't find solid comparisons at the moment and I don't know of a PostgreSQL port to that platform.

Think of the issue this way: if it were true that scaling down threads to one per processor/core yielded a scale up in performance equivalently (only one of two threads yields twice the performance on that last thread), then the question gets more difficult. But threads and cores don't scale that way. Such reverse scaling could only happen if the core's frequency increased as the inverse of threads active. That just doesn't happen, you get a bin or two; that nasty frequency brick wall is the reason vendors have turned to multi-core and threads in the first place.
I am not sure I buy that.  Linux, for example, is quite happy to schedule different processes on different logical cores, thus meaning that two processes can share a hyperthreading core in the same way two threads can.  Again, that behavior is specific to a combination of a CPU, OS and scheduler.

Unless you've been building database applications from before 1995 (or thereabouts), which is to say only webbie thingees, the efficiency and user friendliness of *nix/RDBMS/RS-232/terminal is a foreign concept; just as it still is for mainframe COBOL coders writing to 3270 terminals, same semantics (I'll grant that javascript is more capable than 3270 edit language, and Wikipedia does it again, with this write up). In the web age, AJAX was the first widely known (and possibly absolute first) attempt to get back to that future, then there is Comet. Now, we have WebSocket, discussed in earlier posts.

I don't see what this has to do with anything.  I personally find the user friendliness of an RDBMS and *nix to be very important.  I also primarily build accounting apps, and I will tell you that doing so often over the web is not very performance-friendly for reasons that I am more than happy to discuss in another post (it's one reason why long-run I want to move away from requiring that everything goes through a web interface).   But basically, transactional handling is somewhat crippled when tied to a stateless network protocol like HTTP, meaning all transactions must be atomic to within a single http request, and where we would normally not do that, we have to invent ways to get around that limitation.

The Achilles heel of those *nix/RDBMS/RS-232/VT-X00 applications was the limit on the number of clients that could be accommodated since the client patch remained while the user kept a session. The emerging hardware/infrastructure that I'm discussing may/should remove that restriction. This also means that developers must heed the mantra, "the user doesn't determine transaction scope". With HTTP (un)connectedness, that isn't much of an issue.

This is the wrong way to look at it.   HTTP unconnectedness actually in many cases makes the problem worse rather than better if you have to preserve db state across HTTP requests (which LedgerSMB btw does in some cases).  It's one thing to have 15 connections updating payments in a db with 10 million transactions and relying on advisory locks.  It's a very different thing to have 15 users hitting  a db with 10 million transactions and having to store in the db which transactions they are temporarily holding for payment.  now suddenly you go from 3 sec queries to 45 sec queries, and I would expect that we could support at least 10x the number of concurrent users in a large db if it wasn't for this unconnectedness.  Now you are dealing with a large amount of I/O contention and row-locking (because we have to make sure we get the right info so invoices aren't double-paid), and while it works, it isn't pretty, nor does it perform particularly well.

This being said, the DB is usually even less of an issue in these cases than the web server.....

Moreover, web apps, particularly those that run public sites, can experience traffic spikes that can easily exceed the number of connections a db can be expected to gracefully handle.  So in both cases: simple web site apps like blogs accessible to the public, and complex web-based business apps, rdbms-backed web apps are worse performance-wise than thick clients.

But the overall point remains the same:  scheduling of processes and threads is the OS's job, not the CPU's.  Wasted cycles in scheduling  are the OS's problem, not the CPU's.  If an OS can't schedule processes as efficiently as it schedules threads, that's the OS's constraint, and it is only the app's problem if you are writing with an OS in mind and your OS doesn't behave the way you would like it to.  Windows, in particular, is heavily oriented towards threads.  It isn't clear that most *nix platforms see the same benefit, as witnessed by Oracle's architecture which assumes one process per connection.

Also a note on DB2's connection multiplexing.  It isn't clear that this actually requires multiple threads to do.  The fact is you can only receive a string of data over a connection.  This has to be parsed and handled.  It's not at all clear that this needs threads when it is clear that regardless of how you do it, it needs a queue.

BTW, PostgreSQL parallel performance is getting a LOT of attention in the move from 9.1 to 9.2.

3 comments:

  1. -- I think this makes the mistake of assuming that single threaded multiple process models cannot utilize hyperthreading architectures to increase computational efficiency.

    My point, as I said, is that making use of threads within process (client session), intraquery parallelism etc., isn't the big win, outside of reporting/BI/DW. In a CRUD world, demand for computational efficiency in the client isn't much to speak of.


    -- But basically, transactional handling is somewhat crippled when tied to a stateless network protocol like HTTP

    Somewhat? We agree. The narrative's point is to make the case that the interTubes are evolving to virtual RS-232, and that RDBMSs which maximize clients, rather than maximize individual client power, will be the winners. Servers, database or otherwise, which finesse clients onto hardware threads (O/S mediation or not) will win due to their greater efficiency.


    -- This being said, the DB is usually even less of an issue in these cases than the web server.....

    And Opa, apparently, means to fix that. We'll see.

    It's all about leveraging the hardware resources to get the most bang for the buck. If, at some time, *nixes generally are able to map application processes to hardware threads, then the discussion is moot. So far as I know, that time is not now.


    -- HTTP unconnectedness actually in many cases makes the problem worse

    Fur sure, but... many coders rely on the constraint to ignore transaction semantics.

    ReplyDelete
    Replies
    1. -- My point, as I said, is that making use of threads within process (client session), intraquery parallelism etc., isn't the big win, outside of reporting/BI/DW. In a CRUD world, demand for computational efficiency in the client isn't much to speak of.

      How many databases are solely CRUD-oriented? At least the ones I work on have transactional processing closely tied to reporting. There is no other way you can reconcile a bank account, properly depreciate assets, or the like. Even if you want to create an invoice and you want to see what a customer's remaining credit limit is, that's going to require some reporting.....

      --It's all about leveraging the hardware resources to get the most bang for the buck. If, at some time, *nixes generally are able to map application processes to hardware threads, then the discussion is moot. So far as I know, that time is not now.

      I am not sure. For starters, different OS's have very different process and thread models, as do different development environments. A pthread is different from a perl thread, for example, and perl threads actually have some level of resource isolation.

      I can't speak entirely about Windows on these areas, but on Linux, processes are very light-weight. A process for example can fork without having to copy its memory into new address space. Copy-on-write memory management is extremely important here and it allows both better memory utilization, but also ensures that spinning up processes is virtually indistinguishable from spinning up threads.

      The major processing performance advantage in threads on Linux is that inter-thread communication is faster than IPC. It's not clear to me how much of an issue that is though, and how it maps to resource contention management and brittleness issues that result.

      For example, suppose you take a MySQL InnoDB table with multiple indexes, and start populating it with multi-row inserts. Let's start with 100 rows per insert (seems reasonable for a data migration job). Every so often, you will get an insert deadlocking against itself. The only explanation I can find is that this is a timing issue involving the use of threads.

      Delete

    2. -- Fur sure, but... many coders rely on the constraint to ignore transaction semantics.


      No kidding. The problem is for those of us who write app code and try to do it right!

      Delete