Thursday, August 18, 2016

PostgreSQL vs Hadoop

So one of the clients I do work with is moving a large database from PostgreSQL to Hadoop.  The reasons are sound -- volume and velocity are major issues for them, and PostgreSQL is not going away in their data center and in their industry there is a lot more Hadoop usage and tooling than there is PostgreSQL tooling for life science analytics (Hadoop is likely to replace both PostgreSQL and, hopefully, a massive amount of data on NFS).  However this has provided an opportunity to think about big data problems and solutions and their implications.  At the same time I have seen as many people moving from Hadoop to PostgreSQL as the other way around.  No, LedgerSMB will never likely use Hadoop as a backend.  It is definitely not the right solution to any of our problems.

Big data problems tend to fall into three categories, namely managing ever increasing volume of data, managing increasing velocity of data, and dealing with greater variety of data structure.  It's worth noting that these are categories of problems, not specific problems themselves, and the problems within the categories are sufficiently varied that there is no solution for everyone.  Moreover these solutions are hardly without their own significant costs.  All too often I have seen programs like Hadoop pushed as a general solution without attention to these costs and the result is usually something that is overly complex and hard to maintain, may be slow, and doesn't work very well.

So the first point worth noting is that big data solutions are specialist solutions, while relational database solutions for OLTP and analytics are generalist solutions.  Usually those who are smart start with the generalist solutions and move to the specialist solutions unless they know out of the box that the specialist solutions address a specific problem they know they have.  No, Hadoop does not make a great general ETL platform.....

One of the key things to note is that Hadoop is built to solve all three problems simultaneously.  This means that you effectively buy into a lot of other costs if you are trying to solve only one of the V problems with it.

The single largest cost comes from the solutions to the variety of data issues.  PostgreSQL and other relational data solutions provide very good guarantees on the data because they enforce a lack of variety.  You force a schema on write and if that is violated, you throw an error.  Hadoop enforces a schema on read, and so you can store data and then try to read it, and get a lot of null answers back because the data didn't fit your expectations.  Ouch.  But that's very helpful when trying to make sense of a lot of non-structured data.

Now, solutions to check out first if you are faced with volume and velocity problems include Postgres-XL and similar shard/clustering solutions but these really require good data partitioning criteria.  If your data set is highly interrelated, it may not be a good solution because cross-node joins are expensive.  Also you wouldn't use these for smallish datasets either, certainly not if they are under a TB since the complexity cost of these solutions is not lightly undertaken either.

Premature optimization is the root of all evil and big data solutions have their place.  However don't use them just because they are cool or new, or resume-building.  They are specialist tools and overuse creates more problems than underuse.


  1. PostgreSQL 10 roadmap from 2ndQuadrant on columnar indexes may make PostgreSQL suitable for big data:

    1. Certainly for some kinds of big data. And anything that helps certain areas do better is a major win. Having index-oriented tables would also help with certain volume-related issues as well (and there has been talk about that for some time).

    2. But one point worth noting is that "big data" is a bit of a buzzword and poorly defined topic. Columnar stores help certain kinds of aggregation. Index oriented tables help one tune certain kinds of access patterns. Those are important tools in dealing with data volume problems.

      But one of the things which has really been difficult in the deployment I work with that is moving to Hadoop is the fact that TOAST performance overhead is rather difficult to measure and in order to get adequate performance, certain things have had to move to non-1NF designs. Even when you are dealing with structured data, there are access pattern corners where one has to be rather imaginative to keep performance up.

      So it is a mistake to think that all volume problems are the same, or all velocity problems are the same. Once you get into TB of data, attention to detail and awareness of your specific issues become important.

      Today PostgreSQL can deal well with certain kinds of big data problems. And those are expanding. But we should be careful not to be like the folks who think Hadoop is the answer to everything ;-)

  2. The CitusDB extension is an interesting alternative to Postgres-XL, and it's also not a fork. You're correct that if your data doesn't shard well you're still in trouble. Though, doesn't the same apply to hadoop? Or does it just use a hash of the entire row/document to partition?

  3. hi welcome to this blog. really you have posted an informative blog. it will be really helpful to many peoples. thank you for sharing this blog.
    java training in chennai

    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. IEEE Projects for CSE in Big Data But it’s not the amount of data that’s important. Final Year Project Centers in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

      Java has already made serious inroads as an integrated technology stack for building user-facing applications. Java Training in Chennai the authors explore the idea of using Java in Big Data platforms.
      Specifically, various tasks are geared around preparing data for further analysis and visualization. Java Training in Chennai

  4. This blog post is a great information. I like that our share this topic.We update our site has a lot of uses.Many useful interesting blogs found off here. Selenium Training in Chennai

    1. Thanks for sharing such a wonderful blog on Mean Stack .This blog contains so much data about Mean Stack ,like if anyone who is searching for the Mean Stack data,They will easily grab the knowledge of from this.Requested you to please keep sharing these type of useful content so that other can get benefit from your shared content.
      Thanks and Regards,
      Mean Stack training in Chennai
      Best mean stack training in Chennai
      Top Mean stack raining in Chennai
      Course fees for Mean stack in Chennai
      Mean stack training fees in Velachery, Chennai

  5. Alleyaaircool is the one of the best home appliances repair canter in all over Delhi we deals in repairing window ac, Split ac , fridge , microwave, washing machine, water cooler, RO and more other home appliances in cheap rates

    Window AC Repair in vaishali
    Split AC Repair in indirapuram
    Fridge Repair in kaushambi
    Microwave Repair in patparganj
    Washing Machine Repair in vasundhara
    Water Cooler Repair in indirapuram
    RO Service AMC in vasundhara
    Any Cooling System in vaishali
    Window AC Repair in indirapuram

  6. Agar aap chatey hai ki aapka boyfriend ya girlfriend ka rishta tut jae toh aap Rishta todne ki dua Duas in islam is the best dua for love back.

  7. Get the best nursing services baby care services medical equipment services and allso get the physiotherapist at home in Delhi NCR For more information visit our site

    nursing attendant services in Delhi NCR
    medical equipment services in Delhi NCR
    nursing services in Delhi NCR
    physiotherapist at home in Delhi NCR
    baby care services in Delhi NCR

  8. If you have DSLR than you know Why you Use Telephoto Lenses in 2019.

  9. ಸಂತೋಷ ಮತ್ತು ಸಂತೋಷದ ದಿನ. ಲೇಖನವನ್ನು ಹಂಚಿಕೊಳ್ಳಲು ನೀವು ತುಂಬಾ ಧನ್ಯವಾದಗಳು

    máy phun tinh dầu

    máy khuếch tán tinh dầu tphcm

    máy khuếch tán tinh dầu hà nội

    máy xông phòng ngủ

  10. Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle.
    How to Start A blog 2019
    Eid AL ADHA

  11. Actually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written.
    Data Science Courses

  12. kajal-raghwani-biography Husband

    very good post...
    great information....
    I love your blog post...

  13. food ordering apps india

    very good post...

    I like it...
    you are always providing great content...

  14. Thanks for the Valuable information.Really useful information. Thank you so much for sharing.It will help everyone.Keep Post. Find Some Indian Memes.

    Entertainment News

  15. Packers Movers Pune
    This is a good blog. I also want to share some information about Expressrelocations. It is the company of packers and movers Pune.we provided the best service such as:
    Home Relocation
    Packing and Moving
    Car,Bike Transportation
    Office Moving
    Pet Relocation
    International Shifting
    Insurance Coverage
    Packers Movers Pune

    Company Address:
    Address : Plot no. 86/A, Sector Number 23, Transport Nagar, Nigdi,
    Pune, Maharashtra 411044.
    Mobile No.: +91- 9527312244 / 8600402099 / 9923102244
    Email ID :
    Website :


  16. Get the most advanced Python Course by Professional expert. Just attend a FREE Demo session
    For further details call us @ 9884412301 | 9600112302
    Python training in chennai | Python training in velachery

  17. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    data analytics course malaysia

  18. Download and install Vidmate App which is the best HD video downloader software available for Android. Get free latest HD movies, songs, and your favorite TV shows.

  19. Excellent Blog. I really want to admire the quality of this post. I like the way of your presentation of ideas, views and valuable content. No doubt you are doing great work. I’ll be waiting for your next post. Thanks .Keep it up!
    Kindly visit us @
    Luxury Boxes
    Premium Packaging
    Luxury Candles Box
    Earphone Packaging Box
    Wireless Headphone Box
    Innovative Packaging Boxes
    Wedding gift box
    Leather Bag Packaging Box
    Cosmetics Packaging Box
    Luxury Chocolate Boxes

  20. Easily, the article is actually the best topic on this registry related issue. I fit in with your conclusions and will eagerly look forward to your next updates .

  21. Great Info!!! Thanks for sharing information with us. If someone wants to know about Taxi Service App and Health Management Software I think this is the right place for you.
    Taxi Dispatch App | Taxi Service Providers | Safety and Health Management System

  22. thanks for your details it's very useful and amazing.your article is very nice and excellentweb design company in velachery

  23. thanks for your information really good and very nice web design company in velachery

  24. Thanks For Providing Us this Great Iformation .Get Our Some Quality Services Buy Adsense Accounts .
    Here is also avilable Buy Adsense Accounts .
    You Can Watch Adsense Earning Trick Here Youtube Channel Buy Adsense Accounts .

  25. Best arical thanks for this very helpful posthd movies

  26. This is the first & best article to make me satisfied by presenting good content. I feel so happy and delighted.By Learn Digital Marketing Course Training in Chennai it will help to get Digital Marketing Training with Placement Institute in Chennai. If you Learn Social Media Marketing Training with Placement Institute in Chennai, you will get job soon.

  27. Great Explanation with lots of useful information about aws and great explanation in this blog.

    AWS Training in Chennai | SAP Training in Chennai

  28. Fred Meyer is a US customer care hypermarket chain established by Fred G Meyer in 1931. The brand, works as an auxiliary of Kroger and right now utilizes more than contact 30,000. Headquartered at Cincinnati, Ohio, Fred Meyer serves shoppers crosswise over Washington, Oregon, Idaho and Alaska.

  29. I Got Job in my dream company with decent 12 Lacks Per Annum Salary, I have learned this world most demanding course out there in the current IT Market from the Python Training in pune Providers who helped me a lot to achieve my dreams comes true. Really worth trying instant approval Blog commenting sites

  30. I enjoyed reading your article. Please make more interesting topics like this on.
    I'll come back for more :)

    From Japs a researcher from AOC, a company whois into ecommerce website design Sydney

  31. How to Get FREE UC in PUBG Mobile Android, IOS, and Emulator Tricks

  32. Best post I've ever seen.
    Also check my Hindi Whatsapp Status here HowToImpressaGirl

  33. awesome blog it's very nice and useful i got many more information it's really nice i like your blog styleweb design company in velachery


  34. Such a wonderful blog on Mean Stack .Your blog having almost full information about
    Mean Stack ..Your content covered full topics of Mean Stack ,that it cover from basic to higher level content of Mean Stack .Requesting you to please keep updating the data about Mean Stack in upcoming time if there is some addition.
    Thanks and Regards,
    Best institute for mean stack training in chennai
    Mean stack training fees in Chennai
    Mean stack training institute in Chennai
    Mean stack developer training in chennai
    Mean stack training fees in OMR, Chennai

  35. This comment has been removed by the author.


  36. Are you looking for the best quality F 100 Classic Ford Truck Parts online? Carolina Classics is the manufacturer of best F-100 Ford Truck Parts | Buy F 100 classic Ford truck parts online at Carolina Classics.


  37. Thanks for sharing this information.
    Want to Buy Step Down Transformer or Are you Curious about What is Step Down Transformer? Read the blog to get your queries resolved before making a purchase.

  38. Jeewangarg is the Best SEO Company in Delhi providing FREE site auditing along with the most reasonable Professional SEO services to top all searches, increase organic visibility, promote business, increase audience, and make instant sales.

  39. Awesome post, Really this very informative post, thanks for sharing this post. Are you Looking for the Air Purifier for Home? Don’t go beyond for buying best air purifier in India or Call us @ 8700662806 for more details.

    Indoor Air Pollution
    Himalayan salt lamps
    Natural himalayan rock salt lamp

  40. amazing post written ... It shows your effort and dedication. Thanks for share such a nice post.
    sandeep maheshwari quotes and harry potter wifi names

  41. Such an interesrting and essential topic that everyone should know this. for home PipeLine Leakage repair solution. Visit the site Pipeline Leak Detection to know more about Pipeline Leak Detection Service Provider in India.

  42. Aluminium Composite Panel or ACP Sheet is used for building exteriors, interior applications, and signage. They are durable, easy to maintain & cost-effective with different colour variants.

  43. Thanks for this valuable information sharing, and i learned a lot and cleared my all doubts in this.. keep posting like this useful information.
    post free classified ads in india

  44. good..I would like to thank you for sharing this valuable information
    big data training in chennai

  45. Thanks for sharing this blog. really nice and useful for me
    big data training in chennai

  46. This comment has been removed by the author.

  47. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up.
    Veteran Mode, MLive Mod APK, Layon Shop, Multitas Pinjaman, Brasil Tv New, Project IGI, Enlight Pixaloop Pro, Gimy TV, Sakura Live Show China, TR Vibes HotStar

  48. Thank you for sharing valuable information. Thanks for provide great informatic blog, really nice required information & the things i never imagined. Thanks you once agian Download Poweramp Pro Apk

  49. For the globalization of internet & web based foundation it has become so near to reach the Door to Door to provide all kinds of support & service. Clipping Path EU is a company of web & internet base image editing service Provider Company which provides all kinds of image treatment. If you are familiar with this you will know the services like Photoshop Clipping Path, Photo Retouching, Image Masking, Color Correction, Photo Restoration, and Logo Design, Raster 2 Vector works, image editing and also Photoshop editing service.
    clipping path

  50. Really amazing article thanks for this article.

  51. I found your article on Google when I was surfing, it is written very nicely and is optimized .Thank you I visit your website regularly.
    haryana gk

  52. Awesome..I read this post so nice and very imformative information...thanks for sharing
    Click here for data science course

  53. This comment has been removed by the author.

  54. Wow What A Nice And Great Article, Thank You So Much for Giving Us Such a Nice & Helpful Information, please keep writing and publishing these types of helpful articles, I visit your website regularly.
    national scholarships portal nsp login

  55. Nice information, you write very nice articles, I visit your website for regular updates.
    latest and trending gymwears and activewears for women and men