Sunday, March 20, 2016

When PostgreSQL Doesn't Scale Well Enough

The largest database I have ever worked on will eventually, it looks like, be moved off PostgreSQL.  The reason is that PostgreSQL doesn't scale well enough.  I am writing here however because the limitations are so extreme that it ought to give plenty of ammunition for those who think databases don't scale.

The current database size is 10TB and doubling every year.  The main portions of the application have no natural partition criteria.  The largest table currently is 5TB and the fastest growing portion of the application.

10TB is quite manageable.  20TB will still be manageable.  By 40TB we will need a bigger server.  But in 5 years we will be at 320 TB and so the future does not look very good for staying with PostgreSQL.

I looked at Postgres-XL and that would be useful if we had good partitioning criteria but that is not the case here.

But how many cases are there like this?  Not too many.

EDIT:  It seems I was misunderstood.  This is not complaining that PostgreSQL doesn't scale well  It is about a case that is outside of all reasonable limits.

Part of the reason for writing this is that I hear people complain that the RDBMS model breaks down at 1TB which is hogwash.  We are facing problems as we look towards 100TB.  Additionally I think that PostgreSQL would handle 100TB fine in many other cases, but not in ours.  PostgreSQL at 10, 20 or 50TB is quite usable even in cases where big tables have no adequate partitioning limit (needed to avoid running out of page counters), and at 100TB in most other cases I would expect it to be a great database system.  But the sorts of problems we will hit by 100TB will be compounded by the exponential growth of the data (figure within 8 years we expect to be at 1.3PB).  So the only solution really is to move to a big data platform.

Sunday, February 21, 2016

A couple annoyances (and solutions) regarding partitioned tables

In one of my projects we had an issue where a large table that was under huge transactional load was having trouble with autovacuum not keeping up.  The problem was that the table sometimes held over half a billion records, added and deleted millions of records a day, and that since most of these occurred at the heads of various indexes, autovacuum was just not fast enough.

So we decided to partition the table into around 50 pieces in order to allow autovacuum to achieve a bit better parallelism in managing the data.  This helped to some extent.  But partitioning is a rare solution for rare problems and comes with unexpected costs.   Interestingly most of our problems have been ORM-related.  Here are some we ran into and their solutions (spoiler:  at the end of the day, effectively, we stopped using an ORM on these tables).  At the end of the day, throughput on these tables was increased around 10-fold, and db load cut by about 90%.

Annoyance 1:  Redirection and ORM transparency

The first problem we had was getting DBIx::Class to work with the partitioned table.  The solution was to add another view in between which did the redirection of inserts, updates, and deletes.  This also allowed us to go through the ORM for inserts (we still do) without the cross-locking issues below being a problem.

Annoyance 2:  Cross-locking and exclusion constraints

A second major problem is that autovacuum can only free up space when it gets an exclusive lock and if any queries are going through the parent table, then you get constraint exclusion coming into play.  The problem here is that constraint exclusion takes out a relatively non-invasive lock on every table at planning time which means you cannot even plan to select a row from one partition if another partition is locked, if you are going through the parent table.

The obvious solution here is not to go through the parent table, but the ORM doesn't support that so we had to drop to SQL.  It also took us about 6 months to find and fix.

Annoyance 3:  Constraint exclusion doesn't always do what you expect it to!

One day we had a very slow running straight-forward query that should have been able to resolve quickly on an index scan on one of the partitions. However, because the constraint criteria was being brought in via a subquery, it was not available at plan time, so it was falling back on a sequential scan through another large partition.  Ouch......  Found the query and fixed it.

Annoyance 4:  Solving some performance problems puts more stress on the next bottleneck

The result of the initial success was increased db concurrency, which was great until it became clear our selection of rows to process and delete was leading to lots of indexes having huge numbers of dead tuples at their heads.  This meant that selecting rows actually became slower than before.  So we had to go back and engineer a new selection algorithm to avoid this problem....

Unrelated Annoyance:  Long running transactions causing autovacuum headaches

An interesting unrelated issue we had was the fact that at the time, we had transactions that would sometimes remain open for a week.  While the partitions directly affected were small, the problem is that autovacuum cannot clear tuples that are invalidated since the oldest transaction started, so higher processing throughput partitions were adversely affected.  After significant effort, we got the worst offenders corrected and now the longest running transactions take just over a day.  This is usually sufficient depending on the load of the system (but sometimes the duration spikes to 18 hours).

Was the partitioning worth it?  Definitely!  However it was a bit of a long road to get there

Wednesday, February 10, 2016

Why Commons Should Not Have Ideological Litmus Tests

This will likely be my last post on this topic.  I would like to revive this blog on technical rather than ideological issues but there seems like a real effort to force ideology in some cases.  I don't address this in terms of specific rights, but in terms of community function and I have a few more things to say on this topic before I return to purely technical questions.

I am also going to say at the outset that LedgerSMB adopted the Ubuntu Code of Conduct very early (thanks to the suggestion of Joshua Drake) and this was a very good choice for our community.  The code of conduct provides a reminder for contributors, users, participants, and leadership alike to be civil and responsible in our dealings around the commons we create.  Our experience is we have had a very good and civil community with contributions from every walk of life and a wide range of political and cultural viewpoints.  I see  this as an unqualified success.

Lately I have seen an increasing effort to codify a sort of political orthodoxy around open source participation.  The rationale is usually about trying to make people feel safe in a community, but these are usually culture war issues so invariably the goal is to exclude those with specific political viewpoints (most of the world) from full participation, or at least silence them in public life.  I see this as extremely dangerous.

On the Economic Nature of Open Source

Open source software is economically very different from the sorts of software developed by large software houses.  The dynamics are different in terms of the sort of investment taken on, and the returns are different.  This is particularly true for community projects like PostgreSQL and LedgerSMB, but it is true to a lesser extent even for corporate projects like MySQL.  The economic implications thus are very different.

With proprietary software, the software houses build the software and absorb the costs for doing so, and then later find ways to monetize that effort.  In open source, that is one strategy among many but software is built as a community and in some sense collectively owned (see more on the question of ownerership below).

So with proprietary software, you may have limited ownership over the software, and this will be particularly limited when it comes to the use in economic production (software licenses, particularly for server software, are often written to demand additional fees for many connections etc).

Like the fields and pastures before enclosure, open source software is an economic commons we can all use in economic production.  We can all take the common software and apply it to our communities, creating value in those areas we value.  And we don't all have to share the same values to do it.  But it often feeds our families and more.

But acting as a community has certain requirements.  We have to treat eachother with humanity generally.  That doesn't mean we have to agree on everything but it does mean that some degree of civility must be maintained and cultivated by those who have taken on that power in open source projects.

On the Nature of Economic Production, Ownership and Power (Functionally Defined)

I am going to start by defining some terms here because I am using these terms in functional rather than formal ways.

Economic Production:  Like all organisms we survive by transforming our environment and making it more conducive to our ability to live and thrive.  In the interpersonal setting, we would call this economic production.  Note that understood in this way, this is a very broad definition and includes everything from cooking dinner for one's family to helping people work together.  Some of this may be difficult to value but it can (what is the difference between eating out and eating at home?  How much does a manager contribute to success through coordination?).

Ownership:  Defining ownership in functional rather than formal terms is interesting.  It basically means the right to use and direct usage of something.  Seen in this way, ownership is rarely all or nothing.  Economic ownership is the right to utilize a resource in economic production.  The extent to which one is restricted in economic production using a piece of software the less one owns it, so CAL requirements in commercial software and anti-TIVOization clauses in the GPL v3 are both restrictions on functional ownership.

Economic Power:  Economic power is the power to direct or restrict economic production.  Since economic production is required for human life, economic power is power over life itself.  In an economic order dominated by corporations, corporations control every aspect of our lives.  In places where the state has taken over from the corporations, the state takes over this as well.  But such power is rarely complete because not all economic production can be centrally controlled.

I am going to come back to these below because my hesitation on kicking people out of the community due to ideological disagreements (no matter how wrong one side may seem to be) have to do with this fear of abuse of economic power.

On Meritocracy (and what should replace it)

Meritocracy is an idea popularized by Eric Raymond, that power in a community should be given to technical merit.  In short, one should judge the code, not the person.  The idea has obvious appeal and is on the surface hyper-inclusive.  We don't have to care about anything regarding each other other than quality of code.  There is room for everyone.

More recently there has been push-back in some corners against the idea of meritocracy.  This push-back comes from a number of places, but what they have in common is questioning how inclusive it really is.

The most popular concern is that meritocracy suggests that we should tolerate people who actively make the community less welcoming, particularly for underrepresented groups. and therefore meritocracy becomes a cover for excluding the same groups who are otherwise excluded in other social dimensions, that the means of exclusion differs but who is excluded might not.

There is something to be said for the above concern, but advocates have often suggested that any nexus between community and hostile ideas is sufficient to raise a problem and therefore when an Italian Catholic expresses a view of gender based on his religion on Twitter, people not even involved in the project seek his removal from it on the grounds that the ideas are toxic.  For reasons that will become clear, that is vast overreach, and a legitimate complaint is thus made toxic by the actions of those who promote it.  And similarly toxic are the efforts by some to use social category to insist that their code should be included just to show a welcoming atmosphere.

A larger problem with meritocracy though is the way it sets up open source communities to be unbalanced, ruled by technical merit and thus not able to attract the other sorts of contributions needed to make most software successful.  In a community where technical merit is the measure by which we are judged, non-technical contributions are systematically devalued and undervalued.  How many open source communities produce software which is poorly documented and without a lot of attention to user interface?  If you devalue the efforts at documentation and UI design, how will you produce software which really meets people's needs?  If you don't value the business analysts and technical writers, how will you create economic opportunities for them in your community?  If you don't value them, how will you leverage their presence to deliver value to your own customers?  You can't if your values are skewed.

The successor to meritocracy should be economic communitarianism, i.e. the recognition that what is good for the community is economically good for all its members.  Rather than technical merit, the measure of a contribution and a contributor ought to be the value that a contribution brings the community.    Some of those will be highly technical but some will not.  Sometimes a very ordinary contribution that anyone could offer will turn the tide because only one person was brave enough to do it, or had the vision to see it as necessary.  Just because those are not technical does not mean that they are not valuable or should not be deeply respected.  I would argue that in many ways the most successful open source communities are the ones which have effectively interpreted meritocracy loosely as economic communitarianism.

On Feeling Safe in the Community

Let's face it  People need to feel safe and secure in the community regarding their physical safety and economic interests.  Is there any disagreement on this point?  If there is, please comment below.  But the community cannot be responsible for how someone feels, only in making sure that people are objectively physically and economically secure within it.  If someone feels unsafe in attending conferences, community members can help address security concerns and if someone severely misbehaves in community space, then that has to be dealt with for the good of everyone.

I don't think the proponents of ideological safety measures have really thought things through entirely.  The world is a big place and it doesn't afford people ideological safety unless they don't go out and work with people they disagree with.  As soon as you go across an international border, disagreements will spring up everywhere and if you aren't comfortable with this then interacting on global projects is probably not for you.

Worse, when it comes to conduct outside of community circles, those in power in the community cannot really act constructively most of the time.  We don't have intimate knowledge and even if we do, our viewpoints have to be larger than the current conflict.

On "Cultural Relativism:" A welcoming community for all?

One of the points I have heard over and over in discussions regarding community codes of conduct is that welcoming people regardless of viewpoint (particularly on issues like abortion, sexuality, etc) is cultural relativism and thus not acceptable.  I guess the question is not acceptable to whom?  And do we really want an ideological orthodoxy on every culture war topic to be a part of an open source project?  Most people I have met do not want this.

But the overall question I have for people who push culture war codes of conduct is "when you say a welcoming community for all, do you really mean it?  Or do you just mean for everyone you agree with?  What if the majority changes their minds?"

In the end, as I will show below, trying to enforce an ideological orthodoxy in this way does not bring marginal groups into the community but necessary forces a choice of which marginal groups to further exclude.  I don't think that is a good choice and I will go on record and say it is a choice I will steadfastly refuse to make.

A Hypothetical

Ideology is shaped by culture, and ideology of sexuality is shaped by family structures, so consequently where family structures are different, views on sexuality will be also.

So suppose someone on a community email list includes a pro-same-sex marriage email signature, something like:

"Marriage is an institution for the benefit of the spouses, not [to] bind parents to their children" -- Ted Olson, arguing for a right to same-sex marraige before the United States Supreme Court.

So a socially conservative software developer from southern India complaints to the core committee saying that this is an attack on his culture, saying that traditional Indian marriages are not real marriages.  Now, I assume most people would agree that it would be best for the core committee not to insist that the email signature be changed for someone to continue to participate.  So with such a decision, suppose the complainant changes his signature instead to read:

"If mutual consent makes a sexual act moral, whether within marriage or without, and, by parity of reasoning, even between members of the same sex, the whole basis of sexual morality is gone and nothing but misery and defect awaits the youth of the country… " -- Mohandas Gandhi

Now the first person decries the signature as homophobic and demands the Indian fellow be thrown off the email list.  And the community, if it has decided to follow the effort at ideological safety has to resolve the issue.  Which group to exclude?  The sexual minority?  Or the group marginalized through a history of being on the business end of colonialism?  And if one chooses the latter, then what does that say about the state of the world?  Should Indians, Malaysians, Catholics, etc. band together to fork a competing project?  Is that worth it as a cost?  Doesn't that hurt everyone?

On Excluding People from the Commons

In my experience, excluding people from the commons carries with it massive cost, and this is a good thing because it keeps economic power from being abused.  I have watched the impact first hand.  LedgerSMB would not even exist if this weren't an issue with SQL-Ledger.  That we are now the only real living fork of SQL-Ledger and far more active than the project we forked from is a testament to the cost.

Of course in that case the issue was economic competition and a developer who did not want to leverage community development to build his own business.  I was periodically excluded from SQL-Ledger mailing lists etc for building community documentation (he sold documentation).  Finally the fork happened beccause he wouldn't take security reports seriously.  And this is one of the reasons why I would like to push for an inclusive community.

But I also experienced economic ramifications from being excluded.  It was harder to find customers (again, the reason for exclusion was economic competition so that was the point).  In essence, I am deeply aware of the implications of kicking people out.

I have seen on email lists and tracker tickets the comparison of the goal of excluding people with problematic ideologies with McCarthyism.  The goal of McCarthyism was indeed similar, to make sure that if you had the wrong ideas you would be unable to continue a professional career.  I have had relatives who suffered because they defended the legal rights of the Communist Party during that time.  I am aware of cases where the government tried to take away their professional career (unsuccessfully).

Management of community is political and the cost of excluding someone is also political.  We already exist in some ways on the margins of the software industry.  Exclude too many people and you create your own nemesis.  That's what happened to SQL-Ledger and why LedgerSMB is successful today.

Notes on former FreeBSDGirl

One blog entry that comes from the other side of this issue is Randi Harper's piece on why she no longer will go to FreeBSD conferences and participate on IRC channels.   I am not familiar with the facts surrounding her complaints and frankly I don't have time to be so what the nature of her harassment complaint is, I will not be the judge.

There is however another side to the issue that is outside what she evidently has experience with, and that is the role of software maintainers in addressing the sorts of complaints she made.  Consequently I want to address that side and then discuss her main points at the bottom.

One thing to remember is that when people make accusations of bullying, harassment, etc. the people in charge are also the people with the least actual knowledge of what is going on.  Expecting justice from those in power in cases like this will lead, far more often than not, to feelings of betrayal.  This is not because of bad intentions but because of lack of knowledge.  This was one thing I learned navigating schoolyard bullies when I was growing up and we as project maintainers are in an even lower knowledge role than school administrators are.  Bullies are furthermore usually experts at navigating the system and take advantage of those who are not as politically adept, so the more enforcement you throw at the problem, the worse it gets.

So there is an idea that those in charge will stop people from treating eachother badly.  That has to stop because it isn't really possible (as reasonable as it sounds).  What we can do is keep the peace in community settings and that is about it.  One needs bottom up solutions, not top down ones.

So if someone came to me as a maintainer of a project alleging harassment on Twitter and demanding that an active committer be removed, that demand would probably go nowhere.  If political statements were mentioned, the response would be "do we want a political orthodoxy?"  Yet LedgerSMB has avoided these problems largely because, I think, we are a community of small businesses and therefore are used to working through disagreements and maybe because we are used to seeing these sorts of things as political.

Her main points though are worth reading and pondering.  In some areas she is perfectly right and in some areas dangerously wrong.

Randi is most right in noting that personal friction cannot be handled like a technical problem.  It is a political problem and needs to be handled as such.  I don't think official processes are the primary protection here, and planning doesn't get you very far, but things do need to be handled delicately.

Secondly, there is a difference between telling someone to stay quiet and telling someone not to be shouting publicly.   I think it is worth noting that if mediation is going to work then one cannot have people trying to undermine that in public, but people do need friends and family for support and so it is important to avoid the impression that one is insisting on total confidentiality.

Randi is also correct that how one deals with conflict is a key gauge of how healthy an open source community is.  Insisting that people be banished because of politically offensive viewpoints however does not strike me as healthy or constructive.  Insisting that people behave themselves in community spaces does.  In very rare cases it may be necessary to mediate cases that involve behavior outside that, but insisting on strict enforcement of some sort of a codified policy will not bring peace or prosperity.

More controversially I will point out that there is a point that Randi makes implicitly that is worth making explicit here, namely that there is a general tone-deafness to women's actual experiences in open source.  I think this is very valid.  I can remember a former colleague in LedgerSMB making a number of complaints about how women were treated in open source.  Her complaints included both unwanted sexual attention ("desperate geeks") and more actionably the fact that she was repeatedly asked how to attract more women to open source (she responded once on an IRC channel with "do you know how annoying that is?").  She ultimately moved on to other projects following a change in employment that moved LedgerSMB outside the scope of duties,  but one obvious lesson that those of us in open source can take from this is just to listen to complaints.  Many of these are not ones that policies can solve (you really want a policy aimed at telling people not to ask what needs to be done to attract more women to open source?) but if we listen, we can learn something.

One serious danger in the current push for more expansive codes of conduct is that it puts those who have the least knowledge in the greatest responsibility.  My view is that expansive codes of conduct, vesting greater power with maintainers over areas of political advocacy outside community fora will lead to greater, not less conflict.  So I am not keen in her proposed remedies.

How Codes of Conducts Should be Used

The final point I want to bring up here is how codes of conduct should be used.  These are not things which should be seen as pseudo-legal or process-oriented documents.  If you go this way, people will abuse the system.  It is better in my experience to vest responsibility with the maintainers in keeping the peace, not dispensing out justice, and to have codes of conduct aimed at the former, not the latter.  Justice is a thorny issue, one philosophers around the world have been arguing about for millennia with no clear resolution.

A major problem is the simple fact that perception and reality don't always coincide.  I was reminded of this controversy while reading an article in The Local about the New Years Eve sexual assaults, about work by a feminist scholar in Sweden to point out that actually men are more at risk from bodily violence than women are, and that men suffer disproportionately from crime but are the least likely to modify behavior to avoid being victimized.  The article is worth reading in light of the current issues.

So I think if one expects justice from a code of conduct, one expects too much.  If one expects fairness from a code of conduct, one expects too much.  If one expects peace and prosperity for all, then that may be attainable but that is not compatible with the idea that one has a right not to be confronted by people with dangerous ideologies.

Codes of conducts, used right, provide software maintainers with a valuable tool for keeping the peace.  Used wrong, they lead open source projects into ruin.  In the end, we have to be careful to be ideologically and culturally inclusive and that means that people cannot guarantee that they are safe from ideas they find threatening.

Tuesday, January 26, 2016

On Contributor Codes of Conduct and Social Justice

The PostgreSQL, Ruby, and PHP communities have all been considering codes of conduct for contributors. The LedgerSMB community already uses the Ubuntu Code of Conduct.  Because this addresses many projects, I am syndicating this further than where there are current issues.  This is not a technical post and it covers a wide range of very divisive issues for a very diverse audience.  I can only hope that the nuance I am trying to communicate comes across.

Brief History

A proximal cause seems to be an event referred to as "Opalgate" where an Italian individual who claimed to be a part of the Opal project made some unrelated tweets in an exchange about the politics of education and the question of how gender should be presented, and some people got offended and demanded his resignation (at least that is my reading of the Twitter exchange but I have been outside the US long enough to lose the context in which it would likely be read by an American in the US).  The details are linked below, but the core questions involve how much major contributors to projects need to keep from saying anything at all about divisive issues seems to be a recurring topic.  Moreover it is a legitimate one.

Like some of my blog posts, this goes into touchy territory.  I am discussing things which require a great deal of nuance.  Chances are, regardless of where you sit on some of these issues, you will be offended by things I say, but there are worse things than to be offended (one of them is never to be challenged by different viewpoints).

I write here as someone who has lived in a number of very different cultures and who can see perspectives on many of these issues which are not present in American political discourse  For this reason, I think it is important for me to share the concerns I see because otherwise open source software maintainers often don't have a perspective outside of Western countries, or even outside the US.

Of course as open source software maintainers we want everyone to feel safe and valued as members of the community.  But cultural tensions and ways of life do crop up and taking a position on these as a community will always do more harm than good.

Background Reading regarding Opalgate and the question of so-called "social justice warriors" in open source

It may seem strange to put a list of links for background reading near the start of an article, but I want to make sure that such material is available up front.  People can read about Opalgate here and the ongoing debate between various parties about it.  It's important background reading but somewhat peripheral to the overall problems involved.  It may or may not be the best example of the difficulties in running cross-cultural projects but it does highlight the difficulties that come in addressing diverse community bases, those which may have deep philosophical disagreements about things which people take very personally.

In the interest of full disclosure, I too worry that there is too much eagerness to liberate children from concepts of gender and too little thought about how this can and will be abused, and what the life costs for the children actually will be.  I believe that we must be human and humane to all but I am concerned that the US is going down a path that strikes me as anything but that in the long run.  That doesn't mean that the concerns of the trans community in the US should be ignored, but that doesn't mean they should be paramount either.  As communities we need to come together to solve problems not fight culture wars.

Twitter is not a medium which is conducive to thoughtful exchange so I also have to cut some slack.  Probably not the wisest medium to discuss controversial topics.  But people around the world have deep differences in views on major controversies.  My wife, for example, is far more opposed to abortion than I am, and having come to a deeper understanding of her culture, I don't disagree that in her cultural context, it is more harmful.  But that brings me to another problem, that many issues are contextual and we cannot see how others really are impacted by such changes, particularly when forced from the outside.

But my view doesn't matter everywhere.  It matters in my family, my discussions with people I know, and so forth.  But most of the world is not my responsibility nor should it be.  These are not entirely easy issues and there should be room for disagreement.

Is Open Source Political?

Caroline Ada Ehmke's basic argument is that open source is inherently political, that it seeks a positive change in the world, and therefore it should ally itself with others sharing the same drive to make the world a better place.  I think this viewpoint is misguided but because it is only half-wrong.

Aristotle noted that all human relationships are necessarily political.  The three he chose as primary in Politics is illustrative:  master and slave (we could update to boss and worker); husband and wife; and king and subject.  To Aristotle, the human being alone is incomplete.  We are our relationships and our politics follows from them.  While there has been an effort to separate the personal and the political in modern times, Feminist historians have kept this tradition alive and well.  A notion of the political grounded in humans as social animals is fundamentally more conducive to justice than cold, mechanical, highly engineered social machinery.  Moreover Aristotle notes that all communities are built on some concept of the good, that humans only want things that seem good to them and therefore we can assume that all groups seek a better world, but we don't always know which ones deliver, and that is the problem. 

Open source begins not with an ideology but with a conviction.  Not everybody shares the same conviction.  Not everyone participates in open source for the same reason.    But everyone has a reason, some conviction that what they are doing is good.  There is enough commonality for us all to work together, but that commonality is not as strong as one may think.

In a previous post on this blog I argued for a very different understanding of software freedom than Richard Stallman supposes, for example.  While he holds a liberal enumerated liberties view, I hold a traditionalist work-ownership view.  Naturally that leads to different things we look for in an ideal license.

And the diversity in viewpoint does not stop there.  Some come to open source because they believe that open source is a better way of writing software.  Some because they believe that open source software delivers benefits in use.  But regardless of our disagreement we share the understanding that open source software brings community and individual benefits.

In two ways then is open source software political:
  1. Communities require governance and this is inherently political, and
  2. To the extent there is a goal to transform the software industry to one of open source that is political.
The first as we will see is a major problem.  Open source communities are diverse in a way few Americans can fully comprehend (we like to think everyone is like us and there is one right way, the American Way whether that is in industry -- the right -- or formulations of rights -- the left).  Thus most discussions end up being Western-normative (and in particular American-normative) and disregard perspectives from places like India, Malaysia, Indonesia, and so forth.

However it is worth coming back to the point that what brings us together is an economic vision.  Yes, that is intrinsically and highly political, but it also has consequences for other causes and therefore it is worth being skeptical of alliances with groups in other directions.  What would an open source-based economy look like? What would the businesses look like? Will they be the corporations of today or the perpetual family businesses and trades of yesteryear?  And if the latter, what is the implication for the family?  Many of these questions (just like questions of same-sex marriage) depend in large part on the current social institutions in a culture -- the implications of an industrial, corporate, weak family society adopting something like same-sex marriage are very different than in an agrarian or family-business, strong family society.  My view is that these will likely have different answers in different places.

Thus when an open source community takes a position on, for example, gay rights in the name of providing a welcoming community, they make the community openly hostile to a very large portion of the world and I think that is not what we want.  Moreover such a decision is usually a product of white, Western privilege and effectively marginalize those in so-called developing countries who want to see their countries economically develop in a very different direction than the US has.  Worse, this is not an unintended side effect but the whole point.

A brief detour into the argument over white privilege

A discussion of so-called white privilege is needed I think for three groups reading this:
  • Non-Americans who will have trouble understanding the idea as it applies to American society (like all social ideas, it does not apply to all societies or even where it does apply, it may not in the same way).
  • White Americans who seem to have trouble understanding what people of color in the US mean when they use the term.
  • Activists who want to use the idea as a political weapon to enforce a sort of orthodoxy

I mentioned Western-Normative above.  It is worth pointing out that this forms a part of a larger set of structures that define what is normal or central in a culture, and what is abnormal, marginalized (or perhaps liminal).  It is further worth noting that the perception as to these models is more acute to those who are not treated as the paragons of normality.  In the US, the paragon of normality is the white, straight male.  But unspoken here is that it is the white, straight, urban, wealthy American male (or maybe European, they are white too).  Everyone else (women, people of color, Africans, Asians, etc) should strive to be like these paragons of success (I, myself, having lived most of my life now either outside the US or in the rural parts, am most certainly not included in this paragon of normality model, but nevertheless it took years of marriage to someone from a different culture and race to begin to be able to partially see a different perspective).

Now, it doesn't follow that white straight males live up to this image (which is one reason why white privilege theory has proven controversial among the arguably privileged) even where there is wealth, one is brought up in nice neighborhood in the city, etc.  But that isn't really the point.  The point is that society holds these things to be *normal* and everything else to be only normal to the extent it is like this model.  It would be better and more accurate to call this a model of normality rather than privilege and to state at the outset that we cannot really walk a mile in the shoes of people from across many social borders (culture included).

White privilege is real, as is male privilege (in some areas, particularly employment), urban privilege, American privilege, Western privilege, even female privilege (in some areas, particularly family law).

Issues exist in a sticky web of culture, and no culture is perfect

These issues of privilege aren't necessarily wrong in context:  it seems unlikely that the workplace can be made less male-normative without men sharing equally in the duties and rights of childrearing, but enforcing that cuts against the goal by some feminists of liberating women from men (and also exists in tension with things like same-sex marriage and gender-nonessentialism).  Insisting that men get the same amount of parental leave as women cuts one direction, but insisting that single women get free IVF cuts the other (both of these are either the case in Sweden or efforts are being made to make them the case).  In other words, addressing male privilege requires a transformation of the economic and family order together, in such a way that having children becomes an economic investment rather than an economic burden.  But then that has implications for the idea of gay rights as we understand the concept in the West because if having and raising children becomes normative then one is providing a sort of parental privilege, and gender equality becomes based on heteronormativity.

But the ultimate white privilege is to deny it is a factor when one uses one's own perception that other cultures are homophobic or transphobic to justify one's own racist paternalism.  No need to understand why.  We are white.  We know what is right.  We just need to educate them so they can join the ranks of the elite culturally white enlightened liberals as well.  Most of the world, however, disagrees, and as maintainers of open source projects we have to somehow keep the peace. (Note I use the term liberal as it is used in the history of ideas --  in the West it is no less prevalent on the mainstream right than on the mainstream left, though the application may be different.)

Since many of these issues necessarily exist in tension with eachother, there is no such thing as a perfect culture.  It isn't even clear the West does better than Southeast Asia on the whole (in fact I would say the SE Asia does better than the West on the whole).  But all culture is an effort at these tradeoffs, and it is not the job of open source communities to push Western changes on the rest of the world.

What is Social Justice?  Two Theories and a Problem

If open source is inherently political then social justice must in some way matter to open source.  Naturally we must understand what social justice is and how it applies.  Certainly a sense of being treated fairly by the community is essential for contributors from all walks of life.  The cult of meritocracy is an effort at social justice within the community.  As some argue it is not entirely without problems (see below) but as a technical community it is a start.

Western concepts of justice today tend to stress individuality, responsibility, and autonomy.  The idea is that justice is something that exists between individuals, and maybe between individuals and the state.  And while contemporary Western social justice theorists on the left try to relate the parts to the whole of society, it isn't clear that there is room for any parts other than the isolated individual and the state in their theories.  If one starts with the view that humans are born free but everywhere in chains (Rousseau), then the job of the state is to liberate people from eachother, and that leaves no room for any other parts.

The individualist view of justice, when seen as primary, breaks down in a number of important ways.  The most important is that it provides no real way of understanding parts and how they can be related to the whole.  Thus, the state becomes both stronger and isolates people more from eachother, and predictability becomes more important than human judgement.  Separatism cannot be tolerated, and assimilationism becomes the rallying cry when it comes to how the central model of normality should deal with those outside.  In other words the only way that this approach can deal with those on the margins is to destroy their culture and assimilate the individuals remaining.  Resistance must be made futile (Opalgate can be seen as such an effort).  For this reason, this view of justice is incompatible with real cultural pluralism.   This is not a question of the political spectrum in the US or Europe.  It is a fundamental cultural assumption in the much of the West.  Interestingly, the insistence that the personal is political means that intellectual feminism already exists in tension with this cold, mechanical view of justice.

Another view of justice can be found in Thomas Aquinas's view that in addition to justice between individuals, there is a need to recognize that just as individuals are parts in relation to the whole, so are other organs within society.  In other words, justice is a function of power, and justice is in part about just  design and proper distribution of power and responsibility.  In this regard, Aquinas built on the thought experiments of Plato's Republic and the Politics of Aristotle.  In this regard, key questions of social justice include the structure of an open source community and the relationship between the parts of the community (how users and developers interact and share power and responsibility), the relationship between open source projects and so forth.

In the end though, there was a reason why Socrates eventually rejected every formulation of justice he pondered.  Justice itself is complex and to formulate it removes a critical component of it, namely human judgement when weighing harms which are not directly comparable.  I think it is therefore quite necessary for people to remain humble about the topic and to realize that nobody sees all the pieces, and that we as humans learn more from disagreement than from agreement.  Therefore every one of us is ignorant to some extent on the nature of justice and so disagreements are healthy.

Open Source Projects and So-Called Social Justice Warriors

Coraline Ehmke, in her post on ticket on Opalgate asked:

Is this what the other maintainers want to be reflected in the project? Will any transgender developers feel comfortable contributing?"

This is  good question but another question needs to be asked as well.  Given that a lot of people live in societies with very different family and social structures, should people feel comfortable using software if the maintainers of the project have come out as openly hostile to the traditional family structures in a culture?  Does not a community that is welcoming of all need to avoid the impulse to delegitimize social institutions in other cultures, ones where one necessarily lacks an understanding into how it plays into questions of economic support and power?  If open source is already political do we want to ally ourselves with groups that could alienate important portions of our user base by insisting that they change their way of life?

It is important that we maintain a community that is welcoming to all, but that means we have to work with people we disagree with.  A mere difference of opinion should never be sufficient to trigger a problem with the code of conduct and expressing an opinion outside community resources should never be sufficient to consider the community unduly unwelcoming.  A key component of the community is whether people can work together with people when they disagree, and forcing agreement or even silencing opposition is the opposite of social justice when it comes to a large-reaching global project.

Should open source communities eject social justice warriors as ESR suggests?  Not if they are willing to work comfortably with people despite disagreements on hot button issues.  Should we welcome them?  If they are willing to work with people comfortably despite disagreements on hot button issues.  Should we require civility?  Yes.  Should we as communities take stances on hot button issues internationally?  Absolutely not.  What about as individuals?  Don't we have a civic duty to engage in our own communities as we feel best?  And if both those are true, must we not be tolerant of a wide range of differences in opinion, even those we find deeply and horribly wrong?

Wednesday, September 17, 2014

PGObject Cookbook Part 2.1: Serialization and Deserialization of Numeric Fields


This article demonstrates the simplest cases regarding autoserialization and deserialization to the database of objects in PGObject.   It also demonstrates a minimal subset of the problems that three valued logic introduces and the most general solutions to those problems.  The next article in this series will address more specific solutions and more complex scenarios.

The Problems

Often times we want to have database fields automatically turned into object types which are useful to an application.  The example here turns SQL numeric fields into Perl Math::Bigfloat objects. However the transformation isn't perfect and if not carefully done can be lossy.  Most applications types don't support database nulls properly and therefore a NULL making a round trip may end up with an unexpected value if we aren't careful.  Therefore we have to create our type in a way which can make round trips in a proper, lossless way.

NULLs introduce another subtle problem with such mappings, in that object methods are usually not prepared to handle them properly.  One solution here is to try to follow the basic functional programming approach and copy on write.  This prevents a lot of problems.  Most Math::BigFloat operations do not mutate the objects so we are relatively safe there, but we still have to be careful.

The simplest way to address this is to build into one's approach a basic sensitivity into three value logic.  However, this poses a number of problems, in that one can accidentally assign a value which can have other values which can impact things elsewhere.

A key principle on all our types is that they should handle a null round trip properly for the data type, i.e. a null from the db should be turned into a null on database insert.  We generally allow programmers to check the types for nulls, but don't explicitly handle them with three value logic in the application (that's the programmer's job).

The Example Module and Repository

This article follows the code of PGObject::Type::BigFloat..  The code is licensed under the two-clause BSD license as is the rest of the PGObject framework.  You can read the code to see the boilerplate.  I won't be including it in here.  I will though note that this extends the Math::BigFloat library which provides arbitrary precision arithmetic for PostgreSQL and is a good match for LedgerSMB's numeric types.

NULL handling

To solve the problem of null inputs we extend the hashref slightly with a key _pgobject_undef and allow this to be set or checked by applications with a function "is_undef."  This is fairly trivial:

sub is_undef {
    my ($self, $set) = @_;
    $self->{_pgobject_undef} = $set if defined $set;
    return $self->{_pgobject_undef};

How PGObject Serializes

When a stored procedure is called, the mapper class calls PGObject::call_procedure with an enumerated set of arguments.  A query is generated to call the procedure, and each argument is checked for a "to_db" method.  That method, if it exists, is called and the output used instead of the argument provided.  This allows an object to specify how it is serialized.

The to_db method may return either a literal value or a hashref with two keys, type and value.  If the latter, the value is used as the value literal and the type is the cast type (i.e. it generates ?::type for the placeholder and binds the value to it).  This hash approach is automatically used when bytea arguments are found.

The code used by PGObject::Type::BigFloat is simple:

sub to_db {
    my $self = shift @_;
    return undef if $self->is_undef;
    return $self->bstr;

Any type of course can specify a to_db method for serialization purposes.

How and When PGObject Deserializes

Unlike serialization, deserialization from the database can't happen automatically without the developer specifying which database types correspond to which application classes, because multiple types could serialize into the same application classes.  We might even want different portions of an application (for example in a database migration tool) to handle these differently.

For this reason, PGObject has what is called a "type registry" which specifies which types are deserialized and as what.  The type registry is optionally segmented into several "registries" but most uses will in fact simply use the default registry and assume the whole application wants to use the same mappings.  If a registry is not specified the default subregistry is used and that is consistent throughout the framework.

Registering a type is fairly straight forward but mostly amounts to boilerplate code in both the type handler and using scripts.  For this type handler:

sub register{
    my $self = shift @_;
    croak "Can't pass reference to register \n".
          "Hint: use the class instead of the object" if ref $self;
    my %args = @_;
    my $registry = $args{registry};
    $registry ||= 'default';
    my $types = $args{types};
    $types = ['float4', 'float8', 'numeric'] unless defined $types and @$types;
    for my $type (@$types){
        my $ret =
            PGObject->register_type(registry => $registry, pg_type => $type,
                                  perl_class => $self);
        return $ret unless $ret;
    return 1;

Then we can just call this in another script as:


Or we can specify a subset of types or different types, or the like.

The deserialization logic is handled by a method called 'from_db' which takes in the database literal and returns the blessed object.  In this case:

sub from_db {
    my ($self, $value) = @_;
    my $obj = "$self"->new($value);
    $obj->is_undef(1) if ! defined $value;
    return $obj;

This supports subclassing, which is in fact the major use case.

Use Cases

This module is used as the database interface for numeric types in the LedgerSMB 1.5 codebase.  We subclass this module and add support for localized input and output (with different decimal and thousands separators).  This gives us a data type which can present itself to the user as one format and to the database as another.  The module could be further subclassed to make nulls contageous (which in this module they are not) and the like.


PGObject::Type::BigFloat does not currently handle making the null handling contageous and this module as such probably never will, as this is part of our philosophy of handing control to the programmer.  Those who do want contageous nulls can override additional methods from Math::BigFloat to provide such in subclasses.

A single null can go from the db into the application and return to the db and be serialized as a null, but a running total of nulls will be saved in the db as a 0.  To this point, that behavior is probably correct.  More specific handling of nulls in the application, however, is passed to the developer which can check the is_undef method.

Next In Series:  Advanced Serialization and Deserialization:  Dates, Times, and JSON

Monday, September 15, 2014

PGObject Cookbook Part 1: Introduction


I have decided to put together a PGObject Cookbook, showing the power of this framework.  If anyone is interested in porting the db-looking sides to other languages, please let me know.  I would be glad to provide whatever help my time and skills allow.

The PGObject framework is a framework for integrated intelligent PostgreSQL databases into Perl applications.  It addresses some of the same problems as ORMs but does so in a very different way.  Some modules are almost ORM-like and more such modules are likely to be added in the future.  However unlike an ORM, PGObject mostly serves as an interface to stored procedures and whatever code generation routines will be added, these are not intended to be quickly changed.  Moreover it only supports PostgreSQL because we make extended use of PostgreSQL-only features.

For those who are clearly not interested in Perl, this series may still be interesting as it not only covers how to use the framework but also various problems that happen when we integrate databases with applications.  And there are people who should not use this framework because it is not the right tool for the job.  For example, if you are writing an application that must support many different database systems, you probably will get more out of an ORM than you will this framework.  But you still may get some interesting stuff from this series so feel free to enjoy it.

Along the way this will explore a lot of common problems that happen when writing database-centric applications and how these can be solved using the PGObject framework.  Other solutions of course exist and hopefully we can talk about these in the comments.

Much of the content here (outside of the prefaces) will go into a documentation module on CPAN.  However I expect it to also be of far more general interest since the problems are common problems across frameworks.


PGObject is written under the theory that the database will be built as a server of information and only loosely tied to the application.  Therefore stored procedures should be able to add additional parameters without expecting that the application knows what to put there, so if the parameter can accept a null and provide the same answer as before, the application can be assured that the database is still usable.

The framework also includes a fairly large number of other capabilities.  As we work through we will go through the main areas of functionality one at a time, building on the simplest capabilities and moving onto the more advanced.  In general these capabilities can be grouped into basic, intermediate, and advanced:

Basic Functionality

  1. registered types, autoserialization, and autodeserialization.
  2. The simple stored procedure mapper
  3. Aggregates and ordering
  4. Declarative mapped methods

Intermediate Functionality

  1. The Bulk Loader
  2. The Composite Type stored procedure mapper
  3. The database admin functions

Advanced Functionality

  1. Memoization of Catalog Lookups
  2. Writing your own stored procedure mapper

This series will cover all the above functionality and likely more.  As we get through the series, I hope that it will start to make sense and we will start to get a lot more discussion (and hopefully use) surrounding the framework.

Design Principles

The PGObject framework came out of a few years of experience building and maintaining LedgerSMB 1.3.  In general we took what we liked and what seemed to work well and rewrote those things that didn't.  Our overall approach has been based on the following principles:
  • SQL-centric: Declarative, hand-coded SQL is usually more productive than application programming languages.  The system should leverage hand-coded SQL.
  • Leveraging Stored Procedures and Query Generators: The system should avoid having people generate SQL queries themselves as strings and executing them.  It's better to store them persistently in the db  or generate well-understood queries in general ways where necessary.
  • Flexible and Robust: It should be possible to extend a stored procedure's functionality (and arguments) without breaking existing applications.
  • DB-centric but Loosely Coupled:  The framework assumes that databases are the center of the environment, and that it is a self-contained service in its own right.  Applications need not be broken because the db structure changed, and the DB should be able to tell the application what inputs it expects.
  • Don't Make Unnecessary Decisions for the Developer:  Applications may use a framework in many atypical ways and we should support them.  This means that very often instead of assuming a single database connection, we instead provide hooks in the framework so the developer can decide how to approach this.  Consequently you can expect your application to have to slightly extend the framework to configure it.
This framework is likely to be very different from anything else you have used.  While it shares some similarities with iBatis in the Java world, it is unique in the sense that the SQL is stored in the database, not in config files.  And while it was originally inspired by a number of technologies (including both REST and SOAP/WSDL), it is very much unlike any other framework I have come across.

Next in Series:  Registered Types:  Autoserialization and Deserialization between Numeric and Math::BigFloat.

Sunday, September 14, 2014

LedgerSMB 1.4.0 Released

15 September 2014, London. The LedgerSMB project - all-volunteer developers and contributors - today announced LedgerSMB 1.4.0.

Based on an open source code base first released in 1999, the LedgerSMB project was formed in 2006 and saw it's 1.0 release in the same year. It has now seen continuous development for over eight years and that shows no signs of slowing down.

"LedgerSMB 1.4 brings major improvements that many businesses need," said Chris Travers, who helped found the project. "Businesses which do manufacturing or retail, or need features like funds accounting will certainly get much more out of this new release."

Better Productivity

LedgerSMB 1.4 features a redesigned contact management framework that allows businesses to better keep track of customers, vendors, employers, sales leads, and more. Contacts can be stored and categorized, and leads can be converted into sales accounts.

Additionally, a new import module has been included that allows businesses to upload csv text files to import financial transactions and much more. No longer is data entry something that needs to be done entirely by hand or involves customizing the software.

Many smaller enhancements are here as well, For example, shipping labels can now be printed for invoices and orders, user management workflows have been improved,

Better Reporting

The reporting interfaces have been rewritten in LedgerSMB 1.4.0 in order to provide greater flexibility in both reporting and in sharing reports. Almost all reports now include a variety of formatting options including PDF and CSV formats. Reports can also be easily shared within an organization using stable hyperlinks to reports. Additionally the inclusion of a reporting engine means that it is now relatively simple to write third-party reports which offer all these features. Such reports can easily integrate with LedgerSMB or be accessed via a third party web page.

Additionally, the new reporting units system provides a great deal more flexibility in tracking money and resources as they travel through the system. Not only can one track by project or department, but funds accounting and other specialized reporting needs are possible to meet.

Better Integration

Integration of third-party line of business applications is also something which continues to improve. While all integration is possible, owing to the open nature of the code and db structure, it has become easier as more logic is moved to where it can be easily discovered by applications.

There are two major improvement areas in 1.4. First additional critical information, particularly regarding manufacturing and cost of goods sold tracking, has been moved into the database where it can be easily shared by other applications. This also allows for better testability and support. Secondly LedgerSMB now offers a framework for web services, which are currently available for contact management purposes, allowing integrators to more easily connect programs together.

Commercial Options

LedgerSMB isn't just an open source project. A number of commercial companies offer support, hosting, and customization services for this ERP. A list of some of the most prominant commercial companies involved can be found at