Sunday, February 21, 2016

A couple annoyances (and solutions) regarding partitioned tables

In one of my projects we had an issue where a large table that was under huge transactional load was having trouble with autovacuum not keeping up.  The problem was that the table sometimes held over half a billion records, added and deleted millions of records a day, and that since most of these occurred at the heads of various indexes, autovacuum was just not fast enough.

So we decided to partition the table into around 50 pieces in order to allow autovacuum to achieve a bit better parallelism in managing the data.  This helped to some extent.  But partitioning is a rare solution for rare problems and comes with unexpected costs.   Interestingly most of our problems have been ORM-related.  Here are some we ran into and their solutions (spoiler:  at the end of the day, effectively, we stopped using an ORM on these tables).  At the end of the day, throughput on these tables was increased around 10-fold, and db load cut by about 90%.

Annoyance 1:  Redirection and ORM transparency


The first problem we had was getting DBIx::Class to work with the partitioned table.  The solution was to add another view in between which did the redirection of inserts, updates, and deletes.  This also allowed us to go through the ORM for inserts (we still do) without the cross-locking issues below being a problem.

Annoyance 2:  Cross-locking and exclusion constraints


A second major problem is that autovacuum can only free up space when it gets an exclusive lock and if any queries are going through the parent table, then you get constraint exclusion coming into play.  The problem here is that constraint exclusion takes out a relatively non-invasive lock on every table at planning time which means you cannot even plan to select a row from one partition if another partition is locked, if you are going through the parent table.

The obvious solution here is not to go through the parent table, but the ORM doesn't support that so we had to drop to SQL.  It also took us about 6 months to find and fix.

Annoyance 3:  Constraint exclusion doesn't always do what you expect it to!


One day we had a very slow running straight-forward query that should have been able to resolve quickly on an index scan on one of the partitions. However, because the constraint criteria was being brought in via a subquery, it was not available at plan time, so it was falling back on a sequential scan through another large partition.  Ouch......  Found the query and fixed it.

Annoyance 4:  Solving some performance problems puts more stress on the next bottleneck


The result of the initial success was increased db concurrency, which was great until it became clear our selection of rows to process and delete was leading to lots of indexes having huge numbers of dead tuples at their heads.  This meant that selecting rows actually became slower than before.  So we had to go back and engineer a new selection algorithm to avoid this problem....

Unrelated Annoyance:  Long running transactions causing autovacuum headaches


An interesting unrelated issue we had was the fact that at the time, we had transactions that would sometimes remain open for a week.  While the partitions directly affected were small, the problem is that autovacuum cannot clear tuples that are invalidated since the oldest transaction started, so higher processing throughput partitions were adversely affected.  After significant effort, we got the worst offenders corrected and now the longest running transactions take just over a day.  This is usually sufficient depending on the load of the system (but sometimes the duration spikes to 18 hours).

Was the partitioning worth it?  Definitely!  However it was a bit of a long road to get there

Wednesday, February 10, 2016

Why Commons Should Not Have Ideological Litmus Tests

This will likely be my last post on this topic.  I would like to revive this blog on technical rather than ideological issues but there seems like a real effort to force ideology in some cases.  I don't address this in terms of specific rights, but in terms of community function and I have a few more things to say on this topic before I return to purely technical questions.

I am also going to say at the outset that LedgerSMB adopted the Ubuntu Code of Conduct very early (thanks to the suggestion of Joshua Drake) and this was a very good choice for our community.  The code of conduct provides a reminder for contributors, users, participants, and leadership alike to be civil and responsible in our dealings around the commons we create.  Our experience is we have had a very good and civil community with contributions from every walk of life and a wide range of political and cultural viewpoints.  I see  this as an unqualified success.

Lately I have seen an increasing effort to codify a sort of political orthodoxy around open source participation.  The rationale is usually about trying to make people feel safe in a community, but these are usually culture war issues so invariably the goal is to exclude those with specific political viewpoints (most of the world) from full participation, or at least silence them in public life.  I see this as extremely dangerous.

On the Economic Nature of Open Source


Open source software is economically very different from the sorts of software developed by large software houses.  The dynamics are different in terms of the sort of investment taken on, and the returns are different.  This is particularly true for community projects like PostgreSQL and LedgerSMB, but it is true to a lesser extent even for corporate projects like MySQL.  The economic implications thus are very different.

With proprietary software, the software houses build the software and absorb the costs for doing so, and then later find ways to monetize that effort.  In open source, that is one strategy among many but software is built as a community and in some sense collectively owned (see more on the question of ownerership below).

So with proprietary software, you may have limited ownership over the software, and this will be particularly limited when it comes to the use in economic production (software licenses, particularly for server software, are often written to demand additional fees for many connections etc).

Like the fields and pastures before enclosure, open source software is an economic commons we can all use in economic production.  We can all take the common software and apply it to our communities, creating value in those areas we value.  And we don't all have to share the same values to do it.  But it often feeds our families and more.

But acting as a community has certain requirements.  We have to treat eachother with humanity generally.  That doesn't mean we have to agree on everything but it does mean that some degree of civility must be maintained and cultivated by those who have taken on that power in open source projects.

On the Nature of Economic Production, Ownership and Power (Functionally Defined)


I am going to start by defining some terms here because I am using these terms in functional rather than formal ways.

Economic Production:  Like all organisms we survive by transforming our environment and making it more conducive to our ability to live and thrive.  In the interpersonal setting, we would call this economic production.  Note that understood in this way, this is a very broad definition and includes everything from cooking dinner for one's family to helping people work together.  Some of this may be difficult to value but it can (what is the difference between eating out and eating at home?  How much does a manager contribute to success through coordination?).

Ownership:  Defining ownership in functional rather than formal terms is interesting.  It basically means the right to use and direct usage of something.  Seen in this way, ownership is rarely all or nothing.  Economic ownership is the right to utilize a resource in economic production.  The extent to which one is restricted in economic production using a piece of software the less one owns it, so CAL requirements in commercial software and anti-TIVOization clauses in the GPL v3 are both restrictions on functional ownership.

Economic Power:  Economic power is the power to direct or restrict economic production.  Since economic production is required for human life, economic power is power over life itself.  In an economic order dominated by corporations, corporations control every aspect of our lives.  In places where the state has taken over from the corporations, the state takes over this as well.  But such power is rarely complete because not all economic production can be centrally controlled.

I am going to come back to these below because my hesitation on kicking people out of the community due to ideological disagreements (no matter how wrong one side may seem to be) have to do with this fear of abuse of economic power.


On Meritocracy (and what should replace it)


Meritocracy is an idea popularized by Eric Raymond, that power in a community should be given to technical merit.  In short, one should judge the code, not the person.  The idea has obvious appeal and is on the surface hyper-inclusive.  We don't have to care about anything regarding each other other than quality of code.  There is room for everyone.

More recently there has been push-back in some corners against the idea of meritocracy.  This push-back comes from a number of places, but what they have in common is questioning how inclusive it really is.

The most popular concern is that meritocracy suggests that we should tolerate people who actively make the community less welcoming, particularly for underrepresented groups. and therefore meritocracy becomes a cover for excluding the same groups who are otherwise excluded in other social dimensions, that the means of exclusion differs but who is excluded might not.

There is something to be said for the above concern, but advocates have often suggested that any nexus between community and hostile ideas is sufficient to raise a problem and therefore when an Italian Catholic expresses a view of gender based on his religion on Twitter, people not even involved in the project seek his removal from it on the grounds that the ideas are toxic.  For reasons that will become clear, that is vast overreach, and a legitimate complaint is thus made toxic by the actions of those who promote it.  And similarly toxic are the efforts by some to use social category to insist that their code should be included just to show a welcoming atmosphere.

A larger problem with meritocracy though is the way it sets up open source communities to be unbalanced, ruled by technical merit and thus not able to attract the other sorts of contributions needed to make most software successful.  In a community where technical merit is the measure by which we are judged, non-technical contributions are systematically devalued and undervalued.  How many open source communities produce software which is poorly documented and without a lot of attention to user interface?  If you devalue the efforts at documentation and UI design, how will you produce software which really meets people's needs?  If you don't value the business analysts and technical writers, how will you create economic opportunities for them in your community?  If you don't value them, how will you leverage their presence to deliver value to your own customers?  You can't if your values are skewed.

The successor to meritocracy should be economic communitarianism, i.e. the recognition that what is good for the community is economically good for all its members.  Rather than technical merit, the measure of a contribution and a contributor ought to be the value that a contribution brings the community.    Some of those will be highly technical but some will not.  Sometimes a very ordinary contribution that anyone could offer will turn the tide because only one person was brave enough to do it, or had the vision to see it as necessary.  Just because those are not technical does not mean that they are not valuable or should not be deeply respected.  I would argue that in many ways the most successful open source communities are the ones which have effectively interpreted meritocracy loosely as economic communitarianism.

On Feeling Safe in the Community


Let's face it  People need to feel safe and secure in the community regarding their physical safety and economic interests.  Is there any disagreement on this point?  If there is, please comment below.  But the community cannot be responsible for how someone feels, only in making sure that people are objectively physically and economically secure within it.  If someone feels unsafe in attending conferences, community members can help address security concerns and if someone severely misbehaves in community space, then that has to be dealt with for the good of everyone.

I don't think the proponents of ideological safety measures have really thought things through entirely.  The world is a big place and it doesn't afford people ideological safety unless they don't go out and work with people they disagree with.  As soon as you go across an international border, disagreements will spring up everywhere and if you aren't comfortable with this then interacting on global projects is probably not for you.

Worse, when it comes to conduct outside of community circles, those in power in the community cannot really act constructively most of the time.  We don't have intimate knowledge and even if we do, our viewpoints have to be larger than the current conflict.

On "Cultural Relativism:" A welcoming community for all?


One of the points I have heard over and over in discussions regarding community codes of conduct is that welcoming people regardless of viewpoint (particularly on issues like abortion, sexuality, etc) is cultural relativism and thus not acceptable.  I guess the question is not acceptable to whom?  And do we really want an ideological orthodoxy on every culture war topic to be a part of an open source project?  Most people I have met do not want this.

But the overall question I have for people who push culture war codes of conduct is "when you say a welcoming community for all, do you really mean it?  Or do you just mean for everyone you agree with?  What if the majority changes their minds?"

In the end, as I will show below, trying to enforce an ideological orthodoxy in this way does not bring marginal groups into the community but necessary forces a choice of which marginal groups to further exclude.  I don't think that is a good choice and I will go on record and say it is a choice I will steadfastly refuse to make.

A Hypothetical


Ideology is shaped by culture, and ideology of sexuality is shaped by family structures, so consequently where family structures are different, views on sexuality will be also.

So suppose someone on a community email list includes a pro-same-sex marriage email signature, something like:

"Marriage is an institution for the benefit of the spouses, not [to] bind parents to their children" -- Ted Olson, arguing for a right to same-sex marraige before the United States Supreme Court.

So a socially conservative software developer from southern India complaints to the core committee saying that this is an attack on his culture, saying that traditional Indian marriages are not real marriages.  Now, I assume most people would agree that it would be best for the core committee not to insist that the email signature be changed for someone to continue to participate.  So with such a decision, suppose the complainant changes his signature instead to read:

"If mutual consent makes a sexual act moral, whether within marriage or without, and, by parity of reasoning, even between members of the same sex, the whole basis of sexual morality is gone and nothing but misery and defect awaits the youth of the country… " -- Mohandas Gandhi

Now the first person decries the signature as homophobic and demands the Indian fellow be thrown off the email list.  And the community, if it has decided to follow the effort at ideological safety has to resolve the issue.  Which group to exclude?  The sexual minority?  Or the group marginalized through a history of being on the business end of colonialism?  And if one chooses the latter, then what does that say about the state of the world?  Should Indians, Malaysians, Catholics, etc. band together to fork a competing project?  Is that worth it as a cost?  Doesn't that hurt everyone?

On Excluding People from the Commons


In my experience, excluding people from the commons carries with it massive cost, and this is a good thing because it keeps economic power from being abused.  I have watched the impact first hand.  LedgerSMB would not even exist if this weren't an issue with SQL-Ledger.  That we are now the only real living fork of SQL-Ledger and far more active than the project we forked from is a testament to the cost.

Of course in that case the issue was economic competition and a developer who did not want to leverage community development to build his own business.  I was periodically excluded from SQL-Ledger mailing lists etc for building community documentation (he sold documentation).  Finally the fork happened beccause he wouldn't take security reports seriously.  And this is one of the reasons why I would like to push for an inclusive community.

But I also experienced economic ramifications from being excluded.  It was harder to find customers (again, the reason for exclusion was economic competition so that was the point).  In essence, I am deeply aware of the implications of kicking people out.

I have seen on email lists and tracker tickets the comparison of the goal of excluding people with problematic ideologies with McCarthyism.  The goal of McCarthyism was indeed similar, to make sure that if you had the wrong ideas you would be unable to continue a professional career.  I have had relatives who suffered because they defended the legal rights of the Communist Party during that time.  I am aware of cases where the government tried to take away their professional career (unsuccessfully).

Management of community is political and the cost of excluding someone is also political.  We already exist in some ways on the margins of the software industry.  Exclude too many people and you create your own nemesis.  That's what happened to SQL-Ledger and why LedgerSMB is successful today.

Notes on former FreeBSDGirl


One blog entry that comes from the other side of this issue is Randi Harper's piece on why she no longer will go to FreeBSD conferences and participate on IRC channels.   I am not familiar with the facts surrounding her complaints and frankly I don't have time to be so what the nature of her harassment complaint is, I will not be the judge.

There is however another side to the issue that is outside what she evidently has experience with, and that is the role of software maintainers in addressing the sorts of complaints she made.  Consequently I want to address that side and then discuss her main points at the bottom.

One thing to remember is that when people make accusations of bullying, harassment, etc. the people in charge are also the people with the least actual knowledge of what is going on.  Expecting justice from those in power in cases like this will lead, far more often than not, to feelings of betrayal.  This is not because of bad intentions but because of lack of knowledge.  This was one thing I learned navigating schoolyard bullies when I was growing up and we as project maintainers are in an even lower knowledge role than school administrators are.  Bullies are furthermore usually experts at navigating the system and take advantage of those who are not as politically adept, so the more enforcement you throw at the problem, the worse it gets.

So there is an idea that those in charge will stop people from treating eachother badly.  That has to stop because it isn't really possible (as reasonable as it sounds).  What we can do is keep the peace in community settings and that is about it.  One needs bottom up solutions, not top down ones.

So if someone came to me as a maintainer of a project alleging harassment on Twitter and demanding that an active committer be removed, that demand would probably go nowhere.  If political statements were mentioned, the response would be "do we want a political orthodoxy?"  Yet LedgerSMB has avoided these problems largely because, I think, we are a community of small businesses and therefore are used to working through disagreements and maybe because we are used to seeing these sorts of things as political.

Her main points though are worth reading and pondering.  In some areas she is perfectly right and in some areas dangerously wrong.

Randi is most right in noting that personal friction cannot be handled like a technical problem.  It is a political problem and needs to be handled as such.  I don't think official processes are the primary protection here, and planning doesn't get you very far, but things do need to be handled delicately.

Secondly, there is a difference between telling someone to stay quiet and telling someone not to be shouting publicly.   I think it is worth noting that if mediation is going to work then one cannot have people trying to undermine that in public, but people do need friends and family for support and so it is important to avoid the impression that one is insisting on total confidentiality.

Randi is also correct that how one deals with conflict is a key gauge of how healthy an open source community is.  Insisting that people be banished because of politically offensive viewpoints however does not strike me as healthy or constructive.  Insisting that people behave themselves in community spaces does.  In very rare cases it may be necessary to mediate cases that involve behavior outside that, but insisting on strict enforcement of some sort of a codified policy will not bring peace or prosperity.

More controversially I will point out that there is a point that Randi makes implicitly that is worth making explicit here, namely that there is a general tone-deafness to women's actual experiences in open source.  I think this is very valid.  I can remember a former colleague in LedgerSMB making a number of complaints about how women were treated in open source.  Her complaints included both unwanted sexual attention ("desperate geeks") and more actionably the fact that she was repeatedly asked how to attract more women to open source (she responded once on an IRC channel with "do you know how annoying that is?").  She ultimately moved on to other projects following a change in employment that moved LedgerSMB outside the scope of duties,  but one obvious lesson that those of us in open source can take from this is just to listen to complaints.  Many of these are not ones that policies can solve (you really want a policy aimed at telling people not to ask what needs to be done to attract more women to open source?) but if we listen, we can learn something.

One serious danger in the current push for more expansive codes of conduct is that it puts those who have the least knowledge in the greatest responsibility.  My view is that expansive codes of conduct, vesting greater power with maintainers over areas of political advocacy outside community fora will lead to greater, not less conflict.  So I am not keen in her proposed remedies.

How Codes of Conducts Should be Used


The final point I want to bring up here is how codes of conduct should be used.  These are not things which should be seen as pseudo-legal or process-oriented documents.  If you go this way, people will abuse the system.  It is better in my experience to vest responsibility with the maintainers in keeping the peace, not dispensing out justice, and to have codes of conduct aimed at the former, not the latter.  Justice is a thorny issue, one philosophers around the world have been arguing about for millennia with no clear resolution.

A major problem is the simple fact that perception and reality don't always coincide.  I was reminded of this controversy while reading an article in The Local about the New Years Eve sexual assaults, about work by a feminist scholar in Sweden to point out that actually men are more at risk from bodily violence than women are, and that men suffer disproportionately from crime but are the least likely to modify behavior to avoid being victimized.  The article is worth reading in light of the current issues.

So I think if one expects justice from a code of conduct, one expects too much.  If one expects fairness from a code of conduct, one expects too much.  If one expects peace and prosperity for all, then that may be attainable but that is not compatible with the idea that one has a right not to be confronted by people with dangerous ideologies.

Codes of conducts, used right, provide software maintainers with a valuable tool for keeping the peace.  Used wrong, they lead open source projects into ruin.  In the end, we have to be careful to be ideologically and culturally inclusive and that means that people cannot guarantee that they are safe from ideas they find threatening.