Tuesday, November 12, 2013

On CPAN, Community, and P: A Case Study in What Not to Do

I am going to try to do this piece as respectfully as I can.  I understand people put a lot of work into developing things and they submit them, and when they get panned, it can be difficult.  At the same time, community resources are community resources and so a failure to conduct such case studies in things gone amiss can lead to all kinds of bad things.  Failure to get honest feedback can lead to people not improving, but worse, it can leave beginners sometimes mistakenly believing that bad practices are best practices.  There is also a period of time after which bad practices become less easily excused. 

So somewhat reluctantly I am going to undertake such a study here.  This is solely about community interfacing.  I am not going to critique code.  Rather I would hope that this post can be a good one regarding understanding some of the problems regarding community interfaces generally, whether CPAN, PGXN, or others.  The lessons apply regardless of language or environment and the critiques I offer are at a very different level than critiques of code.

So with this, I critique the P CPAN module from a community coding perspective.  This module exports a single function called "P" which acts kind of like printf and sprintf.  It would be an interesting exercise in learning some deep aspects of Perl but from a community resource perspective, it suffers from enough issues to make it a worthy case study.

The gist of this is that community resources require contemplating how something fits into the community and working with the community in mind.  I cool idea or something one finds useful is not always something that is a candidate for publishing as a community resource, at least not without modifications aimed at carefully thinking how things fits into more general processes.

Four of my own CPAN modules are actually rewrites of code that I wrote in other contexts (particularly for LedgerSMB), and rewrote specifically for publication on CPAN.  In general there is a huge gulf between writing a module for one project or one developer and writing it for everyone.  I believe, looking at P, that it is just something the developer thought was useful personally and published it as is without thinking through any of these issues.  This is all too common and so going through these I hope will prevent too many from making the same mistakes.

Namespace Issues


The name 'P' as an extraordinarily bad choice of naming for a public module.  Perl uses nested namespaces, and nesting implies a clear relationship, such as inheritance (though other relationships are possible too).

Taking a top-level namespace is generally discouraged on CPAN where a second or third level namespace will suffice.  There are times and places for top-level namespaces, for example for large projects like Moose, Moo, and the like.  In general these are brand names for conglomerates of modules, or they are functional categories.  They are not shorthand ways of referring to functionality to save typing.

'P' as a name is not helpful generally, and moreover it denies any future large project that namespace.  The project is not ambitious enough to warrant a top-level namespace.  There is no real room for sub-modules and so there are real problems with having this as a top-level module.

Proper Namespacing


It's helpful, I think, to look at three different cases for how to address namespacing.  All three of these are ones I maintain or am planning to write.  I believe they follow generally acceptable practices generally although I have received some criticism for PGObject being a top-level namespace as well.

  • PGObject is a top-level namespace housing, currently three other modules (perhaps ten by next year).  I chose to make it a top-level namespace because it is a framework for making object frameworks, and not a simple module.  While the top-level module is a thin "glue" module, it offers services which go in a number of different directions, defying simple categorization. 

    Additionally the top-level module is largely a framework for building object frameworks, which complicates the categorization further,.  In this regard it is more like Moose than like Class::Struct.  Sub-modules include PGObject::Simple (a simple framework using PGObject, not a simple version of PGObject), PGObject::Simple::Role, and PGObject;:Type::BigFloat.
  • Mail::RoundTrip is a module which allows web-based applications to request email verification by users.  The module offers only a few pieces of functionality and is not really built for extensibility.  This should not be a top-level module.
  • Device::POS::Printer is a module I have begun to write for point of sale printers, providing a basic interface for printing, controlling cash drawers, getting error messages, etc.  The module is likely to eventually have a large  number of sub-modules, drivers for various printers etc, but putting Device:: in front does no real harm and adds clarity.  There's no reason to make it a top-level namespace.

The main point is thinking about how your module will fit into the community, how it will be found, etc.  'P' a name which suggests these have not been considered.

Exports

The P module exports a single function, P() which functions like printf and sprintf.  The major reason for this, according to the author, is both to add additional checking and to save typing.  Saving typing is not a worthy goal by itself, though neither is verbosity.  Condensing a function which takes over two different functions to a single letter, however, is not a recipe for good, readable code.  If others follow suit, you could get code like this:

P(A(R("This is a string", 3));

Now maybe this code is supposed to print the ASCII representation of "This is a string" repeated three times.  However that is not obvious from the code, leading to code that is hard to read or debug.

Proper Exports 


In Perl, exports affect the language.  Exports are thus to be used sparingly as they can lead to conflicts which can lead to hard to maintain code.  Exports should be rare, well documented, and not terribly subject to name collision.  They should also be verbose enough they can be understood without tremendous prior knowledge of the module.  P() as an exported function meets none of these criteria.

A good example of exports done right would be a function like has() used by Moose, Mouse, and Moo.  The function is exported and used to declaratively define object properties.  The convention has become widespread because it is obvious what it does.  Again this does not matter so much for personal projects, but it does for published modules on a community repository.

Test Failure Report Management


 The CPANTesters page for P shows that every version on CPAN has had test failures.  This is unusual.  Most modules have clear passes most of the time.  Even complex modules like DBD::Pg show a general attention to test failures and a relatively good record.  A lack of this attention shows a lack of interest in community use, and that fixes to test failures, needed for people to use the library, are just not important.  So if you manage a module, you really want to take every test failure seriously.


Conclusions


Resources like CPAN, CTAN, PGXN, and the like are subject to one important rule.  Just because it is good for your own personal use does not make it appropriate for community publication as a community resources.  Writing something that fits the needs of a specific project, or a specific coder's style is very different from writing something that helps a wide range of programmers solve a wide range of problems.  These community resources are not places to upload things just because one wrote them.  They are places to interface with the community through work.  Getting pre-review, post-review, and paying attention to the needs of others is critically important.

6 comments:

  1. While I agree with everything you've said here I'd really be more interested in your observations and conclusions regarding the mechanics of CPAN submission protocols and how they could be improved. Whether it is systemic or simply a technology failure I would fault the community moreso than the contributor. The fact this person wanted to contribute back to the community is a good thing; the fact their submission did not conform to some unwritten and/or unenforced standard should not be held against them. Even an experienced contributor is going to have significant bias toward their contribution so they choices of naming and such is likely going to be naturally different than what the community would choose. To that end what are your suggestions for how one, newbie especially, should go about "Getting pre-review, post-review, and pay(ing) attention to the needs of others...". ISTM that a newcomer is likely to reasonably assume that CPAN would be the vehicle by which one introduces products to said community for their consideration.


    ReplyDelete
    Replies
    1. I think the inherent problems in community resources like this are generalizable to community problems on every level, and scaling is hard. But you are right about one thing in that the point isn't so much to blame the contributor of this module so much as to highlight a set of problematic patterns. Nonetheless I think that there are some things that are generally community problems and I think this is a productive discussion. I would frame it less about improving process though than improving the interconnectedness of the community.

      The first big issue, to be frank, is this idea that "if you build it, they will come." This leads people to think, "I wrote this and I find it useful so others will too." The problem is that writing interfaces with others in mind is different than writing them in a more narrow framework. This is a cultural issue with open source software in general but it is understood not to be the case implicitly by large, successful projects. It's not enough to build it. One has to design it for others. That's a cultural attitude in our industry that has to be confronted.

      Before I got ready to submit the PGObject modules (there are now 5 of them with two more under development, and in use by some of my non-CPAN projects), I had no idea that Prepan existed. It may not have existed when P was first built.

      As a new CPAN maintainer, though there are some things that could be improved.

      1. PAUSE would do well to have maintained list of community resources for things like pre- and post-review.

      2. The PAUSE account setup process would do well to introduce new account members to these resources, where to get help, ask questions, get community feedback, etc.

      3. Maybe there would be some value in a "Pause before you PAUSE" section in the PAUSE signup process? Something to remind new users that writing for the community is different than writing for personal or project use?

      To be honest I am not big on enforcing process. In my experience this kills the valuable distributed nature of open source development. But key reminders that the community is there to help and that there are a large number of resources available would help a great deal. Once you enforce process, the dynamic aspects of the community go out the window. But what is important is to get new users thinking about how other people will use their libraries.

      Delete
    2. Might be nice if prepan was a forced stage...

      Delete
  2. You should add your blog to http://ironman.enlightenedperl.org/ and then tag this with ironman or perl so that the Perl community can see it. (note: I'm considering resharing as a blog post)

    ReplyDelete
  3. BTW, what do you think about the other one-letter modules on CPAN? Like L, V, B, or U :-)

    ReplyDelete
  4. I think Stratopan.com could help with CPAN pollution. As a private CPAN, Stratopan gives authors a place to publish modules that aren't ready or appropriate for the public CPAN. Modules can be installed from Stratopan using the standard Perl tool chain, so it feels as natural as installing from the public CPAN.

    ReplyDelete