Perspectives on LedgerSMB: On CPAN, Community, and P: A Case Study in What Not to Do

I am going to try to do this piece as respectfully as I can. I understand people put a lot of work into developing things and they submit them, and when they get panned, it can be difficult. At the same time, community resources are community resources and so a failure to conduct such case studies in things gone amiss can lead to all kinds of bad things. Failure to get honest feedback can lead to people not improving, but worse, it can leave beginners sometimes mistakenly believing that bad practices are best practices. There is also a period of time after which bad practices become less easily excused.

So somewhat reluctantly I am going to undertake such a study here. This is solely about community interfacing. I am not going to critique code. Rather I would hope that this post can be a good one regarding understanding some of the problems regarding community interfaces generally, whether CPAN, PGXN, or others. The lessons apply regardless of language or environment and the critiques I offer are at a very different level than critiques of code.

So with this, I critique the P CPAN module from a community coding perspective. This module exports a single function called "P" which acts kind of like printf and sprintf. It would be an interesting exercise in learning some deep aspects of Perl but from a community resource perspective, it suffers from enough issues to make it a worthy case study.

The gist of this is that community resources require contemplating how something fits into the community and working with the community in mind. I cool idea or something one finds useful is not always something that is a candidate for publishing as a community resource, at least not without modifications aimed at carefully thinking how things fits into more general processes.

Four of my own CPAN modules are actually rewrites of code that I wrote in other contexts (particularly for LedgerSMB), and rewrote specifically for publication on CPAN. In general there is a huge gulf between writing a module for one project or one developer and writing it for everyone. I believe, looking at P, that it is just something the developer thought was useful personally and published it as is without thinking through any of these issues. This is all too common and so going through these I hope will prevent too many from making the same mistakes.

Namespace Issues

The name 'P' as an extraordinarily bad choice of naming for a public module. Perl uses nested namespaces, and nesting implies a clear relationship, such as inheritance (though other relationships are possible too).

Taking a top-level namespace is generally discouraged on CPAN where a second or third level namespace will suffice. There are times and places for top-level namespaces, for example for large projects like Moose, Moo, and the like. In general these are brand names for conglomerates of modules, or they are functional categories. They are not shorthand ways of referring to functionality to save typing.

'P' as a name is not helpful generally, and moreover it denies any future large project that namespace. The project is not ambitious enough to warrant a top-level namespace. There is no real room for sub-modules and so there are real problems with having this as a top-level module.

Proper Namespacing

It's helpful, I think, to look at three different cases for how to address namespacing. All three of these are ones I maintain or am planning to write. I believe they follow generally acceptable practices generally although I have received some criticism for PGObject being a top-level namespace as well.

PGObject is a top-level namespace housing, currently three other modules (perhaps ten by next year). I chose to make it a top-level namespace because it is a framework for making object frameworks, and not a simple module. While the top-level module is a thin "glue" module, it offers services which go in a number of different directions, defying simple categorization.

Additionally the top-level module is largely a framework for building object frameworks, which complicates the categorization further,. In this regard it is more like Moose than like Class::Struct. Sub-modules include PGObject::Simple (a simple framework using PGObject, not a simple version of PGObject), PGObject::Simple::Role, and PGObject;:Type::BigFloat.
Mail::RoundTrip is a module which allows web-based applications to request email verification by users. The module offers only a few pieces of functionality and is not really built for extensibility. This should not be a top-level module.
Device::POS::Printer is a module I have begun to write for point of sale printers, providing a basic interface for printing, controlling cash drawers, getting error messages, etc. The module is likely to eventually have a large number of sub-modules, drivers for various printers etc, but putting Device:: in front does no real harm and adds clarity. There's no reason to make it a top-level namespace.

The main point is thinking about how your module will fit into the community, how it will be found, etc. 'P' a name which suggests these have not been considered.

Exports

The P module exports a single function, P() which functions like printf and sprintf. The major reason for this, according to the author, is both to add additional checking and to save typing. Saving typing is not a worthy goal by itself, though neither is verbosity. Condensing a function which takes over two different functions to a single letter, however, is not a recipe for good, readable code. If others follow suit, you could get code like this:

P(A(R("This is a string", 3));

Now maybe this code is supposed to print the ASCII representation of "This is a string" repeated three times. However that is not obvious from the code, leading to code that is hard to read or debug.

Proper Exports

In Perl, exports affect the language. Exports are thus to be used sparingly as they can lead to conflicts which can lead to hard to maintain code. Exports should be rare, well documented, and not terribly subject to name collision. They should also be verbose enough they can be understood without tremendous prior knowledge of the module. P() as an exported function meets none of these criteria.

A good example of exports done right would be a function like has() used by Moose, Mouse, and Moo. The function is exported and used to declaratively define object properties. The convention has become widespread because it is obvious what it does. Again this does not matter so much for personal projects, but it does for published modules on a community repository.

Test Failure Report Management

The CPANTesters page for P shows that every version on CPAN has had test failures. This is unusual. Most modules have clear passes most of the time. Even complex modules like DBD::Pg show a general attention to test failures and a relatively good record. A lack of this attention shows a lack of interest in community use, and that fixes to test failures, needed for people to use the library, are just not important. So if you manage a module, you really want to take every test failure seriously.

Conclusions

Resources like CPAN, CTAN, PGXN, and the like are subject to one important rule. Just because it is good for your own personal use does not make it appropriate for community publication as a community resources. Writing something that fits the needs of a specific project, or a specific coder's style is very different from writing something that helps a wide range of programmers solve a wide range of problems. These community resources are not places to upload things just because one wrote them. They are places to interface with the community through work. Getting pre-review, post-review, and paying attention to the needs of others is critically important.

6 comments:

David J.November 14, 2013 at 9:00 AM
While I agree with everything you've said here I'd really be more interested in your observations and conclusions regarding the mechanics of CPAN submission protocols and how they could be improved. Whether it is systemic or simply a technology failure I would fault the community moreso than the contributor. The fact this person wanted to contribute back to the community is a good thing; the fact their submission did not conform to some unwritten and/or unenforced standard should not be held against them. Even an experienced contributor is going to have significant bias toward their contribution so they choices of naming and such is likely going to be naturally different than what the community would choose. To that end what are your suggestions for how one, newbie especially, should go about "Getting pre-review, post-review, and pay(ing) attention to the needs of others...". ISTM that a newcomer is likely to reasonably assume that CPAN would be the vehicle by which one introduces products to said community for their consideration.

UnknownNovember 15, 2013 at 2:03 PM
You should add your blog to http://ironman.enlightenedperl.org/ and then tag this with ironman or perl so that the Perl community can see it. (note: I'm considering resharing as a blog post)
Steven HaryantoJanuary 14, 2014 at 10:15 AM
BTW, what do you think about the other one-letter modules on CPAN? Like L, V, B, or U :-)
UnknownJanuary 14, 2014 at 4:50 PM
I think Stratopan.com could help with CPAN pollution. As a private CPAN, Stratopan gives authors a place to publish modules that aren't ready or appropriate for the public CPAN. Modules can be installed from Stratopan using the standard Perl tool chain, so it feels as natural as installing from the public CPAN.

Tuesday, November 12, 2013

On CPAN, Community, and P: A Case Study in What Not to Do