In the next branch of Haskell Platform we’ll be adding and removing packages from the specification for the first time. The Haskell Platform steering committee will make recommendations for additions and removals based on individual proposals to add and remove packages from the list.
It is hard to come up with “notability” criteria for why a package should be added or removed. There are many competiting reasons why people use the Haskell Platform, and what packages they need.
The goal though should be an almost fully automated criteria for determining when a package should be added, based on objective data. Then, combined with strategic and other concerns, packages will be added or, sometimes, removed.
Possible Criteria for Notability
A quick list of possible criteria by which to evaluate whether a package is “blessed”:
- How popular is the package in Hackage downloads?
- How many packages depend on it?
- Do any applications of note depend on it?
- Does it meet a stated end-user need?
- Do similar systems include such a library (e.g. Python)?
- Is it portable?
- Does it add additional C libraries?
- Does it follow the package versioning system?
- Is the code of good quality?
- Does it have a good development history?
- Is it on hackage?
- Does it provide haddock documentation?
- Does it come with examples?
- Does it have a test suite?
- Does it have a maintainer?
- Does it in turn require new Haskell dependencies?
- Does it have a simple/configure-based Cabal build?
- Does it conflict/compete with existing functionality?
- Does it reuse existing types?
- Does it follow the hierarchical naming conventions?
- Is it -Wall clean?
- Have declared correctness or performance statements?
- Is it BSD licensed?
- Is it thread-safe?
A Point System
One way of determining notability for a package would be to use a points system against an agreed-upon set of such criteria.
Does anyone know of similar examples, or would like to code up some programs to experiment with these ratings?
Distro Page Rank
Another source of raw data may well be a sort of “Page Rank” across unix distros for how often a package is used. On the Arch Linux distribution, we have 3 level support for Haskell. In the core system some Haskell apps and tools are provided in binary form. In the “community” binary repo there are yet more packages. Finally, in the user-contributed repository are around 1300 other packages (~90% of Hackage).
Does your distro have popularity statistics? Could you determine the top 100 Haskell package by vote?
Most Popular Packages in Arch Linux
Some users install packages with the ‘yaourt’ tool, and some of those users opt in to voting when they install. Here’s the top 100 packages sorted by votes in Arch Linux, with those that are in the Haskell Platform already, indicated:
HP | Repository | Category | Library/Program | Votes | Synopsis | Notes |
Extra | darcs | Decentralized replacement for CVS with roots in quantum mechanics | ||||
Extra | haskell-extensible-exceptions | Extensible exceptions | darcs dep | |||
Extra | haskell-hashed-storage | Hashed file storage support code. | darcs dep | |||
Extra | haskell-haskeline | A command-line interface for user input, written in Haskell. | darcs dep | |||
Extra | haskell-mmap | Memory mapped files for POSIX and Windows | darcs dep | |||
Extra | haskell-terminfo | Haskell bindings to the terminfo library. | darcs dep | |||
Extra | haskell-utf8-string | Support for reading and writing UTF8 Strings | darcs dep | |||
YES | Extra | ghc | The Glasgow Haskell Compiler | |||
Extra | hugs98 | Haskell 98 interpreter | ||||
YES | Extra | happy | The Parser Generator for Haskell | |||
YES | Community | alex | a lexical analyser generator for Haskell | |||
Community | gtk2hs | A GTK+2 binding for Haskell | ||||
YES | Community | haskell-http | A library for client-side HTTP | cabal dep | ||
YES | Community | cabal-install | The command-line interface for Cabal and Hackage. | |||
Community | haskell-x11 | A Haskell binding to the X11 graphics library. | xmonad dep | |||
Community | haskell-x11-xft | Bindings to the Xft, X Free Type interface library, and some Xrender parts | xmonad dep | |||
YES | Community | haskell-zlib | Compression and decompression in the gzip and zlib formats | cabal dep | ||
Community | pandoc | Haskell library and program to convert one markup format to another | ||||
Community | xmonad | A lightweight X11 tiled window manager written in Haskell | ||||
Community | xmonad-contrib | Add-ons for xmonad | xmonad dep | |||
lib | haskell-binary 0.5.0.1-1 | 98 | Binary serialisation for Haskell values using lazy ByteStrings | |||
YES | lib | haskell-opengl 2.2.1.1-1 | 56 | A binding for the OpenGL graphics system | ||
lib | haskell-hslogger 1.0.7-2 | 51 | Versatile logging framework | |||
lib | haskell-puremd5 1.0.0.0-1 | 48 | MD5 implementations that should become part of a ByteString Crypto package. | |||
YES | lib | haskell-syb 0.1.0.0-1 | 48 | Scrap Your Boilerplate | ||
YES | devel | haddock 2.4.2-1 | 46 | A documentation-generation tool for Haskell libraries | ||
devel | haskell-xft 0.2-2 | 46 | Bindings to the Xft library, and some Xrender parts | |||
lib | haskell-ghc-paths 0.1.0.5-1 | 45 | Knowledge of GHC’s installation directories | |||
lib | haskell-haxml 1.13.3-1 | 42 | Utilities for manipulating XML documents | |||
lib | haskell-missingh 1.1.0-1 | 40 | Large utility library | |||
lib | haskell-testpack 1.0.2-1 | 36 | Test Utililty Pack for HUnit and QuickCheck | |||
YES | lib | haskell-time 1.1.2.4-1 | 36 | A time library | ||
lib | haskell-uniplate 1.2.0.3-1 | 36 | Uniform type generic traversals. | |||
lib | haskell-diff 0.1.2-1 | 35 | O(ND) diff algorithm in haskell. | |||
YES | lib | haskell-mtl 1.1.0.2-1 | 35 | Monad transformer library | ||
YES | lib | haskell-regex-base 0.93.1-1 | 33 | Replaces/Enhances Text.Regex | ||
YES | lib | haskell-parsec 3.0.0-1 | 32 | Monadic parser combinators | ||
devel | cpphs 1.7-1 | 31 | A liberalised re-implementation of cpp, the C pre-processor. | |||
lib | haskell-curl 1.3.5-1 | 31 | Haskell binding to libcurl | |||
lib | haskell-hinotify 0.2-1 | 31 | Haskell binding to INotify | |||
lib | haskell-transformers 0.1.4.0-1 | 31 | Concrete monad transformers | |||
lib | haskell-unix-compat 0.1.2.1-1 | 31 | Portable POSIX-compatibility layer. | |||
devel | cabal2arch 0.5.3-1 | 30 | Create Arch Linux packages from Cabal packages | |||
lib | haskell-fingertree 0.0.1.0-1 | 30 | Generic finger-tree structure, with example instances | |||
lib | haskell-haskell-src-exts 1.0.1-1 | 30 | Manipulating Haskell source: abstract syntax, lexer, parser, and pretty-printer | |||
YES | lib | haskell-glut 2.1.1.2-1 | 29 | A binding for the OpenGL Utility Toolkit | ||
lib | haskell-pcre-light 0.3.1-2 | 29 | A small, efficient and portable regex library for Perl 5 compatible regular expressions | |||
lib | haskell-rosezipper 0.1-1 | 29 | Generic zipper implementation for Data.Tree | |||
devel | hscolour 1.13-1 | 28 | Colourise Haskell code. | |||
lib | haskell-data-accessor 0.2.0.2-1 | 26 | Utilities for accessing and manipulating fields of records | |||
lib | haskell-data-accessor-template 0.2.1.1-1 | 26 | Utilities for accessing and manipulating fields of records | |||
lib | haskell-regex-tdfa 1.1.2-2 | 26 | Replaces/Enhances Text.Regex | |||
lib | haskell-xml 1.3.4-1 | 26 | A simple XML library. | |||
lib | haskell-hsh 2.0.2-1 | 25 | Library to mix shell scripting with Haskell programs | |||
lib | haskell-split 0.1.1-1 | 25 | Combinator library for splitting lists. | |||
lib | haskell-utility-ht 0.0.5.1-1 | 25 | Various small helper functions for Lists, Maybes, Tuples, Functions | |||
lib | haskell-vty 3.1.8.4-1 | 25 | A simple terminal access library | |||
lib | haskell-syb-with-class 0.5.1-1 | 24 | Scrap Your Boilerplate With Class | |||
YES | lib | haskell-cgi 3001.1.7.1-1 | 23 | A library for writing CGI programs | ||
YES | lib | haskell-fgl 5.4.2.2-1 | 23 | Martin Erwig’s Functional Graph Library | ||
devel | derive 0.1.4-1 | 22 | A program and library to derive instances for data types | |||
lib | haskell-monads-fd 0.0.0.1-1 | 21 | Monad classes, using functional dependencies | |||
devel | haskell-pandoc 1.2.1-1 | 21 | Conversion between markup formats | |||
lib | haskell-safe 0.2-1 | 21 | Library for safe (pattern match free) functions | |||
lib | haskell-zip-archive 0.1.1.3-1 | 21 | Library for creating and modifying zip archives. | |||
YES | lib | haskell-bytestring 0.9.1.4-1 | 20 | Fast, packed, strict and lazy byte arrays with a list interface | ||
lib | haskell-configfile 1.0.4-2 | 20 | Configuration file reading & writing | |||
lib | haskell-data-accessor-monads-fd 0.2-1 | 20 | Use Accessor to access state in monads-fd State monad class | |||
lib | haskell-hstringtemplate 0.6-1 | 20 | StringTemplate implementation in Haskell. | |||
lib | haskell-pointedlist 0.3.5-1 | 20 | A zipper-like comonad which works as a list, tracking a position. | |||
YES | lib | haskell-quickcheck 2.1.0.1-2 | 20 | Automatic testing of Haskell programs | ||
lib | haskell-convertible 1.0.5-1 | 19 | Typeclasses and instances for converting between types | |||
lib | haskell-digest 0.0.0.6-1 | 19 | Various cryptographic hashes for bytestrings; CRC32 and Adler32 for now. | |||
lib | haskell-hdbc 2.1.1-1 | 19 | Haskell Database Connectivity | |||
network | twidge 0.99.3-1 | 19 | Unix Command-Line Twitter and Identica Client | |||
lib | haskell-hspread 0.3.3-1 | 18 | A client library for the spread toolkit | |||
lib | haskell-readline 1.0.1.0-1 | 17 | An interface to the GNU readline library | |||
lib | haskell-strict 0.3.2-2 | 17 | Strict data types and String IO. | |||
lib | haskell-happs-util 0.9.3-1 | 16 | Web framework | |||
devel | hoogle 4.0.7-1 | 16 | Haskell API Search | |||
editors | yi 0.6.1-1 | 16 | The Haskell-Scriptable Editor | |||
lib | haskell-findbin 0.0.2-1 | 15 | Locate directory of original program | |||
lib | haskell-glfw 0.3-1 | 15 | A binding for GLFW, An OpenGL Framework | |||
lib | haskell-json 0.4.3-1 | 15 | Support for serialising Haskell to and from JSON | |||
YES | lib | haskell-network 2.2.1.4-1 | 15 | Networking-related facilities | ||
lib | haskell-stream 0.3.2-1 | 15 | A library for manipulating infinite lists. | |||
lib | haskell-tagsoup 0.6-2 | 15 | Parsing and extracting information from (possibly malformed) HTML documents | |||
YES | lib | haskell-editline 0.2.1.0-2 | 14 | Bindings to the editline library (libedit). | ||
lib | haskell-sdl 0.5.5-1 | 14 | Binding to libSDL | |||
editors | leksah 0.6.1-1 | 14 | Haskell IDE written in Haskell | |||
devel | c2hs 0.16.0-1 | 13 | C->Haskell FFI tool that gives some cross-language type safety | |||
lib | haskell-hsx 0.5.6-1 | 13 | HSX (Haskell Source with XML) allows literal XML syntax to be used in Haskell source code. | |||
devel | hlint 1.6.4-1 | 13 | Source code suggestions | |||
lib | haskell-crypto 4.2.0-1 | 12 | Collects together existing Haskell cryptographic functions into a package | |||
lib | haskell-hdbc-sqlite3 2.1.0.2-1 | 12 | Sqlite v3 driver for HDBC | |||
lib | haskell-highlighting-kate 0.2.4-1 | 12 | Syntax highlighting | |||
lib | haskell-hjavascript 0.4.4-1 | 12 | HJavaScript is an abstract syntax for a typed subset of JavaScript. | |||
lib | haskell-hjscript 0.4.4-1 | 12 | HJScript is a Haskell EDSL for writing JavaScript programs. | |||
devel | mkcabal 0.4.2-2 | 12 | Generate cabal files for a Haskell project | |||
lib | haskell-arrows 0.4.1.1-1 | 11 | Arrow classes and transformers | |||
lib | haskell-filemanip 0.3.2-1 | 11 | Expressive file and directory manipulation for Haskell. | |||
lib | haskell-happs-data 0.9.3-1 | 11 | HAppS data manipulation libraries | |||
lib | haskell-happs-ixset 0.9.3-1 | 11 | ||||
lib | haskell-happs-state 0.9.3-1 | 11 | Event-based distributed state. | |||
lib | haskell-harp 0.4-1 | 11 | HaRP allows pattern-matching with regular expressions | |||
lib | haskell-lazysmallcheck 0.3-2 | 11 | A library for demand-driven testing of Haskell programs | |||
lib | haskell-typecompose 0.6.4-1 | 11 | Type composition classes & instances | |||
lib | haskell-dataenc 0.13.0.0-1 | 10 | Data encoding library | |||
lib | haskell-happstack-util 0.3.2-1 | 10 | Web framework | |||
lib | haskell-hxt 8.3.1-1 | 10 | A collection of tools for processing XML with Haskell. | |||
lib | haskell-maybet 0.1.2-1 | 10 | MaybeT monad transformer | |||
lib | haskell-platform 2009.2.0.2-1 | 10 | The Haskell Platform | |||
office | pdf2line 0.0.1-1 | 10 | Simple command-line utility to convert PDF into text | |||
lib | haskell-category-extras 0.53.5-1 | 9 | Various modules and constructs inspired by category theory | |||
lib | haskell-colour 2.2.1-1 | 9 | A model for human colour/color perception | |||
lib | haskell-datetime 0.1-1 | 9 | Utilities to make Data.Time.* easier to use. | |||
lib | haskell-happs-server 0.9.3-1 | 9 | Web related tools and services. | |||
Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.
If we were to judge only on download popularity, the 10 new packages would be:
Now, one of the other constraints on the Haskell Platform is sustainable growth. We can’t add 1000 packages tomorrow and hope to maintain quality. Instead, something like 10-20% growth per release cycle seems plausible. This would mean adding 4 to 9 new packages.
If we were to judge only on download popularity, our first 5 new packages would be:
- haskell-extensible-exceptions
- haskell-hashed-storage
- haskell-haskeline
- haskell-mmap
- haskell-terminfo
Merely because one killer app, darcs, depends on them, and so they are widely built (they may also fail to satisfy many of the other critieria noted above).
If we ignore those packages popular for being dependencies, we get a different top 5:
Now we’re getting there. pandoc is both a library and a popular app, so we might treat it specially. gtk2hs is very popular, but not cabalised, so we might also set that aside, leaving (and I’ll ignore ghc-paths as it is used by ghc):
Which is starting to look like a plausible list. In turn however, you can find fault with all these packages in various dimensions (utf8-string may be obsoleted by Data.Text, haxml is LGPL licensed).
Coming up with an obvious list is non-trivial!
Finally, this is clearly only one very small data set, which should only have a small influence. If we step over an look at the Hackage download statistics, sorted by popularity, our top 5 new packages would be:
Popularity by Category
If instead we thought that having a comprehensive library set was the key goal, we may choose to include libraries via category, no matter how popular in the global list. This would yield, according to Hackage,
- Database: haskell-hdbc
- XML: haskell-haxml
- Binary: haskell-binary
- Text: haskell-utf8-string
- 2D Graphics: haskell-sdl
- Numerics: haskell-hmatrix
For example.
What Is The Decision Model?
So how do we decide what goes in? One model would be:
- Have people propose packages
- Sort them by category need
- Identify the top rank package in each category using a points system or page rank
- Add or remove packages based on this?
What do you think? What is a good way to decide when a package is sufficiently notable to add to the Haskell Platform?
What critieria would you use to determine when a package is blessed?
A “best in category” seems simple. But how do we define which of the competing packages is best? Well, IMO, performance is very important, so each package would have to be run through a typical user scenario benchmark, or something. Next, functionality: compare the capabilities of both, including their future direction. Lastly, how well are they written? If it looks like messy code that is hard to maintain and extend then possibly performance and functionality don’t matter as much. I agree, it’s not a simple process. But it would be great (for Haskell in general) to have a highly elegant, fast, and rich platform to build upon without having to hunt down the good stuff.
Why so much BSD?
If it is a library LGPL should be safe.
Remember that GHC statically links libraries, which removes many of the reasons to use an LGPL license.
“Library longevity”, while not a common issue is worth considering. For example: SHA, pureMD5, RSA, etc should all disappear in the long run and be replaced with an improved Crypto library.
Don, I don’t think you should have a point system or anything of the sort. While community input should weigh into the decisions, I think you and the other platform maintainers should feel authorized to use your own subjective judgement as appropriate.
What I think matters the most about something in the platform is the quality of the code. I was flattered when you once encouraged me to contribute stuff to Hackage, but at the same time that told me that Hackage is populated by code written by relatively unwashed implementers like me–and therefore I don’t want to use Hackage code for anything serious (except for some specific packages that I know are well-regarded). Platform code should be:
1) reviewed by experts, who wouldn’t be expected to take responsibility for it themselves, but who would at least make sure that the implementation is reasonably complete and reasonably sane; and
2) well-documented, which not much of it is right now;
3) One significant priority should be on having enough functionality in the platform to enable easy and well-performing implementation of anything that can be done in competing languages like Python or Java.
Example of #3: bos’s Text module makes it possible for the first time to write competitive Unicode text-processing applications. So it should be treated as important.
I agree with the idea of an eventual crypto module, but functions like sha are important enough to include immediately. The module contents can later be replaced with calls to the TBD crypto module so that the existing API will keep working.
Also, a code signing capability should be added to cabal, similar to signatures on .jar files.
There should also be some refactoring of stuff between packages. For example, the MaybeT module should go away, and MaybeT should be included in Data.Maybe or mtl or wherever the standard place for such things is.
It may be that there simply don’t currently exist as many as 4 libraries that are high-quality enough to go into the Platform currently, too, depending how picky we want to be.
haddock2009, Note that 4 packages is 0.27% of Hackage :)
If it’s decided that a package should go in *but* — i.e. but it needs better test coverage, or but it needs to clean up its dependencies, or etc, then it seems that the right thing would be to make it a point to encourage package maintainers to bring it up to par. Additionally, I suspect package maintainers might not *want* to be blessed at a given stage. E.g., haxml might want to wait on resolving the two-version issue, etc?
Data.Binary seems to be a gimme, but on the other hand, something like HDBC is more complicated — even though it should be the standard, I think, it needs a backend to do anything, and the backends in turn need to have libraries installed to bind to, which raises the question of why install the frontend if it can’t do anything on its own?
Also it would be particularly nice to bless hslogger, and a few other related packages designed for parsing config files and command line options, etc — the sort of basic machinery to help users get up and running quickly. I don’t know the space well, but I’m sure there’s a few people who can put the right thought and discussion into it.
..so it looks like any package will only be accepted if it has a maintainer who is willing to maintain it for Haskell Platform. Generally an existing maintainer would take on the rather light responsibilities (stick to PVP, make announcements when there are new releases, anything else?)… but if the maintainer thinks it’s not ready for HP, then it’s probably not, but theoretically someone could sort of fork it and maintain it for HP.
maybe that’s a sketch of the relationship we need to have with maintainers?
well, except some of the core packages are just maintained by “libraries@” which means they make decisions and various people do the busywork of applying patches when appropriate. And often fix them in regards to GHC changes also
Ubuntu and Debian have package statistics for users who have installed the popularity-contest package (Ubuntu, at least, offers to install it when setting up). Statistics are available at http://popcon.debian.org and http://popcon.ubuntu.com.
On these pages “inst” means number of installations, “vote” means number of installations *which have been used in the past n days* (I think n=30).
What you folks are doing with the Haskell Platform is absolutely awesome.
I fully agree with solrize that a set of “benevolent dictators for life” taking input from the community would trump any point system.
Haskell is an absolutely awesome language but it strikes me that rounding out the full functionality for a “batteries included” set of “blessed” libraries to offer something competitive with Python and/or Java or .Net languages is what’s really needed (solrize’s point 3).