Documenting FOSS Projects
I've had a few thoughts recently on documenting one's code. Nothing original, really; more a synthesis of ideas that are already out there. I'm posting them here primarily as a reference for myself, but hopefully someone else will find this useful.
Audiences
When writing, I try to begin with my audience. I see four distinct audiences for a Free & Open-Source Software (AKA FOSS) project:
People Who Have Just Discovered Your Project & Might Want to Use It
This audience is served by the inveterate README
. Regardless of the project details, this should be a quick intro for someone who has stumbled upon the project and is asking themselves:
- What is it?
- What can it do for me?
- How do I get it?
Not only does this file serve as the home page for your project on sites like Github, the file should be distributed with the package. Per the Gnu Coding Standards, "The distribution should contain a file named README with a general overview of the package:
- the name of the package;
- the version number of the package, or refer to where in the package the version can be found;
- a general description of what the package does;
- a reference to the file INSTALL, which should in turn contain an explanation of the installation procedure;
- a brief explanation of any unusual top-level directories or files, or other hints for readers to find their way around the source;
- a reference to the file which contains the copying conditions. The GNU GPL, if used, should be in a file called COPYING. If the GNU LGPL is used, it should be in a file called COPYING.LESSER."
I use a template for my READMEs consisting of:
- Introduction: what is this?
- License: licensing information (not a big deal for me since I release everything under the GPL)
- Pre-requisites: on what, if anything, does this thing depend?
- Installation: how do you install it?
- Status & roadmap: where is this project? Is it going to break anything that uses it soon? Where can I get more information?
Not all of my projects follow this template precisely, but it's a good starting point. For readers who haven't gone away already, I try to include links to user- and developer-oriented documentation (on which more below).
End-Users
This audience is comprised of users who have committed to your project, installed it, and are now ready to employ it in the service of some end. This body of writing, which I think of as "user docs", should be far more comprehensive than the README
, but more than anything else, it should be organized around problems one can solve with your package. For myself, I like to distribute a Texinfo manual with the source distribution. This will compile & install a GNU Info manual on the target host, but I also like to host the Texinfo-generated HTML documents at my personal site for users' convenience. For a code library, I think user docs should include API documentation. Even still, I like the GNU Texinfo approach of organizing user documentation around topics and/or tasks, introducing the relevant functions within that context, and providing an index for reference.
This is distinct from the Doxygen or rustdoc approach of generating documentation from the source code, centered around classes & methods. The documentation for Rust crates available at docs.rs therefore tend to the style of a reference manual, but even there good documentation authors use the crate and/or their module pages to talk about what you can do with the crate (or module), and the rest of the doc "site" functionally behaves like the "function index" in a GNU Texinfo manual. Even so, I'm seeing such authors more frequently create "doc-only" modules to explore a topic: for instance Clap's _derive. Therefore, when working on a Rust project, one should remain aware of the Rust tooling ecosystem: users expect to go to docs.rs and install your crate with Cargo– even if the Clap authors had crafted a beautiful Texinfo manual, it would go wanting for readers because most of their users just said cargo add clap
, and docs.rs only hosts rustdoc
-generated documentation. They could self-host the Texinfo-generated HTML elsewhere, I suppose, but within the Rust community the convention seems to be to stay on docs.rs.
Users At the Command-Line
If your project includes a CLI, this audience consists of users who are invoking the tool in their shell and need a reminder of options, sub-commands, arguments, and so forth. The tool can generate documentation via flags like --help
, -h
, and so on, but this is different from the "user documentation" described above in that:
- first, the setting is ill-suited to long-form documentation of general topics
- secondly, if they've gotten this far, the user knows what he or she is doing, but needs assistance with forgotten flags, side-effects, and so on (as an aside,
git
& scribbu conflate this with man pages; when you saygit push --help
, thegit
binary will locate the "git-push"man
page andexec()
a new process piping it through your pager; see "Man Pages" below)
Your Fellow Developers
This is my least well-thought-out audience (but then, I've had limited success enticing others to contribute to my projects), but it is meant for coders who want to hack on the project. I see a few components:
- tooling docs & checklists (e.g. "before committing– run the following linter…")
- a theory of operations document(s); could be or involve UML or other architectural diagrams
To be honest, I generally keep this private (under ~/doc/projects
), but I've used the wiki which Github associates with your project for elfeed-score. I've sometimes let this sort of thing bleed into the API docs, e.g. using the doxygen \page
construct to give myself a space dedicated to the design of a module. On the one hand, it's good to keep this with the code, I suppose. On the other, the end-user is unlikely to be interested, while the developer, who is, will likely be looking elsewhere.
It might be better to produce such artifacts with dedicated tools, keep them with the developers' docs, and provide a reference in the code. The thing is, the code will tend to drift out-of-date with the docs. Perhaps some sort of literate programming tool could extract the documentation from the source into a a suitable format. Of course, when contributing to other, established projects, you don't always have this option available.
Other Well-Known Documents
Man Pages
The Gnu Coding Standards explicitly prefers Texinfo to man: "In the GNU project, man pages are secondary. It is not necessary or expected for every GNU program to have a man page, but some of them do. It’s your choice whether to include a man page in your program."
Rather than selecting one or the other, I see them as having separate audiences. Texinfo is user documentation; a document one reads while sitting down with coffee in order to understand a topic. "Each manual should cover a coherent topic. For example, instead of a manual for diff
and a manual for diff3
, we have one manual for “comparison of files” which covers both of those programs, as well as cmp
. By documenting these programs together, we can make the whole subject clearer" (per the GNU Manuals chapter of the GNU Coding Standard).
Man pages are what I turn to when I need a tip for using a particular tool ("What flag was that, again?"). This ties in nicely with the practice of having the --help
flag display a man page.
In other words, I tend to provide both:
- a TexInfo manual along with the package, and the generated HTML docs hosted on my personal site
- man pages get installed with the package, and sometimes the
--help
option displays the man page
The NEWS file
Per the Gnu Coding Standards , this is "a list of user-visible changes worth mentioning". As a user myself I've found this useful when trouble-shooting, so I distribute them along with my packages. I really don't do much with it until I roll a release, where my release checklist always includes updating the file.
The ChangeLog File
Per the Gnu Coding Standards: "Keep a change log to describe all the changes made to program source files. The purpose of this is so that people investigating bugs in the future will know about the changes that might have introduced the bug." This seems superfluous to me in the age of widespread SCM; and indeed for Autotools-based projects (where I need to provide a ChangeLog), I use git-to-changelog
to produce one from the git history. Still, a user who's obtained your project as an Autotools-style source code distribution won't immediately have the commit history available to them, so providing the file seems helpful in that case.
09/05/22 07:25