My Open Source Software Principles

December 27 2024

Introduction

You might have noticed that I contribute a bit to open source projects (for example: brom_drake, trajax, etc. ). I've talked about some of my open source work from a technical perspective before (i.e., talking about how things work). In this post, however, I want to take a step back and discuss software from a "300-foot" view.

In this post, I'll talk about what principles guide me when I'm contributing to these projects. In other words: instead of talking about "what" I'm contributing, I'm going to write about "how" I think about contributing.

The Principles

What follows is a list of the concepts that I use when developing open source code. As with any principle/rule, I recommend that you use the principle as a guide and not a law (i.e., don't confuse the map for the territory.)

For every new feature, there should be a test
The User Should Use As Little of Your Library as Possible
Examples can be worth their weight in gold
Make your tests run regularly
Add code coverage, if you can
Make sure the delightful things happen by default

Also, I reserve the right to change my mind on any of these at any time. ;)

For every new feature, there should be a test

This principle often feels hard to do. You'll probably ask yourself: "Why should I worry about tests when there are so many other features to work on?"

The short answer is: The other features can usually be developed faster when the features are built on top of a well-tested foundation.

I can't count the number of times that I attempted to add a new feature Y to a piece of code, but couldn't because a previous feature X was written incorrectly. I would struggle to debug feature Y for hours until I realized that the true problem was actually in a separate feature that I thought was working. Instead of wasting time like me, you should just add the test and run it when you think the feature is ready.

This principle was previously much stricter (I used to prefer more test-driven development).

The User Should Use As Little of Your Library as Possible

I intentionally wrote this principle's name to be a bit "clickbait-y". Another name for it could be: Simplify, simplify, simplify.

The intention of this principle is that you should make your open source tool extremely simple to use in order to get the result that most users want.

I've seen too many repositories where the author wants you to learn how their conventions work in their projects before you can understand or use the features in the library.

From my experience, most successful open source libraries (for example, wandb, tqdm, Euler) don't require you to learn a whole bunch of library-specific concepts in order to get what you want. They often rely on conventions of the field/industry or one-line functions that are descriptive enough to abstract away the library into a single black box.

If I've learned anything from being a user of open source tools and interacting with others in the open source ecosystem, it's that the potential users of your library are often under a time crunch, stressed and/or learning other programming/library concepts while using yours. Make their lives easier and don't make them have to "dig in" to learn how to use your code.

An example: You don't need to learn very much about brom_drake to use the drakeify_my_urdf feature. Simply run the function and it changes your urdf and spits out the path to the new urdf file.

Simple examples are worth their weight in gold.

This principle is probably the least controversial and yet one that is frequently messed up online.

Anyone that's tried to code for classes or outside of classes knows how often we use examples to try to understand a new piece of software. In many cases, I (or friends of mine) have verbatim copied small functions from blogs or StackOverflow posts to see how algorithm X might solve problem Y. Having an example to use as a "starting point" reduces the friction required to use an open source piece of code in a way that is often much better than writing detailed documentation.

With this in mind, you're probably running to your computer to begin writing some example scripts. Wait!

I'd like to pause for a second to stress the first word in this principle simple.

This word is somehow forgotten in many of the libraries I have seen. Usually, because it is easy, what an author will do is to copy an example of how they use their own library. This is often easy for the author to find, but also often includes many things that confuse a potential user (like extra jargon, extra files, etc.) and are unnecessary. In the worst case, the use of extra libraries in the example can require the potential user to learn other libraries in addition to yours which can be quite frustrating.

Before copying your own, domain-specific code, try to think about the following questions:

Can I put this example in its own, isolated directory (outside of the main library's source code)?
- In the process of moving your example to an isolated directory, you may find missing imports, hardcoded paths, or other problems that would make potential users confused.
What is the simplest version of this example?
- Can the tool be demonstrated without another external library?
- Can the code be demonstrated with a smaller input file than the one you've chosen? (i.e., instead of a robot with 7 links, can it be demonstrated on a 1 link robot?)

Make your tests run regularly

In the spirit of talking about testing, let's talk about one of the most impactful things that I think you can do with your set of tests: Make them run automatically.

This principle is valuable because your code changes over time. New features are added. Old features are removed. Conventions change. It's important that you can certify that the entirety of the code still runs, as expected after each change. So, simply testing only the feature or lines that you're working is sometimes not enough.

For that reason, you should find a way to trigger a run of your ENTIRE set of tests on a regular basis. There are a number of ways to do this, including:

Create a git hook (or similar feature in your own version control system) that runs your test suite, whenever you merge a branch, make a commit or some other condition.
(My preference) Create a Github Action which runs your test suite.
Create a personal habit of running the test suite manually every week at a specific time in your morning routine.

My personal preference is to use GitHub actions for this, because not only can you get a nice GUI that shows your test results, but you can also integrate metrics like: Code Coverage.

Add Code Coverage, if you can

This principle is about monitoring code coverage on any software project that you're working on. Code coverage is a measurement of how many lines in your code are used by the test suite you've defined.

In short, code coverage tools can quickly help you answer a simple question when testing: Does this test trigger X condition in my code?

Code coverage can thus be very useful when debugging. If you think that there is a bug in how one particular code path works, then you can check to see if that particular part of the "code" is "covered" by your tests. If it is, then you can either rule out that part as a potential source of bugs or investigate if that part has any untested consequences.

There is an added benefit of using code coverage tools, as well. Code coverage services like Codecov can create a "badge" for you that displays the current value of the code coverage metric. A potential user that sees this badge on your open source code often views it as a "certification of quality" (if the value is high enough).

Make sure the delightful things happen by default

This principle is actually just a sub-principle of "The User Should Use As Little of Your Library as Possible", but I wanted to take some time to discuss it outside of that principle.

Many of us write software that does amazing things and put it online, but the users of the library don't even know that these features exist! Often, the reason for this is simple: We created a very specific configuration/input that the user must provide to make "the delightful thing" happen.

I would say that this is not helpful for the user. Often, that magic feature that you use the most is actually what the user wants (or is very close to it). So, instead of letting them get the wrong impression of your library ("this thing doesn't do what I want!"), always put your best foot forward and leave them feeling impressed ("this library does what I wanted and more!"). It is often better to allow the user to enjoy the library, get hooked and then turn off the features that they don't use as much. You should prefer that to the user getting an underwhelming experience at first and then never using the software again (especially when the software can actually solve their problems).

The Caveats/Addendums

I'm still fairly new to open source software and, besides following Twitter accounts of open source developers and reading the book "Working In Public", I have never formally studied the concept in detail.

As usual, if you have any thoughts, then feel free to reach out to me via email!