WebOps Team, Pantheon Reading estimate: 4 minutes
All Code is Debt
The way people use the phrase “technical debt” implies that good, tested code is debt free. If you are building a software service this is wrong; every line of custom code you maintain is debt.
To be competitive, software products must continually improve. All of the custom code you’ve written yesterday, rewritten today, and what you’ll write tomorrow -- you will be burdened with maintaining, forever. To build competitive software you must balance this cost when you decide what code gets written, and what gets integrated from upstream.
Debt isn’t inherently bad. It’s often cheaper to buy than rent housing. Yet, no one considers mortgages assets, despite the work necessary to get them and what they make possible. Likewise, refactoring code is akin to refinancing. The burden may lighten, but it doesn’t magically move to the other side of the balance sheet.
So, if code is debt, how do we manage it well? Let’s take some cues from credit counseling.
- Understand obligations. Get a basic survey of your code. A tool like cloc or gitstats can give lines-of-code metrics excluding comments. Consider making it a key metric on a wallboard. Think about what's coming down the pipeling in the FOSS world and whether it's going to eclipse the value of what you maintain internally.
- Close charge accounts. This means finding and stamping out the worst sources of new code. A common offender is “not invented here” (NIH) syndrome, which is new code only subtly different from off-the-shelf and open-source options.
- Refinance. For the code you do want to keep around, minimize your burden. Refactor. Write better tests. Automate tests.
- Pay it down. Find areas where your could meet your needs either with an existing off-the-shelf option (which may be new since starting your project) or contribute to a free, open-source software project in a way that removes the code from being purely your team’s burden. Like in regular debt repayment, it can make sense to start with smaller projects first to establish good habits and see early results.
- Recognize progress. Ensure the work above receives the same recognition as feature and bug fix contributions.
Like debt in regular life, maintaining good practices isn’t a one-time thing. It’s a sort of hygiene for the way your team codes. Let’s look at some of the ways Pantheon’s paid down debt in our logging infrastructure.
First, we wanted to perform structured logging from our Python applications. Because the systemd journal is the backbone of logging on modern Fedora releases, we wanted to use its support for field-structured entries rather than just sending everything best-effort with UDP and Logstash.
The problem was the systemd only shipped with a specification of what to send to the journal socket and a C library. Python comes with batteries included, but directly calling C functions isn’t one of them. So, we built a module, integrated it upstream into systemd, and ported our applications to it once it was shipping with Fedora. Hundreds of lines of C and Python are now no longer uniquely burdening our developers.
Two wonderful things also happened in the process. Our upstream contributions were a catalyst for major improvements in support of log reading and general interaction with systemd from Python. We’ve already ported our journal2gelf daemon to use the new Python APIs instead of wrapping a journalctl subprocess. Second, fail2ban and Fedora’s main selinux auditing tool now support the journal via the new Python module, integration that helps us as users of those tools. This isn’t an unusual effect; Dan Bricklin wrote about it a decade ago in “The Cornucopia of the Commons.”
Upstream contributions are time-consuming, though. Often, the best answer is switching to the best alternative the community already offers. We recently switched our Chef cookbooks for Logstash, Kibana, and ElasticSearch to community-based ones. Not only did this allow dropping thousands of lines of in-house cookbooks, it also allows us to track upstream improvements to support major new versions and updated best-practices.
Of course, a lot of code we write isn't something ready or intended for upstream. Sometimes, it's because it's just not worth the effort, at least now, to generalize or rewrite on top of the best FOSS option.
Here's some stuff at Pantheon in this middle category:
- Our wrapper around a couple NodeJS libraries to push events to the dashboard
- Our API for managing containers
- Many of our Chef recipes
- Our tools for managing database replication
The goal for these is to minimize our code to just a little Pantheon glue around popular FOSS projects, whether we're contributing that FOSS code or rebuilding on top of someone else's. In terms of debt, these parts are our auto loans. They allow us to get value now with ongoing cost, but the asset (code) behind them depreciates. The FOSS option is moving faster than we can internally (which even happens for the largest companies). Every day, our implementation declines a bit more in value relative to the most popular open source options. That doesn't mean the FOSS option is better yet, but it's clearly on its way.
Finally, there's the actual "secret sauce." This is code that provides special advantages to Pantheon customers, ensuring that we have something unique to sell (and then fund the tons of FOSS work we do).
Here's the secret sauce:
- The file system (Valhalla): faster, more efficient, more scalable, and more secure by fundamental design than GlusterFS or Ceph
- The API (Yggdrasil) for managing workflow (deployments, MultiDev, cloning the "content base"): FOSS PaaS options like OpenShift are still limited to deploying single stacks without interoperative state
- The edge router (Styx): lazy routing lookup/caching, automatic failover, in-band failure detection, and learning put it far ahead of other options
- The dashboard (Apollo): how we deliver unique capabilities like our workflow tools to developers and customers
In terms of value, this is our real estate. In terms of code maintenance, these are our mortgages. Unlike our "auto loans," the assets behind them can actually appeciate. For each of the examples above, we think we're both ahead of the FOSS offerings and moving faster. If that changes, we'll re-evaluate whether to invest more in our uniqueness or reclassify our implementation as depreciating (and slate it for FOSS replacement).
In short: what your company delivers is the asset, not code you write to make that happen.