Invariant 4: Do What You Said You’d Do

Will Cole

Apr 19, 2024 • 11 min read

In Invariant 3: Don’t Build Without a Plan, we wrote a lot of code. In fact we wrote all the code. Because we had a strategy and spec that included the requirement that no bugs were allowed in the code base, we’ve really covered all our bases and are ready to ship and move on with our lives. The only issue is that I’m a bit of a stickler and insist that before we move on with our lives, we actually confirm that we did all the things we set out to do. I can’t emphasize enough that I can already hear your criticisms: I have an AI pair programming with me and it never makes mistakes and it certainly would never confidently write nonsense code and I’m a 10x’er how dare you insinuate my work needs checking. Please, bear with me, even if we’re just confirming whether our teammates who contributed code were scoundrels who ignored that no bug requirement.

Invariant 4: Do What You Said You’d Do lines up with the Quality Assurance atomic unit. The one that states: Make sure that what’s been built both satisfies the requirements and also works well within the existing system without unintended consequences. Show the software to people. Quality Assurance, or QA, can mean a lot of things, and be performed by different people on your development team. We’re going to keep things broad for the purposes of this article and assume by QA that we mean the practice of checking product/feature updates in a controlled environment. Further we’ll assume that the primary function of the QA checks is to confirm that the requirements written in the spec are correct, and that no regressions to unrelated functionality have occurred. Lastly we’ll walk through some examples where there are dedicated resources for QA, and others where it falls to PMs, Engineers, or other members of the software development team. The reason Invariant 4 is titled as such is because this is the only invariant where its primary function is to hold all the people who have worked on the previous phases accountable. This is also where software teams get to set their standard of excellence.

If you were building web apps in 2006-2015, you really could move fast and break things without damaging customer confidence because most products weren’t very good. We were barely a decade into mass adoption of the internet, business models were finally working, and there was a proverbial land grab underway. Companies that worked towards high standards were often beaten to the punch, and their prospective customers weren’t punishing competitors who got to market sooner with sloppy products. Even as late as 2017, we had the preeminent builders of that time fetishizing this mentality toward getting to market–strong product or preferably a mildly embarrassing product–with limited attention to quality.

This is decidedly not the current landscape for web/mobile apps, or really any software, in 2024. Unless there is a major technological breakthrough with a product, as was the case with OpenAI’s ChatGPT, products are generally launched in highly polished states with advanced functionality, revenue potential, and aesthetic design. Doing What You Said You’d Do may functionally be QA in many ways, but it’s also a recognition that the days of going to market with buggy software and forgiving customers is largely gone.

Like many of the Invariants, Do What You Said You’d Do is important outside of just the QA phase. It’s even important outside the product development process altogether. When I used to interview for jobs, and someone would give me a version of the: “Would you describe your greatest strengths?” or “Why are you the right person for this job?” style question, I would have a well practiced answer. You’ll never be up late at night worrying yourself, “I wonder if Will’s going to come through on X priority.” You won’t do that because I do what I said I would do. This answer works well because it signals professionalism, which is what Invariant 4 also signals to a software development team. We do what we say, to a specific standard. This is what professional teams do (and mind you, I realize professionalism is expensive, we’ll get to that later). So…

We said what we were going to do in Strategy and Discovery, then we supposedly did those things in Build. Now it’s time to prove that we know what we’re doing, that our opinionated way of building software is paying off, and that our system actually works. That isn’t to suggest that nothing will go wrong prior to QA or that if it did that the system or someone has failed. Things will go wrong. However we need a sense of scale to give us meaning in how this accountability phase should be adjudicated.

Sense of scale

When I say we need a sense of scale, what I mean is that we should be capable of identifying the types of things we expect to sometimes go wrong, what things should never go wrong, and how to react in either scenario.

Things that we should expect to sometimes go wrong are typically defined by:

Being small in nature - quick to recover from
Environment based (tough to test on a local setup)
Minor cross browser issues or other things that only highly critical eyes would catch.
Areas where the spec was weak, or didn’t cover at all.

Things that we hope never go wrong, even while testing:

Disregard of fundamental requirements from a spec (AKA not doing what you said you’d do)
Issues that clearly demonstrate the builders did not check their own work
Errors that are not consistent with the talents of the people who built the software (ex: security vulnerabilities from a highly regarded architect, or poor UX from a very senior front end developer)

Adding to the sense of scale is the standard to which your software team is held. This is necessarily more subjective and is often a moving target in the early days of building a product. However it’s absolutely necessary in being able to tell if you’re living up to Do What You Said You’d Do. If you’re early, there is something about your team capabilities or the brand you’re building that is core. If the team had a strong security background that was core to your capabilities, the standard to which you would hold solutions on the security input would be exceedingly high. If you don’t have a designer on staff yet, you won’t sweat the inevitable CSS and usability bugs that were found in QA. If you’re a mature company/team, that standard would likely include just about everything. The sense of scale is important so you don’t sweat the small stuff and unnecessarily blame your processes for expected outcomes. And for you to appropriately live up to the standard you’ve set.

If you have a good sense of scale, then how you react to different issues arising in the QA phase will seem proportionate and ultimately be helpful to the team. Your goal in your reaction is to:

Quickly fix the small things that you’ve identified as just part-of-writing-software and that enables you to meet requirements and standards to launch.
Be introspective and evaluate previous phases to understand why something went wrong in QA. Examples like:
- Strategy: Was the strategy too ambitious, or even impossible to execute on?
- Discovery: Did the PM write a bad spec? Did designs come through that were off brand?
- Build: Is the engineer too junior or lacking skills to execute on these kinds of tasks?

Answering those could lead to coaching opportunities for teammates, reassignments, or firings. But before you get to that last one, make sure you have a good sense of scale.

What’s Actually Happening during QA?

Whether you have a dedicated QA team or not, someone has to be in charge of making sure you did what you said you would do. In smaller companies, it’s likely everyone. The CEO, CTO, and anyone else with a pulse. In larger companies with dedicated resources, it’s usually QA engineers and Product Managers. They are usually doing manual QA on top of whatever automated systems they’ve built. That typically looks like:

Automated end-to-end testing as part of CI/CD
Updating test environments with “fixtures” to set up eligible scenarios to test functionality
Priority manual testing on changed functionality. Typically this is following the requirements in the spec and trying to confirm the use cases, and then attempt to break them.
Secondary manual testing on related functionality. Sometimes people call this regression testing. Typically a developer would give a heads up to the QA team that they had to change code that could affect a tangential customer workflow.
There’s often some back and forth between the person doing QA and the person who authored a spec. Often times QA people find things, weird things, that were unexpected by anyone involved up to this point. They’ll negotiate, update docs, and sometimes need to update functionality based on these negotiations.

And while there is structure, automation, scripts, and general competence going on during the QA phase, your best QA people can also think like a schizophrenic.

https://x.com/brenankeller/status/1068615953989087232

That leads us to the last most underlooked thing that should happen during QA. To avoid the first real customer walking in and asking where the bathroom is, leading to the bar bursting in flames, maybe you should show the software to someone, or someones, prior to releasing it to everyone? It is in our definition of the QA atomic unit afterall: Show the software to people. While it may not be pragmatic or necessary to do this for every new feature or bug fix, showing software to real customers or prospects dramatically reduces the risk of things going wrong in the public release. I don’t even mean running a large alpha or beta program which comes with a lot of overhead. I mean feature flagging a new feature, turning it on for 1 or 2 people, and essentially running a usability test in an uncontrolled, but limited environment. The right people to turn it on for are customers you have a close relationship with, people who have signaled they like your product and like testing things out, or the customers that actually requested the feature you're about to release. As necessary as it is that you and your team spend serious time testing your product, no one will break it the way a customer can and will.

What is Everyone Else Doing During QA?

If the QA phase is all about making sure you did what you said you’d do, then what is everyone else doing?

First let’s be clear that having dedicated QA testers is a luxury in many organizations. I don’t have strong opinions on who does QA, just that it gets done in a formalized manner. The security posture of your product will typically dictate investments into QA. If you’re working on a financial services product like I did at Unchained, that investment comes very early in the product life cycle. If you’re building forum software like I did at Stack Overflow, it may come much later (or never). So for the purposes of this section, we’ll assume you do have dedicated QA testers and that there are several other people on the software development and tangential teams wondering what to do while they await news of what they got wrong.

This brings us back to what we discussed in Invariant 3: Don’t Build Without a Plan. Everyone else is pipelining. However, during QA there are several things we can be doing that we couldn’t during build.

Marketing - Coming out of build, marketers finally got their hands on a “final product”. They can start updating their sales and external assets, finalize language on blog posts, and generally “finish” their participation in the Delivery atomic unit with low risk to major changes messing up these plans.

Sales - They may be helping QA engineers and PMs set up a few clients to help the QA team finish testing.

Developers - Are typically making themselves available for questions and updates from the QA team, while beginning to immerse themselves in their next project.

PMs and Designers - Since in this case they aren’t running the QA phase themselves, they’re also making themselves available to QA engineers, maybe even with a daily catch up, and are either helping marketers for Delivery of this project, or are focused on Discovery for something entirely different.

We’re always pipelining and never waiting around for someone else to finish. Psychologically, it is easy for members of the software development team to take a breath and wait for good or bad news from a QA team running their process. This is a great way to kill productivity, create a culture of anxiety, expectation of failure, and generally slow everything down. Be available to the QA team, but always be pipelining.

Aside - Best Practices and Misconceptions and the Future

There are a few extra points I want to add in before we wrap things up.

Best Practices

I strongly suggest you come up with some rituals prior to the QA phase that emphasizes teams using, or showing the software off as early as possible. An example of this is having weekly, bi-weekly, or monthly demos. These are internal affairs that include demoing of new strategy, specs and designs, or early versions of the software during build. There are so many issues working in public like this solves, but it’s most impactful when you go into QA. Each part of the feature or product has been presented, analyzed, and sometimes even used, by many people outside of the direct software development team doing the building. It’s similar to test driven development, only for QA. If you begin from a critical (and sometimes adversarial) posture, what comes out the other side is hardened well beyond what would have been possible otherwise. You don’t want the first critical eyes on a feature or product to be QA Engineers.
Likewise, I strongly suggest investing the time in creating fixtures, or pre-populated accounts with reasonable looking data to seed

Misconceptions - I’ve had several arguments in the past from people insisting that QA should simply be the acceptance criteria for exiting the Build phase, and does not need to be a phase of its own. While I think very early on teams naturally treat it this way, I strongly disagree once the team has any real scale. Combining QA with Build both challenges the independence you want from testers, as well as minimizes the expertise and processes desired out of QA testers. Even if those testers are doubling as software engineers or PMs on a smaller team, getting them in the mindset, outside of authoring specs and writing code is the best way to test.

AI and QA - Whether you’re generally bullish or bearish on AI, I do think there are very positive outcomes for AI as it pertains to QA processes. Similar to automated end-to-end testing, LLMs have the ability to perform extremely expensive manual tasks far more efficiently that have particularly good outcomes for finding security vulnerabilities, performance issues, and things of that sort. Including these in the QA workflow can remove not just manual testing steps, but also a lot of the engineering intervention into nebulous performance and security issues that are difficult to discern. LLMs provide non-authors of code a massive shortcut in understanding what the code does and how it was reasoned to work the way it does. For a QA engineer or PM who doesn’t write production code but is technically minded, these models provide a superpower in assaying the code in a deeper way than previously possible.

How do we know we’re done with QA?

There’s so much going on that you really need a checklist or sorts, so I’ll give you one to start with:

Have critical bugs identified been resolved and retested?
Have we reached feature completeness, IE have all the requirements from the spec been confirmed?
Did we pass all automated and regression tests?
Are we in line with any defined performance benchmarks?
Did we complete all integration tests, manual or automated?
Have we performed any kind of user acceptance testing (UAT) that’s appropriate for what’s being released?
Have we cleared all security and compliance hurdles as part of QA?

You may want to take a couple of those off and add your own depending on what type of product you're building and what size company you’re in, but your checklist will look something like this. What should be uncontroversial before you move to Delivery in all scenarios is that what is being shipped and what was spec’d are the same. Either by reaching feature completeness, negotiating the spec down late, updating it based on tests, whatever the way you got there, the spec matches the product ready to deploy.

Speaking of Deploying…

Deploying, or the Delivery atomic unit, is finally upon us. Which brings us to Invariant 5: Make Products Earn Their Keep. Coming soon (and I won’t make you wait as long for the last invariant of the series).