Performance Driven Development

Say what you will about Test Driven Development (TDD), but it was an eye opener for some of us the first time we heard about it. Sure practically it’s not always going to happen despite the best of intentions, but it did reframe how some developers went about their work and how some organizations ensured a modest degree or even slightly increased amount of testing earlier in the lifecycle. At an abstract level, thinking about TDD improved quality outcomes regardless of the tactical steps.

Recently, trying to describe the characteristics and skills needed for a new development team, I joked that we needed Performance Driven Development (PDD). I got some light agreement (mostly of the smiling emoticon variety), and at least one “Aha, that’s an awesome idea!” for each “That’s totally stupid. You clearly don’t understand either development or performance.” Yeah, OK. Sometimes good ideas seem stupid at first.

(Of course sometimes stupid ideas seem stupid at first but smart later, but then again, I probably shouldn’t point that out.)

Well, as a thought experiment, Performance Driven Development is a chance to express several qualities I want in the new organization we’re trying to build. We want performance engineering integrated into the product creation experience. Not a throw-it-over-the-wall-afterthought. We do not want a separate performance team, self-organizing, working on a not-current release, using a synthetic workload, and employing custom-made testing tools. Yeah, we’ve all done that. Some of us got really good at it. There’s a lot of work to do in that environment, lots of good problems to find and solve. But it’s frustrating after awhile.

Let’s consider the end result in those cases. Performance really was an afterthought. Code was baked and delivered, and then a team would come along later to “test at scale” or do “load testing.” A necessary, scary bunch of tasks was delayed to the end when it was perhaps to late to do anything about it. Kind of like the way testing was before TDD.

If we’re trying to improve performance, TDD becomes an instructive model. We want the performance specialists (no longer an independent team) to be integrated into the development team, we want them to work on the most current release, we want them to use real customer workloads, and we want the tools they use to be part of the development experience or easy (easy-ish) to use off-the-shelf tools.

Performance Driven Development means that at every step, performance characteristics are noted and then used to make qualitative decisions. This means that there is some light measurement at the unit level, at the component level, etc., and that measurements occur at boundaries where one service or component hands off to another, and that the entire end-to-end user experience is measured as well.

(Skeptics will note that there’s no mention of load, scale or reliability yet, so bear with me.)

The first goal is that the development team must build not just features but the capability to lightly measure. By measure I mean timestamps or durations spent in calls, and where appropriate and relevant, payload size (how much data passed through, and/or at what rate, etc.). By light, I mean really simple logging or measurement which occurs automatically, and which doesn’t require complex debug flags being set in an .ini file or a special recompilation. The goal is that from release to release as features change it will be possible to note whether execution has slowed or speeded, and relatedly whether they are transferring more or less data.

Unit tests would not just indicate { pass / fail }, but when run iteratively { min / avg / mean / max / std } values for call time or response time. Of course we’re talking about atomic transactions, but then again, tasks that are lengthy should be measured as well. (Actually, lengthy tasks demand light measurement to understand what’s occurring underneath, the rate of a single-threaded transformation, or the depth of a queue and how it changes over time, etc.)

I’ve also been talking of performance boundaries. These are the points where different systems interact, where perhaps data crosses an integration point, where a database hands off to a app server, where a mobile client transmits to a web server, or in complex systems where ownership may change. These boundaries need to be understood and documented, and crucially, light measurement must occur at those points (timestamps, payload, etc.).

Finally, the end-to-end experience must be measured. This may align more typically with the majority of performance tools in the market (like LoadRunner and Rational Performance Tester) which record and playback http / https, simulating end-user behavior in a browser. There are many ways to measure the end-to-end user experience

In the PDD context, the performance engineer is responsible for ensuring that light measurement exists at the unit, component and boundary levels. The performance engineer assists the team in building the tools and communal expertise to capture and collect such measurements. The performance engineer leads the charge to identify when units or components become slower or the amount of data that passes through them has increased. The performance engineer will also need to use tools like Selenium, JMeter and Locust to produce repeatable actions that can be measured either of the single-user or multiple-user variety.

I know plenty of really good developers who once you show them conclusive evidence that the routine which they just finished polishing now increases a common transaction by 0.01 second, will go back to the drawing board to get that 0.01 second back and more. They’ll grumble at first, but they’ll be grateful afterwards. Building that type of light measurement can’t be done after the fact. It must be designed from the beginning. And if I’m being idealistic about a super-developer discovering extra time in code, then we all realize that enhancing features intrinsically gets expensive. Adding more features generally makes things slower, that’s a given tradeoff.

And so what about load and reliability and scalability and all those other things? I’ve spent a lot of time working with complex and messy systems. Solving performance problems and building capacity planning models invariably involves defining and building synthetic workloads. I don’t know how many times I’ve been in situations where the tested synthetic workload is nothing at all like what occurs in real life. I have grown convinced that dedicating a team of smart people to devising a monstrous synthetic workload is misdirected. Yes, interesting problems will be discovered, but many of them will not be relevant to real life. Sure this work creates interesting data sheets and documents, but rarely does it yield useful tools for solving problems in the wild.

(You could argue that the performance engineer quickly learns the tools to solve these problems in the wild, which actually is a benefit of having a team doing this work. However, you’re not improving the root cause, you’re actually training a SWAT team to solve problems you’ve made difficult to diagnose to begin with.)

So then how do you simulate real workloads in a test context? I honestly think that’s not the problem to solve anymore. I know this terrifies some people:

“What do you mean you can’t tell me if it will support 100 or 1000 users?!”

I know that some executives will think that without these user numbers the product won’t sell, that customers won’t put money down for something that will not support some imagined load target. (I’ve muttered about how silly it is to measure performance in terms of users elsewhere.)

Given that most of our workloads involve the cloud, or if they don’t, then increasing either virtual or physical hardware capability is relatively inexpensive, then the problem becomes:

“How does overall performance change as I go from 100 to 200 users, and what can monitoring tell me about what that change looks like so I can predict the growth from N to N+100 users?”

Yes you want your car to go fast, but if you mean to keep that car a long time, you learn how it behaves (the sounds, the responsiveness) as it moves from second gear to third gear, or fourth gear to fifth gear.

The problem we need to solve in the performance domain isn’t about how to drive more load (although it can still be fun to knock things over), the problem to solve is:

“How do you build a complex system which reports meaningful measurement and monitoring information, and is also serviceable so that as usage grows, light measurement provides clear indication of how the system is behaving realtime under a real workload.”

This is what I’m trying to get at with the concept of Performance Driven Development. (I won’t lay a claim to creating the phrase. It’s been out in the wild for a while. I’m just putting forth some definitions and concepts which I think it should stand for.)