Field notes: Unreasonable performance tests seen in the wild

Unreasonable performance tests seen in the wild

Enterprise software deployments are complicated. Every customer goes about the process differently. For some, performance testing in a pre-production environment is a requirement before deployment. These customers are generally very mature and realize that performance testing provides necessary data that will enable and qualify installation into production.

There are some customers who for whatever reason have invested in performance testing in their pre-production environments but haven’t done so consistently, or are ignoring some of the basic tenets about performance testing.

It looks good on paper…

I worked with a customer who assigned some Business Analysts the task of documenting a production workflow. These BAs didn’t fully understand the application or the domain, but went about their work anyway. They created a nice looking spreadsheet indicating workflow steps, and an estimate as to how long they thought the workflow would take to execute.

Actually, they didn’t create an estimate as to the start-to-finish elapsed time of the workflow. They documented the number of times they thought the workflow could be reasonably completed within an hour.

There’s a difference in measurement and intention if I say:

“I can complete Task A in 60 seconds.”

or if I say:

“I can complete Task A 60 times in an hour.”

I may think I can complete Task A faster and perhaps shave a few seconds of my 60-second estimate. Or maybe I think I can execute more tasks in an hour.

In this particular case, the count of tasks within an hour was nudged upwards by successive layers of review and management. Perhaps the estimate was compared against existing metrics for validity, but the results were presented in a table, something like:

Transaction Count Duration
Triage 600 10

At first glance this doesn’t seem unreasonable. Except that no units are provided with the values, and it’s unclear how the numbers actually relate to anything. Expressed as a sentence, here is what the workflow actually intended:

“Team leads will triage 600 defects per hour for 10 hours a day”

This data was taken directly without question by the team who was writing and executing the performance tests. They were told there would be several folks triaging defects so they created several virtual users working at this rate. You guessed it. The test and the system failed. Lots of folks got upset and the deployment to production date slipped.

 … but not in a calculator

Expecting any human to execute any triage task with software at a rate of one-every-6-seconds (10 per minute) for 10 hours without a break is madness. It may be possible that the quickest a person could triage a defect is 6 seconds, but this is not sustainable. Once the test requirement was translated into natural language, and folks realized that the rate was humanly impossible, the test was redesigned, executed and produced meaningful results.

How did this insane workflow rate get this way? Was it successive layers of management review increasing or doubling a value? Maybe when the table was created values were unintentionally copied and pasted, or multiplied. Regardless, the values were not checked against reality.

Are any tests are better than no tests?

I worked with another customer who spent the time and effort to create performance tests in pre-production, but didn’t follow through with the other tenets of good performance testing. Performance work needs to be executed in a stable, well-understood environment, and be executed as repeatably as possible.

Ideally, the work is done in an isolated lab and the test starts from a known and documented base state. After the test is run, the test environment can be reset so that the test can run again repeatedly and produce the same results. (In our tests, we use an isolated lab and we use filer technology so that the actual memory blocks can be reset to the base state.) Variances are reduced. If something has to be changed, then changes should be made one at a time.

This customer didn’t reset the database or environment (so that the database always grew) and they did not operate in an isolated network (they were susceptible to any corporate network traffic). Their results were wildly varying. Even starting their tests 30 minutes earlier from one day to the next produced wildly variable results.

This customer wasn’t bothered by the resulting variance, even though some of us watching were extremely anxious and wanted to root out the details, lock down the environment and understand every possible variable. For good reasons, the customer was not interested in examining the variance. We struggled to explain that what they were doing wasn’t really performance testing.

What can we learn from these two examples?

#1: Never blindly put your entire faith into a workload. Always question it. Ask stakeholders for review. Calculate both hourly counts as well as the actual workload rate. Use common sense. Try to compare your workload against reference data.

#2: Effective performance testing is repeatable and successive tests should have little variance. The performance testing process is simple: Document environment before test (base state), run test, measure and analyze, reset to base state, repeat. If changes need to be made, make them one at a time.