Software sizing isn’t easy

I’m going to quote pretty much the entirety of an introduction I wrote to an article just posted at the jazz.net Deployment wiki on CLM Sizing (https://jazz.net/wiki/bin/view/Deployment/CLMSizingStrategy):

Whether new users or seasoned experts, customers using IBM Jazz products all want the same thing: They want to use the Jazz products without worrying that their deployment implementation will slow them down, that it will keep up with them as they add users and grow. A frequent question we hear, whether it’s from a new administrator setting up Collaborative Lifecycle Management (CLM) for the first time, or an experienced administrator tuning their Systems and Software Engineering (SSE) toolset, is “How many users will my environment support?”

Back when Rational Team Concert (RTC) was in its infancy we built a comprehensive performance test environment based on what we thought was a representative workload. It was in fact based upon the workload the RTC and Jazz teams itself used to develop the product. We published what we learned in our first Sizing Guide. Later sizing guides include: Collaborative Lifecycle Management 2011 Sizing Guide and Collaborative Lifecycle Management 2012 Sizing Report (Standard Topology E1). As features were added and the release grew, we started to hear about what folks were doing in the field. The Jazz products, RTC especially, are so flexible that customers were using them with wonderfully different workloads than we had anticipated.

Consequently, we stepped back from proclaiming a one-size fits all approach, and moved to presenting case studies and specific test reports about the user workload simulations and the loads we tested. We have published these reports on the jazz.net Deployment wiki at Performance datasheets. We have tried to make a distinction between performance reports and sizing guides. Performance reports document a specific test with defined hardware, datashape and workload, whereas sizing guides suggest patterns or categories of hardware, datashape and workload. Sizing reports are not specific and general descriptions of topologies and estimations of workloads they may support.

Throughout the many 4.0.x release cycles, we were still asked “How many users will my environment support?” Our reluctance to answer this apparently straightforward question frustrated customers new and old. Everyone thinks that as the Jazz experts we should know how to size our products. Finally, after some analysis and messing up countless whiteboards, we would like to present some sizing strategies and advice for the front-end applications in the Jazz platform: Rational Team Concert (RTC), Rational Requirements Composer (RRC)/Rational DOORS Next Generation (DNG) and Rational Quality Manager (RQM). These recommendations are based upon our product testing and analysis of customer deployments.

sizingestact_wide

The article talks about how complex estimating a software sizing can be. Besides the obligatory disclaimer, there’s a pointer to the CLM and SEE recommended topologies and a discussion of basic definitions. There’s also a table listing many of the non-product (or non-functional) factors which can wreck havoc with the ideal performance of a software deployment.

Most importantly, the article provides some user sizing basics for Rational Team Concert (RTC), Rational Requirements Composer (RRC)/Rational DOORS Next Generation (DNG) and Rational Quality Manager (RQM). Eventually we’ll talk a bit more about the strategies / concepts needed to determine whether you may need two CCMs or multiple application servers in your environment.

For now, I hope we’re taking a good step towards answering the perennial question: “How many users will my environment support,” and explaining why it’s so hard to answer that question accurately.

As always, comments and questions are appreciated.

CLM 4.0.6 performance reports

A fresh batch of 4.0.6 datasheets was posted to the deployment wiki coincident with the 4.0.6 release. 4.0.6 was released February 28, 2014. Yes, that was a month or so ago, and so I’m late with mentioning the timely performance reports, our largest batch yet. The team worked hard to get the data and compile the reports.

For those keen on migrating data from ClearCase to Rational Team Concert, the ClearCase Version Importer performance report: Rational Team Concert 4.0.6 release report shows basic data on how long an import may take.

The Collaborative Lifecycle Management performance report: RTC 4.0.6 release shows that 4.0.6 RTC performance is comparable to 4.0.5.

zosbuildnos

For the RTC for z/OS 4.0.6 release, the Rational Team Concert for z/OS Performance in 4.0.6 report  shows that 4.0.6 performance is similar to 4.0.5. For 4.0.6 There were enhancements made to RTC for z/OS queries so that they use the Java JFS API: “In the scenario of the incremental build (request build after changing all the copybook files), the “Collecting buildable files” activity in preprocessing time improved about 25%, and result in an about 5% improvement in total run time.”

The Collaborative Lifecycle Management performance report: RRC 4.0.6 release report shows that RRC 4.0.6 performance is comparable to 4.0.5.

Similarly, the Collaborative Lifecycle Management performance report: Rational Quality Manager 4.0.6 release shows that RQM 4.0.6 performance is comparable to 4.0.5.

The CLM Reliability report for the 4.0.6 release demonstrates the capability of a Standard Topology (E1) configuration under sustained 7-day load.

The Collaborative Lifecycle Management performance report: Export Transform Load (ETL) 4.0.6 release report demonstrates that there are no performance regressions in 4.0.6 ETL performance compared to 4.0.5. The 4.0.6 ETL functionality is more comprehensive than it was for 4.0.5, and so there are situations where ETL times may have increased although the data now indexed is more complete and accurate.

Comments, questions? Use the comments box on the actual reports themselves.

 

Field notes: It isn’t always about virtualization, except when it is

Talking recently with some customers, we discussed the fallacy of always trying to solve a new problem with the same method that found the solution of the proceeding problem. Some might recall the adage, “If you only have a hammer, every problem looks like a nail.”

hammer_1

I am often asked to help comprehend complex performance problems our customers encounter. I don’t always have the right answer, and usually by the time the problem gets to me, a lot of good folks who are trained in solving problems have spent a lot of time trying to sort things out. I can generally be counted upon for a big-picture perspective, some non-product ideas and a few other names of folks to ask once I’ve proved to be of no help.

A recent problem appeared to have no clear solution. The problem was easy to repeat and thus demonstrable. Logs didn’t look out of the ordinary. Servers didn’t appear to be under load. Yet transactions that should be fast, say well under 10 seconds, were taking on the order of minutes. Some hands-on testing had determined that slowness decreased proportionally with the number of users attempting to do work (a single user executing a task took 30-60 seconds, two users at the same time took 1 to 90 seconds, three users took 2-3 minutes, etc.).

So I asked whether the environment used virtualized infrastructure, and if so, could we take a peek at the settings.

Yes, the environment was virtualized. No, they hadn’t looked into that yet. But yes, they would. It would take a day or too to reach the folks who could answer those questions and explain to them why we were asking.

But we never did get to ask them those questions. Their virtualization folks took a peek at the environment and discovered that the entire configuration of five servers was sharing the processing power customarily allocated to a single server. All five servers were sharing 4 GHz of processing power. They increased the resource pool to 60 GHz and the problem evaporated.

I can’t take credit for solving that one. It was simply a matter of time before someone else would have stepped back and asked the same questions. However, I did write it up for the deployment wiki. And I got to mention it here.

CLM 4.0.5 performance reports

To coincide with the CLM 4.0.5 release, the performance team has produced six — that’s six! — reports.

Collaborative Lifecycle Management performance report: RTC 4.0.5 release compares the performance of an unclustered Rational Team Concert version 4.0.5 deployment to the previous 4.0.4 release. With the workload as described in the report, we found no performance regressions between current release and prior release.

RTC for z/OS recently introduced a Java API to access JFS (Jazz Foundation Services) instead of using the HTTP APIs which provides the potential for significant performance improvements. The Rational Team Concert for z/OS performance impact of Java JFS API adoption in 4.0.5 report compares the performance before and after RTC for z/OS adopted the Java JFS API in part of the resource operations and queries. Comparison is made between the 4.0.5 RC1 development version and the previous 4.0.4 release.

Collaborative Lifecycle Management performance report: RRC 4.0.5 release compares the performance of an unclustered Rational Requirements Composer version 4.0.5 deployment to the previous 4.0.4 release.

Similarly, Collaborative Lifecycle Management performance report: Rational Quality Manager 4.0.5 release compares the performance of an unclustered Rational Quality Manager version 4.0.5 deployment to the previous 4.0.4 release.

There were design changes to ETL functionality for RM which are highlighted in Collaborative Lifecycle Management performance report: Export Transform Load (ETL) 4.0.5 release. This report presents the results of “Extract, Transform, and Load” (ETL) performance testing for CLM. The ETL type includes Java ETL and DM ETL. Data load includes full load and delta load. The article focusses on ETL performance comparison between the 4.0.5 release and the 4.0.4 release.

Finally, the CLM Reliability report: CLM 4.0.5 release presents a sample of the results from a CLM 405 Reliability test. Reliability testing is about exercising the CLM applications so that failures are discovered and removed before the system is deployed. There are many different combinations of pathways through the complex CLM application, this test scenario exercises the most likely use cases. The use cases are put under constant load for a seven day period to validate that the CLM application provides the expected level of service, without any downtime or degradation in overall system performance.

 

 

Yes, we renamed JazzMon

The tool formerly known as JazzMon has been renamed to JTSMon.

Given the tool’s popularity and wide-spread use, it made sense to align it more closely in name and version numbering with the CLM product family. Version 4.0 follows version 1.4.0 which has the new Excel macro visualizer to help make sense of the data which JTSMon can capture.

The JTSMon FAQ offers lots to help you get started. There is a 32-page user manual, and if you’re really impatient, a single-page QuickStart sheet. A downloadable .zip has all the moving parts, and release notes are published separately. There’s also a 10-minute video demo (Ok, so we didn’t re-record it when we changed the name), and a shorter one.

The older 1.4.0 version is still available. It works with RTC 2.x whereas the new 4.0 version works only with RTC 3.x and 4.x.

If you have questions or comments, please ask them at the jazz.net forum. We’re still using the jazzmon tag there.

 

Be even smarter with virtualization

It took a bit of unplanned procrastination, but we finally got to the second and third parts of our in-depth investigation of virtualization as it relates to IBM Rational products.

casestudy3snip

Part two is now published here as Be smart with virtualization: Part 2. Best practices with IBM Rational Software.

Part three lives on the deployment wiki here as Troubleshooting problems in virtualized environments.

Part two presents two further case studies and a recap of the principles explored in Part 1. We took a stab at presenting the tradeoffs between different virtualization configurations. Virtualization is becoming more prevalent because it’s a powerful way to manage resources and squeeze efficiencies from hardware. Of course there are balances and Part two goes a bit deeper.

Part three moves to the deployment wiki and offers some specific situations we’ve solved in our labs and with customers. There are also screen shots of one of the main vendor’s tools which can guide you to identify your own settings.

The Fall VoiCE Jam is ON

Every year we gather IBM Rational technical experts and customers to talk about product features and futures. This year we’re trying something a bit different this fall, which we’re calling a VoiCE Jam. Basically, we’re hosting several online discussion forums for you to suggest ideas and comment on others.

I’m working in the Deployment theme area (no surprise there). The topic is “Deployment  for Administrators: Improving the Deployment experience of your IBM Rational products and solutions.” There’s topics on performance, deployment and install.

It would be great if you could take a few minutes to collaborate with the community and your peers on your ideas for simplifying the deployment and maintenance of Rational products and solutions.  You can enter your own ideas. You can vote on others. You can also lurk without commenting or voting, but where’s the fun in that.

The VoiCE Jam is open until Friday, October 12, 2013. It costs nothing to register.

https://voicejam.ideajam.net/ideajam/hosted/voice/ideajam.nsf/

See you there!

 

Two new performance reports for CLM 4.0.4

Timed with the release of CLM 4.0.4, two new performance reports are posted on the Deployment wiki. These are part of Rational development and the Rational Performance Engineering team’s plan to release relevant and useful reports more frequently.

Collaborative Lifecycle Management performance report: RTC 4.0.4 release compares the 4.0.4 release to the previous 4.0.3 release. The performance goals are to verify that there are no performance regressions between the 4.0.4 and 4.0.3 releases when running tests using the 1100-1200 concurrent user workload as described. The report shows similar throughput for the 4.0.4 release and nmon data comparing 4.0.3 and 4.0.4 show similar CPU, memory and disk utilization on application server. The database shows similar CPU and disk utilization, but higher memory utilization.

Collaborative Lifecycle Management performance report: RRC 4.0.4 release compares the 4.0.4 release with the prior 4.0.3 release to verify there are no performance regressions using the 400-user workload as described. The report shows that there are several actions in 4.0.4 which are faster than in 4.0.3.

More reports are coming, so keep an eye on the Performance datasheets and sizing guidelines page.

New performance material at the jazz.net Deployment wiki

As promised, we have started to publish some datasheets and reports on the jazz.net Deployment wiki. In between the necessary work of qualifying and testing new releases, the team has explored some more complex scenarios. Some of these explorations are responses to customer requests, so keep letting us know what’s important to you. Others are topics which have sat dormant in our backlog and we’ve only just recently been able to align resources to achieve them.

* * *

percentage

Plan loading improvements in RTC 4.0.3

Everybody works with plans in RTC, whether tiny team plans to multi-year large plans with thousands of items and complex team/owner/plan item relationships. When working with larger, more complex plans, some clients noted sub-optimal performance of plan loading. They sought techniques to optimize their plan usage, and also asked IBM to improve plan load response times.

For the Rational Team Concert 4.x releases, the development team made significant improvements to plan loading behavior. Significant changes were made to 4.0.3. This case study compared identically structured plans of varying sizes with the goal of determining the difference in plan load time between RTC 3.0.1.3 and 4.0.3.

“Using the 3.0.1.3 release the larger plans took more than a minute to load while in 4.0.3 all plans, regardless of size or browser used, took less than a minute to load. In this study plans of all sizes loaded faster in 4.0.3 than in 3.0.1.3. Notably, plans with larger numbers of work items loaded proportionally faster in 4.0.3.”

See Case study: Comparing RTC 3.0.1.3 and 4.0.3 plan loading performance: https://jazz.net/wiki/bin/view/Deployment/CaseStudyRTCPlanLoadingPerformance

* * *

Sharing a JTS server between RTC, RQM and RRC

Key to a successful CLM implementation is separate application servers (Rational Team Concert (RTC), Rational Quality Manager (RQM) and Rational Requirements Composer (RRC)) sharing the same JTS server. Folks have asked about the breaking point of a shared JTS server. From the report:

“Overall, for this set of machines and these workloads, the JTS never became a bottleneck. There was only a small amount of degradation in maximum throughputs (5-10%) even when all 5 CLM servers were at maximum load.”

Throughput was measured in transactions-per-second and graphs show the different combinations of servers connected to the single JTS and the relative loads and transaction rates.

Visit Performance impact of sharing a Jazz Team Server: https://jazz.net/wiki/bin/view/Deployment/PerformanceImpactOfSharingAJazzTeamServer

* * *

Sizing VMware

Everyone is using virtualization, and VMware’s ESX is popular for deploying Linux and Windows OSes. We offer stated minimum suggested CPU sizes which are applicable to both physical and virtual servers. This particular report looks at the performance impact of varying vCPU sizes of VMware VMs which are serving Rational Team Concert.

“In this study, we were using dual 6-core physical hyper-threaded CPUs that were not able to be translated to 12 or 24 vCPUs within the virtual environment. We found better performance using 16 vCPUs in our Virtual Machines.”

Look at Performance impact of different vCPU sizes within a VMWare hypervisor: https://jazz.net/wiki/bin/view/Deployment/PerformanceImpactOfvCPUSizes

* * *

RRDI 2.0.x sizing guidelines

One of my Jumpstart colleagues wrote a note about sizing the Rational Reporting for Development Intelligence (RRDI), an essential ingredient in most CLM deployments. To properly size RRDI requires understanding the approximate number of concurrent users and estimating how many of them might interact with reports.

Take a look at RRDI 2.0.x sizing guidelines: https://jazz.net/wiki/bin/view/Deployment/RRDISizingGuidelines

* * *

The future

We plan to post more reports as they are completed on the wiki here: https://jazz.net/wiki/bin/view/Deployment/PerformanceDatasheetsAndSizingGuidelines. As always, let us know what you think is missing or what you’re interesting in hearing more about. You can ask here or on the wiki itself. Thanks.

 

 

Field notes: Unreasonable performance tests seen in the wild

Unreasonable performance tests seen in the wild

Enterprise software deployments are complicated. Every customer goes about the process differently. For some, performance testing in a pre-production environment is a requirement before deployment. These customers are generally very mature and realize that performance testing provides necessary data that will enable and qualify installation into production.

There are some customers who for whatever reason have invested in performance testing in their pre-production environments but haven’t done so consistently, or are ignoring some of the basic tenets about performance testing.

It looks good on paper…

I worked with a customer who assigned some Business Analysts the task of documenting a production workflow. These BAs didn’t fully understand the application or the domain, but went about their work anyway. They created a nice looking spreadsheet indicating workflow steps, and an estimate as to how long they thought the workflow would take to execute.

Actually, they didn’t create an estimate as to the start-to-finish elapsed time of the workflow. They documented the number of times they thought the workflow could be reasonably completed within an hour.

There’s a difference in measurement and intention if I say:

“I can complete Task A in 60 seconds.”

or if I say:

“I can complete Task A 60 times in an hour.”

I may think I can complete Task A faster and perhaps shave a few seconds of my 60-second estimate. Or maybe I think I can execute more tasks in an hour.

In this particular case, the count of tasks within an hour was nudged upwards by successive layers of review and management. Perhaps the estimate was compared against existing metrics for validity, but the results were presented in a table, something like:

Transaction Count Duration
Triage 600 10

At first glance this doesn’t seem unreasonable. Except that no units are provided with the values, and it’s unclear how the numbers actually relate to anything. Expressed as a sentence, here is what the workflow actually intended:

“Team leads will triage 600 defects per hour for 10 hours a day”

This data was taken directly without question by the team who was writing and executing the performance tests. They were told there would be several folks triaging defects so they created several virtual users working at this rate. You guessed it. The test and the system failed. Lots of folks got upset and the deployment to production date slipped.

 … but not in a calculator

Expecting any human to execute any triage task with software at a rate of one-every-6-seconds (10 per minute) for 10 hours without a break is madness. It may be possible that the quickest a person could triage a defect is 6 seconds, but this is not sustainable. Once the test requirement was translated into natural language, and folks realized that the rate was humanly impossible, the test was redesigned, executed and produced meaningful results.

How did this insane workflow rate get this way? Was it successive layers of management review increasing or doubling a value? Maybe when the table was created values were unintentionally copied and pasted, or multiplied. Regardless, the values were not checked against reality.

Are any tests are better than no tests?

I worked with another customer who spent the time and effort to create performance tests in pre-production, but didn’t follow through with the other tenets of good performance testing. Performance work needs to be executed in a stable, well-understood environment, and be executed as repeatably as possible.

Ideally, the work is done in an isolated lab and the test starts from a known and documented base state. After the test is run, the test environment can be reset so that the test can run again repeatedly and produce the same results. (In our tests, we use an isolated lab and we use filer technology so that the actual memory blocks can be reset to the base state.) Variances are reduced. If something has to be changed, then changes should be made one at a time.

This customer didn’t reset the database or environment (so that the database always grew) and they did not operate in an isolated network (they were susceptible to any corporate network traffic). Their results were wildly varying. Even starting their tests 30 minutes earlier from one day to the next produced wildly variable results.

This customer wasn’t bothered by the resulting variance, even though some of us watching were extremely anxious and wanted to root out the details, lock down the environment and understand every possible variable. For good reasons, the customer was not interested in examining the variance. We struggled to explain that what they were doing wasn’t really performance testing.

What can we learn from these two examples?

#1: Never blindly put your entire faith into a workload. Always question it. Ask stakeholders for review. Calculate both hourly counts as well as the actual workload rate. Use common sense. Try to compare your workload against reference data.

#2: Effective performance testing is repeatable and successive tests should have little variance. The performance testing process is simple: Document environment before test (base state), run test, measure and analyze, reset to base state, repeat. If changes need to be made, make them one at a time.