CLM 4.0.5 performance reports

To coincide with the CLM 4.0.5 release, the performance team has produced six — that’s six! — reports.

Collaborative Lifecycle Management performance report: RTC 4.0.5 release compares the performance of an unclustered Rational Team Concert version 4.0.5 deployment to the previous 4.0.4 release. With the workload as described in the report, we found no performance regressions between current release and prior release.

RTC for z/OS recently introduced a Java API to access JFS (Jazz Foundation Services) instead of using the HTTP APIs which provides the potential for significant performance improvements. The Rational Team Concert for z/OS performance impact of Java JFS API adoption in 4.0.5 report compares the performance before and after RTC for z/OS adopted the Java JFS API in part of the resource operations and queries. Comparison is made between the 4.0.5 RC1 development version and the previous 4.0.4 release.

Collaborative Lifecycle Management performance report: RRC 4.0.5 release compares the performance of an unclustered Rational Requirements Composer version 4.0.5 deployment to the previous 4.0.4 release.

Similarly, Collaborative Lifecycle Management performance report: Rational Quality Manager 4.0.5 release compares the performance of an unclustered Rational Quality Manager version 4.0.5 deployment to the previous 4.0.4 release.

There were design changes to ETL functionality for RM which are highlighted in Collaborative Lifecycle Management performance report: Export Transform Load (ETL) 4.0.5 release. This report presents the results of “Extract, Transform, and Load” (ETL) performance testing for CLM. The ETL type includes Java ETL and DM ETL. Data load includes full load and delta load. The article focusses on ETL performance comparison between the 4.0.5 release and the 4.0.4 release.

Finally, the CLM Reliability report: CLM 4.0.5 release presents a sample of the results from a CLM 405 Reliability test. Reliability testing is about exercising the CLM applications so that failures are discovered and removed before the system is deployed. There are many different combinations of pathways through the complex CLM application, this test scenario exercises the most likely use cases. The use cases are put under constant load for a seven day period to validate that the CLM application provides the expected level of service, without any downtime or degradation in overall system performance.

 

 

Two new performance reports for CLM 4.0.4

Timed with the release of CLM 4.0.4, two new performance reports are posted on the Deployment wiki. These are part of Rational development and the Rational Performance Engineering team’s plan to release relevant and useful reports more frequently.

Collaborative Lifecycle Management performance report: RTC 4.0.4 release compares the 4.0.4 release to the previous 4.0.3 release. The performance goals are to verify that there are no performance regressions between the 4.0.4 and 4.0.3 releases when running tests using the 1100-1200 concurrent user workload as described. The report shows similar throughput for the 4.0.4 release and nmon data comparing 4.0.3 and 4.0.4 show similar CPU, memory and disk utilization on application server. The database shows similar CPU and disk utilization, but higher memory utilization.

Collaborative Lifecycle Management performance report: RRC 4.0.4 release compares the 4.0.4 release with the prior 4.0.3 release to verify there are no performance regressions using the 400-user workload as described. The report shows that there are several actions in 4.0.4 which are faster than in 4.0.3.

More reports are coming, so keep an eye on the Performance datasheets and sizing guidelines page.

New performance material at the jazz.net Deployment wiki

As promised, we have started to publish some datasheets and reports on the jazz.net Deployment wiki. In between the necessary work of qualifying and testing new releases, the team has explored some more complex scenarios. Some of these explorations are responses to customer requests, so keep letting us know what’s important to you. Others are topics which have sat dormant in our backlog and we’ve only just recently been able to align resources to achieve them.

* * *

percentage

Plan loading improvements in RTC 4.0.3

Everybody works with plans in RTC, whether tiny team plans to multi-year large plans with thousands of items and complex team/owner/plan item relationships. When working with larger, more complex plans, some clients noted sub-optimal performance of plan loading. They sought techniques to optimize their plan usage, and also asked IBM to improve plan load response times.

For the Rational Team Concert 4.x releases, the development team made significant improvements to plan loading behavior. Significant changes were made to 4.0.3. This case study compared identically structured plans of varying sizes with the goal of determining the difference in plan load time between RTC 3.0.1.3 and 4.0.3.

“Using the 3.0.1.3 release the larger plans took more than a minute to load while in 4.0.3 all plans, regardless of size or browser used, took less than a minute to load. In this study plans of all sizes loaded faster in 4.0.3 than in 3.0.1.3. Notably, plans with larger numbers of work items loaded proportionally faster in 4.0.3.”

See Case study: Comparing RTC 3.0.1.3 and 4.0.3 plan loading performance: https://jazz.net/wiki/bin/view/Deployment/CaseStudyRTCPlanLoadingPerformance

* * *

Sharing a JTS server between RTC, RQM and RRC

Key to a successful CLM implementation is separate application servers (Rational Team Concert (RTC), Rational Quality Manager (RQM) and Rational Requirements Composer (RRC)) sharing the same JTS server. Folks have asked about the breaking point of a shared JTS server. From the report:

“Overall, for this set of machines and these workloads, the JTS never became a bottleneck. There was only a small amount of degradation in maximum throughputs (5-10%) even when all 5 CLM servers were at maximum load.”

Throughput was measured in transactions-per-second and graphs show the different combinations of servers connected to the single JTS and the relative loads and transaction rates.

Visit Performance impact of sharing a Jazz Team Server: https://jazz.net/wiki/bin/view/Deployment/PerformanceImpactOfSharingAJazzTeamServer

* * *

Sizing VMware

Everyone is using virtualization, and VMware’s ESX is popular for deploying Linux and Windows OSes. We offer stated minimum suggested CPU sizes which are applicable to both physical and virtual servers. This particular report looks at the performance impact of varying vCPU sizes of VMware VMs which are serving Rational Team Concert.

“In this study, we were using dual 6-core physical hyper-threaded CPUs that were not able to be translated to 12 or 24 vCPUs within the virtual environment. We found better performance using 16 vCPUs in our Virtual Machines.”

Look at Performance impact of different vCPU sizes within a VMWare hypervisor: https://jazz.net/wiki/bin/view/Deployment/PerformanceImpactOfvCPUSizes

* * *

RRDI 2.0.x sizing guidelines

One of my Jumpstart colleagues wrote a note about sizing the Rational Reporting for Development Intelligence (RRDI), an essential ingredient in most CLM deployments. To properly size RRDI requires understanding the approximate number of concurrent users and estimating how many of them might interact with reports.

Take a look at RRDI 2.0.x sizing guidelines: https://jazz.net/wiki/bin/view/Deployment/RRDISizingGuidelines

* * *

The future

We plan to post more reports as they are completed on the wiki here: https://jazz.net/wiki/bin/view/Deployment/PerformanceDatasheetsAndSizingGuidelines. As always, let us know what you think is missing or what you’re interesting in hearing more about. You can ask here or on the wiki itself. Thanks.

 

 

Field notes: Unreasonable performance tests seen in the wild

Unreasonable performance tests seen in the wild

Enterprise software deployments are complicated. Every customer goes about the process differently. For some, performance testing in a pre-production environment is a requirement before deployment. These customers are generally very mature and realize that performance testing provides necessary data that will enable and qualify installation into production.

There are some customers who for whatever reason have invested in performance testing in their pre-production environments but haven’t done so consistently, or are ignoring some of the basic tenets about performance testing.

It looks good on paper…

I worked with a customer who assigned some Business Analysts the task of documenting a production workflow. These BAs didn’t fully understand the application or the domain, but went about their work anyway. They created a nice looking spreadsheet indicating workflow steps, and an estimate as to how long they thought the workflow would take to execute.

Actually, they didn’t create an estimate as to the start-to-finish elapsed time of the workflow. They documented the number of times they thought the workflow could be reasonably completed within an hour.

There’s a difference in measurement and intention if I say:

“I can complete Task A in 60 seconds.”

or if I say:

“I can complete Task A 60 times in an hour.”

I may think I can complete Task A faster and perhaps shave a few seconds of my 60-second estimate. Or maybe I think I can execute more tasks in an hour.

In this particular case, the count of tasks within an hour was nudged upwards by successive layers of review and management. Perhaps the estimate was compared against existing metrics for validity, but the results were presented in a table, something like:

Transaction Count Duration
Triage 600 10

At first glance this doesn’t seem unreasonable. Except that no units are provided with the values, and it’s unclear how the numbers actually relate to anything. Expressed as a sentence, here is what the workflow actually intended:

“Team leads will triage 600 defects per hour for 10 hours a day”

This data was taken directly without question by the team who was writing and executing the performance tests. They were told there would be several folks triaging defects so they created several virtual users working at this rate. You guessed it. The test and the system failed. Lots of folks got upset and the deployment to production date slipped.

 … but not in a calculator

Expecting any human to execute any triage task with software at a rate of one-every-6-seconds (10 per minute) for 10 hours without a break is madness. It may be possible that the quickest a person could triage a defect is 6 seconds, but this is not sustainable. Once the test requirement was translated into natural language, and folks realized that the rate was humanly impossible, the test was redesigned, executed and produced meaningful results.

How did this insane workflow rate get this way? Was it successive layers of management review increasing or doubling a value? Maybe when the table was created values were unintentionally copied and pasted, or multiplied. Regardless, the values were not checked against reality.

Are any tests are better than no tests?

I worked with another customer who spent the time and effort to create performance tests in pre-production, but didn’t follow through with the other tenets of good performance testing. Performance work needs to be executed in a stable, well-understood environment, and be executed as repeatably as possible.

Ideally, the work is done in an isolated lab and the test starts from a known and documented base state. After the test is run, the test environment can be reset so that the test can run again repeatedly and produce the same results. (In our tests, we use an isolated lab and we use filer technology so that the actual memory blocks can be reset to the base state.) Variances are reduced. If something has to be changed, then changes should be made one at a time.

This customer didn’t reset the database or environment (so that the database always grew) and they did not operate in an isolated network (they were susceptible to any corporate network traffic). Their results were wildly varying. Even starting their tests 30 minutes earlier from one day to the next produced wildly variable results.

This customer wasn’t bothered by the resulting variance, even though some of us watching were extremely anxious and wanted to root out the details, lock down the environment and understand every possible variable. For good reasons, the customer was not interested in examining the variance. We struggled to explain that what they were doing wasn’t really performance testing.

What can we learn from these two examples?

#1: Never blindly put your entire faith into a workload. Always question it. Ask stakeholders for review. Calculate both hourly counts as well as the actual workload rate. Use common sense. Try to compare your workload against reference data.

#2: Effective performance testing is repeatable and successive tests should have little variance. The performance testing process is simple: Document environment before test (base state), run test, measure and analyze, reset to base state, repeat. If changes need to be made, make them one at a time.

 

 

Field notes: Measuring capacity, users and rates

Picture2

Small, Medium and Large

A frequent question concerns how we might characterize a company’s CLM deployment. Small, medium and large are great for tee-shirts, but not for enterprise software deployments. I admit we tried to use such sizing buckets at one point, but everyone said they were extra-large.

Sometime we characterize a deployment’s size by the number of users that might interact with it. We try to distinguish between named and concurrent users.

Named users are all the folks permitted use a product. These are registered users or users with licenses. This might be everyone in a division or all the folks on a project.

Concurrent users are the folks who actually are logged in and working at any given time. These are the folks who aren’t on vacation, but actually doing work like modifying requirements, checking in code, or toying with a report. Concurrent users are a subset of named users.

Generally we’ve seen the concurrent number of users hover around 15 to 25% of the named users. The percentage is closer to 15% in a global company whose users span time zones, and closer to 25% in a company where everyone is in one time zone.

As important is it is to know how many users might interact with a system, user numbers aren’t always an accurate way to measure a system’s capacity over time. “My deployment supports 3000 users” feels like a useful characterization, but it can be misleading because no two users are the same.

Because it can lead to simple answers, I cringe whenever someone asks me of a particular application or deployment, “How many users does it support?” I know there’s often no easy way to characterize systems, and confess I often ask countless questions and end up providing simple numbers. (I’ve tried answering “It supports one user at a time,” but that didn’t go over so well.)

Magic Numbers

How many users does it support” is a Magic Number Question, encouraged by Product Managers and abetted by Marketing, because it’s simple and easy to print on the side of the box. Magic Numbers are especially frequent in web-applications and website development. I’ve been on death marches mature software development projects where the goal was simply to hit a Magic Number so we could beat a competing product’s support statement and affix gold starbursts to the packaging proclaiming “We support 500 users.” (It didn’t matter if performance sucked, it was only important to cram all those sardines into the can.)

Let’s look at the Magic Number’s flaws. No two users do the same amount of work. The “500 users” is a misleading aggregate as one company may have 500 lazy employees where another might have caffeinated power-users running automated scripts and batch jobs. And even within the lazy company, no two users work at the same rate.

Ideally we’d use a unit of measure to describe work. A product might complete “500 Romitellis” in an hour, and we could translate end-user actions to this unit of measure. Creating a new defect might create 1 Romitelli, executing a small query might be 3 Romitellis, a large query could be 10 Romitellis. But even this model is flawed as one user might enter a defect with the least amount of data, whereas another user might offer a novel-length description. The difference in data size might trigger extra server work. This method also doesn’t account for varied business logic which might require more database activity for one project and less for another.

Rates

Just as a set number of users is interesting but insufficient, a simple counter to denote usage doesn’t helpfully describe a system’s capacity. We need a combination of the two, or a rate. Rate is a two-part measurement describing how much of an action occurred in how much time. But as essential as rates are, it’s important to realize how rates can be misunderstood.

Consider this statement:

“I drank six glasses of wine.”

On its own, that may not seem so remarkable. But you may get the wrong impression of me. Suppose I say:

“I drank six glasses of wine in one night.

Now you might have reason to worry. My wife would definitely be upset. And suppose I say:

“I drank six glasses of wine in the month of August.”

That would give a completely different impression. This is where rate comes into play. The first rate is 6-per-day the second is 6-per-month. The first relative to the second is roughly 30 times greater. One rate is more likely lead to disease and family disharmony than the other.

Let’s move this discussion to product sizing. Consider this statement:

“The bank completes 200 transactions.”

It’s just as unhelpful as the side-of-the-box legend, “The bank supports 200 users.” For this statement to be valuable it needs to be stated in a rate:

“The bank completes 200 transactions in a day.

This seems reasonable at first glance, as it suggests a certain level of capability. But now we can offer another:

“The bank completes 200 transactions in a second.

And now the first rate comes into perspective. Realize that rates can have some degree of meaningful variance. When a system is usable and working, it rarely functions at a consistent rate. Daily peaks and valleys are normal. Some months are busier than others. We may have an average rate, but we probably also need to specify an extreme rate:

“The bank completes 200 transactions in an hour on payday just before closing.

Or even a less-than-average rate:

“The bank completes 20 transactions in an hour on a rainy Tuesday morning in August.”

Imagine we’re testing an Internet website designed to sell a popular gift. The transaction rate in the middle of May should be different than the transaction rate in the weeks between Thanksgiving and Christmas. Therefore we should try to articulate capacity at both average transaction rate and peak transaction rate.

TPH

There’s no perfect solution for measuring user capacity. What we like to do is to describe work in units of transactions-per-hour-per-user. This somewhat gets beyond differences in users by characterizing work in terms of transactions, and also averages different types of users and data sizes to create a basic unit. Of course we could discuss all day what a transaction means. For now, take it to mean a self-contained action in a product (login, create an artifact, run a query, etc.), For much of our tests, we determined an average rate was 15 transactions-per-hour-per-user. (I abbreviate it 15 tph/u.) A peak rate was three times as much or 45 tph/u.

15 tph/u may not seem like very much activity. It’s approximately one transaction every four minutes. But we looked at customer logs, we looked at our logs, and this is the number we came up with as an average transaction rate for some of our products. Imagine that within an hour you may complete 15 transactions and practically you may complete more at the top of the hour, get distracted and complete fewer as the hour completes. Also, 15 is a convenient number for multiplying and relating to an hour. (Thank the Babylonians for putting 60 minutes into an hour.) Increasing the 15 tph/u rate four times means something happens once every minute.

Returning back to the “How many users” question, it’s usually better to answer it in terms of transactions-per-hour: “We support 100 users working at an average rate of 15 transactions-per-hour-per-user on average and 75 transactions-per-hour-per-user at peak periods.” Such statements can make Marketing glaze over, but they’re generally more accurate. Smart companies can look at that statement and say, “In our super-busy company, that’s only 50 users working at 30 transactions-per-hour-per-user” on average and so forth.

When might you see us talking in these terms with such attention to rates? We’re working on it. We want to get it right.

 

Why browser performance doesn’t matter and why maybe you shouldn’t care

Thinking (and obsessing) about software performance, Part 2

When I first blogged about performance over at jazz.net, I received feedback on one hot topic in particular, namely browser performance. There’s no doubt the “Browser Wars” are big business for the competing players. Here in the U.S. during the 2012 Summer Olympics’ televised coverage we saw one vendor heavily tout its newest browser. Of course they claimed it was faster than ever before.

Earlier this year the Rational Performance Engineering team planned to address this topic with an article showing how Rational Team Concert behaves in the major browsers. We also wanted to include details about our testing method so folks could measure their own browsers’ response time and compare with ours. The raw material lives here.

The intent was to produce a graph something like this:

why_image001
Hypothetical Response Times for the Same Transaction in Three Browsers

However, the data collected using three browsers available at that time, suggested that the battle for browser supremacy may be nearing its end. The gaps are closing, and some browsers appear to be better for different use cases.

why_image002
Six Common WebUI Use-Cases on Jazz.net (4.0.1 M3)

Further experimentation revealed that the same task executed in the same browser may not complete consistently in the same amount of time. In fact, it’s possible that this variability could blur the distinction between browsers.

The more we tested and investigated, the more we realized that generalizing about Transaction Z’s performance in Browser B was nearly pointless because everyone’s browser experience is different. A more accurate representation of the first picture would actually be:

why_image004
Hypothetical Response Times for Three Browsers for Three Users

Admittedly, these are imaginary numbers selected for dramatic effect. However, we will get to some facts later on. I’m not saying the CLM team has given up trying to optimize web application performance. A top task for the team is to investigate and deliver improvements in RTC plan loading. What I want to say is that obsessing over browser performance can be deceptive, if not fruitless, because as the major browsers’ performance becomes consistent, it appears that individual and corporate policies are starting to affect browser behavior more.

Let’s look at some things that can slow down your browser.

There have been documented reports that some hotel Wi-Fi services add JavaScript to your browser which may infiltrate and change every page you load (search for “hotel wifi javascript injection”). At the time of this writing, this alteration is presumed to be benign, and maybe less of an issue. But it’s still conceivable that another organization offering Wi-Fi might do something similar. Setting aside the security implications, this injected JavaScript can slow you down. Imagine having to touch a particular flower pot on your neighbor’s porch each and every time you enter your own house.

The hotel Wi-Fi example is a known example of what others might label spyware, which is more pervasive than any of us realize. Some corporations spy on their employees for all sorts of good and maybe not so good reasons. Some sites we visit may expose us to malware and spyware. These logging and caching systems can slow you down, not just browser performance, but your entire machine.

Staying connected with friends, family and even our employers in the Internet age may require being logged into many sites and applications such as Google and Facebook. An open session with Gmail while you’re filling out that online status report might slow you down a tiny fraction, as your browser requests are noted and tracked. Elsewhere, while you’re surfing, open sessions with Yahoo and Bing follow your trail and leave cookies for others. Why else do you suddenly see coupons for that TV you priced last week or hotel ads for places you researched flight times?

Our online crumbs create a profile which advertisers and others are keen to eavesdrop upon. Consequently, if we’re checking friends’ Facebook pages or simply surfing, our search results may be customized. Here are two Google searches run from the same machine and IP, one using Chrome’s “incognito” feature, the other from my heavily customized Firefox:

why_image005
My search results for “quest clear” in incognito Chrome

why_image006
My search results for “quest clear” in customized Firefox

They’re pretty similar except for the order of results. Firefox’s Google results correctly deduce I might be more interested in ClearQuest than World of Warcraft. What’s also interesting is that Google suggests Chrome’s search was 0.03 faster. Is it the browser that’s faster, or is it that Google needs those hundredths-of-a-second to filter its results based on what it knows about me?

Of course corporate applications handle cookies differently, if they use them at all. Showing that the same search from the same desktop can yield different results shows that the intervening browser might keep track of what you do and silently dispatch data to places that aren’t directly related to your immediate tasks. Any additional traffic whether customized or not can slow you down.

If you use an Http sniffer tool to log your http activity (for example, Fiddler, HTTP Sniffer, HTTP Scoop, or even Firebug or HttpWatch) then you might be able to note the excess traffic that comes in and out of your system. My fully-loaded browser is constantly polling for stock updates, news feeds and weather reports. Less frequent traffic alerts me to changed web pages, new antivirus definitions, fresh shared calendar entries and application updates. Those special toolbars offer to make searching slightly easier, but they are probably keeping an eye on your every move.

Indeed if you visit commercial sites with such a tool, you may see all the embedded JavaScript which tracks users firing requests to various different sites which do nothing but data collection. A modest amount of Internet traffic has to be related to tracking. Indeed some websites are slow simply because tracking scripts and data are set to load before the real human-consumable content.

Having fresh data at our behest comes at a price, perhaps small, but still measurable. If you can’t run your browser bare-bones, then we recommend having a sterile alternate browser handy. As indicated above, one of my browsers is fully loaded, the other knows very little about me and keeps no cookies, bookmarks or history.

Because browsers are customizable, and can be tampered with by sites we visit or by the services we use, some corporations take a stronger stance. Common to financial, consulting or any organization that must adhere to government or industry standards, browser and desktop settings may be completely locked down and not changeable at the desktop. A corporation may use policies in the operation system to limit the number of open connections and other TCP/IP settings, and others may put hard limits on the expiration of cookies and cached content.

Some companies route all Internet and Intranet traffic to a proxy server for optimization, security and logging. Some organizations are looking to protect customer information that passes through their hands and/or comply with EU privacy laws. We have seen situations where poor RTC performance was due to the extra trip to a corporate proxy server. Some organizations permit rerouting traffic, others don’t.

We have heard from customers who can’t follow our browser tuning recommendations because their organizations won’t let them. If this surprises you, it’s actually a fairly common practice. Years ago I worked as a consultant for a major petrochemical company. Every night corporate bots would scan desktops and uninstall, delete and reset any application or product customization that wasn’t permitted by corporate IT. Thank goodness the developers had software configuration management and checked-in their work at the end of each day.

Browsers allow us to explore the Internet and interact with applications. They can take us anywhere in cyberspace, and we can pretty much do anything we want once we get there. They are an infinitely customizable vehicle from which to explore the World Wide Web. But customization comes at an expense, and sometimes, there are settings and behaviors we cannot change. As a software development organization, we’re becoming more aware and sensitive to these concerns and are working towards adjusting our web interfaces to work better in controlled environments.

There’s one lurking topic I’ve not addressed head-on. Given how quickly new browsers hit the market, it’s very difficult for software vendors to validate promptly their products in each and every new release. Believe me, this is a headache for software vendors not easily solved.

At the desktop level, there are some things all of us can to right now to improve performance. Here’s a short list, and you’re probably already doing a few of them.

In your OS:

  • Keep your OS up-to-date and patched
  • Run and update your security software on a schedule

In your browser(s):

  • Keep your browser up-to-date and patched
  • Make sure any plug-ins or extensions are up-to-date and patched
  • Remove any plug-ins or extensions you do not use
  • Think twice before customizing your browser
  • Run as lean as possible

In your alternate, lean browser:

  • Keep your browser up-to-date and patched
  • Delete your cache after each session
  • Prevent cookies wherever possible
  • Avoid any customizations

Thanks,

Grant.

 

Thinking (and obsessing) about software performance [Reprint]

So maybe it’s cheating to include this jazz.net post, but I had meant to start a series over there, and now appear to be picking it up over here.

Here’s how the series kicks off…

When we interact with software, we require it does what we want it to and that it responds to our wishes quickly. The domain of software performance is all about understanding where the time goes when we ask software to do something. Another way to look at software performance is to think of it as a scientific attempt to identify and remove anything that might slow it down.

For more, please go to https://jazz.net/blog/index.php/2012/07/16/performance-part-1/.

Thanks,

Grant.