Supermarket Circulars in Vegas

The IBM Rational user conference formerly known as Innovate was folded into a new IBM conference this year called InterConnect. During the last week of February customers and IBMers descended upon Las Vegas. We weren’t there to talk about connecting hi-fi components, like turntables to amplifiers (those sorts of interconnects, remember?), but about software and hardware and business and all the sorts of things IBM is into.

This was your author’s first Las Vegas experience. Way back when, Las Vegas was a thing to study and consider in the abstract, the “architecture of disruption.” Needless to say, nothing quite prepared for the thing itself. Navigating the conference hotels and weaving through the Vegas strip was like trying to play Twister on a constantly changing supermarket circular. (Yeah, I know that makes no sense, but it illustrates my point.)

This year I spoke on Performance, Monitoring, and Capacity Planning for the Rational CLM Solution. The discussion highlighted performance updates delivered to the CLM products across the last few releases, and then moved on to a topic which I’ve been talking about in various contexts for quite some time. Customers use products and they want to be happy. If they use the products, that generally means that there will be a gradually increasing population with a gradually increasing amount of assets. As usage increases, capacity can dwindle, and sometimes performance suffers as a result. To address capacity planning, you must understand current usage and behavior which requires monitoring the system at hand. In essence, ensuring a top-quality performance experience requires monitoring.

There was a detour to talk about our CLM Sizing Strategy which I’ve pointed to here before and which took a long time to document (primarily because it contains a well-researched list of caveats and details on why sizing can be so difficult), but I’m pleased how it finally came together.

Discussing product improvements and fixes is important, as it shows our commitment to releasing excellent products and when we don’t get it quite right the first time, our ability to react and improve. We’re also encouraging customers tackle performance in a proactive way, by understanding existing system behavior and noting how a complex system changes over time.

See you next year? Well, more on that fairly soon. I admit there’s a certain sense of satisfaction to having “survived” Vegas’ absurd scale, disorienting patterns, and intentional obfuscation. Yeah, I’ll try it again if asked.

Looking ahead to InterConnect 2015

InterConnect 2015, in Las Vegas, NV, is a week away. The official conference starts on Monday, February 23, and pre-meetings and whatnot commence the day before on Sunday, February 24.

I’m looking forward to being there. One reason is because I’m sitting in a Boston suburb wearing extra layers including one of my thickest wool sweaters, sheepskin slippers and a scarf. I rarely ever wear a scarf in the house, but it is so cold. Maybe you saw in the news that we’re having a little problem with snowfall this year. Colleagues comfort me by pointing out how unseasonably warm it is for them in the great states of Washington, Texas, Florida and Colorado, to name a few. They mention two-digit temperatures well above freezing which have become rarity ’round here. But no one wants to hear someone complain about seven-foot snow drifts, so let me get back on topic.

snap1670

My main-tent session this year is DRA-2104: Performance, Monitoring, and Capacity Planning for the Rational CLM Solution. It’s on Monday, Feb. 23 at 3:30 pm in the Islander Ballroom B at Mandalay Bay. Because this is the first year the formerly-known-as-Rational-Innovate-conference has become part of a larger conference and moved from Orlando, FL, to Las Vegas, NV, I will get there early. Because I don’t know where anything is. And the helpful messages from the conference management folks suggest that it takes at least 30 minutes to get from site to site.

DRA-2104: Performance, Monitoring, and Capacity Planning for the Rational CLM Solution will talk about all the awesome performance improvements that are in the CLM 5.0.2, 5.0.1 and 5.x releases. Some improvements made back in 4.x are so good, they will bear mentioning again.

I’ll also talk about our CLM Sizing Strategy and how proper monitoring of an existing system can lead towards an understanding of capacity planning. There will be time for questions, and discussions about the local weather.

My presentation is part of the larger Rational Deployment for Administrators track. If you are attending InterConnect and can login to the event portal, https://ibm.biz/BdEC4B will take you to the entire track schedule.

We made slides for ourselves to cross-promote each others sessions:

snap1671

snap1672

On Wednesday, I’ll be moderating DRA-1970A: Best Practices for Using IBM Installation Manager in the Enterprise which has a great line-up of Installation Manager practitioners who are dying to share their experiences.

It’s time for me to get outside and shovel a bit. I hope to see you in Vegas next week. Be sure to say “Hi!”

 

JTSMon 5.0.2 is here

JTSMon 5.0.2 is now available for downloading from the Deployment Wiki FAQ site (https://jazz.net/wiki/bin/view/Deployment/JTSMonFAQ).

  • Some facts to note with this new release:
    The appearance in CLM 5.0.2 of “scenario” (client-side use-case) based web service reporting will cause problems with earlier releases of JTSMon. The new version reads the new format reports accurately though it does not yet take advantage of scenario data.
  • A new 5.0.2 jazzdev baseline is included for comparison to user collected data.
  • RQM web service data is better broken down now for system component reporting.
  • Post-monitoring analysis can now be focused on a subset of the collected data.
  • Several additional defects have been fixed.

If you have questions or comments, please ask them at the jazz.net forum. We’re using the jazzmon tag there.

Field notes: It isn’t always about virtualization, except when it is

Talking recently with some customers, we discussed the fallacy of always trying to solve a new problem with the same method that found the solution of the proceeding problem. Some might recall the adage, “If you only have a hammer, every problem looks like a nail.”

hammer_1

I am often asked to help comprehend complex performance problems our customers encounter. I don’t always have the right answer, and usually by the time the problem gets to me, a lot of good folks who are trained in solving problems have spent a lot of time trying to sort things out. I can generally be counted upon for a big-picture perspective, some non-product ideas and a few other names of folks to ask once I’ve proved to be of no help.

A recent problem appeared to have no clear solution. The problem was easy to repeat and thus demonstrable. Logs didn’t look out of the ordinary. Servers didn’t appear to be under load. Yet transactions that should be fast, say well under 10 seconds, were taking on the order of minutes. Some hands-on testing had determined that slowness decreased proportionally with the number of users attempting to do work (a single user executing a task took 30-60 seconds, two users at the same time took 1 to 90 seconds, three users took 2-3 minutes, etc.).

So I asked whether the environment used virtualized infrastructure, and if so, could we take a peek at the settings.

Yes, the environment was virtualized. No, they hadn’t looked into that yet. But yes, they would. It would take a day or too to reach the folks who could answer those questions and explain to them why we were asking.

But we never did get to ask them those questions. Their virtualization folks took a peek at the environment and discovered that the entire configuration of five servers was sharing the processing power customarily allocated to a single server. All five servers were sharing 4 GHz of processing power. They increased the resource pool to 60 GHz and the problem evaporated.

I can’t take credit for solving that one. It was simply a matter of time before someone else would have stepped back and asked the same questions. However, I did write it up for the deployment wiki. And I got to mention it here.

Yes, we renamed JazzMon

The tool formerly known as JazzMon has been renamed to JTSMon.

Given the tool’s popularity and wide-spread use, it made sense to align it more closely in name and version numbering with the CLM product family. Version 4.0 follows version 1.4.0 which has the new Excel macro visualizer to help make sense of the data which JTSMon can capture.

The JTSMon FAQ offers lots to help you get started. There is a 32-page user manual, and if you’re really impatient, a single-page QuickStart sheet. A downloadable .zip has all the moving parts, and release notes are published separately. There’s also a 10-minute video demo (Ok, so we didn’t re-record it when we changed the name), and a shorter one.

The older 1.4.0 version is still available. It works with RTC 2.x whereas the new 4.0 version works only with RTC 3.x and 4.x.

If you have questions or comments, please ask them at the jazz.net forum. We’re still using the jazzmon tag there.

 

Be even smarter with virtualization

It took a bit of unplanned procrastination, but we finally got to the second and third parts of our in-depth investigation of virtualization as it relates to IBM Rational products.

casestudy3snip

Part two is now published here as Be smart with virtualization: Part 2. Best practices with IBM Rational Software.

Part three lives on the deployment wiki here as Troubleshooting problems in virtualized environments.

Part two presents two further case studies and a recap of the principles explored in Part 1. We took a stab at presenting the tradeoffs between different virtualization configurations. Virtualization is becoming more prevalent because it’s a powerful way to manage resources and squeeze efficiencies from hardware. Of course there are balances and Part two goes a bit deeper.

Part three moves to the deployment wiki and offers some specific situations we’ve solved in our labs and with customers. There are also screen shots of one of the main vendor’s tools which can guide you to identify your own settings.

Our changing audience

Really, who are you?
Not yet introducing the new jazz deployment wiki

In our corner of the world, some of us on the Jazz Jumpstart universe are wondering who will spill the beans and mention the new jazz Deployment wiki first. I don’t think it will be me.

We’re all working on a new way for the Jazz ecosystem to present information, specifically deployment information. Not just “Insert tab A into slot B” types of material, but the more opinionated, specific stuff you’ve told us you want to hear. We have folks working on Monitoring, Integrating, Install and Upgrade, and other deployment topics. I own the Performance Troubleshooting section.

When the actual wiki rolls out (and I can actually talk about it), I’ll talk about some of the structure and design questions we wrestled with. For now I want to talk about one of the reasons why we’re presenting information differently, and that’s because we think our audience has changed.

IT used to be simple

Ok, so may IT was never actually that simple, but it was certainly a lot easier to figure out what to do. One of IBM Rational’s strengths is that we’ve built strong relationships with our customers over the years. Personally, a lot of the customers I know (and who I think know me) started out as ClearCase or ClearQuest admins and over time have evolved now to Jazz/CLM admins. Back when, there was pretty much a direct relationship with our product admins, who in turn knew their end users and had ownership of their hardware environments.

This picture describes what I’m talking about (they’re from a slide deck we built in 2011 to talk about virtualization some of which lives on elsewhere, but these pics are too good to abandon):

aud_olditmod

The relationship between Rational Support / Development and our customers remains strong and direct. Over the years it’s the context of our customers product administrators that has shifted in many cases:

aud_newsysmod

Consolidation, regulation, governance, compliance, etc., have all created additional IT domains which are often outside the customers’ product administration. There are cases where our relationship with our customers’ product administrators remains strong but we’ve lost sight of their context.

Here’s another way to look at the old model, this is specifically around hardware ownership:

aud_oldmod2

Back in the day, our customers’ product administrators would request hardware, say a Solaris box (yes, I am talking about many years ago…), the hardware would arrive and the Rational product admin would get root privileges and start the installation. Nowadays, the hardware might be a VM, and there might be all sorts of settings which the admin can’t control such as security, database, or as is pertinent to this example, VMs.

aud_newmod2

 

This is a long winded way to say that we’re well aware we have multiple audiences, and need to remember that product administrators and IT administrators may no longer be the same people. Loving a product and managing how it’s used isn’t quite the same as it used to be. We’re trying to get better at getting useful information out there which is one of the reasons for the new deployment wiki.

 

Virtualization demystified

Read about Rational’s perspective on virtualization over at IBM developerWorks

For the IBM Innovate 2011 conference, the Rational Performance Engineering team presented some of its research on virtualization. We had an accompanying slide deck too, and called it the Rational Virtualization Handbook.

It’s taken a bit of time, but we have finally fleshed out the slides and written a proper article.

Actually, the article has stretched into two parts, the first of which lives at Be smart with virtualization. Part 1. Best practices with IBM Rational software. Part 2 is in progress and will contain further examples and some troubleshooting suggestions. I can’t say for sure, but we have a topic lined up which would make a third part, but there’s a lot of work ahead.

I’m tempted to repost large excerpts because I’m proud of the work the team did. And it took a bit longer than expected to convert slideware into a real article, and so the article took a lot of work. I won’t give away the secrets here…. You’ll have to check out IBM developerWorks yourself. However, let me kickstart things with a history sidebar:

A brief history of virtualization

Despite its emergence as a compelling, necessary technology in the past few years, server virtualization has actually been around for quite some time. In the 1970s, IBM introduced hypervisor technology in the System z and System i® product lines. Logical partitions (LPARs) became possible on System p® in 2000. The advent of virtual machines on System x and Intel-based x86 hardware was possible as early as 1999. In just the last few years, virtualization has become essential and inevitable in Microsoft Windows and Linux environments.

What products are supported in virtualized environments?

Very often we explain that asking whether a particular Rational product is supported with virtualization isn’t actually the right question. Yes, we’ve run on Power hardware and lpars for several years now. Admittedly KVM and VMware are newer to the scene. Some may recall how clock drift could really mess things up, but those problems seem to be behind us.

The question isn’t whether Rational products are supported on a particular flavor of virtualization: If we support a particular Windows OS or Linux OS, then we support that OS whether it’s physical or virtualized.

Virtualization is everywhere

Starting in 2010 at Innovate and other events, we routinely asked folks in the audience whether they were aware their organizations were using virtualization (the platform didn’t matter). In 2010 and 2011 we got a few hands, maybe two or three in a room of 20. Folks were asking us if virtualization was supported. Was it safe? Could they use it? What were our suggestions?

Two years later, in 2012, that ratio was reversed: Nearly every hand in the audience shot up. We got knowing looks from folks who had disaster stories of badly managed VMs. There were quite a few people who had figured out how to manage virtualization successfully. There were questions from folks looking for evidence and our suggestions to take back to their IT folks.

Well, finally, we have something a bit more detailed in print.

 

Field notes: Measuring capacity, users and rates

Picture2

Small, Medium and Large

A frequent question concerns how we might characterize a company’s CLM deployment. Small, medium and large are great for tee-shirts, but not for enterprise software deployments. I admit we tried to use such sizing buckets at one point, but everyone said they were extra-large.

Sometime we characterize a deployment’s size by the number of users that might interact with it. We try to distinguish between named and concurrent users.

Named users are all the folks permitted use a product. These are registered users or users with licenses. This might be everyone in a division or all the folks on a project.

Concurrent users are the folks who actually are logged in and working at any given time. These are the folks who aren’t on vacation, but actually doing work like modifying requirements, checking in code, or toying with a report. Concurrent users are a subset of named users.

Generally we’ve seen the concurrent number of users hover around 15 to 25% of the named users. The percentage is closer to 15% in a global company whose users span time zones, and closer to 25% in a company where everyone is in one time zone.

As important is it is to know how many users might interact with a system, user numbers aren’t always an accurate way to measure a system’s capacity over time. “My deployment supports 3000 users” feels like a useful characterization, but it can be misleading because no two users are the same.

Because it can lead to simple answers, I cringe whenever someone asks me of a particular application or deployment, “How many users does it support?” I know there’s often no easy way to characterize systems, and confess I often ask countless questions and end up providing simple numbers. (I’ve tried answering “It supports one user at a time,” but that didn’t go over so well.)

Magic Numbers

How many users does it support” is a Magic Number Question, encouraged by Product Managers and abetted by Marketing, because it’s simple and easy to print on the side of the box. Magic Numbers are especially frequent in web-applications and website development. I’ve been on death marches mature software development projects where the goal was simply to hit a Magic Number so we could beat a competing product’s support statement and affix gold starbursts to the packaging proclaiming “We support 500 users.” (It didn’t matter if performance sucked, it was only important to cram all those sardines into the can.)

Let’s look at the Magic Number’s flaws. No two users do the same amount of work. The “500 users” is a misleading aggregate as one company may have 500 lazy employees where another might have caffeinated power-users running automated scripts and batch jobs. And even within the lazy company, no two users work at the same rate.

Ideally we’d use a unit of measure to describe work. A product might complete “500 Romitellis” in an hour, and we could translate end-user actions to this unit of measure. Creating a new defect might create 1 Romitelli, executing a small query might be 3 Romitellis, a large query could be 10 Romitellis. But even this model is flawed as one user might enter a defect with the least amount of data, whereas another user might offer a novel-length description. The difference in data size might trigger extra server work. This method also doesn’t account for varied business logic which might require more database activity for one project and less for another.

Rates

Just as a set number of users is interesting but insufficient, a simple counter to denote usage doesn’t helpfully describe a system’s capacity. We need a combination of the two, or a rate. Rate is a two-part measurement describing how much of an action occurred in how much time. But as essential as rates are, it’s important to realize how rates can be misunderstood.

Consider this statement:

“I drank six glasses of wine.”

On its own, that may not seem so remarkable. But you may get the wrong impression of me. Suppose I say:

“I drank six glasses of wine in one night.

Now you might have reason to worry. My wife would definitely be upset. And suppose I say:

“I drank six glasses of wine in the month of August.”

That would give a completely different impression. This is where rate comes into play. The first rate is 6-per-day the second is 6-per-month. The first relative to the second is roughly 30 times greater. One rate is more likely lead to disease and family disharmony than the other.

Let’s move this discussion to product sizing. Consider this statement:

“The bank completes 200 transactions.”

It’s just as unhelpful as the side-of-the-box legend, “The bank supports 200 users.” For this statement to be valuable it needs to be stated in a rate:

“The bank completes 200 transactions in a day.

This seems reasonable at first glance, as it suggests a certain level of capability. But now we can offer another:

“The bank completes 200 transactions in a second.

And now the first rate comes into perspective. Realize that rates can have some degree of meaningful variance. When a system is usable and working, it rarely functions at a consistent rate. Daily peaks and valleys are normal. Some months are busier than others. We may have an average rate, but we probably also need to specify an extreme rate:

“The bank completes 200 transactions in an hour on payday just before closing.

Or even a less-than-average rate:

“The bank completes 20 transactions in an hour on a rainy Tuesday morning in August.”

Imagine we’re testing an Internet website designed to sell a popular gift. The transaction rate in the middle of May should be different than the transaction rate in the weeks between Thanksgiving and Christmas. Therefore we should try to articulate capacity at both average transaction rate and peak transaction rate.

TPH

There’s no perfect solution for measuring user capacity. What we like to do is to describe work in units of transactions-per-hour-per-user. This somewhat gets beyond differences in users by characterizing work in terms of transactions, and also averages different types of users and data sizes to create a basic unit. Of course we could discuss all day what a transaction means. For now, take it to mean a self-contained action in a product (login, create an artifact, run a query, etc.), For much of our tests, we determined an average rate was 15 transactions-per-hour-per-user. (I abbreviate it 15 tph/u.) A peak rate was three times as much or 45 tph/u.

15 tph/u may not seem like very much activity. It’s approximately one transaction every four minutes. But we looked at customer logs, we looked at our logs, and this is the number we came up with as an average transaction rate for some of our products. Imagine that within an hour you may complete 15 transactions and practically you may complete more at the top of the hour, get distracted and complete fewer as the hour completes. Also, 15 is a convenient number for multiplying and relating to an hour. (Thank the Babylonians for putting 60 minutes into an hour.) Increasing the 15 tph/u rate four times means something happens once every minute.

Returning back to the “How many users” question, it’s usually better to answer it in terms of transactions-per-hour: “We support 100 users working at an average rate of 15 transactions-per-hour-per-user on average and 75 transactions-per-hour-per-user at peak periods.” Such statements can make Marketing glaze over, but they’re generally more accurate. Smart companies can look at that statement and say, “In our super-busy company, that’s only 50 users working at 30 transactions-per-hour-per-user” on average and so forth.

When might you see us talking in these terms with such attention to rates? We’re working on it. We want to get it right.