Archive | February, 2012

Agile Succeeds Three Times More Often Than Waterfall

26 Feb

Agile projects are successful three times more often than non-agile projects, according to the 2011 CHAOS Manifesto from the Standish Group.

The report goes so far as to say, “The agile process is the universal remedy for software development project failure. Software applications developed through the agile process have three times the success rate of the traditional waterfall method and a much lower percentage of time and cost overruns.” (page 25)

Well if you know little bit about program or project management, you will feel and understand that this is big , in fact a very big statement with far reaching consequences and results in IT Project management world. I know lot of companies who are in management consulting and providing governance services often use such research reports to build a case.I can feel the change that is coming,that is going forward companies and lot of people are going to prefer agile management techniques for building software rather than traditional management techniques.

I also know lot many agile experts  shy away from quoting the Standish report for its accuracy or validity, but personally I feel these reports has some market and that’s the reason they are publishing it and also its hard to ignore someone who is in this business since 18 +years.

I feel this research’s findings is also one of positive development that has come to agile world after PMI started agile programs last year. Days are going to be very busy for agile experts going forward.Smile

Hope now ,IT industry will now do away with its command and control structure/Carrot and Stick policies and people will work with more accountability and transparency.

Technorati Tags: ,
Advertisements

JDK– Libraries

20 Feb

 

While browsing the Netbean site, I came across the below picture, which neatly shows the various libraries that are part of Java Development Kit,so thought let me have this in my blog.Having more information about JDK sometimes help in understanding various aspects of technologies involved and sometimes aids in doing research in the right direction.

jdk-diagram

Importance of IIS HTTP ERR logs in Performance Engineering

18 Feb

One among many skills required to be good Performance Engineer is to know, understand, interpret and analyze the logs of the backend servers and trace back those events to the load test executed during that known time period. Many times it so happens that in spite of having perfect script, perfect workload model, response times of the tests do not meet the service level agreements or they are just beyond the range as expected by the business owners or application developers. There could be many reasons for this, but to pin point the exact cause and come up with conclusive data, it becomes extremely necessary that we understand importance of logging and analyzing the server logs for the backend servers. So in this post I would be focusing my effort in explaining the importance of HTTP ERR logs which is the part of IIS logging.

Since IIS handles applications which are HTTP based, errors originating from the HTTP API of the IIS are logged in HTTP ERR logs. The most common types of errors logged here are network drop connections, connection dropped by the application pool, Service Unavailable errors, bad client request etc. Best part of this log is that it gives you the precise information as which URL or application pool is responsible for incorrect behavior, something which is very valuable in case if you have 10s or 100’s of sites sitting on the same IIS box or sharing the same application pool. All these information one can easily correlate with load test results to build a strong case that the application under test has some issues which needs some attention from the concerned team and may possibly have performance impact.

HttpErr logs comes with standard fields as shown below.It has all the information that requires for troubleshooting any fatal issue that occurs during performance testing. One thing I would like to mention here is that in case if you have IIS server sitting behind the  load balancers, then it might happen that the IP what you see in these logs might belong to the load balancer rather than end client users.Again this could be due to your infrastructure/environment design.

Field

Description

Date

The Date field follows the W3C format. This field is based on Coordinated Universal Time (UTC). The Date field is always ten characters in the form of YYYY-MM-DD. For example, May 1, 2003 is expressed as 2003-05-01.

Time

The Time field follows the W3C format. This field is based on UTC. The time field is always eight characters in the form of MM: HH: SS. For example, 5:30 PM (UTC) is expressed as 17:30:00.

Client IP Address

The IP address of the affected client. The value in this field can be either an IPv4 address or an IPv6 address. If the client IP address is an IPv6 address, the ScopeId field is also included in the address.

Client Port

The port number for the affected client.

Server IP Address

The IP address of the affected server. The value in this field can be either an IPv4 address or an IPv6 address. If the server IP address is an IPv6 address, the ScopeId field is also included in the address.

Server Port

The port number of the affected server.

Protocol Version

The version of the protocol that is being used.
If the connection has not been parsed sufficiently to determine the protocol version, a hyphen (0x002D) is used as a placeholder for the empty field.
If either the major version number or the minor version number that is parsed is greater than or equal to 10, the version is logged as HTTP/?

Verb

The verb state that the last request that is parsed passes. Unknown verbs are included, but any verb that is more than 255 bytes is truncated to this length. If a verb is not available, a hyphen (0x002D) is used as a placeholder for the empty field.

CookedURL + Query

The URL and any query that is associated with it are logged as one field that is separated by a question mark (0x3F). This field is truncated at its length limit of 4096 bytes.
If this URL has been parsed (“cooked”), it is logged with local code page conversion, and is treated as a Unicode field.
If this URL has not been parsed (“cooked”) at the time of logging, it is copied exactly, without any Unicode conversion.
If the HTTP API cannot parse this URL, a hyphen (0x002D) is used as a placeholder for the empty field.

Protocol Status

The protocol status cannot be greater than 999.
If the protocol status of the response to a request is available, it is logged in this field.
If the protocol status is not available, a hyphen (0x002D) is used as a placeholder for the empty field.

Site ID

Not used in this version of the HTTP API. A placeholder hyphen (0x002D) always appears in this field.

Reason Phrase

This field contains a string that identifies the type of error that is being logged. This field is never left empty.

Queue Name

This the request queue name.

Another good thing about HTTP ERR logs is that they can be queried exactly the way we query IIS Access logs by using log parser.

Below is one such example query, we can use to query HTTP ERR logs is as below,

LogParser‘SELECT TO_STRING(date,'YYYY-MM-DD'),TO_STRING
(time,'hh:mm:ss'),c-ip,c-port,s-ip,s-port,cs-version,cs-method,
cs-uri,sc-status,s-siteid,s-reason,s-queuename
 From \\YOUR PATH TO HTTP ERROR LOG’

Most often the HTTPERR logs are written in SYSTEMROOT\SYSTEM32\Log Files\HTTPERR Folder of the IIS box, however since this location is configurable  so at times we might need to confirm the exact location with Server administrators.Also in some rare cases , these logs might also have been disabled due to various reasons, so in those cases we might need to check this with Server administrators and get those enabled though I feel no one really disables it for any reason.

Once you query these logs with log parser you should be able to get the details as shown below.

Httperror1

As you can see the wealth of the information lies in the s-reason codes which helps us in identifying the issues with the application and aids a lot in troubleshooting some of the hardest performance bottlenecks.

Below is the Microsoft definition of the reason codes that comes as part of HTTP ERR logs,

AppOffline: A service unavailable error occurred (an HTTP error 503). The service is not available because application errors caused the application to be taken offline.

AppPoolTimer: A service unavailable error occurred (an HTTP error 503). The service is not available because the application pool process is too busy to handle the request.

AppShutdown : A service unavailable error occurred (an HTTP error 503). The service is not available because the application shut down automatically in response to administrator policy.

Bad Request : A parse error occurred while processing a request.

Client_Reset : The connection between the client and the server was closed before the request could be assigned to a worker process. The most common cause of this behavior is that the client prematurely closes its connection to the server.

Connection_Abandoned_By_AppPool : A worker process from the application pool has quit unexpectedly or orphaned a pending request by closing its handle.

Connection_Abandoned_By_ReqQueue : A worker process from the application pool has quit unexpectedly or orphaned a pending request by closing its handle. Specific to Windows Vista and Windows Server 2008.

Connection_Dropped : The connection between the client and the server was closed before the server could send its final response packet. The most common cause of this behavior is that the client prematurely closes its connection to the server.

ConnLimit : A service unavailable error occurred (an HTTP error 503). The service is not available because the site level connection limit has been reached or exceeded.

Connections_Refused : The kernel NonPagedPool memory has dropped below 20MB and http.sys has stopped receiving new connections.

Disabled : A service unavailable error occurred (an HTTP error 503). The service is not available because an administrator has taken the application offline.

EntityTooLarge : An entity exceeded the maximum size that is permitted.

FieldLength : A field length limit was exceeded.

Forbidden : A forbidden element or sequence was encountered while parsing.

Header : A parse error occurred in a header.

Hostname : A parse error occurred while processing a Hostname.

Internal : An internal server error occurred (an HTTP error 500).

Invalid_CR/LF : An illegal carriage return or line feed occurred.

N/A : A service unavailable error occurred (an HTTP error 503). The service is not available because an internal error (such as a memory allocation failure) occurred.

N/I : A not-implemented error occurred (an HTTP error 501), or a service unavailable error occurred (an HTTP error 503) because of an unknown transfer encoding.

Number : A parse error occurred while processing a number.

Precondition : A required precondition was missing.

QueueFull : A service unavailable error occurred (an HTTP error 503). The service is not available because the application request queue is full.

RequestLength : A request length limit was exceeded.

Timer_AppPool:The connection expired because a request waited too long in an application pool queue for a server application to de queue and process it. This timeout duration is ConnectionTimeout. By default, this value is set to two minutes.

Timer_Connection Idle :The connection expired and remains idle. The default ConnectionTimeout duration is two minutes.

Timer_EntityBody :The connection expired before the request entity body arrived. When it is clear that a request has an entity body, the HTTP API turns on the Timer_EntityBody timer. Initially, the limit of this timer is set to the ConnectionTimeout value (typically 2 minutes). Each time another data indication is received on this request, the HTTP API resets the timer to give the connection two more minutes (or whatever is specified in ConnectionTimeout).

Timer_HeaderWait :The connection expired because the header parsing for a request took more time than the default limit of two minutes.

Timer_MinBytesPerSecond :The connection expired because the client was not receiving a response at a reasonable speed. The response send rate was slower than the default of 240 bytes/sec. This can be controlled with the MinFileBytesPerSec metabase property.

Timer_ReqQueue : The connection expired because a request waited too long in an application pool queue for a server application to dequeue. This timeout duration is ConnectionTimeout. By default, this value is set to two minutes. Specific to Windows Vista and Windows Server 2008.

Timer_Response : Reserved. Not currently used.

URL : A parse error occurred while processing a URL.

URL_Length : A URL exceeded the maximum permitted size.

Verb : A parse error occurred while processing a verb.

Version_N/S : A version-not-supported error occurred (an HTTP error 505).

So next time if you are load testing any .Net Web Application installed on IIS Web Server, make sure that you ask for access to IIS HTTP ERR logs as well in addition to the IIS Access logs.It will help you identifying issues in the fraction of seconds compared to any other methods used quite often.

IE 9 Browser Network Capture Tool – Not a reliable Network Capture Tool to measure Latency.

14 Feb

Recently one of the business executive from Catchpoint posted the link in Performance Specialist list in LinkedIn about the potential issue with IE Browser network capture tools where in network capture tool in IE was showing high response time for one of resource when in fact the response time for that resource was in fraction of milliseconds.I found this case very interesting given that given that I have seen lot many performance Engineers and Users using these tools to identify the performance issues. Catch point had made its point very clear, that there are certain cases where in the browser network capture tools do not show us the correct network capture time and the time shown by these tools are impacted by elements of the page.

Given that they had also provided with the test case as how to reproduce this issue,I thought that let me try and see if I can reproduce the issue, so I pulled out their script and loaded it in my Dreamweaver to check this.I was curious to know and understand as what factors of the page were responsible for this abnormal behavior.

I made some changes to the script and pointed the IFrame source to my local Test PHP page hosted on my local Apache server.The point of making this change was to ensure that there is no actual network issues impacting the test case.In addition to this I changed the loop frequency in order to check at what level of DOM manipulation will this issue surface and has major impact on the network capture feature of the browsers.

addDomLoadEvent(function() {
   var title = document.getElementById('title');
       var span;
      for(var i = 0; i <= 1000; i++) {
        span = document.createElement('span');
        span.innerHTML = i;
        document.getElementById('ctn').appendChild(span);
        }

The above piece of code is taken from Catch Point’s test case and only changes I made was to change value of the loop i.e, I ensure that its creating 10,20,50,1000,5000 and 10000 span elements on the fly and they are aligned inline without styles applied to it.After making changes, I ensured that I am taking the network capture with browser’s developers tools for IE9 ,Firefox 9,Chrome 15, and Safari 5.1.

For Creating 10 Span elements, response time reported by IE Capture tools is 32ms.

IE 10

For Creating 20 Span elements, response time reported by IE Capture tools is 31ms.

IE 20

For Creating 100 Span elements, response time reported by IE Capture tools is 46ms.

IE 100

For Creating 1000 Span elements,response time reported by IE Capture tools is 250ms.

IE 1000

For Creating 5000 Span elements,response time reported by IE Capture tools is 820ms.

IE 5000

For Creating 10000 Span elements,response time reported by IE Capture tools is 1.66 sec.

IE 10000

As you see, more the span elements we create ,more the Iframe’s response time. Since the test case was executed in the local environment with absolutely nothing in network to download, we would have expected the response time of IFrame to be same in all cases.Unfortunately it was not.So I think its fair to conclude that IE 9 network capture tool cannot be used to measure the response time at least for sure iframe elements and possibly for those elements which generate the network requests in the heavily DOM operated environments.

I would like to also add that this test case is not about writing perfect javascript but more of knowing and understanding whether Browser’s capture tools are in fact independent of its rendering and parsing architecture.Finally with this test case we can conclude that at least in IE 9, browser’s network capture tool do depend on its rendering and parsing architecture which at times might give users a incorrect picture of network latency.

Hope some one from Microsoft take a note of this behavior and fix this behavior.

In addition to this there are also many other interesting findings which I came across, maybe sometimes later I will write a note on that in another post.

Finally thanks to Catchpoint for highlighting this issue.

Technorati Tags: ,,

Siteminder and Some Performance Issues

11 Feb

Sometimes back I had written about some interesting issues encountered during the load testing application integrated with CA Siteminder. Load testing application integrated with Siteminder has its own challenges and limitations with regard to environment and scope of testing. Normally I have observed that most projects do not test Siteminder integration/approach/solutions as an independent component, but rather they club it with applications and prefer to test this with End to End approach as most people believe that it saves some time or they believe that it’s of little importance. I would say its incorrect approach or in cases a incomplete approach, if you have Siteminder integrated with your applications and you have 10’s of application with 1000’s of users, and then probably with performance testing Siteminder integration individually as a separate project might give you a lot of mileage in long run. Siteminder is a complex product and it does deserve performance testing individually given that it handles some of the critical functions of the business namely Access Management.

The purpose of my writing this post was to highlight one of the interesting bottlenecks we had encountered/investigated and fixed during Siteminder load testing for one of the projects. During load testing one of the applications we were constantly getting 401 errors and nature of the 401 errors were such that it was redirecting itself N number of times and in some cases going into infinite redirection loops. Reproducing this kind of error via browser is a harder job as browser often hides the redirection events and you just see progress activity bar in the browser status bar. With HTTP Debugging tool you can clearly see the redirection calls happening behind the scene, but again I believe browsers don’t really redirect exactly N number of times. So in browsers I have seen max redirection depth of 40 to 50 per tab, after that it use to stay still, may be browsers stops further redirections events. The redirection events looked like below in fiddler trace.

SM_Issue_001

In above image,we can clearly see that siteminder agent installed on the server is redirecting the request URL N number of times.Now during Load testing the issue was looking like below,

SM_Issue_002

We had number of calls with CA team and after we gave them this precise description of the  issue along with our vuser logs and fiddler traces we were seeing during our tests, they came back with below finding and fix,

One out-of four policy server had a bad value for the encryption key; this resulted in request sent to the bad Policy Serve to fail with “invalid key in use” – Looping observed in the fiddler traces for request failing to authorize.

Once we fixed in the encryption key in the sm.registry file all request processed as expected – no more looping (re-authentication process).

I am writing the exact text as given by the CA R &D team in case if any one faces this issue,they can try out this one as well.However please ensure that you see the same behavior as we have seen it and ensure that you do not have any issues with SM cookie collectors and they are doing their job fine.In addition to this make sure your users do not have access issues.

Know your user-Agent String

3 Feb

Sometimes during performance testing it becomes necessary our script send the correct user-agent string to the server so that server identifies and responds correctly to the script.Most of the load testing tools available in market today provide a way to customize the user-agent string. So in case if you want to know and customize the user-agent string, then below JavaScript function can help you in getting this information. All you need to do is copy and paste the below piece of code in your browser and press enter key. Modal window box will be displayed showing you the user agent string of that browser.

javascript: alert (navigator.userAgent);

In fact with this function, you can get lot of information like version of framework involved, type of platforms, OS version etc. etc. Most browsers today send around at least 2 to 3 lines of information as part of user agent string and they send it voluntarily and send it for every request they serve.

Sometimes I really think is it necessary and correct for browsers to attach unnecessary information on every request they serve? Isn’t there any other better way to track the browser’s usage/OS Penetration rather than choking the network with extra characters that very few people care about?

Technorati Tags: ,

TypePerf–Good substitute for PerfMon.

1 Feb

TypePerf is the command line tool which writes the performance data to the log file or to the console window .I feel this tool can be useful to performance engineers who wants to monitor few stats and to those who wants to maintain a some distance from PerfMon. This tool basically queries the same performance objects as PerfMon does, so data retrieved by this tool is exactly the same as retrieved by PerfMon. I feel this tool is easy to use compare to PerfMon for certain situations where in we just need a very few counters to monitor. We can also automate monitoring using this tool exactly the way we do with PerfMon. In order to start using this tool, one needs to go to CMD Prompt and then type typeperf.

C:\>typeperf /?

Microsoft r TypePerf.exe (6.1.7600.16385)

Typeperf writes performance data to the command window or to a log file. To stop Typeperf, press CTRL+C.

Usage:

typeperf { <counter [counter …]> | -cf <filename> | -q [object] | -qx [object] } [options]

Parameters:

<counter [counter …]> Performance counters to monitor.

Options:

-? Displays context sensitive help.

-f <CSV|TSV|BIN|SQL> Output file format. Default is CSV.

-cf <filename>            File containing performance counters to monitor, one per line.

-si <[[hh:]mm:]ss>      Time between samples. Default is 1 second.

-o <filename>             Path of output file or SQL database. Default is STDOUT.

-q [object]                   List installed counters (no instances). To

                                   list counters for one object, include the

                                   object name, such as Processor.

-qx [object]                   List installed counters with instances. To

                                     list counters for one object, include the

                                     object name, such as Processor.

-sc <samples>              Number of samples to collect. Default is to

                                     sample until CTRL+C.

-config <filename>       Settings file containing command options.

-s <computer_name>   Server to monitor if no server is specified

                                     in the counter path.

-y                                  Answer yes to all questions without prompting.

Note:

Counter is the full name of a performance counter in

“\\<Computer>\<Object>(<Instance>)\<Counter>” format,

such as \\Server1\Processor(0)\% User Time.

Below are the some sample queries I ran successfully to retrieve performance stats on  my windows 7 box,

C:\>typeperf “\Processor(_Total)\% Processor Time”

C:\>typeperf “\Memory\Available bytes” “\processor(_total)\% processor time”

C:\>typeperf “\Processor(*)\% Processor Time”

Finally in case if you need more information about this command,I suggest you to visit Microsoft site and do a quick search for typeperf.

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/nt_command_typeperf.mspx?mfr=true

I feel this tool is supported by all version of windows operating system starting from Windows XP to Windows 7.

%d bloggers like this: