Performance,Dirty Reads ,Lost Updates and Locks

25 May

Recently I came across a very interesting issue, which I feel I should write about it. During load testing of web based application, I came across an issue where in response time for couple of the business process was exceeding a lot compare to SLA’s given to us. Test was almost clean with less than 0.1% error with no failed transactions in it. Server resources were also looking good, no abnormal behavior of JVM/CPU/Memory etc. Still the transaction response time was very high for the couple of business processes.

Let’s call those business processes for the sake of this post as A, B, C. Business process A does adding (Insert) a new record to the database, Business B does editing the record in the same table (We call edit as Update in DB Language) and Process C does downloading the latest information which includes both Process A and Process B. We ran these 3 business processes for about an hour with 1 user each. All the data sets used in load testing for these 3 business processes were unique. These business processes were fairly simple to script and web navigation was also fairly easy. Our job was to test these business processes with 1 user each so as to achieve close to 100 transactions for each process in an hour. Pretty easy task and we ran lot many tests with these processes and in all these tests response time was almost same with very little variance. The results were like puzzle. Thankfully we had HP Dia installed in that environment and we had some instrumentation done for this application. So I thought let’s see what application threads are doing, while I am running the test for this application.HP Diagnostics has a very good feature of showing you live thread count along with thread trace as what each threads are doing at that point of time. So I took the thread dump of the all threads running in this JVM. Believe me, taking thread dump with HP Diagnostic is as easy as clicking the link.

Vow, for the first 10 mins, most of the thread were runnable state, and as slowly users ramped up, then few of the threads went into the waiting state and then few were oscillating between runnable and waiting state. Another 10 mins in the test run, I could see lot many threads in waiting state. Now after seeing threads in waiting state I can understood as why I was seeing a very high response time for the most simple business process which involves nothing but simple insert and update operation. Response time was high because threads were waiting for something and that something was nothing but DB Execute update calls.

Now next puzzle for me was why on earth these threads are waiting at DB Execute calls and that too for cool 10 secs and with so less load (I had close to 17 scripts with one user each per script). Now given the limited access we had on database, and after discussing my findings with application development team, we decided to engage DBA to assist us in finding out the root cause of this wait. At this point I must appreciate the honestly of the application developer who agreed to this idea of engaging the DBA. There are very rare cases where developers agree with performance engineers and that’s reason I must thank this guy.

So we engaged the DBA and ran couple of tests and he collected the database stats during the test. He did some analysis and identified couple of stored procedure and dynamic queries which was in need of tuning. In addition to this he came back and said that response time for these 3 business process were high because, row level locking was getting escalated to table level lock after certain duration of time and again after certain time, some transactions were getting dirty read and still after couple of minutes, table gets locked and does not allow any insert or update. Fair enough analysis.

Hmm I was happy that finally we know as why we were seeing high response time. Our next step was discussing this with key stakeholders about our findings. Lot many things were discussed in the meeting, but the key question which I liked in an hour’s meeting was that of DBA asking “Can this application tolerate the dirty reads “. There was long pause for some time in the meeting and finally after some time, this developer guy said this application cannot tolerate the dirty reads. Hmm there was pause again for about 30 secs, after another 20 secs, another voice said, these 3 transactions are executed less than 100 times in an hour and there exists absolutely very little chance that we see this concurrency in Production. There was again some silence and later on everyone moved on to next topic.

Now the reason as why I feel this is interesting case and deserves some writing on this topic is that we have dirty reads/lost updates/phantom reads that can be found out only by proper load testing and there exists a very high probability that if we ignore this we are sending out incorrect information to the application users in production under load and this incorrect information at times could be used by those users to sue the company back. These are typical cases where data integrity issues takes precedence to performance.So if you cannot redesign the queries, than I would suggest you to sacrifice the performance.These types of issues should become high priority fix at least in cases where we are dealing with financial applications. I also feel that at times these are critical bugs given that we have table level lock and this lock impacts all the users if one user locks the table for any reason.

Maybe later on I will write another short post on dirty read/lost updates. These are interesting cases where only proper load testing reproduces the issue and strong foundational skills of performance engineering helps to identify the root cause of the issue. .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: