How often do you reboot your Servers

17 Apr

Ok here is this post all about, Server Reboots and  benefits of periodic reboots and how it improves the performance of the system overall.

So let’s start with some background, below are some of the questions or feelings which I feel you might as a performance engineer have either consciously or subconsciously faced it at some point of your career,

1. Did you ever have gut feeling (you get this feeling when you have spent years load testing applications) that application is alright in performance, yet the performance results for the tests looks bad?

2. Did you ever feel that for some reasons you are investigating issues for which there exists no sensible root cause? Yes quite often HP Diagnostics shows us the call method or box which is failing but again when we query logs; failure rates for those URL’s are very less or negligible.

3. We are spending hell lot of time on bugs/chasing application owners to fix the performance issues and application development team most often comes back and says “It works for me in less than second in Dev environment with same set of data” or “ it’s already working in production and response time are in milliseconds”.

4. Application response time is insanely having wide variance with same test setup across various days in spite of having powerful environment that matches the Production in almost all sense.

5. Did you ever feel that environment seems to be just too complicated/big to monitor and seems to be having more than 60+ known servers or boxes in all and Y number of unknown servers with 10’s of tickets implemented on the environment on the daily basis across the set of applications.(These are sometimes called hot deployments which do not require reboots,just un deploy the app war file and redeploy it again on the server with updated version yet at times, if there is badly written code, it leaves orphaned connections JDBC Connections threads or http connector threads).There always exists some application on daily basis for which either some dll is missing or is having incorrect version or has some kind of fix to be deployed.

6. Did you ever feel that lot of software’s are installed and uninstalled on daily basis in the environment and those software’s are quite often a profilers/debugger tools /Framework patches/Security Patches etc. etc.

If your answer to all of these questions is yes, then probably you are testing in the environment where servers has been up and running since more than 3 weeks or maybe a month or might have large set of boxes which has not been rebooted since last 6 months or so or simply its an environment which is managed badly with no health check been done at all.

Well its good thing that boxes are up and running for long time which tells us that these servers can provide us 99.99% availability in terms of service availability. In fact most of the server manufactures sells the servers with these SLA’s and I feel that’s precise reason as why there is no official policy or guidelines on Server Reboots .Below is the one of the comment from the person I follow and who at some point of time I believe was associated with Sun Microsystems,

There is no official policy at Sun regarding rebooting – mainly because we sell enterprise class machines. The intent of an enterprise class machine is to stay up at all cost. Our enterprise class servers have the ability to add and remove memory, IO, and CPU without a reboot. I know it is common practice to reboot “windows” based machines on a regular schedule, but this simply does NOT apply to Enterprise class Sun servers.
For sake of discussion, I will post what I think our server reboot policy should entail.
Purposed Sun Server Reboot Policy:
“Only reboot Sun servers when installing SW or HW that requires a reboot. It is not necessary to reboot servers on a regular schedule like Windows servers.”

Well looking back at his comments, he is saying reboot only when SW/HW is installed that requires the reboot. Fair enough comment given that they sell servers which needs to be up and running all the time under all circumstances. In fact I have known some Solaris boxes which were up and running for quite a long period of time, maybe a month I can recollect. Again does this mean that Solaris boxes does not require a reboot for a month even though we are installing software’s that doesn’t require reboots ?

I would say No, We still need a periodic reboots for Solaris assuming that we are doing lot of testing on those boxes. There always exists one or other issues in TCP Stack which leaves connection thread hanging(Thread abort issues) or there always exists memory leak caused by the bad code which is not released very soon(Yes sometimes we detect memory leaks and sometimes its left over, given that memory is cheap now or we just too busy or lazy) or its just some piece of software which gets uninstalled might just leave some  settings messed up.

Now moving forward let’s talk about the Windows Boxes, Windows servers are one of the best in terms of usability and UI.Everything is so easy to find and fix in windows with sexy GUI. However I have seen most admins often saying that more the windows servers we have in environment more strictly we need to adhere to periodic maintenance of those servers. Another reason as why every install or uninstall of software quite often requires a reboot is that there exists a risk that maybe some shared DLL or registry setting might be left over which might slow down the server drastically(eg Profiler traces). This is my own observation that installing or uninstalling the software from the windows boxes often leaves back orphaned traces in the system and their impact could be major or minor depending on what software we have installed or uninstalled. One can clearly see these by installing CCleaner software which has capability to clean the registry for orphaned keys.If you are installing the software on windows boxes and if the software is using any of the windows win32 dll, then without rebooting, the installed software might not work as desired given that shared DLL’s do not accept registration in case if some process is holding lock on it.Only reboots in most of the cases resolves those issues.(You can confirm this statement with process monitor).

I would say that Server reboots needs to done at periodic intervals depending on the amount of load your systems handles and its best thing to do at all times but again I suggest don’t do it just to hide problems or apply this process as band aid.It does not hurt to have some down time and take proper care of the servers.

So next time if you find someone selling enterprise software/servers and telling you that his server/software installation/uninstallation does not require reboots, then probably you need to dig deeper and ask for some solid technical proofs to back up their statements.

As a performance engineer, I will always suggest, Periodic reboots of the servers in the environment is good thing to do.

PS: Server reboots do not mean that you shut down and restart the box, it means following right process for shutting down the box by making sure that box is ready to be shut down.

Technorati Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: