reCaptcha’s,Simple Captcha,Javascript Captcha’s and Testing/Performance Engineering

10 May

I know for sure that most of us browsing the internet has at some point of time  come across captcha images and validation.Captcha Images validation techniques are  used with HTML forms in  web sites to prevent automated spams.I had recently implemented one functionality which was using Google reCaptcha technique to prevent automated spams.I had used the Google reCaptcha Java/Jsp plugin for my functionality.While implementing this reCaptcha functionality I came across interesting observations which I feel can help Performance Engineering community since I believe there exists some knowledge gaps in the way performance engineers understands captchas.The reason why I say this is that I have come across few folks misunderstanding and  asking me as how to capture or correlate captchas or at times I have seen people claiming to have defeated the captchas during their load testing assignments , of course without disabling the captchas validation.

Google recaptcha is more of  an web service call to the Google servers and uses some interesting techniques in validating the users input.So in Google reCaptcha’s there is actually 2 things we need to take care, first displaying the captcha in the front end and this can be done using the via below code,and second is verifying the inputs given by the user at the server side by passing required number of the parameters.

 

<script type="text/javascript"
     src=http://www.google.com/recaptcha/api/challenge?k=yourpublickey>
  </script>
  <noscript>
     http://span<br>
     <textarea name="recaptcha_challenge_field" rows="3" cols="40">
     </textarea>
     <input type="hidden" name="recaptcha_response_field"
         value="manual_challenge">
  </noscript>

The above code will display the captcha image in your web page and will built container which will have Captcha details in it along with text area field to write the image words.I suggest you use the above code at the head section of the HTML.The reCaptcha should look like below,

 

image

The above captcha consists of 2 words here one of the word is verification word (9th) and another is scanned text from some book which might belong to some ancient book or famous book or flop book or some super technical publication stuff which only Google folks read and this scanned text is then distorted to ensure that no OCR can read it correctly, and once it is confirmed that this text has failed OCR test, only then these types of images are fed in reCaptcha’s image database.

Now in order to fetch these captcha images in the front end, we see minimum 8 calls ,

image

 

One of the calls among these 8 calls has challenge field(recaptcha_challenge_field) which corresponds to the verification word as seen in the front end.Again we just know that its a long unique string generated at server side to uniquely identify the image and its answer.

So if you believe that correlating or capturing this long string means that you have solved captcha puzzle, then probably I would say, you are terribly wrong,read this post further to know why,

Now coming to the server side of validation,Lets use servlets to understand this process,

The first thing to do in the servlet program is to import the below packages in your validating servlet,

import net.tanesha.recaptcha.ReCaptchaImpl;
import net.tanesha.recaptcha.ReCaptchaResponse;

These 2 packages has method definitions to display error message encountered during validation and has method that can show us whether our captcha validation succeeded or failed.

String remoteAddr = request.getRemoteAddr();
ReCaptchaImpl reCaptcha = new ReCaptchaImpl();
        reCaptcha.setPrivateKey(“yourprivatekey”);
        String challenge = request.getParameter(“recaptcha_challenge_field”);
        String uresponse = request.getParameter(“recaptcha_response_field”);
        ReCaptchaResponse reCaptchaResponse = reCaptcha.checkAnswer(remoteAddr, challenge, uresponse);
         System.out.println(“challenge is ” + challenge + “‘”);
         System.out.println(” uresponse is ” + uresponse + “‘”);
         System.out.println(” remoteAddr is ” + remoteAddr + “‘”);

The above piece of code can written in the doPost method or doGet method of the servlet depending on the method type used in the main form.I have added to couple of print statements to ensure that we are indeed getting back the correct values from the front end.

If you look closely at the above code, we can see that there are at least 4 parameters which we need and among these 4 there are some parameters which we are collecting from the users namely

  • Remote IP Address of the client
  • Private Key of the site
  • User Response given by the user after seeing the captcha image at the front end
  • Challenge field.(This is same long string generated at the front end)

After passing  these 4 parameters correctly to the Google reCaptcha servers in the format as expected  by them, only then we say that the user is verifed as Human and not the automated bot.

Now what this all things means,

  • If we have some captchas which are third party, then probably we need unique IP address , in case we decide to test them.
  • There exists an option where we can test captcha as an isolated component given that most captcha solution are server side programs.
  • From the front end, its impossible to read the text embedded in the image since most of the matured captcha libraries present in the market host only images or use images which are distorted and cannot be read by OCR software,so testing captcha from the front end not a technically feasible approach.

So if someone tells you that they have tested the application which had captchas functionality in it, then probably you need to ask, how did they read the text embedded in the image ?

So moving on, is it the right thing to exclude captchas from the testing effort, I would say no, and I have certain reasons for this like

  • Captcha are blocking functionality features which according to me means that failed Captcha can block other working functionality.If captcha goes down for any reasons, user cannot do the submit or just cannot do any transaction on the page that benefits the site.So its loss to the site.
  • Captcha often has the blocking UI.I have seen this at least in my environment with recaptcha where till the captcha is loaded, certain portion of the page remains blank.
  • Since Captcha are server side programs, they suffer from same performance issues as seen in most of the server programs like server unavailable, client abort exception, 503 errors etc.
  • If we are using custom pure client side captcha solution like generating captcha using Javascript, then exists a risk that some day some smart user will just disable the javascript at his end and then do multiple submits to the site.Yes its possible to bypass client side captcha solutions that uses javascript solution with no server side verification .So if we are using this type of solution, then obviously testing is required end to end.

Captcha are interesting techniques and they do help to reduce spams, but they keeping them outside the scope of testing is not the right thing to do.

Technorati Tags: ,
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: