Hoodwinking CAPTCHAs with a Web Scraper

If you’ve ever worked with Web Scrapers, you’ll know that the most irritating thing to see is a CAPTCHA. It is put in place to prevent exactly what we’re making and pretty infuriating really! So I set out to build a system that could quite simply, beat the captcha.

CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart

They are used to tell computers and humans apart by placing a challenge that only humans can solve. You might have seen the above symbol around the internet. The challenge might be something like what follows.

A reCAPTCHA Challenge
An example from the FedEx website
An example from the Indian motor vehicle registry “Vahan”

Step 1: Preparing the Scraper

The first thing to do is import all the modules required. In your python file add the following lines. We’re using Selenium for scraping, Requests to deal with the OCR API, and PIL to handle working with images.

Step 2: Hunting for the CAPTCHA

Now that we can see the webpage, it’s time to hunt for the CAPTCHA. We won’t be able to directly extract the text from the code, so we take a different approach.

Hunting for the CAPTCHA
Webpage screenshot
Cropped CAPTCHA

Step 3: Extracting text from the CAPTCHA

There are multiple ways to go about this as it is effectively now an OCR problem. You can use a Cloud ML provider or use a model on your local machine.

Sending the request to Read API
Polling for the result
The original CAPTCHA
The processed output from Azure Read API

Step 4: Solving the CAPTCHA

Now depending on how your CAPTCHA looks, the method of solving may be different. You’ll have to be a bit of recon and a few trials to figure it out. In my case, there are 6 possible CAPTCHAs (Greater, Lesser, +, -, *, /).

Step 5: Proceed Scraping!

Now that the CAPTCHA is cracked, submitting the form with any value we would like is not a problem. No obstacles to stop this bot!

Hey, I’m Srujan. A Student, Developer and Perpetual Learner!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store