One of the internet’s favorite ways of verifying that traffic is coming from actual people and not bots are captchas. Those little boxes with pictures of street signs have proven notoriously difficult for robots to crack, despite increasing progress in machine learning making progress on recognizing images. Recently, however, researchers at the University of Maryland have figured out a new, easier way to crack these pesky security measures.
Rather than looking at the images provided by Google, their new system UnCaptcha uses the available audio captcha in order to circumvent the complexities of image processing.
The program works by passing the audio played by the program to various speech to text algorithms. It uses Bing Speech Recognition, IBM, and the Google Cloud API along with phonetic processing to determine exact and near homophones and plug its results back into the captcha.
The researchers have been able to achieve about 85% accuracy with this system, which is available on Github. Google has noticed the release of this captcha cracker, and recently started adding certain bits of spoken text into their audio recordings.