All posts by

Stop Spam, Read Books: Turning you into a Data Capturer

The Google owned reCaptcha system is something you’ll be all too familiar with, and something you will undoubtedly use every day online, even if you don’t have any idea what we are talking about. The reCaptcha system is one of many Captcha security options that web forums, blogs and forms use to protect themselves from spam robots.

These systems work by providing an obscured series of words and numbers, which the user needs to identify and then enter in order to prove that they are in fact human and not spam robots. They are often just randomised letters and words, which serve no purpose other than preventing robots from filling out your forms and commenting on your blog. reCaptcha, however, has found an innovative new use for these programmes that has changed the face of data capturing and, in essence, turned every Internet user into a data capture technician.

They have taken advantage of the millions of captchas that take place everyday to advance their book digitization process and ensure absolute accuracy with the data they capture. And all Internet users are now their data capture technicians, assisting with transcribing and deciphering text in millions of books, images and even on Google Maps.

Their initial process uses OCR, which is the optical recognition of a hardcopy text document by a computer programme that then digitises the text and stores it on a server. This process relies on very little human interaction and uses absolutely no human power for the actual capturing and digitisation of text. This system, while very advanced, cannot deliver the level of accuracy that a double capture human driven system can.

This is where reCaptcha is changing the face of optical recognition data capturing, and making the world’s Internet users a force for good. Internet users are literally helping the robots learn to read. In so doing they are not only helping to improve OCR technology and ensure accuracy of data that is captured, they’re actually improving the quality of digitized content on the Internet.

So although it might seem to you that data capturing is something far removed from your own life, or even something that you don’t quite understand, it is in fact, not just something that is right under your nose but something that you help with and participate in everyday.

The reCaptcha method is an extremely intelligent use of a system that was already being used everyday. What it really shows, though, is that OCR cannot compete with human captured data. To get truly accurate data capture results for business purposes, quickly and professionally, you need to rely on human based data capture methods. Our clients work on tight deadlines requiring accurate results while demanding accountability. They can’t afford working with the margin of error that OCR creates, nor do they have the time to wait for bugs and inaccuracies to be fixed.

The OutProsys human based double capture method is the best as our manual data capture of handwritten text involves one operator first capturing the data and a second more experienced operator re-capturing each character blind. This ensures all data is double captured, complete and accurate; utilising a combination of our leading edge recognition technology (OCR), as well as highly trained and experienced data capture operators to capture typed and handwritten text in multiple currencies & languages from the scanned electronic document images.


do what you do best, outsource the rest