Teaching a computer how to ‘see’ is no small feat. You can slap a camera on a PC, but that won’t give it sight. In order for a machine to actually view the world like people or animals do, it relies on computer vision and image recognition.
Computer vision is what powers a bar code scanner’s ability to “see” a bunch of stripes in a UPC. It’s also how Apple’s Face ID can tell whether a face its camera is looking at is yours. Basically, whenever a machine processes raw visual input – such as a JPEG file or a camera feed – it’s using computer vision to understand what it’s seeing. It’s easiest to think of computer vision as the part of the human brain that processes the information received by the eyes – not the eyes themselves.
One of the most interesting uses of computer vision, from an AI standpoint, is image recognition, which gives a machine the ability to interpret the input received through computer vision and categorize what it “sees.”
Here’s some examples of image recognition at work:
1.The Ebay app lets you search for items using your camera
2.This neural network turns pitch black photos into bright images
3.Facebook’s AI knows a lot about your photos
4.How about an AI that can read your mind?
There’s also the app, for example, that uses your smartphone camera to determine whether an object is a hotdog or not – it’s called Not Hotdog. It uses computer vision and image recognition to make its judgments. It may not seem impressive, after all a small child can tell you whether something is a hotdog or not. But the process of training a neural network to perform image recognition is quite complex, both in the human brain and in computers.
AI, at this point, is much like a small child. Computer vision gives it the sense of sight, but that doesn’t come with an inherit understanding of the physical universe. For that, an AI needs training just like children do. If you show a child a number or letter enough times, it’ll learn to recognize that number.
Surprisingly, many toddlers can immediately recognize letters and numbers upside down once they’ve learned them right side up. Our biological neural networks are pretty good at interpreting visual information even if the image we’re processing doesn’t look exactly how we expect it to.
It’s easy enough to make a computer recognize a specific image, like a QR code, but they suck at recognizing things in states they don’t expect — enter image recognition.
The way image recognition works, typically, involves the creation of a neural network that processes the individual pixels of an image. Researchers feed these networks as many pre-labelled images as they can, in order to “teach” them how to recognize similar images.
In the hotdog example above, the developers would have fed an AI thousands of pictures of hotdogs. The AI then develops a general idea of what a picture of a hotdog should have in it. When you feed it an image of something, it compares every pixel of that image to every picture of a hotdog it’s ever seen. If the input meets a minimum threshold of similar pixels, the AI declares it a hotdog.
Any AI system that processes visual information usually relies on computer vision, and those capable of identifying specific objects or categorizing images based on their content are performing image recognition.
This is incredibly important for robots that need to quickly and accurately recognize and categorize different objects in their environment. Driverless cars, for example, use computer vision and image recognition to identify pedestrians, signs, and other vehicles.