One of the world's leading computer vision competitions, called the ImageNet Challenge, will encourage participants to develop algorithms for 3D objects next year. If ImageNet reaches its ambitious goal, robots could soon see and talk about their surrounding environment with as much clarity as human beings.
As of today, techies who build image-recognition algorithms for the ImageNet database have only worked with 2D objects. Since the competition began in 2010, ImageNet has collected an impressive catalogue of over 14 million 2D images.
Although it was difficult at first, it is now becoming easier for programmers to code 2D algorithms for ImageNet's vast database. Indeed, in the 2015 challenge a Microsoft team was able to create a system of visual recognition that was more accurate than the human eye.
Thanks to ImageNet's database, companies like Apple and Google have been able to help people search for photos with greater ease. People could now use words like "baby" or "happiness" to find specific pictures in their photo library.
One of the competition's organizers, Alex Berg, said he's thrilled that ImageNet's database is being incorporated into "products that millions of people are using."
Microsoft's 2015 stunning performance got ImageNet employees thinking about how they could help advance even more breakthroughs in the computer vision field. The obvious answer for many of the ImageNet employees was to push for accurate 3D computer imaging.
All of the algorithms in ImageNet's database are currently labeled by hand, which means they have absolutely no sense of depth. All images for the 3D challenge in 2018 would need to take depth into account and will probably first consist of coding 360-degree photos. The first 3D images developers are set to code will have to do with the interior spaces of both homes and offices.
As of today, there's extremely little scientific literature to work with regarding 3D algorithms. Participants will have to almost work from scratch to design algorithms that faithfully represent 3D images.
Although no one expects the 3D algorithms to be perfected at the 2018 competition, many people in the tech field think the time has come to get serious about 3D computer imaging.
Andrew Davison, a professor at Imperial College London, believes there's great potential for this technology, especially for domestic cleaning robots more sophisticated than current iterations of the self-cleaning vacuum. Also, this information could be used to advance the exciting field of virtual reality.
Not only do organizers want robots to see in 3D, they also want them to communicate what they are seeing with clarity. In the future, they hope robots will be able to understand what they see and describe the world around them using syntactically complex sentences.
Berg and other organizers don't expect to make any real progress on this issue for a few years. However, ImageNet wants to get developers around the world thinking about this issue as soon as possible. The specifics of the competition have yet to be finalized, and Berg told reports that he doesn't expect to see robots that can perceive and interpret the 3D world around them for at least for five years.