|
|
Setbacks
Vision recognition is an extremely new field that has far more problems than it has solutions. While older, more studied fields of research will tend to be mostly complete with just a few irresolvable holes, vision recognition tends to be more like one big hole that needs a lot of filling up. Thus far, theories and implementations have been very rudimentary and we have but scratched the surface of what is yet to come. Vision research has proceeded mostly like a single huge bunch of trials and errors.
Today, there are all kinds of problems and setbacks that plague and have plagued vision recognition researchers. These problems can be very specific and can literally number in the thousands. But overall, the following three problems have come up as major ones that have the ability to limit and influence the direction in which the field of vision recognition can go:
-Finding 3D objects in 2D images
-Converting raw raster images into usable representations
-Incomplete understanding of our own vision
Currently, it is very difficult if not mostly impossible for a computer to view a 2-dimensional image and see an object within it as 3-dimensional. To overcome this problem, many researchers have equipped their robots with binocular (that is, two cameras) vision systems that allow them to see from two different angles and thus in 3-dimensions. But the method of making 3D out of 2D can only be used in robots that are able to have new hardware installed. Adding a second camera does not make it any easier for a piece of software to find a 3D object. So, if a computer is fed a 2D image it will still be limited by what it can do with that picture.
More generally, there is the problem of making raster (pixel) images usable for a computer. Usually all the computer has is raw pixel data to go by, and it must somehow be able to take this data and find meaning in it, for otherwise the computer is nearly useless.

Adult humans have mastered the skill of finding meaning and recognizing objects within an image, and usually seem to do it without having to think about what they are doing. A computer however, must rely on complex algorithms and computations for it to be able to perform its task. For example, to be able to recognize an object in the image of a classroom, the computer or robot must first distinguish the object from its surroundings, and then it must find the shape of the object, and then only can it attempt to recognize the object for what it is. So first the computer must take raw pixel data and use, for example, edge detection algorithms to find out where one object ends and another starts. Once it has isolated the object, it can use raw pixel data to find a general shape for the object. And finally, in can compare this shape to one in a database and thus identify it.
Vision recognition researchers are constantly limited by what we as humans don't understand about our own vision. Our understanding of human vision is incomplete and/or subject to change, and since so much of vision recognition is based on biological vision, this can pose a major problem. Researchers developing new vision technologies tend to pick single abilities and then emulate those specific abilities by writing algorithms. But natural vision is not really a bunch of discrete, isolated systems that each do one thing - it is more one large and continuous process. So research in this field has just been a lot of scientists writing algorithms to try to imitate specific human processes, processes which they might not even fully understand.
|