Reductionism is a beloved research strategy in many areas of modern sciences. It says that if you cannot solve a problem, you should divide it into smaller components as any complex system is nothing but the sum of its parts. This methodology was practiced by early vision researchers in the 1980s, for example, numerous methods for edge detection, segmentation, shape-from-X etc.
But, people found that even the simplest problem like edge detection couldn't be solved, because the definition of an edge depends on tasks in higher levels and even human labelers cannot agree whether there is an edge without specifying the task levels. Unlike physicists who can choose to study a system or phenomenon at a given scale or status, computer vision researchers found themselves very unfortunate: each single image contains so many patterns and tasks across many levels!
The figure below shows how much we the humans can infer, parse, and reason about in space, time and causal-effect from a single image.
This is a figure that I drew in our MURI 2015 project: Understanding Scenes and Events by Joint Parsing, Cognitive Reasoning and Lifelong Learning..
The table below lists a set of questions that we must solve, all together, in order to understand a single image. So, we go the opposite direction: if you cannot solve a simple problem, you may have to solve a complex one! This motivated our work for developing a unified representation --- spatial, temporal and causal and-or graph and making joint parsing of all the tasks on the table (see our demo page ). Now it reminds me of a loud slogan in machine learning: "You should never solve a problem more than is necessary (by Vapnik)". This was used to argue for discriminative models against generative models. The slogan itself has nothing wrong, but unfortunately we just don't have such well-defined problems to solve in computer vision! Face detection perhaps is a rare exception when you don't consider the image context. Edge detection was thought to be a classification problem, but it is not. I am also reminded that physicists are taking our approach lately. For example, the concept of Dark Matter/Energy is to construct a more complex system than what we can see, and in superstring theory, people go to 10 dimensional space in order to put relativity theory and quantum mechanics in peace.
This table lists the aspects for scene understanding that we promised in 2010 to study in the ONR MURI project.
时段 | 个数 |
---|---|
{{f.startingTime}}点 - {{f.endTime}}点 | {{f.fileCount}} |