In images and videos, many entities and relations are infeasible to detect by their appearances using existing approaches, and most of them do not even show in any pixels. Yet, they are pervasive and govern the placement and motion of the visible entities that are relatively easier to detect. By analogy, they are like the dark matters and dark energy in cosmology which physicists study in a standard cosmology model.
Studying such "dark entities" and "dark relations" in vision are crucial for filling the performance gaps in the recognition of objects, scenes, actions and events. More specificallty, “dark matter” corresponds to entities which are infeasible to recognize by visual appearances. This includes
i) status of an agent (human and animal)’s goals and intents, like hungry, thirsty, which trigger actions;
ii) status of an object, such as a door is “locked”;
and iii) stuff like water which has no specific geometric shape or appearance.
“Dark energy” refers to hidden relations which drive the motion of objects in a video. To name a few:
i) physical forces like gravity and supporting relations between objects;
ii) causal effects and causal relations between actions and the changing object statuses;
and iii) attraction relations between an object (like food) and an agent (hungry); and so on.
In computer vision, two prevailing representational paradigms are: (I) the view-centered and appearance-based models; and (II) theobject/scene centered geometric based representation. I argue that a deep level for representation is task-centered based on the dark matters and dark energies. Therefore we must integrate the "visible" (geometry and appearance) and the "dark" (FPIC), and develop algorithms for joint inference and reasoning. This also explains why we cannot solve those "simple" problems and must tackle the "complex" one jointly.
时段 | 个数 |
---|---|
{{f.startingTime}}点 - {{f.endTime}}点 | {{f.fileCount}} |