Research methodologies in vision (and other sciences and engineering) can be summarized in three approaches or stages: Hack, Math, and Stat. Hacks are heuristics or somethings that somehow work somewhere, but you cannot tell exactly how and where they work. Math is on the opposite side, it tells us that under certain conditions, things can be said analytically or with a gaurantee of performance, but often the conditions are quite limited and do not apply to general situations in the real world. Stat is essentially regression. With lots of parameters, you eventually can fit any data, but lack a physical explanation. Hack, math, and stat are therefore different levels of interpretations or models.
It is interesting to see examples in discplines that has a longer history, say physics. The Chinese expedition [1405-1433] in the Ming dynasty was the most advanced of its time when folks sailed 2/3 of the world reaching Africa and Europe without even knowing the Earth is round ! The technique they used is called celestial navigation (see the picture below), which I call "hacks" here. People used the constalletions to find the north and the latitude. the constellations are very much like shape features we are using today for object recognition. It was not precise, but worked to some extent in practice.
A beautiful math theory appeared in the 1680s* when Newton invented the gravitation theory which is simple and explains the movements of stars and planets. But the math is not suffisticated enough to explain fully the motion of moon. Newton was reported said that the lunar theory "made his head ache and kept him awake so often that he would think of it no more"**. In 1750s, it was the French talents like Euler and a few others who came to rescue. They invented the least-sqaured method to fit the observational data perfectly with regression (see the equation below). Such regression equations looks very familiar in machine learning today.
Hack, math, and stat are all useful tools and methods, and often a complex solution integrates all three of them. For example, in image compression and coding, we have the math in information theory, wavelets and computational mononic analysis at its core. Then we also use statistics for the frequency of various elements in the code book. Finally the coding scheme contains numerous engineering hacks to make it work in real images/video, such as jpg and mpeg. It is likely the solution to vision will rely on all three aspects, and you need to be good at all three aspects if you are serious at solving the vision problems.
From a talk I gave at the Frontiers of Vision at Boston in 2011. Download the whole ppt at XXXXXXXXXXXXXXXXXXXXXXXX/~sczhu/Talks/Zhu_Hack_Stat_Math_2011.pptx/
时段 | 个数 |
---|---|
{{f.startingTime}}点 - {{f.endTime}}点 | {{f.fileCount}} |