The amount of video on the Web is growing at an incredible rate. Effectively searching online video, however, remains difficult. Microsoft researcher Xian-Sheng Hua hopes to crack the problem by teaching computers to recognize objects, scenes, events, and other elements of digital images.
Hua uses machine-learning techniques and annotated videos to train computers to automatically categorize new videos. While this general approach isn't new, Hua's system permits multiple labels for each video segment--and relies not only on specified tags applied by experts but also on descriptions written by large numbers of grassroots Internet users. These user-generated tags are gathered by means of online games, "pay for labeling" schemes, analysis of how people search for video, or other methods. Hua applies some automated filters to the labels to ensure their quality.
The system, which runs online, is first trained on videos tagged by experts; it's then periodically updated and retrained using the grassroots labels. This "online active learning" makes the algorithm more accurate and several times faster than previous systems; applying multiple labels to each video increases the speed further. The technology should aid searches for still images, too. Some of the techniques involved are already being incorporated into Microsoft's Live Search Video. Ultimately, Hua says, the technology should improve not only online video and image searches but also video surveillance and digital media management. --Erika Jonietz