Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction

 

In this paper, we address the problem of jointly summarizing large-scale Flickr images and YouTube user videos.

June 23, 2014
IEEE Conference on Computer Vision Pattern Recognition (CVPR) 2014

 

Authors

Gunhee Kim (Disney Research)

Leonid Sigal (Disney Research)

Eric P. Xing (Carnegie Mellon University)

 

 

 

Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction

Abstract

Starting from the intuition that the characteristics of the two media are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summary but also a novel structural summary of online images as storyline graphs, which can illustrate various events or activities associated with the topic in a form of a branching network. In our approach, the video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we create the datasets of 20 outdoor recreational activities, consisting of 2.7M of Flickr images and 16K of YouTube user videos. Due to the large-scale nature of our problems, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other important baselines and our own methods using videos or images only.

Copyright Notice