2. Related work
A thorough review of concept-based video retrieval has been presented by Snoek et al. The approach currently
achieving the best performance in semantic concept annotation in competitions like TRECVid or PASCAL VOC
is based on the bag-of-words (BoW) approach. The bag-of-words approach has been initially proposed for text
document categorization, but can be applied to visual content analysis, treating an image or keyframe as the
visual analog of a document that is represented by a bag of quantized descriptors (e.g. SIFT), referred to as visualwords.
A comprehensive study on this approach has been presented by Zhang et al. considering the classification
of object categories; a review of action detection and recognition methods has been recently provided by Ballan et
al. Since the BoW approach disregards the spatial layout of the features, some researchers have proposed several
methods to incorporate the spatial information to improve classification results : Lazebnik et al have proposed to
perform pyramid matching in the two-dimensional image space, while Bronstein et al. have proposed the
construction of vocabularies of spatial relations between features. Regarding CBIR we refer the reader to the
complete review of Datta et al.
A web based video search system, with video streaming delivery, has been presented by Halversen et al. to
search videos obtained from PowerPoint presentations, using the associated metadata. A crowd-sourcing system for
retrieval of rock’n’roll multimedia data has been presented by Snoek et al. which relies on online users to
improve, extend, and share, automatically detected results in video fragments. Another approach to web-based
collaborative creation of ground truth video data has been presented by Yuen et al. extending the successful
LabelMe project from images to videos. A mobile video browsing and retrieval application, based on HTML and
JavaScript, has been shown by Bursuc et al. An interactive video browsing tool for supporting content
management and selection in postproduction has been presented by Bailer et al. comparing the usability of a fullfeatured
desktop application with a limited web-based interface of the same system.