3.1. Media annotation and indexing
Text indexing and annotation is performed by services that provide: ) language classification, based on n-grams
and Naïve Bayes classifiers that despite the simplicity have shown to work effectively also on short fragments39; )
topic detection based on LDA; ) named entity extraction based on gazetteers and a rule-based system, to handle
entities that have not been added yet to lists. Topic detection and named entity identification can be used also with
the outcomes of speech transcription services.