Topic Detection, Tracking and Trend Analysis of Social Media Data
Supervisor: Dr.J. Gonzalo
Online Reputation Management (ORM) systems allow monitoring and management of the opinions of internet users on individuals, companies or products. The main objective is to detect topics that can affect an organization's reputation either positively or negatively. The research is centered in this scenario and the main goals are: topic/event detection for online reputation management; techniques for processing short texts; classification of the priority / relevance of topics. So far, we have two main results. A new technique based on LDA for clustering a collection of tweets emitted within a short time span about a specific entity. This approach relies on transfer learning and considers contextualizing the target collection with a large set of unlabeled tweets to perform the clustering. Besides, the design of an LDA-based model in which each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word cooccurrences and the document's timestamp. Furthermore, as work in progress we will use data from past events/topics to learn a stochastic model to predict if a new event is likely to occur in the very near future. To achieve this, we will use different types of features extracted from the tweet information (length, keywords, trustworthiness, number of followers, location, timeliness, correlation with past events, etc.). We are also
working on the development of a system that uses syntactic and semantic approaches to classify the topics according to their relevance to the entity.
Tamára Martín-Wanton PhD D Student