Unsupervised Textual Knowledge Acquisition and inference

Advisor: Dr. A. Peñas

A central challenge in NLP is how to deal with the implicit information on a text. Currently the trend is to gather this background knowledge from large corpora. Still, it remains as an open question how to gather and use this knowledge efficiently. Moreover, the evaluation of this process heavily depends on the concrete task where the knowledge is applied. In the first part of this thesis we develop a procedure to transform natural language text into a graph based representation. The representation is built from a standard dependency parser that subsequently is normalized and enriched with several techniques. We test this representation in two main tasks: 1. Relation extraction: We have participated in a relation extraction task using the representation to generate features and achieved similar results to the state of the art systems. 2. Proposition extraction: Our method applies patterns to extract knowledge from the graphical representation. This knowledge is structured in predicate-argument and instance-semantic class propositions. Moreover, we took a large collection of documents represented as graphs and build a proposition store by aggregating the occurrences. The second part of the thesis exploits the representation in order to perform textual inference tasks. These tasks are: 1. Unsupervised method for correction of appositive dependences. We have defined a method to exploit the instance-semantic class propositions to improve the parsing on appositive structures. 2. Interpretation of eventive propositions: The main purpose of this task is to automatically unfold the meaning of eventive propositions. Specifically, given a proposition, we retrieve a pair of propositions that represent a plausible explanation of the sub-events involved. 3. CogALex Shared Task: In this task we evaluate the performance of several language models to retrieve the suggested word given by a set of stimuli. 4. Semantic parsing on linked data: Our goal here is to tie a graphical representation of an utterance with its canonical representation on different structured knowledge bases. To do so, we train a classifier that maps from our initial graphs to the graphs that we build from two popular linked data sources, Freebase and DBPedia. We also plan to enrich our graphical representation with topic models in order to ground latent semantic representations.

Bernardo Cabaleiro Barciela