This framework is based on this paper: Relation Extraction: Perspective from Convolutional Neural Networks
Review of the paper:
Traditional approaches - relation extractor uses features generated by linguistic analysis modules.
This paper - relation extractor based on complicated feature engineering by introducing a convolutional neural network that automatically learns features from sentences and minimizes the dependence on external toolkits and resources.
model - multiple window sizes for filters
- pre-trained word embeddings as an initializer on a non-static architecture to improve the performance
- unblanced corpus (non-relation examples exceed usual relations.)
Techniques - feature-based method - kernel-based method
Uses large body of lingustic analysis and knowledge resources to transform relation mentions into some rich representation to be used by some statistical classifier like SVM or Maximum Entropy. - tokenize - part of speech tagging - chunking - name tagging - parsing with existing NLP modules.
- Induces errors propogation from supervised NLP toolkits.
- Out of domain data performance loss
- System is provided with raw sentences marked with the two entities of interest
- Word embeddings as an initializer (captures latent semantic and syntactic properties)
- CNN - recognizes specific classes of n-gram and induce more abstract representations.
- various window size for convolutional filters (captures wider range of n-grams)
- rather than initializing the word embeddings randomly, we use pretrained word embeddings for initialization and optimize both word embeddings and position embeddings as model parameters
CNN layers:
- look-up tables to encode words in sentences by real-valued vectors
- convolutional layer to recognize n-grams
- pooling layer to determine the most relevant features
- a logistic regression layer (a fully connected neural network with a softmax at the end) to perform classification
- Input : sentences marked with two entity
- CNNs - work with fixed length inputs => compute the maximal separation between entity mentions linked by a relation and choose an input width greater than this distance.