Prism

Prism is a neural network library for natural language processing written in pure Clojure. This library get you a distributed representation of words.

Now prism is ready to work with following models.

Word2Vec by skip-gram with negative-sampling
RNNLM with negative-sampling
Skip-Thought

This library also includes some basic neural network model (e.g. feedforward, RNN(LSTM, GRU), encoder-decoder(LSTM, GRU)). See demonstration section and examples for more details.

And prism also provides visualization tool that makes figure like above one. See wiki.

Usage

git clone https://github.com/Jah524/prism.git and lein repl. You can find how to do with prism at wiki

If you want to work on trained model in your project, add following dependency to your project.clj,

[jah524/prism "0.9.1"]

models for NLP

Word2vec

;make word embeddings
(require '[prism.nlp.word2vec :as w2v])
(def model (w2v/make-word2vec "your-training-path.txt" "model-save-path.w2v" 100 {:workers 2}))

;use word embeddings
(require '[prism.util :as util])
(def em (util/load-model "your-save-path.w2v.em"))
(get em "word") ;=> #vectorz/vector [0.06990837345068174,-0.09570045605373989 ....]

(util/similarity (get em "word1") (get em "word2") true) ;=> 0.03489215427695168

(require '[clojure.core.matrix :as m])
(m/set-current-implementation :vectorz)

example using pretrained model

(require '[prism.util :as util])
(require '[prism.nlp.word2vec :as word2vec])
(require '[clojure.core.matrix :as m])
(m/set-current-implementation :vectorz)

(def em (util/load-model "https://s3.amazonaws.com/prism-archive/pretrained-model/1-billion-word-language-modeling-benchmark_200h_ns5_min10.w2v.em"))
;; it takes a minutes (222.0MB)

; or
; download the model and
(def em (util/load-model "your path to pretrained model"))


(def target-list ["Japan" "Tokyo" "China" "Beijing" "Bangkok" "Thai" "Singapore" "France" "Paris" "Italy" "Rome" "Spain" "Madrid"])

(def v1 (m/add (m/sub (get em "Japan") (get em "Tokyo")) (get em "Paris")))
(word2vec/most-sim em v1 target-list)
;=> ({:word "France", :sim 0.7725763} {:word "Italy", :sim 0.6931164} {:word "Spain", :sim 0.6633791} {:word "Paris", :sim 0.64103466} {:word "Rome", :sim 0.49774215})

(def v2 (m/add (m/sub (get em "Japan") (get em "Tokyo")) (get em "Beijing")))
(word2vec/most-sim em v2 target-list)
;=> ({:word "China", :sim 0.97199464} {:word "Beijing", :sim 0.8846489} {:word "Japan", :sim 0.7798172} {:word "Italy", :sim 0.44583768} {:word "Singapore", :sim 0.41647854})

See Word2vec for more details.

RNNLM

(require '[prism.nlp.rnnlm :as rnnlm])
(def rnnlm (rnnlm/make-rnnlm "your-training-path" "model-save-path.rnnlm" 100 :gru {:workers 2 :negative 5}))

(require '[prism.util :as util])
(def rnnlm (util/load-model "your-save-path.rnnlm"))

;; you can take distributed representation of text or phrase
(rnnlm/text-vector rnnlm ["word1" "word2" "word3"]) ;=> #vectorz/vector [0.5559875183548029,0.6338452816753448,0.49570920352227194 ...]

;; and resume train, model-path represents your-save-path.rnnlm
(rnnlm/resume-train "your-training-path" "model-path.rnnlm" {:workers 4})

See RNNLM for more details.

Skip Thought

(require '[prism.nlp.skip-thought :as st])

(def m (st/make-skip-thought "your-training-path" "your-embedding-path" "model-save-path.st" 200 100 100 :gru {:workers 4 :interval-ms 10000}))

;; get encoder
(def en (st/get-encoder m))

;; get words vector
(st/skip-thought-vector en ["word1" "word2" "word3"]) ; => #vectorz/vector [0.08737680011079421,0.07707453000181463, ...]

;; save and load encoder
(require '[prism.util :as u])
(u/save-model en "output-path.st.encoder")
(def en (u/load-model "output-path.st.encoder"))

See Skip-Thoght Vectors for more details.

Visualization

Move on to your console.

git clone https://github.com/Jah524/prism.git
lein run -m server.handler word2vec `path-to-your-embedding` --port 3003

go to 127.0.1.1:3003 on your browser and you can see visuallization tool

Items

First, you put together items you want visualize. Next, give them by text file or input manually. You can try to use example.txt (body has took from clojure.org).

t-sne parameters

We use t-sne for dimensionality reduction to visualize. Perplexity and iteration are needed at least. The perplexity should not too much bigger or smaller, and iterations also should not be too small. You can get to know more about parameters in T-SNE-Java.

You ready? Then you just put green button below.

Result

You might get result figure top of the page. Keep in mind we use t-sne(non-deterministic) so that a result is different by trials. When something doesn't work, check your messages on console.

Note, we use plotly.js for visuallization.

Basic neural networks

Feed Forward

approximation of sin function with 3 hidden units

lein run -m  examples.feedforward.sin3

RNN

you can choose rnn model from #{"standard" "lstm" "gru"} for each example

with dense input

lein run -m examples.rnn.simple-prediction standard

with sparse inputs

lein run -m examples.rnn.sparse gru

multi class classification

lein run -m examples.rnn.multi-class lstm

License

Distributed under the Eclipse Public License either version 1.0

Name		Name	Last commit message	Last commit date
Latest commit History 316 Commits
resources/public/js		resources/public/js
src		src
test		test
.gitattributes		.gitattributes
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prism

Usage

models for NLP

Word2vec

RNNLM

Skip Thought

Visualization

Items

t-sne parameters

Result

Basic neural networks

Feed Forward

RNN

License

About

Releases

Packages

Languages

Jah524/prism

Folders and files

Latest commit

History

Repository files navigation

Prism

Usage

models for NLP

Word2vec

RNNLM

Skip Thought

Visualization

Items

t-sne parameters

Result

Basic neural networks

Feed Forward

RNN

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages