query2vec

A small repo for producing query embeddings. Thesis work

HowTo

First one needs the train, valid and test triplets for a dataset. For example FB15k_237: [https://www.microsoft.com/en-us/download/details.aspx?id=52312]

To generate data for training and testing, as well as the appropriate filters, one can prepare the following script:

#!/usr/bin/env bash

dataset_path="./datasets/FB15k_237/"
qa_folder="qa_example" #149689
train_path=$dataset_path"train.txt"
val_path=$dataset_path"valid.txt"
test_path=$dataset_path"test.txt"
train_query_orders="[[(1, -1), (2, 50000), (2, 50000), (3, 50000), (3, 50000)]]"
val_query_orders="[[(1, -1), (2, 5000), (2, 5000), (3, 5000), (3, 5000)]]"
test_query_orders="[[], [(2, 5000)], [(2, 5000)], [(3, 5000)], [(3, 5000)], [(3, 5000)], [(3, 5000)]]"
include_train='[["1p", "2p", "2i", "3p", "3i"]]'
include_val='[["1p", "2p", "2i", "3p", "3i"]]'
include_test='[["1p"], ["2p"], ["2i"], ["3p"], ["3i"], ["ip"], ["pi"]]'

python ./query2vec/graph.py $train_path $val_path $test_path \
            --qa_folder=$qa_folder --train_query_orders="$train_query_orders" \
            --val_query_orders="$val_query_orders" --test_query_orders="$test_query_orders" \
            --include_train="$include_train" --include_val="$include_val" --include_test="$include_test" \
            --add_inverse=true

test_filter=$dataset_path$qa_folder"/filter.pkl"
val_filter=$dataset_path$qa_folder"/val_filter.pkl"
train_ds=$dataset_path$qa_folder"/train_qa_1.txt"
val_ds=$dataset_path$qa_folder"/val_qa_1.txt"

python ./query2vec/create_filter.py $test_filter $train_ds $val_ds $val_filter

Then to run an experiment, one can simply prepare a model.json and train_config.json (example can be found in the example folder) and simply run:

Attention: Now only run grouped hits@ and mrr (all together take too long). The flag "--all_scaled" does this...

python ./query2vec/run.py ./query2vec/example --all_scaled

Also, if one can prepare multiple runs, by preparing a folder (runs) that contains folders of model.json and train_config.json

To run them all:

python ./query2vec/run.py -f ./runs --all_scaled

To monitor the experiments, one can use mlflow command (in the location where mlruns is located):

mlflow server

This creates a server on localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
algorithms		algorithms
example		example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
config.py		config.py
create_filter.py		create_filter.py
entrypoint.sh		entrypoint.sh
environment.yml		environment.yml
form.py		form.py
gpu_colab.ipynb		gpu_colab.ipynb
gpu_kaggle.ipynb		gpu_kaggle.ipynb
graph.py		graph.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
run.py		run.py
run_tests.py		run_tests.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

query2vec

HowTo

About

Releases

Packages

Languages

tolios/query2vec

Folders and files

Latest commit

History

Repository files navigation

query2vec

HowTo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages