Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with simple_tokenizer #1

Open
mxnno opened this issue Feb 28, 2022 · 9 comments
Open

Problem with simple_tokenizer #1

mxnno opened this issue Feb 28, 2022 · 9 comments

Comments

@mxnno
Copy link

mxnno commented Feb 28, 2022

Hi,
I want to use your approach, but I want to train (train_protoinfomax_kws_sentiment.sh) I will get the error: No module named 'simple_tokenizer '. There is no simple_tokenizer.py file and I cant find any Tokenizer called 'simple_tokenizer'. Is this module outdated or am I doing something wrong?
Thanks!

@inimah
Copy link
Owner

inimah commented Feb 28, 2022

Thanks for pointing it out!
The missing tokenizer file is called from ./basic_utils/vocabulary_cls.py
I have just added it.

@mxnno
Copy link
Author

mxnno commented Mar 1, 2022

Thanks!
Another small issue:
train_fasttext.py has a SyntaxError in line 466: "Amazondat/"

@mxnno
Copy link
Author

mxnno commented Mar 1, 2022

wikipedia2vec is missing in requirements.txt

@inimah
Copy link
Owner

inimah commented Mar 1, 2022

OK. Please let me know other issues as well, which may due to naming conversion, typos, and migration from our server to github repo.

@mxnno
Copy link
Author

mxnno commented Mar 7, 2022

if I run train_imax_kw_intent.py, I´ll get a FileNotFoundError:
protoinfomax/embeddings/tfidf_sparse_vec_intent.pkl is missing.
It doesn´t get created anywhere, only tfidf_sparse_vec_cls.pkl. Is it the same or how can i get tfidf_sparse_vec_intent.pkl?

@inimah
Copy link
Owner

inimah commented Mar 8, 2022

In train_imax_kw_intent.py:

image

I have added tfidf_sparse_vec_intent.pkl in ./embeddings/

For sentiment data, this file is created from ./src/extract_sentiment.py
For intent data, please adopt the code accordingly by calling the corresponding dataset. (The *.pkl I provided is for intent data because the size is smaller than sentiment domain).

This tfidf_sparse_vec_intent.pkl file is mainly used to extract keywords per sentences.

However,

Torch Data Loader prefers numeric array representation that can be reshaped during batch training episode.
That is why we use TfIdf representation when loading ``Kws_xx.train'' via ./workspace/workspace_intent_kw.py.
See that params["tfidf_transformer"] and params["cv"] are called in workspace_intent_kw.

@abtuo
Copy link

abtuo commented Apr 12, 2022

Hello,

I am also trying to reproduce your work. After taking into account the issues discussed above, I have this new error :

Traceback (most recent call last): File "train_oproto_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 5, in <module> from workspace.workspace_intent_rl import workspace ModuleNotFoundError: No module named 'workspace.workspace_intent_rl'
It seems that there are missing _rl files in the workspace folder.

Thanks !

@abtuo
Copy link

abtuo commented Apr 12, 2022

I removed the _rl at the end of the file name and it seems to work. There were several other modules missing such as tensorflow, keras, sklearn, which are not in the requirement.txt file...

I have a new error with a package 'utility' that I haven't yet fixed.

Traceback (most recent call last): File "train_imax_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 7, in <module> from utils.cal_methods import HistogramBinning, TemperatureScaling, evaluate, cal_results File "../utils/cal_methods.py", line 17, in <module> from utility.evaluation import ECE, MCE ModuleNotFoundError: No module named 'utility'

@inimah
Copy link
Owner

inimah commented Apr 13, 2022

Thanks.
It might be because python cannot find/read the directory "utility".
A little hack is to call the function directly on utils.cal_methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants