Problem with simple_tokenizer #1

mxnno · 2022-02-28T15:45:16Z

Hi,
I want to use your approach, but I want to train (train_protoinfomax_kws_sentiment.sh) I will get the error: No module named 'simple_tokenizer '. There is no simple_tokenizer.py file and I cant find any Tokenizer called 'simple_tokenizer'. Is this module outdated or am I doing something wrong?
Thanks!

inimah · 2022-02-28T17:44:55Z

Thanks for pointing it out!
The missing tokenizer file is called from ./basic_utils/vocabulary_cls.py
I have just added it.

mxnno · 2022-03-01T13:10:13Z

Thanks!
Another small issue:
train_fasttext.py has a SyntaxError in line 466: "Amazondat/"

mxnno · 2022-03-01T13:30:27Z

wikipedia2vec is missing in requirements.txt

inimah · 2022-03-01T14:11:53Z

OK. Please let me know other issues as well, which may due to naming conversion, typos, and migration from our server to github repo.

mxnno · 2022-03-07T19:37:05Z

if I run train_imax_kw_intent.py, I´ll get a FileNotFoundError:
protoinfomax/embeddings/tfidf_sparse_vec_intent.pkl is missing.
It doesn´t get created anywhere, only tfidf_sparse_vec_cls.pkl. Is it the same or how can i get tfidf_sparse_vec_intent.pkl?

inimah · 2022-03-08T15:31:32Z

In train_imax_kw_intent.py:

I have added tfidf_sparse_vec_intent.pkl in ./embeddings/

For sentiment data, this file is created from ./src/extract_sentiment.py
For intent data, please adopt the code accordingly by calling the corresponding dataset. (The *.pkl I provided is for intent data because the size is smaller than sentiment domain).

This tfidf_sparse_vec_intent.pkl file is mainly used to extract keywords per sentences.

However,

Torch Data Loader prefers numeric array representation that can be reshaped during batch training episode.
That is why we use TfIdf representation when loading ``Kws_xx.train'' via ./workspace/workspace_intent_kw.py.
See that params["tfidf_transformer"] and params["cv"] are called in workspace_intent_kw.

abtuo · 2022-04-12T15:01:06Z

Hello,

I am also trying to reproduce your work. After taking into account the issues discussed above, I have this new error :

Traceback (most recent call last): File "train_oproto_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 5, in <module> from workspace.workspace_intent_rl import workspace ModuleNotFoundError: No module named 'workspace.workspace_intent_rl'
It seems that there are missing _rl files in the workspace folder.

Thanks !

abtuo · 2022-04-12T16:02:39Z

I removed the _rl at the end of the file name and it seems to work. There were several other modules missing such as tensorflow, keras, sklearn, which are not in the requirement.txt file...

I have a new error with a package 'utility' that I haven't yet fixed.

Traceback (most recent call last): File "train_imax_intent.py", line 18, in <module> from basic_utils.utils_torch_intent import compute_values, get_data, compute_values_eval File "../basic_utils/utils_torch_intent.py", line 7, in <module> from utils.cal_methods import HistogramBinning, TemperatureScaling, evaluate, cal_results File "../utils/cal_methods.py", line 17, in <module> from utility.evaluation import ECE, MCE ModuleNotFoundError: No module named 'utility'

inimah · 2022-04-13T06:52:12Z

Thanks.
It might be because python cannot find/read the directory "utility".
A little hack is to call the function directly on utils.cal_methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with simple_tokenizer #1

Problem with simple_tokenizer #1

mxnno commented Feb 28, 2022

inimah commented Feb 28, 2022

mxnno commented Mar 1, 2022

mxnno commented Mar 1, 2022

inimah commented Mar 1, 2022

mxnno commented Mar 7, 2022 •

edited

Loading

inimah commented Mar 8, 2022 •

edited

Loading

abtuo commented Apr 12, 2022

abtuo commented Apr 12, 2022

inimah commented Apr 13, 2022

Problem with simple_tokenizer #1

Problem with simple_tokenizer #1

Comments

mxnno commented Feb 28, 2022

inimah commented Feb 28, 2022

mxnno commented Mar 1, 2022

mxnno commented Mar 1, 2022

inimah commented Mar 1, 2022

mxnno commented Mar 7, 2022 • edited Loading

inimah commented Mar 8, 2022 • edited Loading

abtuo commented Apr 12, 2022

abtuo commented Apr 12, 2022

inimah commented Apr 13, 2022

mxnno commented Mar 7, 2022 •

edited

Loading

inimah commented Mar 8, 2022 •

edited

Loading