Skip to content

Error running Pipeline with BasicReferenceRecognizer #60

Open
@xesaad

Description

@xesaad

Hi there! I am a new and frequent user of this great package, which also comes with a few inevitable GitHub issues 😅

When I initialize the pipeline as follows:

name = "absa/classifier-rest-0.2"
model = absa.BertABSClassifier.from_pretrained(name)
tokenizer = BertTokenizer.from_pretrained(name)
reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
professor = absa.Professor(reference_recognizer) 
nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

I receive the following error:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_514/72277120.py in <module>
      2 model = absa.BertABSClassifier.from_pretrained(name)
      3 tokenizer = BertTokenizer.from_pretrained(name)
----> 4 reference_recognizer = absa.aux_models.BasicReferenceRecognizer()
      5 professor = absa.Professor(reference_recognizer)
      6 nlp = absa.Pipeline(model=model, tokenizer=tokenizer, professor=professor)

TypeError: __init__() missing 1 required positional argument: 'weights'

I realise this is because the BasicReferenceRecognizer needs to be trained in order to select weights. This leads me to two questions/issues:

  1. The BasicReferenceRecognizer class has no train method. Is there another way in which to train it, or any ways to load a pretrained model from the package? From the unit tests for the BasicReferenceRecognizer I found there were two pre-trained models, 'absa/basic_reference_recognizer-rest-0.1' and 'absa/basic_reference_recognizer-lapt-0.1', but on trying to initialize with these I received an ImportError.
  2. I also tried directly initializing the BasicReferenceRecognizer with weights=(-0.025, 44) as is done in this line. However, upon making predictions I get an error in the Pipeline at the postprocess step:
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_514/3162923628.py in <module>
      3 for row in df.itertuples():
      4     print(row)
----> 5     prediction = predict(row.Review, row.Aspect)
      6     sentiment = get_sentiment(prediction)
      7     certainty_score = get_certainty_score(prediction)

/tmp/ipykernel_514/1002360698.py in predict(text, aspect)
     16         output_batch = nlp.predict(input_batch)
     17         predictions = nlp.review(tokenized_examples, output_batch)
---> 18         completed_task = nlp.postprocess(task, predictions)
     19         completed_subtask = completed_task.subtasks[aspect]
     20         return completed_subtask

/pyenv/versions/3.8.5/envs/seo-advice-page/lib/python3.8/site-packages/aspect_based_sentiment_analysis/pipelines.py in postprocess(task, batch_examples)
    301             aspect, = {e.aspect for e in examples}
    302             scores = np.max([e.scores for e in examples], axis=0)
--> 303             scores /= np.linalg.norm(scores, ord=1)
    304             sentiment_id = np.argmax(scores).astype(int)
    305             aspect_document = CompletedSubTask(

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

I believe that this error is related to a TypeError between int and float. If instead I initialize with weights = (1,1), for example, I receive no error.

I wanted to flag these issues for your awareness. Thank you very much for any advice you can provide 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions