Skip to content

Vector preprocessing happens during initialization rather than after calling .featurize() (Should it be moved?) #358

@xehu

Description

@xehu

While testing out a local instance of the FeatureBuilder, I found that the preprocessing for vectors/sentiments happens upon initialization, rather than after calling .featurize(), which would make more sense.

Here's the output after merely initializing the FB:

>>> df_test = pd.DataFrame({
... "conversation_num": [1,2,3],
... "speaker_nickname": ["foo","foo","bar"],
... "message": ["lorem", "ipsum", "dolor"] 
... })
>>> my_fb = FeatureBuilder(input_df = df_test)
Initializing Featurization...
Confirmed that data has conversation_id_col column: conversation_num!
Confirmed that data has speaker_id_col column: speaker_nickname!
Confirmed that data has message_col column: message!
Generating RoBERTa sentiments...
100%|██████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.55it/s]

Was this the behavior all along, or did we previously start processing only after calling .featurize()? Creating this issue to document/address.

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions