Skip to content

Allow the user to ONLY calculate aggregates while reading in an existing utterance-level dataframe #357

@xehu

Description

@xehu

Currently, when the FeatureBuilder is run, we go through the entire process --- generating utterance-level features from scratch, then conversation-level features, then conversation-/speaker-level aggregates. There might be a world in which a user wants to pass in an already-featurized utterance-level dataset, and they simply want the aggregated features. This would potentially allow the FB to run faster, without going through the whole featurization process again.

Major issue: how would we validate the utterance-level data the user passes in? What if they pass in something that's junk? Right now, since we control the entire pipeline, we can guarantee that the utterance-level features are legitimate. If we allow the user to pass in their own version, we need to think about how we want to validate it --- otherwise, the aggregates are just garbage-in, garbage-out.

Another variation of this: what if the user wants only the features at the conversation-level, and they want to skip the generation of the chat-level features?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfutureLonger-range goals that should take place in the future (but are not immediate or upcoming).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions