-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Currently, when the FeatureBuilder is run, we go through the entire process --- generating utterance-level features from scratch, then conversation-level features, then conversation-/speaker-level aggregates. There might be a world in which a user wants to pass in an already-featurized utterance-level dataset, and they simply want the aggregated features. This would potentially allow the FB to run faster, without going through the whole featurization process again.
Major issue: how would we validate the utterance-level data the user passes in? What if they pass in something that's junk? Right now, since we control the entire pipeline, we can guarantee that the utterance-level features are legitimate. If we allow the user to pass in their own version, we need to think about how we want to validate it --- otherwise, the aggregates are just garbage-in, garbage-out.
Another variation of this: what if the user wants only the features at the conversation-level, and they want to skip the generation of the chat-level features?