Replies: 3 comments
-
Just for a side note for proposal#3. If we end up stuffing all extracted properties or data into the I suggest considering keeping the strongly typed properties or attributes (maybe |
Beta Was this translation helpful? Give feedback.
-
Good points; however annotations are just pairs (RULE_ID, TEXTUAL_EXPLANATION), so they are pairs of string essentially so far. The problem you refer to will only happen if we decide to put more complex structures in there, which I do not see the need for, at this time. I agree that if we go down that route, we will have to use types to our advantage. But I would not touch that with this refactoring; rather the goal would be to streamline the process by removing the overlapping in the responsibilities of different stages of the process (commit processing Vs rule application). To me, the |
Beta Was this translation helpful? Give feedback.
-
This is all implemented as of 3122cb6 |
Beta Was this translation helpful? Give feedback.
-
It has become evident that the role of feature extraction and rule application is somewhat overlapping; in some cases we had to deal with duplication of code because a feature extraction routine would do more or less the same that would be needed for a rule, but not quite in the exact same way...
Also, in these first few months of the "Prospector 2" development, the priority has shifted from a pure ML-based approach to a more pragmatic (but equally or more effective) rule-based approach. The question is how to reconcile the two. Initially, the CommitWithFeature class was meant to represent data records that would end up in dataframe rows, for ML use. But what rules do, at the end of the day, is also feature extraction; the only difference is that rules compute features that are immediately meaningful to humans and come with a human-readable explanation.
Proposal
Beta Was this translation helpful? Give feedback.
All reactions