-
Notifications
You must be signed in to change notification settings - Fork 84
docs: adr for fractional #1640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: adr for fractional #1640
Conversation
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
✅ Deploy Preview for polite-licorice-3db33c ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
``` | ||
|
||
This approach supports flexible weight ratios; weights of [25, 50, 25] translate to 25%, 50%, and 25% distribution respectively as do [1, 2, 1]. | ||
It's worth noting that the maximum bucket resolution is 1/100, meaning that the maximum ratio between variant distributions is 1:99 (ie: a weight distribution of [1, 100000] behaves the same as [1, 100]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is a limitation we should consider improving. We could make this dynamic based on the weights, to support a practically unlimited resolution (this might come with other costs, like reducing stability of assignments) or we could choose a higher number, like 1000, or even make this number an optional parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also might want to specify a max integer size in any case, considering precision and range constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can default to 100
and then allow the user, by configuration, to change the ratios. We would need to make sure the sum is respected and what happens when it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to change the bucket size to something larger (e.g. 10_000) as resizing the bucket size can lead to unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even make this number an optional parameter.
This sounds a good approach - it's harder for users to accidentally change the distribution as they have to specify the total weight, and meanwhile we're not limiting the precision. We could start by allowing a list of values as the parameter (100, 1000, 100,000 etc) or setting a maximum value allowed.
* Good, because Murmur3 is fast, has good avalanche properties, and we don't need "cryptographic" randomness | ||
* Good, because we have flexibility but also simple shorthand | ||
* Good, because our bucketing algorithm is relatively stable when new variants are added | ||
* Bad, because we only support string bucketing values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think non-string bucketing values were of interest?
* Good, because our bucketing algorithm is relatively stable when new variants are added | ||
* Bad, because we only support string bucketing values | ||
* Bad, because we don't have bucket resolution finer than 1:99 | ||
* Bad because we don't support JSONLogic expressions within bucket definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a limitation that was also questioned.
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
The bucking value is an optional first value to the operator (it may be a JSONLogic expression, other than an array). | ||
This allows enables targeting based on arbitrary attributes (individual users, companies/tenants, etc). | ||
If not specified, the bucketing value is a JSONLogic expression concatenating the `$flagd.flagKey` and the extracted [targeting key](https://openfeature.dev/specification/glossary/#targeting-key) (`targetingKey`) from the context (the inclusion of the flag key prevents users from landing in the same "bucket index" for all flags with the same number of buckets). | ||
If the bucking value does not resolve to a string, or the `targeting key` is undefined, the evaluation is considered erroneous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targeting key
check is only valid when users haven't specified a custom bucketing value. In those situations, the users are responsible for checking if the value exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right but I wonder if we should improve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goals of this ADR to capture the current state of the fractional operator or the ideal state? If it's the latter, I'd like to define the bucket size and hashing key constraints before approving.
Co-authored-by: Michael Beemer <beeme1mr@users.noreply.github.com> Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@beeme1mr my thinking with this was the capture the current state, and we could immediately open a new PR to outline our hopes. |
- **Ease of use**: must be easy to use and understand for basic use-cases | ||
- **Customization**: must support customization, such as specifying a particular context attribute to "bucket" on | ||
- **Stability**: adding new variants should result in new assignments for as small a section of the audience as possible | ||
- **Strong avalanche effect**: slight input changes should result in relatively high chance of differential bucket assignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that some of the requirements here are the consequences of the hash algorithm we picked, not necessarily requirements.
Wearing the hat of a user, I would like the actual distributions as close as what I specified in the weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an advantage to this variability from an A/B testing perspective. It wasn't a primary reason for choosing this algorithm but it was at least considered when evaluating the various options.
|
||
## Considered Options | ||
|
||
- We considered various "more common" hash algos, such as `sha1` and `md5`, but they were frequently slower than `Murmur3`, and didn't offer better performance for our purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is hashing the bottleneck of an evaluation? If not, this does not sound a good reason to not use other algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a bottleneck since hashing can be fairly CPU-intensive, and this could run multiple times for every request.
The bucking value is an optional first value to the operator (it may be a JSONLogic expression, other than an array). | ||
This allows enables targeting based on arbitrary attributes (individual users, companies/tenants, etc). | ||
If not specified, the bucketing value is a JSONLogic expression concatenating the `$flagd.flagKey` and the extracted [targeting key](https://openfeature.dev/specification/glossary/#targeting-key) (`targetingKey`) from the context (the inclusion of the flag key prevents users from landing in the same "bucket index" for all flags with the same number of buckets). | ||
If the bucking value does not resolve to a string, or the `targeting key` is undefined, the evaluation is considered erroneous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand more about only allowing string?
|
||
#### Bucketing strategy implementation | ||
|
||
After retrieving the bucketing value, and hashing it to a [0, 100] range, the algorithm iterates through variants, accumulating their relative weights until finding the bucket containing the hash value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: [0, 100) or [0, 99]
``` | ||
|
||
This approach supports flexible weight ratios; weights of [25, 50, 25] translate to 25%, 50%, and 25% distribution respectively as do [1, 2, 1]. | ||
It's worth noting that the maximum bucket resolution is 1/100, meaning that the maximum ratio between variant distributions is 1:99 (ie: a weight distribution of [1, 100000] behaves the same as [1, 100]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or even make this number an optional parameter.
This sounds a good approach - it's harder for users to accidentally change the distribution as they have to specify the total weight, and meanwhile we're not limiting the precision. We could start by allowing a list of values as the parameter (100, 1000, 100,000 etc) or setting a maximum value allowed.
|
||
```json | ||
{ | ||
"fractional": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having variants as the only allowed value won't support some use cases of A/B testing straightforward (they could probably write some conversions to generate the expression), for example,
I would like 30% of population to assign to group A, and 70% to group B.
- For group A, if user is from OpenFeature, assign variant P, otherwise Q
- For group B, if user is from OpenFeature, assign variant M, otherwise N.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense. I think it's reasonable to want this enhancement.
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Since this is historical, I'm merging this. That doesn't mean it can't be edited or superseded, it's simply an explanation of a previous decision and reflects the current implementation. |
Adds ADR for fractional.
As written this is historical, however, we can modify this one in this PR if you would like. I know @tangenti @dominikhaska @cupofcat are interested in some changes here. Feel free to suggest them and if we have a consensus we can turn the changes into roadmap items.