Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds List.sample and List.sampleN #3398

Merged
merged 2 commits into from
Sep 30, 2024
Merged

Adds List.sample and List.sampleN #3398

merged 2 commits into from
Sep 30, 2024

Conversation

OAGr
Copy link
Contributor

@OAGr OAGr commented Sep 30, 2024

Closes #3337

@OAGr OAGr requested a review from berekuk as a code owner September 30, 2024 14:29
Copy link

changeset-bot bot commented Sep 30, 2024

🦋 Changeset detected

Latest commit: a241b8b

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@quri/squiggle-lang Patch
@quri/squiggle-ai Patch
@quri/squiggle-components Patch
@quri/prettier-plugin-squiggle Patch
@quri/versioned-squiggle-components Patch
vscode-squiggle Patch
@quri/squiggle-textmate-grammar Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Sep 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
quri-ui 🛑 Canceled (Inspect) Sep 30, 2024 2:38pm
squiggle-website ✅ Ready (Inspect) Visit Preview Sep 30, 2024 2:38pm
2 Skipped Deployments
Name Status Preview Updated (UTC)
quri-hub ⬜️ Ignored (Inspect) Visit Preview Sep 30, 2024 2:38pm
squiggle-components ⬜️ Ignored (Inspect) Visit Preview Sep 30, 2024 2:38pm

export function sampleN<T>(array: readonly T[], n: number, rng: PRNG): T[] {
const size = Math.max(0, Math.floor(n));
if (size === 0) return [];
if (size >= array.length) return shuffle(array, rng);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling 10 items from the list with 3 items will return the list of 3 items, is this intentional?

I'd expect it to create duplicates, and that would be consistent with how Dist.sampleN works

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see that this defaults to sampling without replacement on lower N values, so it makes sense to keep it consistent.

It's a pity that it's inconsistent with how Dist.sampleN works, though. I assume it was intentional, and also the reason why you didn't make un-namespaced sampleN non-polymorphic?

Seems like a potential footgun...

How about this: we rename the version with replacement to pickN, and make sampleN consistent for lists and dists(with replacement)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. Agreed it's gnarly.
Another option is to add configs. Maybe something like,
sampleN(samples, {withReplacement: boolean, shuffle: boolean})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, options could work, but if we want to combine this with polymorphism on dists vs lists (which seems natural and good), and have different replacement defaults, then it's still problematic.

const index = Math.floor(rng() * array.length);
if (!indices.has(index)) {
indices.add(index);
result.push(array[index]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look optimal, but I tested it on List.make(n,1) -> List.sampleN(n-1), and even for n = 1M it does only 14M loops, so I guess it's fine.

(asymptote is probably something like O(n*log(n)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

List.sample() and List.sampleN() methods
2 participants