Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections again #185

Open
apblack opened this issue Dec 31, 2019 · 7 comments
Open

Collections again #185

apblack opened this issue Dec 31, 2019 · 7 comments

Comments

@apblack
Copy link
Contributor

apblack commented Dec 31, 2019

I just closed #104 because it was old, and the particular issues discussed there have been resolved. I've recently run up against some new ones.

Recently, I was wanting to extract an arbitrary element from a set, and could not see how. So I added anyone to sets. Then, later, I found this specification in type Collection⟦T⟧

first -> T
    // The first element of self; raises BoundsError if there is none.  
    // If self is unordered, then first returns an arbitrary element. 

I had forgotten about this. This question I want to raise is: which is preferable?

  1. Having first on unordered collections return an arbitrary element, and on collections with numeric keys be equivalent to at 1
  2. Having anyone work on any collection, and return an arbitrary element, and have first work only on collections with numeric keys, where is is always equivalent to at 1

I also noticed that find(predicate) ... and includes(predicate), which look for an element that satisfies predicate and either return it (find) or tell us whether or not the search was successful (includes), existed on sets but not other collections. This seems wrong; there is no reason not to implement them on all collections, and indeed the same code would work for all. But I have trouble remembering the distinction between contains and includes (perhaps because Smalltalk uses includes for what Grace calls contains). So I'm suggesting

anySatisfies(predicate: Function1⟦T,Boolean⟧) -> Boolean
    // true if predicate holds for any of the elements of self

allSatisfy(predicate: Function1⟦T,Boolean⟧) -> Boolean
    // true if predicate holds for all of the elements of self

where anySatisfies is the new name for includes, and allSatisfy is a new method.

The tension here is between putting more stuff into collections, and thus making the interface better for sophisticated programmers, but harder for novices to wrap their minds around, or leaving stuff out, and thus providing opportunities for student assignments that implement methods like anySatisfies and includes.

We can have the best of both worlds, by creating a beginning dialect that exposes simpleCollections, for example. But then someone has to maintain them.

@kjx
Copy link
Contributor

kjx commented Jan 1, 2020

leaving stuff out, and this providing opportunities for student assignments that implement methods like anySatisfies and includes.

In a world in which Grace is open source, popular, widely used, then the source code is available anyway, and answers to every assignment question will be on stackoverflow

harder for novices to wrap their minds around

to me this is the most important reason not to add things in.

Although, again, perhaps it matters less than we think, if documentation can present the core methods first, without requiring novices to learn the entire lot at once. If we had autocompletion (static types? statistics?) then we could always offer the core methods first. Still: we have dialects, be a shame not to use them.

@KimBruce
Copy link
Contributor

KimBruce commented Jan 4, 2020

Is there any reason why we don't just use iterators for all this? Each collection class has an iterator and there is a filter method as well. Having first for an unordered collection seems wrong (even though it is easily implemented from the iterator)!

anySatisfies is just filter with a check for empty. allSatisfies can also easily be implemented with iterators.

One thing we should worry about is libraries that are too complicated for novices. I found in teaching Java-based data structures that it was nearly impossible to avoid talking about wild card types (though I hated them!). It should be easy to write a simple, easy-to-maintain dialect that restricts what can be invoked from collections and hence solves that problem.

@apblack
Copy link
Contributor Author

apblack commented Jan 5, 2020

As for first not being appropriate for unordered collections, I agree: it seems wrong.

And yes, all of these things can be implemented with iterations. Indeed, they are so implemented. What it comes down to is: should we have just one collection module, or two, and if two, should they be “collections” and “beginningCollections”, or “advancedCollections” and “collections”. And which should be part of “standard”?

@kjx
Copy link
Contributor

kjx commented Jan 6, 2020

Is 'any' or 'random' an option rather than 'first'?
The real problem with 'first' isn't 'first' - it's that it implies there should be a 'second'.

@kjx
Copy link
Contributor

kjx commented Jan 6, 2020

There's a difference I think between implementing things with iterators (C++ STL perhaps), vs making iterator-like things a first-class part of the library design (e.g. Java Streams). The random example here would give a stream that sampled a collection presumably without replacement.

@kjx
Copy link
Contributor

kjx commented Jan 6, 2020

I'm not sure it matters whether we have one or two or even three collection modules.
Either or both can be part of the standard.

Given an Advanced Collections library; it should just be a Small Matter of Programming to use it to implement a much simpler beginner's library - wrappers are probably easiest.

Given a basic collections library, it will be relatively easy to implement new kinds of collections;
implementing more behaviour / algorithms into every existing collection will be tricker unless that's done as external methods, although with somewhat sneaky modularisation or inheritance / traits even internal extensions shouldn't be too hard. (Either the collections inherit from parallel empty traits in a "collections-extension-hook" module that can be replaced by programmers, or an extended collections module can import the individual leaf collections and extend each one. Supporting something like family polymorphism gets more of this for free, but will most likely break the manifest rule - although again, it may be possible with careful design of replaceable modules).

What's most important is that we actually have a collections library (yay Andrew);
that it's clearly described; ideally portable;
and that the interface to the underlying VM is also clearly described, portable, and ideally minimalist.
Kernan has an interface to a primitive mutable collection (think Java array) via a "platform/memory" module which is a factory for the primitive collection objects that understand only size, at(_), at(_)put(_) - user level collections are implemented in terms of those primitive objects...

(this is another reason why I don't much like the [1,2,3] notation because that's where the circularity comes in --- unless again [1,2,3] is always re-written to something like literalHook.newSequence(1,2,3) (doesn't have to be that verbose) where literalHook is presumably a small module with factories for sequences (and probably ranges too) but which can be replaced where necessary so that [1,2,3] notation can work with any particular library. Or something.

@apblack
Copy link
Contributor Author

apblack commented Dec 11, 2020

Nice observation, @kjx, that if we once have a complete collections bundle, then creating a restricted bundle is a simple application of excludes. This is much better than two independent source pools, or extension methods.

@apblack apblack changed the title Collections agains Collections again Nov 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants