Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Array iterator classes private to hide implementation from client #193

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

pmetras
Copy link

@pmetras pmetras commented Jan 9, 2022

Proposal to make Array iterator classes ArrayKeys, ArrayValues and ArrayPairs private. Instead Array function keys, values and pairs will return general Iterator interface like other stdlib classes that implement theses functions. Hiding the implementation of these iterators follow good design principles.

A new interface Rewindable has been added for when an iterator can be rewinded.

These changes have been implemented in my repository of the Pony compiler and all tests pass.

EDIT: Rendered RFC: https://github.com/pmetras/rfcs/blob/rfc-array/text/0000-array-private-classes.md

@SeanTAllen
Copy link
Member

If this is discussed at a sync that I'm not present at before I have a chance to comment more fully, I would like it noted that I am not in favor of this change.

I want to note this note as I probably won't have time to put together a full write up within the next few days.

@jemc
Copy link
Member

jemc commented Jan 9, 2022

Apart from the naming suggestion I gave above, I am in favor of this change 👍

@pmetras
Copy link
Author

pmetras commented Jan 9, 2022

A bit of context for this RFC.

I'm presently learning Pony, reading all documentation and particularly the one of the stdlib. Looking at the Array class, I was surprised that the API was unveiling array iterators implementations, while traditional API design promotes encapsulation. Looking at other classes in the stdlib, I saw that there were other classes that make public their internal implementation while others did not.

 $ find . -type f -name '*.pony' -exec grep "fun values" {} \; -print
  fun values(): Iterator[this->ByteSeq box]
./packages/builtin/std_stream.pony
  fun values(): (Iterator[this->A] & Rewindable[this->A]) =>
./packages/builtin/array.pony
  fun values(): Iterator[this->A]^
./packages/builtin/read_seq.pony
  fun values(): StringBytes^ =>
./packages/builtin/string.pony
  fun values(): Iterator[this->A]^
./packages/builtin/seq.pony
  fun values(): ListValues[A, this->ListNode[A]]^ =>
./packages/collections/list.pony
  fun values(): MapValues[K, V, H, this->HashMap[K, V, H]]^ =>
./packages/collections/map.pony
  fun values(): Iterator[val->A]^ =>
  fun values(): Iterator[val->A]^ =>
./packages/collections/persistent/list.pony
  fun values(): Iterator[A]^ =>
./packages/collections/persistent/set.pony
  fun values(): SetValues[A, H, this->HashSet[A, H]]^ =>
./packages/collections/set.pony
  fun values(): Iterator[this->A]^ =>
./packages/collections/heap.pony
  fun values(): Iterator[this->String]^ => strings.values()
./packages/cli/command_spec.pony

I couldn't find the logic but the fact that some results concern interfaces and some concrete classes. When designing classes, exposing internals is often considered bad design. But it could be related to some Pony constraints that I don't know about yet. So I decided to ask the question on Zulip regarding the class Array. Sean T Allen listed his arguments but I did not understood them as they were neither in favour or against such API design decision. And he decided that he wouldn't answer my questions on the subject but if I open an RFC.

A few days passed and I was still thinking about that API decision, and because I think that there must be some rational in any decision, I thought that it could be because of a technical constraint imposed by the language. So I decided to test it by myself to see if there was something in Pony that prevented it. I did the changes in the Array class in the compiler and ran all the tests, and they passed without problem. So that was probably about something else I don't understand...

Here is the link to my branch of Pony compiler that implement the changes: https://github.com/pmetras/ponyc/tree/rfc-array

Following Sean T Allen's recommendation that the discussion should continue with a RFC, that's what I did.

One of Sean's argument is that naming things is important: "I personally find naming the thing that implements the interface to be a good thing. I have a thing I can refer to. A name that can be reused."

Thinking about this argument, from a type or object perspectives is not important, in my opinion, when client code call values or keys functions. The client only needs to know that she is getting an opaque object of an unknown type to her (that's the private class), that respect the Iterator* return type of the functions...
I thought that this could be different in the situation of using composition, instead of inheritance, to extend this opaque object. But first, that's the Array data structure that a client generally wants to extend and not ArrayValues, and secondly composition extends through redefinition of functions in the new types. So hidden classes still remain hidden. And like I've shown in the RFC example of code using ArrayValues, client have no reasons to use these types.

All my attempts to find a rational failed. I must say I don't understand Sean's argument. I agree that naming thing for public types are important, but I don't see why you need to name things that clients of your class don't need to know (private implementation classes). That's against information hiding/encapsulation. I don't think that Pony syntax or stdlib promotes not enforcing this design principle.

@pmetras
Copy link
Author

pmetras commented Jan 9, 2022

And I forgot to say that Array is only one situation where this occurs. If one would want to simplify the stdlib API, this change could be applied in other places like List, Map, Set, etc.

@ergl
Copy link
Member

ergl commented Jan 11, 2022

I don't have a strong opinion on this RFC, but if Array already returned an Iterator interface, I'd probably vote against an RFC that wanted to change them to return concrete types as it does now.

My only concern about this RFC is adding the overhead of virtual calls for these iterators. I assume iterating over arrays is very common, so these specific classes are used a lot, even if users don't name them.

@jemc
Copy link
Member

jemc commented Jan 11, 2022

@ergl - I had the same concern about virtual calls, though I think in practice most iterator-using use cases are direct enough (e.g. passing straight into a for loop) that LLVM "should" be able to inline enough of the code to devirtualize the calls by seeing the concrete type descriptor being written in the same context where the otherwise-virtual calls are being made.

This is a hypothesis we should probably test prior to accepting the RFC. I can do some testing of that if it looks like the RFC has enough support to be accepted.

@jemc
Copy link
Member

jemc commented Jan 11, 2022

@SeanTAllen mentioned in the sync call that he'd like to see one of two things happen with this RFC:

  • 1️⃣ Update the RFC to apply to state that it will apply to all collection iterators in the standard library
  • 2️⃣ Update the reasoning provided in the RFC, to explain why it will apply only to builtin and not set a precedent that should apply to other packages in the standard library

He wants to hone in on one of those two directions so that he can better focus his energy in stating his objections, either to the general case (:one:) or to the builtin-specific reasoning (:two:)

I think this a reasonable request, and agree.

@pmetras
Copy link
Author

pmetras commented Jan 12, 2022

My initial concern was only to get an answer to the question in the title of this RFP that I could understand. It could have been "It's that way because I decided that it is", without any later discussion... But I was intrigued by the fact that the design decision is against the Sanctus encapsulation/information hiding principle of computer science. And I decided to open this RFC to get a definite answer.

Now, if I had to select between direction 1️⃣ or 2️⃣, I think that applying a design principle only on part of a library would be a bad decision by creating an exception (the less exceptions, the better the language). The remaining paths are either correct everything ( 2️⃣) or do nothing (cancel and close the RFC).

Personally, I think that 2️⃣ is the best move, but I don't know yet what could be the impact on existing code or other consequences of such change. But I'm ready to investigate the impacts and update the text of the RFC in that direction if others think it's worth my effort.

@jemc
Copy link
Member

jemc commented Jan 18, 2022

@pmetras - I'm confused by your use of numbering - can you please check the numbers in the above comment I gave to make sure they match the use of your numbers - I think you meant 1️⃣ when you said 2️⃣ - can you confirm?

We discussed this on today's sync call, and Sean suggested you go ahead and try to update the RFC to make your arguments in favor of the approach for either 1️⃣ or 2️⃣ - that's not a guarantee the RFC will be accepted but it's to ensure he (and others) fully understand the points being argued so that he (and others) can voice any relevant objections clearly.

@pmetras
Copy link
Author

pmetras commented Jan 18, 2022

I'm confused by your use of numbering - can you please check the numbers in the above comment I gave to make sure they match the use of your numbers - I think you meant when you said - can you confirm?

Effectively, I mixed the numbers. I meant [one], that is applying the design change to all stdlib.

@SeanTAllen
Copy link
Member

This is very hard to read in GitHub due to lack of line breaks. Can you add hard line breaks around column 80 or so to make it easier to read in the GitHub web UI?

@SeanTAllen
Copy link
Member

I think this RFC doesn't consider the cost of breaking user code enough. Even if I was to be convinced of the value of this change, I think we need to give more weight to considering that this will break people's code and a good justification beyond "design principles" needs to be given for breaking code.

Regardless of how I feel about this specific RFC, I would make the same point for any RFC. I would like to see the breaking change taking more seriously and consideration given to justifying that breaking change.

RFCs set precedent for future changes and the breaking change in here feels somewhat cavalierly presented.

Note: To remain consistent with `Array` behaviour, functions `keys` and `pairs` should return a `RewindableIterator` too but we limited the API change to minimum as we did not understood why it was not already the case.

```pony
interface RewindableIterator[A] is Iterator[A]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for is Iterator[A] as far as I can tell here. Is this an attempt to make sure in the future that RewindableIterator doesn't diverge at all from Iterator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for interfacebut I just kept with the coding style of the stdlib where numerous classes (i.e. Array, String, List, Range, Reverse, HashSet, etc.) explicitly name the interface they implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the case of those, it made sure that if the Iterator interface was changed that all those concrete classes would fail compilation, is that the intention here? If yes, I think there is a larger discussion about the idea of duplicating the methods from Iterator in RewindableIterator and then using is to enforce the Iterator interface.

It feels odd. It might be worth it to avoid having an interface that is only rewind and the returning the two interfaces, but, it does feel odd and I think the reasoning is worth discussion.

I know you had some conversation around this with @jemc. I think you should discuss with him some supporting justification for this. Is this a one off? Is this establishing a pattern that we want to use in the standard library? Is it a pattern that is already being used in the standard library?

Copy link
Member

@jemc jemc Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to keep the is Iterator[A] part and remove the explicit next and has_next definitions.

For an interface, that roughly reads as:

  • "To implement RewindableIterator[A], you need to implement everything Iterator[A] requires (which doesn't need to be repeated here), but you additionally need to implement this rewind method."

Or from the perspective of the interface user:

  • "If you get a RewindableIterator[A], it can do everything Iterator[A] can do (which doesn't need to be repeated here), but you additionally can use it to rewind"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted below by me, I prefer the alternative design of not having RewindableIterator[A] overlap with Iterator[A] and instead have Iterator[A] & Rewindable[A] define RewindableIterator[A].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what is finally desired here. What would be the RewindableIterator (or CollectionIterator) definition so that I can include it in the RFC text?

@SeanTAllen
Copy link
Member

@pmetras thank you for the linebreaks. its much easier to read now.

Comment on lines 52 to 57
- The interface `RewindableIterator` is added to create rewindable iterators
(can be re-start from first value).

This change remains compatible with the existing code base but for client code
that is directly using the classes `*Keys`, `*Values` and `*Pairs`. A search on
Github shows that the impact is very limited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be motivation and should be worked into design as these are notes and statements about a some specific design points.

@ergl
Copy link
Member

ergl commented Feb 10, 2022

@pmetras This part:

I don't see a major advantage in creating a new type of iterator for collections only, when both iterators are builtin. Iterator could be updated to define rewind with a default implementation, and neither client's nor implementer's code won't be impacted at all...

would not work for iterators that define a streaming sequence. You could, potentially, define an iterator over an infinite stream of events that can't possibly be rewinded, unless you kept all previous elements in memory. We don't want to constraint future uses of the Iterator interface so that all implementations have to be rewindable.

@pmetras
Copy link
Author

pmetras commented Feb 10, 2022

@ergl , perhaps I did not express correctly my concern. The important word is for collection only. I have previously suggested naming it RewindableIterator or better, in my opinion, having a Rewindable interface.

If Iterator is defined as an interface with a default implementation of rewind that raises an error or does nothing or create a new iterator (behaviour to be defined later), this type could be used even by streams or infinite sequenses that can't be rewound. For instance:

interface Iterator[A}
  fun has_next(): Bool
  fun next(): A ?
  fun rewind() ? => error
    """By default, iterators can't be reset"""

I don't think that such interface is the best solution, but at least it prevents having two types with one that is difficult to limit to collections usage.

@SeanTAllen
Copy link
Member

@pmetras This RFC should not modify Iterator[A]. No one is suggesting anything that would result in a change to Iterator.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Feb 15, 2022
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 24, 2022
@pmetras
Copy link
Author

pmetras commented Feb 24, 2022

I've added the rendered RFC at https://pmetras.github.io/rfcs/ and updated the text.

Now, I've just to find some time to see how to implement it and measure the impact.

@rhagenson
Copy link
Member

rhagenson commented Feb 24, 2022

@pmetras I can only speak for myself, but I am not going to accept an RFC which breaks from our RFC procedure. There is no need to add HTML commits as GitHub will render the RFC's Markdown directly. You can link to that rendering like I did in #190

@pmetras
Copy link
Author

pmetras commented Feb 24, 2022

@rhagenson , please explain how to have it rendered and how to get a rendered URL. I wasn't able to find it automatically rendered as I wanted to be able to read it and I would have save a few hours today. So I used pandoc and followed Github documentation...

@rhagenson
Copy link
Member

@pmetras Thank you! Will you update your first comment with the link to the GitHub rendered Markdown?

Copy link
Member

@rhagenson rhagenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left a large number of comments. My overall impression is still that I am in favor of the simpler return types, but there are issues that need to be addressed with respect to performance impact and ensuring the motivation for these changes are not setting a precedent for hiding all implementation details based on a notion of strict information hiding.

* `Vec`
* persistent `Vec`
* `Set`
* `Itertools`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

itertools seems out of place to me here. I see no internals of itertools being exposed. Iter is the only public type in itertools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this below. There are references to ArrayValues within itertools which would need to change. Having itertools in this list is still out of place though as this list is for "other collection classes that expose internals too" which itertools is neither "[an]other collection class" nor does it "expose internals too".

Comment on lines +66 to +69
To quote Antoine de Saint-Exupéry:

> Perfection is achieved, not when there is nothing more to add, but when there
> is nothing left to take away.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To quote from the Pony Philosophy:

Performance

Runtime speed is more important than everything except correctness. If performance must be sacrificed for correctness, try to come up with a new way to do things. The faster the program can get stuff done, the better. This is more important than anything except a correct result.

Simplicity

Simplicity can be sacrificed for performance. It is more important for the interface to be simple than the implementation. The faster the programmer can get stuff done, the better. It’s ok to make things a bit harder on the programmer to improve performance, but it’s more important to make things easier on the programmer than it is to make things easier on the language/runtime.

If making this change to abstract types has a performance impact than it is not inline with our driving philosophy which places performance at a higher priority than simplicity.

Comment on lines +71 to +76
Currently the API makes a guarantee that the return value of each of these functions
is a specific concrete class with a specific public name. That particular guarantee
does not actually serve any practical needs of users (except, arguably, the virtual
calls caveat that was mentioned on [Github discussion](https://github.com/ponylang/rfcs/pull/193#issuecomment-1009733064)). If it is not helpful to users in practice,
it can be removed. And if it can be removed, then it should be removed, in the
name of taking another step toward stabilizing the Pony standard library's API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This linked issue comment is in line with my previous review comment concerning performance taking priority over simplicity in the Pony Philosophy. I would like to see some evidence that the compiler as it is today devirtualize the calls and as such we get both performance (no worse than today) and simplicity (improved by this RFC).

Comment on lines +85 to +92
It'll always be easier to get more specific with these return types than more
general. So we should default to offering the most general return type that is
still useful to users, and ratchet down to more specific subtypes as needed (as
tangible use cases for more specific return types are discovered). Ratcheting down
will always be possible without a breaking change, but ratcheting up will be a
breaking change each time. It's best to make such "ratchet up" breaking changes
like this RFC before we reach Pony 1.0.0 as part of a general effort to make Pony
and its standard library stable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree with this as motivation. It is always easier to start with many concrete implementations/returns which are then used to derive a generic implementation/return and trying to be generic before concrete presents more room for instability.


# Detailed design

Iterating functions in collections `keys`, `values` and `pairs` are changed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example they return a CollectionIterator, not an Iterator.


# Alternatives

1. Stay as is. Continue the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we continue the conversation of keeping this as is? If no change is needed then no conversation is needed? Are you suggesting other changes which should be pulled into this RFC from Zulip?

| a. Simpler stdlib API | --- | --- | + | +++ | +++ |
| b. Simpler function signatures | --- | --- | + | +++ | +++ |
| c. Stay compatible with existing stdlib | +++ | +++ | + | +++ | + |
| d. Evolutive | --- | + | +++ | +++ | + |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for the sake of changes is not evolution -- there must be positive or negative selection which lead microevolution into macroevolution. Put another way, if all the suggested changes have negative effects than it is still "evolution" to do nothing as we do not bring those negative effects forward with us.

Side note: I am a career "full stack biologist" who has done work ranging from cancer to conservation to COVID-19 -- Pony is the project that I volunteer my time with when not working on the "evolving nature" of biological problems.

| c. Stay compatible with existing stdlib | +++ | +++ | + | +++ | + |
| d. Evolutive | --- | + | +++ | +++ | + |
| e. No impact on compiler | +++ | +++ | - | +++ | - |
| f. No impact on performance | +++ | +++ | +++ | +++ | +++ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmarked these changes to ensure no performance impact?

| e. No impact on compiler | +++ | +++ | - | +++ | - |
| f. No impact on performance | +++ | +++ | +++ | +++ | +++ |
| g. Limit stdlib pollution with interfaces | +++ | +++ | + | + | +++ |
| g. Limit stdlib pollution with classes | --- | --- | +++ | ++ | +++ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "g" repeated?

Comment on lines +359 to +366
# Unresolved questions

- Must analyze how `Range` and `Reverse` would be impacted if defined as
`CollectionIterator`. Particularly, impact on existing client code.
- Possible candidates to be analyzed: `StringBytes`, `StringRunes`, `ByteSeqIter`
and probably others will be found in stdlib code.
- Understand why the intersection type proposal is not considered as it seems to
provide a better flexibility and is easier to understand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are questions.

Comment on lines +12 to +17
Following information hiding design principle, the builtin classes of collection
data structures must not be made visible through iterator functions like `ArrayKeys`,
`ArrayValues` and `ArrayPairs`. These classes can be made private as they are
only used as return types for `Array` functions `keys`, `values` and `pairs`.
The return values for these functions are changed to the more general interface
`Iterator`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without "Following information hiding design principle"

being removed. I can't support this RFC. There's no "design principle" that I want to codify into Pony via RFC as a precedent for making changes.

Comment on lines +22 to +23
This design principle is applied to the other collection classes that expose
internals too like:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no, I am not in favor of anything in here that I am in favor of based on an abstract design principle. I can't support this language being included.

@SeanTAllen
Copy link
Member

I've repeated numerous times that I am against setting precedent based on a design principle. I didn't read the entire new version of this RFC. I read up to the 3rd mention of this being done because of a design principle and stopped.

I'm against this RFC and at this point, I'm not going to invest more time in it. I can't support the motivation given, do not support it being used to set precedent going forward and have stated that more than once. I have many other Pony related things to do and so I am voicing my opposition to this RFC. I would welcome an RFC that meets the reasoning stated here: #193 (comment).

As is, it seems clear to me this RFC is not going to be written taking that motivation into account.

@pmetras
Copy link
Author

pmetras commented Mar 5, 2022

I won't update this text anymore. It seems my English vocabulary and wording are not adequate to describe the objective of the RFC. Anybody who wants to support this proposal can take over and update this text.

@rhagenson
Copy link
Member

@pmetras Thank you for the work you have put into this RFC.

I am in support of the core of the RFC so I am still hoping to see it through to the end. However, I am already working on another RFC (#200) which I expect will draw most of my attention. Is there anybody else in the Pony community with the bandwidth needed to take over this RFC?

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants