Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update test262 +test262.properties file (re)generator #930

Merged
merged 17 commits into from
Jun 23, 2021

Conversation

p-bakker
Copy link
Collaborator

@p-bakker p-bakker commented Jun 16, 2021

As we now have Template Literal support, I wanted to update the version of the test2626 TestSuite we run against to the latest commit in master.

As I didn't want to manually update test262.properties accordingly, I set out to write some code to generate test262.properties from the results of running the TestSuite at the standard optLevels (-1, 0 and 9).

In order to make things easier, I changed the format a bit:

  • removed the ! in front of every .js file: if the file is listed in test262.properties, it's already an exclusion, so the ! is unneeded
    This step makes the directory/file path a key in the Java Properties file format, instead of a commented line. As a result, it's now possible to add comments after the path and if your editor supports the Java Properties file format, you get nice colors
  • changed from # to ~ for excluding a directory
  • support for comments after the folder/.js file name

I've also changed the behavior a bit:

  • removed the logic that allowed excluding a similarly named file in multiple subfolders, by excluding the parent path
    While excluding this logic made it easier to (re)generate the properties file, I think it was also very confusing behavior to begin with
  • on .js files that only fail in some strict-mode/optLevel combinations, details which variations fail is added as the first comment:
    • strict: fails only in strict mode on all optLevels
    • non-strict: fails only in non-strict mode on all optLevels
    • non-interpreted: always passes in interpreted mode, but fails on all other optLevels
    • {unsupported: [...]}: skipped due to depending on unsupported features
    • {strict: [1,9], non-strict: [9]}: any other scenario not matching the ones above
  • if all tests fail for a subfolder (i.e. built-in/Array/prototype/xxxx), the .js files can be omitted and replaced by an * in front of the subfolder
  • all tests that fail due to depending on unsupported features are included (by default), in order to:
    • have a full picture
    • easier to comprehend the delta between the old and new .properties file when regenerating it after updating the version of the test262 Test Suite
  • on (rolled-up) folder entries stats are included as the first part in the comment # of failed tests vs. the total # of tests

Check out the updated ./testsrc/README.md for more details, but the basic gist of regenerating the test262.properties file is ./gradlew test --tests org.mozilla.javascript.tests.Test262SuiteTest --rerun-tasks -DupdateTest262properties

The only downside of regenerating the test262.properties file is that any lines that are just comments are lost. To me, that is a reasonable tradeoff though

If the changes in this PR seem overwhelming (especially comparing test262.properties against master): I did things in steps (1 through 6), see the individual commits

@rbri I hope you don't mind but I took your work in #910 as the basis of my work. I think this PR makes yours obsolete, as it adds all the relevant info directly as comments on the individual test cases in the test262.properties file.

When attempting to update to the latest commit in the main branch of test262 I discovered that since September last year, the test262 harness utilizes rest parameters, so unfortunately wasn't able to update all the way to the latest commit on their main branch.

Some open questions:

  • when (re)generating the test262.properties file, there are a couple of flags that affect the output (see readme). The default is to do rollup, include stats and include failing tests due to unsupported features. To me this makes the most sense, but the default (thus what we should also commit) could be changed
  • the stats show the % failed test. Could change that to show the % passed. Dunno which number makes us happier :-)

Next steps (can be a future PR(s)):

  • add rest parameter support to Rhino, so we can attempt to update the version of test262 we test against to the latest commit on their main branch
  • move excluded features list out of Test262TestSuite and into test262.properties
  • move excluded flags list out of Test262TestSuite and into test262.properties
  • when (re)generating the .properties file, include all test262/test// folders in the newly generated .properties file (so new folders added in the test262 repo will automatically be included)

Hope you guys see this as a good step forwards.

closes #817

@rbri
Copy link
Collaborator

rbri commented Jun 16, 2021

removed the logic that allowed excluding a similarly named file in multiple subfolders, by excluding the parent path

Great

ll tests that fail due to depending on unsupported features are included (by default), in order to:

Yes you made my day

Hope you guys see this as a good step forwards.

From my point of view this is more than one good step forward - Thanks a lot

@gbrail
Copy link
Collaborator

gbrail commented Jun 16, 2021

This looks very promising. Can we talk for a second about how we'll use it going forward?

Does the logic to regenerate test262.properties rewrite the whole file, or does it preserve what's there currently? In particular, I see that you added a lot of helpful metadata to test262.properties about why a particular test is failing. Does all that get thrown away when the file is regenerated?

If so, I'd suggest that we'll really only use the generator once, to build the files for this PR, and then after that we'll edit the file manually in individual PRs like we do today.

@gbrail
Copy link
Collaborator

gbrail commented Jun 16, 2021

Second question -- can we use a transpiler to make the test262 files run for us even though they use rest parameters?

Copy link
Collaborator

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a few questions about the syntax that I put in with the changes to the properties file.

S15.1.3.1_A2.4_T1.js
S15.1.3.1_A5.2.js
```
Folders can be prefixed with a `~` to mark the folder to be skipped entirely, for example because it contains all tests for a feature not yet supported by Rhino or because running the tests take a long time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tilde marks the whole folder to be skipped, doesn't that imply that all the files in it would also be skipped?

In other words, in the example above, wouldn't the same behavior result if the three individual file names (name.js and so on) were not in the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but built-ins/decodeURI is excluded because of performance reasons. The three files were failing at the time when the entire folder was excluded (or so I assumed). And I decided we do not want to loose this info (as it was there in the file before I made changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean on regeneration if the directory is marked as skipped, then it retains the files listed below it?

Copy link
Collaborator Author

@p-bakker p-bakker Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup: marking a folder to be skipped using the ~ is a manual action. If you as a dev decide to not remove the failing files of that folder, then when regenerating the file, it'll copy them over

As I mentioned in the description of the PR and the README: the only things lost when regenerating the file are lines that just contain comments (first non-whitespace character on the line being either an ! or a #)

```
If all files in a (sub) folder are expected to fail, instead of listing all files explicitly, the folder can be marked as expected to contain only failing tests by prefixing the folder path with an `*`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A syntax that folks might understand better would be "prototype/flatMap/*". Or even use "**" to indicate multiple levels of directory. Is that a big change -- it might require creating a regex.

Copy link
Collaborator Author

@p-bakker p-bakker Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly easy to change and I had thought about it, but I chose to go with a prefix instead of a suffix because if you glance over the file, the prefix stands out more than a suffix would.

But if people feel a suffix makes more logical sense, I can change it, np

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much weight I have, but I vote to leave it as a prefix for the reason mentioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for suffix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess @gbrail will be the deciding vote?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, but I guess the question still holds. Is there a way to skip a subset of a folder? I have no idea if this is really necessary, and it may not be.

Do you even need a symbol to mark a subfolder as failing? The files don't have one, and the subfolder is listed in the same way, correct? Does the symbol in this case only serve to identify it as a folder instead of a file?

Copy link
Collaborator Author

@p-bakker p-bakker Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you even need a symbol to mark a subfolder as failing? The files don't have one, and the subfolder is listed in the same way, correct? Does the symbol in this case only serve to identify it as a folder instead of a file?

True, didn't think of that. So we have the following options to choose from then:

! prefix (current behavior)
built-ins/Array
    *prototype/flatMap 21/21 (100.0%)
! nothing
built-ins/Array
    prototype/flatMap 21/21 (100.0%)
! / suffix
built-ins/Array
    prototype/flatMap/ 21/21 (100.0%)
! /* suffix
built-ins/Array
    prototype/flatMap/* 21/21 (100.0%)
! /** suffix
built-ins/Array
    prototype/flatMap/** 21/21 (100.0%)

My mistake, but I guess the question still holds. Is there a way to skip a subset of a folder? I have no idea if this is really necessary, and it may not be.

ATM there are no subfolders or .js files that are being skipped in test262.properties, only what I call topLevel folders (direct children of test262/test/).
Had a look and concluded that attempting to skip subfolders or individual .js files under those would currently cause issues, both in running the test suite and in regenerating the .properties file.
So for now I'll ahead and make sure skipping folders is only supported for topLevel folders. If the need arises in the future to be able to skip on a lower level, that could be added.

Note that most subfolders that only contain failing tests are related to unsupported features, which, if properly tagged as such in test262 and properly excluded by us (we have a hardcoded list of unsupported features), they will be skipped anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just go with the !nothing option. A prefix is not necessary. There is already a de facto "suffix" because you can tell by the absence of .js at the end that it is a folder, plus we don't have trailing slashes for the top level folders.

Will the generator create these folder exceptions instead of listing each file individually, or is this only a manual action?

Copy link
Collaborator Author

@p-bakker p-bakker Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the 'nothing' if everyone else agrees with that as well

If the generator concludes that all files in a folder (that is not marked to be skipped) fail, it'll just include the folder and not the individual files (unless you not enable 'rollup' through the command line (by default 'rollup' is enabled))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing / is also an option for me, maybe a bit more readable.

Adding a trailing / to the topLevel folder entries for consistency wouldn't be difficult

built-ins/Array <- include all the tests in the test262/built-ins/Array folder
from/calling-from-valid-1-noStrict.js <- but expect that this tests fails
```
If `built-ins/Array/from/calling-from-valid-1-noStrict.js` indeed fails, it wont fail the test suite. If it passes, this will be logged while running the test suite
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the default behavior. I understand that if "built-ins/Array" is included, then we run all the files in that directory except for the ones called out below. But if "built-ins/Array" is not included in the file, then do we run those tests anyway? The PR comments seemed to indicate that we do.

Copy link
Collaborator Author

@p-bakker p-bakker Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the PR comments are then misleading: both before my changes and now with my changes, only testfiles from the test262/test/ subfolders listed in the test262.property at the top level (i.e. built-ins/Array or language/expressions/function) will be included.

At the bottom of my PR description I did mention that in a future task we should include ALL test262/test subfolders into the test262.properties file when regenerating it, so we dont miss out on newly added folders in the test262 project

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "nothing" is OK. If there's no .js suffix, then we can assume that it's a prefix, and if there is a .js suffix, then we can assume that it's a fail. I don't feel strongly, but if you ask me, that makes the most sense and is the simplest.

@p-bakker
Copy link
Collaborator Author

p-bakker commented Jun 16, 2021

Does the logic to regenerate test262.properties rewrite the whole file, or does it preserve what's there currently? In particular, I see that you added a lot of helpful metadata to test262.properties about why a particular test is failing. Does all that get thrown away when the file is regenerated?

It rewrites the whole file, but looks at the existing file for user/dev provided comments to carry over. All the other stuff I added (x/y (zzz%), strict/non-strict/non-interpreted/{unsupported: [...]}/{strict: [...], non-strict: [...]} is all regenerated based on the actual test results when regenerating the file

If you check out my branch and run the command to update test262.properties, the regenerated file is identical to what it was before. If you fix some code so tests stop failing, their entry would be removed on regeneration. If you make some changes to Rhino code that result in more broken tests, the broken tests would get added as exclusions on regeneration.

Basically: if you regenerate test262.properties and run the tests afterwards, the Test Suite will pass without warnings

If so, I'd suggest that we'll really only use the generator once, to build the files for this PR, and then after that we'll edit the file manually in individual PRs like we do today.

That is not my idea: while you can manually edit the test262.properties file, you're better off generating it (guess it depends on the # of edits). Benefit of regenerating the file everytime is that the stats are up to date and if you update the version of the test262 suite, the delta between the old and regenerated version is just the delta due to the changes in test262

can we use a transpiler to make the test262 files run for us even though they use rest parameters?

While I'm sure we can, I think we're better off adding rest parameter support :-) Semantics aren't that complicated, just need to figure out where to hook it in, see #817 (comment)

Tnx for the questions @gbrail

@p-bakker
Copy link
Collaborator Author

Got rest params going :-)

Only tests that fail are on functions using non-simple params (rest/spread/destructuring) not allowed to have 'use strict' in the function body. The current destructuring parameters implementation seems to also fail those tests. Will see what can be done about those.

So, if we can wrap up this PR, I can create a PR to add Rest Param support and update test262 to the last commit on their main branch hopefully (haven't tested if there's another feature we're missing that prevents us from doing so)

@p-bakker
Copy link
Collaborator Author

p-bakker commented Jun 18, 2021

In response to the whole prefix/suffix discussion: while trying to implement the nothing option today, I concluded there are some challenges with the nothing option:

Example:

built-ins/Map 7/145 (4.83%) <- topLevel folder
    Symbol.species 4/4 (100.0%) <- subfolder of the built-ins/Map topLevel folder
    iterable-calls-set.js
    iterator-is-undefined-throws.js
    proto-from-ctor-realm.js {unsupported: [Reflect, cross-realm]}

Both built-ins/Map and Symbol.species refer to folders on disk, but their meaning within test262.properties is different:

  • built-ins/Map refers to a topLevel folder and means: include all tests in built-ins/Map and all of its subfolders
  • Symbol.species refers to a subfolder of built-ins/Map and means: all tests in built-ins/Map/Symbol.species/** fail

When programatically reading the test262.properties file, the code needs to determine whether a line containing a folder path is a topLevel folder or a subfolder within a topLevel folder. If not using any prefix or suffix to distinguish the two, we need another means to determine this. I see the following options:

  • whitespace: could assume a line with a folder path that has no leading whitespace is a topLevel folder, otherwise a subfolder
  • hardcoded topLevel folder names: test262 (currently) comes with 5 topLevel folders in the /test directory (annexB, built-ins, harness, intl402 and language). Could assume that if a folder path starts with any of these 5 names, it must be a topLevel folder
  • folder existence testing: when encountering a line with a folder path, could test whether that path exists as a subfolder of the current topLevel folder being processed. If it doesn't exist, assume a topLevel folder

Personally, I prefer the first option (whitespace testing) as:

  • depending on hardcoded topLevel folder names might cause issues if subfolders with the same name occur
  • folder existence testing will reach the wrong conclusions if a new version of test262 has removed the subfolder

Thoughts?

If we choose one of these options (instead of using a prefix/suffix) it implies that it will not be possible to roll up all files in a topLevel folder if all tests in a topLevel folder fail (but we can mark the topLevel folder to be skipped using the ~), as a line with a topLevel folder path means 'include all tests within this folder'.

Eventually I want to read all folders from disk when regenerating the test262.properties file and include then in the regenerated file, so we don't 'forget' to include new folders being added to test262. I guess we could decide to not add new topLevel folders to test if all tests in them fail in that case or include them as excluded

@tonygermano
Copy link
Contributor

I actually just assumed from the way the file was laid out that whitespace had meaning.

Don't kill me, but is subfolder rollup really needed? What's the problem with listing each file individually, especially if it's going to be auto-generated anyway? It's probably simpler to just do files.

Referencing your example above, what should be the appropriate action if an entire subfolder is excluded and commented and then a code change allows some of the files in the directory to pass? When the properties file is regenerated the directory needs to be replaced with the remaining failing tests, but do they inherit or ignore the directory comment?

All that being said, whatever you decide on at this point I will probably find acceptable.

@p-bakker
Copy link
Collaborator Author

p-bakker commented Jun 19, 2021

Think the roll-up is handy (gives you the likely insight that a certain method/feature isn't implemented). If you just see a list of files, you won't know if it's all or a subset of the files in the folder

A comment on a rolled up directory would be lost if some of the tests in the directory starts passing and the file is regenerated, as the individual remaining failing tests would be listed, instead of the rolled up folder

Will implement the whitespace option next week, unless I hear from anyone who thinks I should not 🙂

p-bakker added 14 commits June 21, 2021 20:15
Cleaning up existing test262.properties file for larger changes to come,
removing wildcard matching from the 262Test runner

- Removed wildcard matching: now need to be explicit: 'name.js' wont
match any file in any subdirectory that ends with `name.js` anymore
- added all files with full path for current "wildcard" entries and
removed wildcard entries (xxxx.js, matching **/*xxxx.js)
- removed exclusions of test files that depend on unsupported features
(are filtered out/skipped anyway when running the tests)
- made sure test files within a directory are listed in a certain order
(as to be able to better compare the current test262.properties with a
generated one later on)
- marked folders containing only failing tests as excluded (instead of
listing all failing files individually)
- removed some comments
- removed unneeded ! in front of every .js file: every .js file is an
exclusion regardless and by removing the !, the filePath becomes a key
within the Java Properties format
- use ~ instead of # for excluding directories
- no need to exclude files under an already excluded directory
- add generateTest262properties commandline flag to regenerate the used
.properties file based on the test results of running the
Test262SuiteTest
on unsupported features

+ added a few more features to the unsupported feature list
See tc39/test262@f94fc66

Unfortunately, as of the next commit the sta.js file from the harness is
using rest parameters, so crashes all tests
@p-bakker p-bakker force-pushed the test262_prop_file_generator branch from d3f147d to 7395bda Compare June 21, 2021 18:43
@p-bakker
Copy link
Collaborator Author

Made the adjustments to not have a prefix or suffix on rolled up subfolders (and instead rely on whitespace to differentiate between topLevel folders and subfolders).

From my POV this is now ready to merge

@p-bakker
Copy link
Collaborator Author

@gbrail understand you've been busy getting the Promise PR ready, but could you give a quick update on where you stand regarding this PR?

Asking because I'd like to get this one out of the way before moving onto some of the other cases I'd like to tackle

@gbrail
Copy link
Collaborator

gbrail commented Jun 23, 2021

I think this is great and we should do it. However right now, at least on my machine, about 2000 tests are failing. I updated my test262 submodule using "git submodule update" to f94fc660cc3c59b1f2f9f122fc4d44b4434b935c. Can you check? I'll also re-run ci and see if anything else happened. If anything it might need you to merge the current master, which I won't push to until this is merged.

@p-bakker
Copy link
Collaborator Author

That is strange, don't get why you would get 2000 failing tests.

My branch is rebased on top of the current master and CI passes without failures.

I ran git checkout f94fc66 within the /test262 folder to update test262 to the proper commit

@p-bakker
Copy link
Collaborator Author

Are you getting 2000 failing test after checking out my branch and the proper test262 version and then running all tests?

Or 2000 new failing tests when running the tests with -DupdateTest262properties?

Either way I have no idea what could be going on though

@gbrail
Copy link
Collaborator

gbrail commented Jun 23, 2021 via email

@gbrail gbrail merged commit 263655b into mozilla:master Jun 23, 2021
@p-bakker p-bakker added this to the Release 1.7.14 milestone Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can we discuss how to update test262?
4 participants