improving compliance tests #2585

sasa1977 · 2018-04-10T09:19:30Z

Based on my experience reported in #2522 about compliance with a large user set, I'd like to open a discussion about how can we improve the detected_problems / running_time ratio of our compliance tests.

Currently, our compliance tests seem to conflate two separate concerns: detecting discrepancies between different data backend (including the emulator layer), and informal performance/stress test. I believe that we should treat these cases separately, and in particular, use compliance tests only for the former purpose (detecting discrepancies). I think that this could allow us to reduce the input set for compliance, and therefore improve both the running time of compliance tests as well as the confidence we have in them.

One approach worth considering is to reduce the input domain (.txt files with names and words). Currently, this domain is fairly large, and so some problematic inputs (e.g. the name ┬─┬ノ( º _ ºノ) ( ͡° ͜ʖ ͡°)) are anonymized away in smaller user sets, and essentially not tested.

I think we'd be better off if we used only a couple of names, say one with standard English alphabets, and a couple of more with unicode characters. If we had less than 10 names, I think that it's very probable that every input would surface to the output, even on a smaller test set of 100 users. As a result, every error that can be caused by some input would be detected.

I propose we use the same approach for words used in fields such as note title, content, note change. In addition, I think we should generate less words per each field to reduce the pressure on row splitter functions.

As an added benefit, a smaller data input should also produce a smaller error which is hopefully easier to understand. In the current version, some errors produce a huge left vs right output which is quite hard to analyze.

Finally, I wonder if we need to test the encoded version for every data source. As far as I understand, in the encoded compliance test we simply fetch all the data from the database and then everything else is emulated. We currently use 9 data sources, with plans to add more. If we only used encoded version for one datasource (say the first one), we might cut the running time into half, if not more, given that queries on an encoded data source are usually slower.

In summary, I believe that with the proposed changes we could reduce the compliance running time, improve the our confidence about compliance tests, and make the error output easier to understand. I think that we don't need gazillions of users and a plethora of random inputs to get there.

Thoughts?

The text was updated successfully, but these errors were encountered:

cristianberneanu · 2018-04-10T09:39:26Z

Reducing the number of rows in the input domain sounds good to me. As well as the part about reducing the number of encoded data sources, as they seem redundant.
I would also make the performance tests separate, since the compliance tests have too much concurrency and are not reliable. Maybe put the result of the performance tests into the final CI test message, so we see the result each time we change something.

obrok · 2018-04-10T10:51:58Z

👍 I agree that compliance should be limited to checking... well... compliance, and not try to be a performance test at the same time, and the steps you outline should move towards that.

sasa1977 · 2018-04-10T10:54:19Z

It looks like encoded tests are already disabled for many data sources in this pull. I think we could possibly also remove encoded tests for mongodb. WDYT?

sasa1977 · 2018-04-10T13:25:48Z

As a case in point, I tried a quick experiment using following names.txt:

Anna Laurențiu
Ștefan Günther
Saša Jurić
王 عبد الله‎‎ Ⱥ Ⱦ İstanbul
Иль
,./;'[]\-= <>?:"{}|_+ !@#$%^&*()`~
┬─┬ノ( º _ ºノ)  ( ͡° ͜ʖ ͡°)
  １２３

In the official version, the user name is a combination of random number of random entries from names.txt. I changed that to use exactly one random entry from the file.

With these changes I was able to locally reproduce all the unicode errors I've discovered in the long test mentioned in #2522. Compared to that massive test, this one required only 100 users, and I was able to use all datasources. The compliance test finished in 675 seconds, using only 4 CPU cores on my machine.

I didn't reproduce SQL Server mismatches, but I think that could also be done if we generated a small (<= 10) list of random numbers, and then take a random element from that list when generating each numerical entry. Perhaps the same could be done with string values too.

Also, as an internal improvement, we could consider using StreamData to generate random values, instead of using our custom generators. This will not lead to any particular visible improvement, but it will remove our custom reinvention of the available logic, which is likely to be integrated in Elixir in the next version.

obrok · 2018-04-10T13:52:39Z

The compliance test finished in 675 seconds, using only 4 CPU cores on my machine.

🎉 Seems like a good enough reason to go with this to me.

sasa1977 · 2018-04-10T14:10:06Z

In addition, here's a sample error output. I find it much easier to understand than huge outputs produced with "free form" random values:

Also, I quickly removed encoded tests from all mongodbs, and that reduced the running time to 513 seconds. The same tests failed as in the previous iteration.

sebastian · 2018-04-10T15:43:46Z

Reducing the input domain to get faster tests that surface problems sooner is good, let's do it
Reducing the number of words in the multi-words texts too (but have some multi-word columns so we can exercise extract_words still
Let's reduce the number of rows per user too
Let's use a separate data generator for the performance tests

Also, as an internal improvement, we could consider using StreamData to generate random values, instead of using our custom generators. This will not lead to any particular visible improvement, but it will remove our custom reinvention of the available logic, which is likely to be integrated in Elixir in the next version.

We have a decently working data generator. It is fairly simple. Let us not spend resources on a rewrite with StreamData until we find a compelling reason to do so.

Maybe put the result of the performance tests into the final CI test message, so we see the result each time we change something.

I like that idea. Even though this isn't really supposed to be a performance test. All the same it gives an interesting view of how the different data sources compares.

sebastian · 2018-04-14T11:33:46Z

@sasa1977 can this be considered complete now?

sasa1977 · 2018-04-19T07:36:48Z

I still have a bit of work I'd like to do here. I'll try to wrap it up this week.

sasa1977 · 2018-04-19T14:41:05Z

I think most of the work on compliance is done with #2635 and a couple of earlier PRs. Note, however, that there are some known bugs, as described in #2634.

I haven't touched performance tests, but technically they're not a part of this issue, so I think we can close this one?

sebastian closed this as completed Jan 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improving compliance tests #2585

improving compliance tests #2585

sasa1977 commented Apr 10, 2018

cristianberneanu commented Apr 10, 2018

obrok commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

obrok commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

sebastian commented Apr 10, 2018

sebastian commented Apr 14, 2018

sasa1977 commented Apr 19, 2018

sasa1977 commented Apr 19, 2018

improving compliance tests #2585

improving compliance tests #2585

Comments

sasa1977 commented Apr 10, 2018

cristianberneanu commented Apr 10, 2018

obrok commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

obrok commented Apr 10, 2018

sasa1977 commented Apr 10, 2018

sebastian commented Apr 10, 2018

sebastian commented Apr 14, 2018

sasa1977 commented Apr 19, 2018

sasa1977 commented Apr 19, 2018