Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests should normalize whitespace before sorting lines, rather than after #21

Open
helloworld12321 opened this issue Sep 17, 2021 · 6 comments

Comments

@helloworld12321
Copy link

helloworld12321 commented Sep 17, 2021

Right now, process_client_logs.bats and process_logs.bats each take the actual result, sort it, and diff it with the expected result ignoring whitespace.

  # process_client_logs.bats
  sort data/discovery/failed_login_data.txt > data/discovery_sorted.txt
  sort tests/discovery_failed_login_data.txt > data/target_sorted.txt
  run diff -wbB data/target_sorted.txt data/discovery_sorted.txt
  assert_success
  # process_logs.bats
  sort tests/summary_plots.html > targets/target.txt
  sort failed_login_summary.html > targets/sorted.txt
  run diff -wbB targets/target.txt targets/sorted.txt
  assert_success

There's a problem, though—if a line has extra whitespace, it might get sorted differently than the canonical, whitespace-normalized version!

Screen Shot 2021-09-17 at 5 44 02 PM

So, to compare two files in a whitespace-insensitive way, we need to normalize whitespace and then sort :)

@helloworld12321
Copy link
Author

helloworld12321 commented Sep 17, 2021

How exactly sort treats whitespace varies from system to system.

On the lab computers, sort ignores whitespace. But on most other unix systems, sort is whitespace-sensitive. (The lab computers even differ from other Fedora systems!)

This explains why some tests that pass on the lab computers fail on GH actions.

I'm not sure why sort is weird on the lab computers—googling suggests that it might be locale-related ¯\_(ツ)_/¯

@helloworld12321
Copy link
Author

Lab computers:
Screenshot from 2021-09-17 17-10-16

Ubuntu:
Screen Shot 2021-09-17 at 5 08 24 PM

fresh Fedora installation:
Screen Shot 2021-09-17 at 5 17 38 PM

@helloworld12321 helloworld12321 changed the title Unit tests should normalize whitespace before sorting lines, rather than after. Unit tests should normalize whitespace before sorting lines, rather than after Sep 17, 2021
@NicMcPhee
Copy link
Contributor

Wow – huge thanks to @helloworld12321 for the awesome write-up! We only started with GitHub actions on this lab last year, and I don't remember anyone raising (or being aware of) this issue then. It's possible that it just never came up, or that people ran into the issue and just worked around it in some way.

I have no idea why sort behaves differently in the lab. We're (slowly – thanks COVID) in the process of upgrading the lab boxes, and potentially switching to Ubuntu, so it'll be interesting to see what, if any effect that has.

I just checked and the BSD sort on MacOS seems to behave like the Ubuntu, etc., options you tested above:

Screen Shot 2021-09-18 at 4 19 00 PM

I'm not sure how we'd "normalize whitespace" before sorting, TBH. diff is nice because we can just give it several flags and it will ignore most whitespace. sort doesn't really have that option, so we'd essentially have to use something like sed to replace runs of spaces/tabs with a single space. We could do that, but I'm not sure we should as it seems that kind of change might end up "covering up" certain kinds of errors? One could argue, for example, that folks should just eliminate the weird two-space-thing between months and single digit days, and that the tests are "right" in that regard. If that's the case, then maybe we should create a new issue to say something in the write-up about there being an expectation that things like Aug  9 (two spaces) are converted to Aug 9 (with just one space).

I suspect this never came up for me because I matched the month and the day using two separate regexes, with \s+ between them.

@helloworld12321
Copy link
Author

helloworld12321 commented Sep 18, 2021

Yeah, that's totally fair :)

I had suggested normalizing before sorting since it seemed like the "intent" of the tests was to ignore whitespace (eg, diff had the -wbB flags). But, I agree that changing the tests to be whitespace-sensitive everywhere might be better. The big thing is just to make it consistent across systems. 🙂

@NicMcPhee
Copy link
Contributor

NicMcPhee commented Sep 18, 2021

Agreed. The systems differences is definitely super weird. I don't immediately know how to make it consistent across systems, though, since I don't know why our lab sort behaves differently. I poked at some of the local environment variables mentioned in sort documentation, and none of them seem to be set in the lab. Weirdz.

@helloworld12321
Copy link
Author

helloworld12321 commented Sep 19, 2021

Maybe we could get rid of the the -wb flags to diff?

Then the sort behavior shouldn't matter. If the whitespace matches, the tests will pass; if the whitespace is different, the tests will fail. ¯\_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants