-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing support #57
Comments
1]
I agree. So something like 2] I think we can use named arguments to implement optional test parameters. For example:
3] 4] I don't have any strong opinions about these at the moment. 5] I think we want a The plan is to allow specifying multiple compiler commands one after another. This allows e.g. 6] I agree that the placement of the tests should be up to the programmer. 7] Yes, tests should definitely be able to access private members. 8] Tests for uncompilable code: I agree this would nice to have. The tests could possibly even check that the error message is what's expected. 9] Actually, you don't need to import things from the same project even in normal code. So tests would be able to do this as well. 10] 11] 12] 13] Yes to all. 20]
I think this is useful for finding out if a change broke just a few tests, or if it broke all of them. Although it should also be possible to exit after the first N errors. So there should be a way to easily toggle between these two behaviors (and a way to set the error limit N). So many bullet points :) I'll check the remaining ones more closely later. |
That would be very handy feature. A litany of
Two problems I see:
Being able to add customized test runner is like having an universal toolset right in the language. (E.g. for checking for performance regressions, without external tools and scripts. I remember one guy who, as his PhD, wrote advanced performance regression checking tool for Linux kernel, in Perl. It was not used: too much nuisance, maintenance headache, instinctive avoidance of 3rd party tools.) I got once curious, if one can implement a test runner in D. I asked: and got relevant answer, pointing to this library: When I looked inside the code, I was scared. External build tool is needed, plus heavy metaprogramming wizardry, just to run some tests. I remembered one more feature, potentially useful for very heavy testing. (Something like this was discussed for Zig language.) It may look like:
and this would be transformed into:
|
So just to clarify, in my mind a test suite consists of one or more "tests", each of which contains one or more "asserts". Currently a failing assert in normal (non-test) code just terminates the process. So to keep the same behavior also in tests, the test runner could run each test as a separate process and then check their exit status to know if they succeeded. Or use a flag as you mentioned to avoid spawning so many processes. Or keep record about which test is currently running, and if it exits, continue from the next one. There are many possibilities, and being able to run the tests in parallel would also be good to take into consideration.
I think it would be best to stop the test after the first failing assert, to not have to deal with this. It's probably reasonable to assume that every test can be run standalone, and doesn't depend on running other tests first.
Yeah I think it wouldn't be too hard to implement. I could imagine having a library function that can be called in the test runner main to return a list of the tests, and then do what you want with them. And then another library function to submit the test results for reporting (except when you want to report them yourself). And/or letting you register a callback function that's called for each tests, allowing you to do performance timing or whatever. But I haven't thought about it deeply yet, so I'm not 100% sure if it's really that simple in practice. Defining common set-up and tear-down code for multiple tests is a common testing framework feature, which I think is useful to have. |
Yes.
Yes.
This is not IMO needed. Test which does not assert is supposed not to screw up the program state. The next test could be then run safely. If assert fails within a test, this is the end of the program, unrecoverable error. One more feature I forgot to mention, was to run tests in random order. This could, in theory, discover interdependence bugs (a test changing program state, another test failing because of this). I'd once implemented random execution of tests, but is was more a nuisance than help (I never managed to find a bug this way). Seed for randomizer should be recorded somewhere, to allow reproducibility of found problem. Running tests as separate programs is possible, with a custom test runner, but I never needed this for my purposes.
Yes. The idea is that each test is de-facto standalone program, independent of anything else.
It is not very hard. It is possible to implement it in C++, in C and in Nim. Nim implementation of (simple) test library took just dozen lines: |
what about a concise integrated test syntax like zig, compiler execute all the test blocks so a block can fail in any arbitrary way and reported in a nice message. std has helper functions like test "integer overflow at compile time" {
const x: u8 = 255;
const y = x + 1;
assert(false);
expect(true);
} |
Zig's test system is rather primitive.
Long ago, I suggested better test support for Zig ( ziglang/zig#128 and others ), but not much, if anything, was implemented. D's tests are better than Zig's (the language allows for cleanup after failure, but it complicates the code a lot). Nim can probably implement many of the proposed features, but the language is very hard to grok. |
further a description, a test needs a name to execute unit tests, could have 2 alternatives: test test_name("test with a name", timeout: ...) {} or test (name: test_name, "test with a name", timeout: ...) {} in terminal:
|
Here's long list of what thorough testing support could be. It is more tooling support than language feature.
1] It should be really easy to write a new test.
The main point is: one doesn't need to invent a unique name for a test.
The usual alternatives:
or:
give false perception that test is a function (it is not, no one calls a test). Inventing a name is hard in large programs, IDE browser may be cluttered with these useless names.
Uppercase
TEST
makes it easy to spot.2] Tests may have optional parameters. For example, maximum expected time for the test. If it takes longer, there's something wrong, error could be shown.
Test parameters are not like ordinary function parameters, they should be free form. E.g.:
There should be several implicit parameters available:
__FILE__
/__LINE__
for every test3] The last component of test system is the "test runner". It collects all tests, reads their parameters (if any), parses these parameters (giving error if it doesn't understand) and then executes all or some of the tests.
There should be default test runner, which understands couple of the most common parameters (name, timeout, ...).
Its API could be:
The default test runner should be able to verify timeout constrains, if present. A default timeout could be e.g. 1. second.
4] Custom test runner may be used e.g. to run those tests used for performance regressions. These tests would be identified by a parameter, their duration would be stored somewhere, and if it gets worse, programmer would be notified.
Another use case for a test runner is code coverage. Certain tests could be labeled so, and run only when code coverage is needed (because they would take lot of time).
Another use case is to exclude platform specific tests.
5] It should be up to the programmer, to invoke the test runner at the beginning of the application. Either the default one (e.g.
run-recent-tests
), or a custom variant.There should be no special "test invocation mode" for the compiler. It is hassle, it complicates things, it is inflexible. Just let the programmer do it explicitly in
main
, exactly as he wishes.6] Where should be tests placed? There are several options:
IMO programmer should have choice. Most important small tests could be placed after the functions, those long and checking minutae details could be put into the separate file. Let's call it companion test file.
Companion test file should have name similar to its "parent source file", should have full access to it, w/o need to import anything. It should behave as if it was copy-pasted at the end of the source file.
Companion test file should define only tests (plus maybe helpers), should not export anything, should not be importable elsewhere. It should be just a convenient storage for tests, nothing else.
7] Tests should have access to everything. Nothing should be
private
to them.8] Since the language plans to have generics, there could be tests for uncompilable code:
Parameter
does-not-compile
cannot be used with any other parameter together.The compiler should make sure the code really doesn't compile, but NOT because of some trivial error, like unbalanced parenthesis or an invalid name.
9] If possible, test should be able to use any part of the project, w/o explicit need to import anything. This is keep the code clean, not to pollute it it because of the tests. If all tests are removed, the rest of the code should not need any modification.
E.g.
I'm not sure whether this is feasible, but it would help a lot.
10] With usual compilation modes
debug
andrelease
, it should be possible to use unit test both in debug and in release too (e.g. for performance checks, and to make sure optimization didn't screw up something).11] All tests should do memory checking. Whatever was allocated within certain test, should be also deallocated right there. I'd implemented such feature in C and C++ testing library, and it does wonders to keep application free of leaks.
I know there are big complicated tools that try to do the same, but having such tool inside every test allows to identify leaks very quickly.
Such checking against the leaks could be major "selling point" for the language.
Intentionally leaking tests could be annotated with a parameter.
12]
assert
could be seen as yet another tool, not just as a function:When this fails:
assert(x == y)
you may be interested what the values of
x
andy
were. Advanced assert could help:assert(x == y | x = %x y = %y, z = %z); // z is an interesting value visible in this scope
Here, if assert fails, it would print x, y and z values. The syntax of assert should not be restricted by language rules. Whatever helps should be available. E.g. I could imagine something like:
This feature would help a lot against Heisenbugs.
13] When assert is fired while a test is running, it should show the exact location of the test (it should not be hidden inside a nearly useless long stack trace).
14] There are different kinds of assert.
The good old ordinary
assert
. Should be used a lot, should be active indebug
mode, should be compiled away inrelease
mode. Nothing unusual here.The assert used in a test, at the top level of such test:
If test are allowed in
release
mode, then such asserts should not be compiled away. There are two options:a) use different name (e.g.
verify
). It would be the same as assert, but allowed only in tests, at the top level, nowhere else. It would be present even in release mode.b) compiler should keep asserts in tests (at the top level) intact, even in release mode
There could be asserts "just to be sure", placed to notify the user that something impossible did actually happen. E.g.:
However, when one does truly defensive testing, such situations should be arranged and tested.
But how to distinguish the situation when we intentionally provoked this situation versus the unexpected error? There are two options:
a) If
assert(false)
is fired, it would check whether a test is running. If it is, then it would assume it was triggered intentionally, and it would not show the error. If tests is not running, impossible bug was spotted, show it.b) there could be special form of assert, e.g.
impossible_assert()
. If invoked during a test, it does nothing, outside of a test it shows error.15] The language should standardize how to distinguish between
debug
andrelease
modes, between tests compiled in and tests not present. It should avoid the C/C++ mess withNDEBUG
,DEBUG
,_DEBUG
and other inconsistent inventions.16] Mocking support. The biggest feature I can imagine. Ability to replace specified functions for the duration of the test:
It could be implemented using function pointers. In release mode without test there would be no performance penalty at all.
17] Mocking support could be extended even to constants. E.g. timeout constant, to keep test duration down.
It could be also implemented using function pointers. The constant would became variable accessed by calling that function pointer.
18] Support for white box testing. Another potentially huge feature.
The problem: I want to be sure that this code invoked exactly 2 allocations and 2 deallocations and does NOT invoke any TCP/IP calls.
This could be handled by making the code very complicated, or by using "tracing" feature proposed here:
http://akkartik.name/post/tracing-tests
Basically, it could work this way:
E.g.
A trace could be created explicitly:
or implicitly, by the compiler inserting trace when a function is called. The compiler would know (by checking the "
register-trace
" within all tests) which functions could be possibly checked and which do not.Implicit traces reduce clutter in the code.
The compiler should check the traced strings, to make sure there's no typo, should really do the implicit traces (to avoid unneeded clutter in the code).
The whole mechanism should be well optimized not to slow down the tests too much and to optimize trace data.
When tests are not running, no traces would be collected. There would be only small impact on performance.
When tests are not compiled in, there would be no traces and no impact at all on the performance.
19] Tests should run strictly serially. Parallel execution is virtually impossible (every single piece of code would need to be guarded against multithreading bugs). If something multithreaded has to be tested, a test should spawn such threads.
Running only recent tests automatically whenever program is started take little time and guards against bugs well.
I routinely do:
If someone really, really needs parallel execution of tests, he could write his own test runner and blame himself for the problems.
20] Having a failing test and then running remaining ones should not be supported. Its a misfeature.
If someone really, really needs this, he could write his own test runner, plus probably overload the assert.
21] There could be support to also invoke tests from foreign C code, or to allow invocation of the tests from C code. This feels to me as a trivial thing.
22] Tests are expected to run at the start of
main
. However, nothing stops one to invoke some of the tests at any moment, even interactively. This may help to deal with Heisenbugs.Tests checking automatically against leaks would make this adventure less risky.
Multithreaded applications would need to protect the tests invoked in the middle of application run, but this is unavoidable.
23] People who do not like testing wouldn't be forced to use them, those who want their own unique testing framework could implement one. Ability to overload or overwrite
assert
would be desirable in this case.AFAIK no language known to me supports all the features, not even half of them. Most only pay lip service to testing, few stopped in the middle (e.g. language D allows easy definition of tests, but neither test parameters not custom test runners).
Some of proposed features were implemented in C++ and even in C. It is possible to write:
It was also possible to implement whitebox testing too, but w/o compiler support it gets too clumsy.
The text was updated successfully, but these errors were encountered: