Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing support #57

Open
PavelVozenilek opened this issue Apr 10, 2020 · 7 comments
Open

Testing support #57

PavelVozenilek opened this issue Apr 10, 2020 · 7 comments
Labels

Comments

@PavelVozenilek
Copy link

Here's long list of what thorough testing support could be. It is more tooling support than language feature.


1] It should be really easy to write a new test.

TEST()
{
  assert(2 + 2 == 4);
}

TEST()
{
  assert(2 + 2 == 5);
}

The main point is: one doesn't need to invent a unique name for a test.

The usual alternatives:

@test
void foo()
{
  assert(2 + 2 == 4);
}

or:

void test_foo()
{
  assert(2 = 2 == 5);
}

give false perception that test is a function (it is not, no one calls a test). Inventing a name is hard in large programs, IDE browser may be cluttered with these useless names.

Uppercase TEST makes it easy to spot.


2] Tests may have optional parameters. For example, maximum expected time for the test. If it takes longer, there's something wrong, error could be shown.

Test parameters are not like ordinary function parameters, they should be free form. E.g.:

TEST() // no parameters
{
}

TEST(time < 10 ms) // one parameter
{
  ...
}

TEST(10 ms  < time  < 30 ms) // still only one parameter
{
   ...
}

// two parameters: a name if we like to call the test individually, max time, plus some separator
TEST(name = "xyz" |  time < 1 s) 
{
   ...
}

// three parameters
TEST( qwerty | asdf | x = true)
{
  ...
}

There should be several implicit parameters available:

  • something like __FILE__/__LINE__ for every test
  • how long ago (e.g. in seconds) was the source file with current test modified. Useful when running only tests from recently modified source files.

3] The last component of test system is the "test runner". It collects all tests, reads their parameters (if any), parses these parameters (giving error if it doesn't understand) and then executes all or some of the tests.

There should be default test runner, which understands couple of the most common parameters (name, timeout, ...).

Its API could be:

// returns # of executed tests, 
uint run-all-tests(bool show_results_dialog_on_success);
uint run-recent-tests(uint recent_in_seconds, bool show_results_dialog_on_success);

The default test runner should be able to verify timeout constrains, if present. A default timeout could be e.g. 1. second.


4] Custom test runner may be used e.g. to run those tests used for performance regressions. These tests would be identified by a parameter, their duration would be stored somewhere, and if it gets worse, programmer would be notified.

Another use case for a test runner is code coverage. Certain tests could be labeled so, and run only when code coverage is needed (because they would take lot of time).

Another use case is to exclude platform specific tests.


5] It should be up to the programmer, to invoke the test runner at the beginning of the application. Either the default one (e.g. run-recent-tests), or a custom variant.

There should be no special "test invocation mode" for the compiler. It is hassle, it complicates things, it is inflexible. Just let the programmer do it explicitly in main, exactly as he wishes.


6] Where should be tests placed? There are several options:

  • after the relevant code (e.g. after a function)
  • at the end of source file (this one I do not recommend, gets messy very fast)
  • in a separate file (not to overcrowd the source file)

IMO programmer should have choice. Most important small tests could be placed after the functions, those long and checking minutae details could be put into the separate file. Let's call it companion test file.

Companion test file should have name similar to its "parent source file", should have full access to it, w/o need to import anything. It should behave as if it was copy-pasted at the end of the source file.

Companion test file should define only tests (plus maybe helpers), should not export anything, should not be importable elsewhere. It should be just a convenient storage for tests, nothing else.


7] Tests should have access to everything. Nothing should be private to them.


8] Since the language plans to have generics, there could be tests for uncompilable code:

// this parameter say the code inside test must NOT compile
TEST(does-not-compile) 
{
   ...

}

Parameter does-not-compile cannot be used with any other parameter together.

The compiler should make sure the code really doesn't compile, but NOT because of some trivial error, like unbalanced parenthesis or an invalid name.


9] If possible, test should be able to use any part of the project, w/o explicit need to import anything. This is keep the code clean, not to pollute it it because of the tests. If all tests are removed, the rest of the code should not need any modification.

E.g.

TEST()
{
  // this should not require explicit import, if compiler is able to infer what it is
  SomethingFromThisProject x;   
  assert(foo() == x);
}

I'm not sure whether this is feasible, but it would help a lot.


10] With usual compilation modes debug and release, it should be possible to use unit test both in debug and in release too (e.g. for performance checks, and to make sure optimization didn't screw up something).


11] All tests should do memory checking. Whatever was allocated within certain test, should be also deallocated right there. I'd implemented such feature in C and C++ testing library, and it does wonders to keep application free of leaks.

I know there are big complicated tools that try to do the same, but having such tool inside every test allows to identify leaks very quickly.

Such checking against the leaks could be major "selling point" for the language.

Intentionally leaking tests could be annotated with a parameter.


12] assert could be seen as yet another tool, not just as a function:

When this fails:

assert(x == y)

you may be interested what the values of x and y were. Advanced assert could help:

assert(x == y | x = %x y = %y, z = %z); // z is an interesting value visible in this scope

Here, if assert fails, it would print x, y and z values. The syntax of assert should not be restricted by language rules. Whatever helps should be available. E.g. I could imagine something like:

assert(x == y) 
{ // code block accompanying the assert
  a = x.foo();
  b =y.bar();
  assert_print(Because x has %a and y has %b it failed here);
}

This feature would help a lot against Heisenbugs.


13] When assert is fired while a test is running, it should show the exact location of the test (it should not be hidden inside a nearly useless long stack trace).


14] There are different kinds of assert.

  1. The good old ordinary assert. Should be used a lot, should be active in debug mode, should be compiled away in release mode. Nothing unusual here.

  2. The assert used in a test, at the top level of such test:

    TEST()
    {
      assert(2 + 2 == 4);
    }
    
    

    If test are allowed in release mode, then such asserts should not be compiled away. There are two options:
    a) use different name (e.g. verify). It would be the same as assert, but allowed only in tests, at the top level, nowhere else. It would be present even in release mode.
    b) compiler should keep asserts in tests (at the top level) intact, even in release mode

  1. There could be asserts "just to be sure", placed to notify the user that something impossible did actually happen. E.g.:

       if (something-impossible) {
          assert(false); // cannot happen
          return -1;
       }
    

    However, when one does truly defensive testing, such situations should be arranged and tested.
    But how to distinguish the situation when we intentionally provoked this situation versus the unexpected error? There are two options:

    a) If assert(false) is fired, it would check whether a test is running. If it is, then it would assume it was triggered intentionally, and it would not show the error. If tests is not running, impossible bug was spotted, show it.
    b) there could be special form of assert, e.g. impossible_assert(). If invoked during a test, it does nothing, outside of a test it shows error.


15] The language should standardize how to distinguish between debug and release modes, between tests compiled in and tests not present. It should avoid the C/C++ mess with NDEBUG, DEBUG, _DEBUG and other inconsistent inventions.


16] Mocking support. The biggest feature I can imagine. Ability to replace specified functions for the duration of the test:

TEST()
{
  mock fopen = ... // fopen dummy reimplementation
  mock fclose = ...

   // code doing lot of fopen/fclose, but using mocks instead.

}

It could be implemented using function pointers. In release mode without test there would be no performance penalty at all.


17] Mocking support could be extended even to constants. E.g. timeout constant, to keep test duration down.

It could be also implemented using function pointers. The constant would became variable accessed by calling that function pointer.


18] Support for white box testing. Another potentially huge feature.

The problem: I want to be sure that this code invoked exactly 2 allocations and 2 deallocations and does NOT invoke any TCP/IP calls.

This could be handled by making the code very complicated, or by using "tracing" feature proposed here:
http://akkartik.name/post/tracing-tests

Basically, it could work this way:

  1. Within a test you specify your intent, to watch for certain "traces". Here it would be "allocate", "deallocate", "socket", etc.
  2. If such a trace happens (either it is logged explicitly somewhere in the code, or implicitly, e.g. a funtion call), then such a trace would be stored somewhere.
  3. When the test ends, it goes through stored traces. It makes sure there are 2 "allocate", 2 "deallocate", no "socket" etc. It could check their proper order, it could check parameters (like bytes allocated).

E.g.

TEST()
{
  // I 'm interested in these calls - record them
  register-trace("allocate", "deallocate", "socket"); 

   ...
   ... code to be tested
   ...

  // now I do check whether my expectations were correct
  assert(check-for-trace("socket") == false); // no TCP/IP?

  // 2 allocations and then 2 deallocations?
  assert(check-for-trace("allocate"));
  assert(check-for-trace("allocate"));
  assert(check-for-trace("allocate") == false); // not 3

  assert(check-for-trace("deallocate"));
  assert(check-for-trace("deallocate"));
  assert(check-for-trace("deallocate") == false);

  // all collected traces would be wiped out the end of the test
}

A trace could be created explicitly:

void* allocate(uint bytes)
{
   ...
   TRACE("allocated %u bytes", bytes);
   return p;
}

or implicitly, by the compiler inserting trace when a function is called. The compiler would know (by checking the "register-trace" within all tests) which functions could be possibly checked and which do not.

void* allocate(uint bytes)
{
   ...
   TRACE("allocate()"); // inserted implicitly by the compiler
   ...
}

Implicit traces reduce clutter in the code.

The compiler should check the traced strings, to make sure there's no typo, should really do the implicit traces (to avoid unneeded clutter in the code).

The whole mechanism should be well optimized not to slow down the tests too much and to optimize trace data.

When tests are not running, no traces would be collected. There would be only small impact on performance.

When tests are not compiled in, there would be no traces and no impact at all on the performance.


19] Tests should run strictly serially. Parallel execution is virtually impossible (every single piece of code would need to be guarded against multithreading bugs). If something multithreaded has to be tested, a test should spawn such threads.

Running only recent tests automatically whenever program is started take little time and guards against bugs well.

I routinely do:

int main(void) {
#ifdef DEBUG
  (void)run-recent-tests(
    recent_in_seconds: 60 * 60 /* hour*/, 
    show_results_dialog_on_success: false);
#endif
   ...

If someone really, really needs parallel execution of tests, he could write his own test runner and blame himself for the problems.


20] Having a failing test and then running remaining ones should not be supported. Its a misfeature.

If someone really, really needs this, he could write his own test runner, plus probably overload the assert.


21] There could be support to also invoke tests from foreign C code, or to allow invocation of the tests from C code. This feels to me as a trivial thing.


22] Tests are expected to run at the start of main. However, nothing stops one to invoke some of the tests at any moment, even interactively. This may help to deal with Heisenbugs.

Tests checking automatically against leaks would make this adventure less risky.

Multithreaded applications would need to protect the tests invoked in the middle of application run, but this is unavoidable.


23] People who do not like testing wouldn't be forced to use them, those who want their own unique testing framework could implement one. Ability to overload or overwrite assert would be desirable in this case.


AFAIK no language known to me supports all the features, not even half of them. Most only pay lip service to testing, few stopped in the middle (e.g. language D allows easy definition of tests, but neither test parameters not custom test runners).

Some of proposed features were implemented in C++ and even in C. It is possible to write:

#ifdef DEBUG
TEST()
{
  assert(2 + 2 == 4);
}

TEST()
{
  assert(2 + 2 == 5);
}
#endif

It was also possible to implement whitebox testing too, but w/o compiler support it gets too clumsy.

@emlai
Copy link
Member

emlai commented Apr 11, 2020

1]

one doesn't need to invent a unique name for a test.

I agree. So something like test() {} would be good for that. I think lowercase is more suitable, to keep it consistent with the rest of the language. Syntax highlighting / editor support can be used to make it easier to spot.

2] I think we can use named arguments to implement optional test parameters. For example:

test(timeout: 10.milliseconds)
{
  ...
}

test("test with a name", timeout: ...) 
{
  ...
}

3] 4] I don't have any strong opinions about these at the moment.

5] I think we want a delta test command to allow running the tests without running the whole program.

The plan is to allow specifying multiple compiler commands one after another. This allows e.g. delta test run which would run the tests and then run the program if the tests were successful. (By itself delta run would just run the program.)

6] I agree that the placement of the tests should be up to the programmer.

7] Yes, tests should definitely be able to access private members.

8] Tests for uncompilable code: I agree this would nice to have. The tests could possibly even check that the error message is what's expected.

9] Actually, you don't need to import things from the same project even in normal code. So tests would be able to do this as well.

10] 11] 12] 13] Yes to all.

20]

Having a failing test and then running remaining ones should not be supported.

I think this is useful for finding out if a change broke just a few tests, or if it broke all of them. Although it should also be possible to exit after the first N errors. So there should be a way to easily toggle between these two behaviors (and a way to set the error limit N).

So many bullet points :) I'll check the remaining ones more closely later.

@PavelVozenilek
Copy link
Author

9] Actually, you don't need to import things from the same project even in normal code. So tests would be able to do this as well.

That would be very handy feature. A litany of #includes at the beginning of source file is worthless nuisance.


20] Having a failing test and then running remaining ones should not be supported.

I think this is useful for finding out if a change broke just a few tests, or if it broke all of them. Although it should also be possible to exit after the first N errors. So there should be a way to easily toggle between these two behaviors (and a way to set the error limit N).

Two problems I see:

  1. How to signal the test failed? The language doesn't have exceptions (at least I hope so). Probably assert would need to set a flag, test runners would need to deal with such flag.
  2. What if an assert somewhere deep down fails? Who is going to restore the system into reasonably stable state, so the tests could continue? C++ is able to do this, but only for the very high cost of exception safety.

3] 4] I don't have any strong opinions about these [custom test runners] at the moment.

Being able to add customized test runner is like having an universal toolset right in the language.

(E.g. for checking for performance regressions, without external tools and scripts. I remember one guy who, as his PhD, wrote advanced performance regression checking tool for Linux kernel, in Perl. It was not used: too much nuisance, maintenance headache, instinctive avoidance of 3rd party tools.)

I got once curious, if one can implement a test runner in D. I asked:
https://forum.dlang.org/post/vblsrdsqbaxthuomikqa@forum.dlang.org

and got relevant answer, pointing to this library:
https://gitlab.com/AntonMeep/silly/-/blob/master/silly.d

When I looked inside the code, I was scared. External build tool is needed, plus heavy metaprogramming wizardry, just to run some tests.


I remembered one more feature, potentially useful for very heavy testing. (Something like this was discussed for Zig language.) It may look like:

TEST()
{
   set-up-code

   TEST()
    {
     test1-code
    }

   TEST()
    {
     test2-code
    }

   tear-down-code
}

and this would be transformed into:

TEST()
{
   set-up-code
   test1-code
   tear-down-code
}

TEST()
{
   set-up-code
   test2-code
   tear-down-code
}

@emlai
Copy link
Member

emlai commented Apr 15, 2020

How to signal the test failed?

So just to clarify, in my mind a test suite consists of one or more "tests", each of which contains one or more "asserts".

Currently a failing assert in normal (non-test) code just terminates the process. So to keep the same behavior also in tests, the test runner could run each test as a separate process and then check their exit status to know if they succeeded.

Or use a flag as you mentioned to avoid spawning so many processes. Or keep record about which test is currently running, and if it exits, continue from the next one.

There are many possibilities, and being able to run the tests in parallel would also be good to take into consideration.

What if an assert somewhere deep down fails?

I think it would be best to stop the test after the first failing assert, to not have to deal with this. It's probably reasonable to assume that every test can be run standalone, and doesn't depend on running other tests first.


Being able to add customized test runner is like having an universal toolset right in the language.

Yeah I think it wouldn't be too hard to implement. I could imagine having a library function that can be called in the test runner main to return a list of the tests, and then do what you want with them. And then another library function to submit the test results for reporting (except when you want to report them yourself). And/or letting you register a callback function that's called for each tests, allowing you to do performance timing or whatever. But I haven't thought about it deeply yet, so I'm not 100% sure if it's really that simple in practice.


Defining common set-up and tear-down code for multiple tests is a common testing framework feature, which I think is useful to have.

@emlai emlai added the compiler label Apr 15, 2020
@PavelVozenilek
Copy link
Author

So just to clarify, in my mind a test suite consists of one or more "tests", each of which contains one or more "asserts".

Yes.

Currently a failing assert in normal (non-test) code just terminates the process.

Yes.

So to keep the same behavior also in tests, the test runner could run each test as a separate process and then check their exit status to know if they succeeded.

This is not IMO needed. Test which does not assert is supposed not to screw up the program state. The next test could be then run safely.

If assert fails within a test, this is the end of the program, unrecoverable error.


One more feature I forgot to mention, was to run tests in random order. This could, in theory, discover interdependence bugs (a test changing program state, another test failing because of this).

I'd once implemented random execution of tests, but is was more a nuisance than help (I never managed to find a bug this way). Seed for randomizer should be recorded somewhere, to allow reproducibility of found problem.


Running tests as separate programs is possible, with a custom test runner, but I never needed this for my purposes.

What if an assert somewhere deep down fails?

I think it would be best to stop the test after the first failing assert, to not have to deal with this. It's probably reasonable to assume that every test can be run standalone, and doesn't depend on running other tests first.

Yes. The idea is that each test is de-facto standalone program, independent of anything else.

Being able to add customized test runner is like having an universal toolset right in the language.

Yeah I think it wouldn't be too hard to implement. I could imagine having a library function that can be called in the test runner main to return a list of the tests, and then do what you want with them. And then another library function to submit the test results for reporting (except when you want to report them yourself). And/or letting you register a callback function that's called for each tests, allowing you to do performance timing or whatever. But I haven't thought about it deeply yet, so I'm not 100% sure if it's really that simple in practice.

It is not very hard. It is possible to implement it in C++, in C and in Nim.

Nim implementation of (simple) test library took just dozen lines:
https://forum.nim-lang.org/t/653

@pushqrdx
Copy link

pushqrdx commented Apr 17, 2020

what about a concise integrated test syntax like zig, compiler execute all the test blocks so a block can fail in any arbitrary way and reported in a nice message. std has helper functions like expect, assert that can be use too

test "integer overflow at compile time" {
    const x: u8 = 255;
    const y = x + 1;

    assert(false);
    expect(true);
}

@PavelVozenilek
Copy link
Author

Zig's test system is rather primitive.

  • It has no way to implement custom test runner, to select which tests to run. All or none are the only options. Not good for programs with hundreds/thousands of tests. It also produces one line log for every passed test, with no chance to switch it off or customize.
  • Zig doesn't allow tests to have parameters, crippling versatility of this tool.
  • I suspect Zig code doesn't do proper recovery from failed test (like cleaning up resources). This could affect state of the system and result in false bugs elsewhere.

Long ago, I suggested better test support for Zig ( ziglang/zig#128 and others ), but not much, if anything, was implemented.

D's tests are better than Zig's (the language allows for cleanup after failure, but it complicates the code a lot). Nim can probably implement many of the proposed features, but the language is very hard to grok.

@igotfr
Copy link

igotfr commented Jun 10, 2021

1]

one doesn't need to invent a unique name for a test.

I agree. So something like test() {} would be good for that. I think lowercase is more suitable, to keep it consistent with the rest of the language. Syntax highlighting / editor support can be used to make it easier to spot.

2] I think we can use named arguments to implement optional test parameters. For example:

test(timeout: 10.milliseconds)
{
  ...
}

test("test with a name", timeout: ...) 
{
  ...
}

further a description, a test needs a name to execute unit tests, could have 2 alternatives:

test test_name("test with a name", timeout: ...) {}

or

test (name: test_name, "test with a name", timeout: ...) {}

in terminal:

# zig test file.zig -t test_name other_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants