Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can table nodes really contain scalar cell values? #285

Open
uuf6429 opened this issue Feb 10, 2025 · 7 comments
Open

Can table nodes really contain scalar cell values? #285

uuf6429 opened this issue Feb 10, 2025 · 7 comments

Comments

@uuf6429
Copy link
Contributor

uuf6429 commented Feb 10, 2025

If we look at the following code:

if (!is_scalar($string)) {
it gives the impression that we can have tables with scalar values, even though AFAIK we parse a table from a string without any type conversion (therefore can only have string cells).

Why is this significant? While implementing PHPStan, I started with scalar cell values, but later on this becomes more and more ambiguous.

For example:

  1. We calculate the cell length just two lines below:
    $this->maxLineLength[$column] = max($this->maxLineLength[$column], mb_strlen($string, 'utf8'));
    Assuming the type gets converted, the length of, for instance false is 0, because PHP casts it to an empty string (instead of '0').
  2. Elsewhere we replace tokens with str_replace() - which produces some odd results: https://3v4l.org/QTtAK
    $table[$line][$col] = $this->replaceTextTokens($table[$line][$col]);

So, should I...

  1. keep scalar and fix/improve the strange usages (e.g. explicit typecast in str_replace)
  2. OR should I replace is_scalar with is_string?

In any case, how much of a BC break is it?

@uuf6429
Copy link
Contributor Author

uuf6429 commented Feb 11, 2025

Seems to relate to #121 - it doesn't explain or provide a good reason why scalar is allowed (other than tests passing numbers). Also slightly relates to #134, which was closed as duplicate (even though I can't find the duplicate).

@acoulton
Copy link
Contributor

So I think this depends on how you view the TableNode class.

  • If it is something that is only created by parsing Gherkin, then you're right, the values can only be strings, and tightening that (and fixing the unit tests that pass ints) would be OK
  • If it is something that end-users should also be able to create, then tightening it up would indeed be a BC break.

I don't think the project has ever marked TableNode as an internal / consume-only class, so I think we have to keep supporting the second case.

I know that in our projects we do create TableNode - for example because we have a library for comparing two TableNodes (e.g. one from a scenario, one built in our contexts from the actual result). That lets us do things like ignore column sequence / columns that aren't specified in the expectation for that specific step / etc, and it outputs any failures in a table format. I realise we could do that with plain arrays, but that's not the way we went back in 2015... For these, we generally have to cast to string before building the table to pass the assertions, so the impact of the BC break would be small (there are doubtless some columns we've missed).

We also sometimes create TableNode as an efficient/effective way to render tabular data for failure / debug messages. This would be more of a BC issue for us (we're fine with the current behaviour where false becomes a zero-width string).

I think from bug reports / comments I've seen over the years, there are other people creating tables themselves too.

However I would actually be in favour of loosening the requirement to is_scalar || is_null as per #134 / #135 and #130 (which is the duplicate of the first pair, and was rejected only considering how Gherkin itself parses tables). I think it would be useful to have the flexibility to include them when creating a table as an end-user (based on raw values, or e.g. by transforming a ~~NULL~~ placeholder string), without having to do that after converting the table to an array.

@stof
Copy link
Member

stof commented Feb 12, 2025

I would be in favor of deprecating passing non-string values in TableNode (deprecating only, so that it is not a BC break yet) and casting them to string early, so that the implementation after that only deals with strings.

@acoulton
Copy link
Contributor

I guess I just feel there are valid cases for using TableNode to hold and work with any (or maybe any stringable) table-like structure of key=>value data, either as rows or columns, and the stringification is only really a concern at the point of formatting the data back to a table string. In effect maxLineLength is being calculated earlier than it needs to be - in many cases the table will never be rendered as a string and the property never used. I'm not sure where the str_replace @uuf6429 mentions happens - I can't find it?

However I recognise using it as a general data structure is beyond the scope of it purely being a representation of a Gherkin table, which is always strings. So I'm happy to be outvoted.

Bear in mind that adding a runtime deprecation (e.g. trigger_error) would effectively be a BC break, at least as far as Behat is concerned, given by default Behat will fail on E_DEPRECATED. However I'd have no objection to deprecating it with a comment / stricter phpdoc.

@uuf6429
Copy link
Contributor Author

uuf6429 commented Feb 13, 2025

I'm not sure where the str_replace @uuf6429 mentions happens - I can't find it?

Sorry, I forgot to link that, here's how:


I'm fine either way. I've been thinking maybe I could use phpstan generics, I'll see if I can set up an example in the next few days.

@acoulton
Copy link
Contributor

Ah OK. So IMO the usage by ExampleNode is specific to a table that came from Gherkin, and therefore we know it will definitely only be strings in that replaceTextTokens method.

I appreciate that knowledge may be too complex to represent for phpstan, and maybe having the strict type safety for ourselves is more important than supporting the generic usecase for other users. But if it was possible to do both, personally I think that'd be good.

@uuf6429
Copy link
Contributor Author

uuf6429 commented Feb 14, 2025

Here's how that can be determined by the class type - I would need to change the class hierarchy a bit though, but it should be fine in terms of BC:
https://phpstan.org/r/195d867a-523e-47b2-b62b-7f7c7bc85b70

Alternatively, something similar can be done purely in PHPDoc - this approach might make it more complicated to actually check cell values as we do now:
https://phpstan.org/r/61f61832-64a4-47b4-b8e3-4dcebc43549a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants