Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent array functions from escaping taints #9041

Open
wants to merge 7 commits into
base: 5.x
Choose a base branch
from

Conversation

mmcev106
Copy link
Collaborator

@mmcev106 mmcev106 commented Jan 2, 2023

These changes allow tainted array function arguments to flow through to return values. Don't hesitate to speak up if there are any concerns or requests. I'm hoping you'll appreciate buildDataSets() as it significantly reduces boilerplate. Once approved & merged, I plan to create similar PRs for other core functions.

@orklah
Copy link
Collaborator

orklah commented Jan 2, 2023

Looks promising :) Mind fixing CI failures?

@weirdan
Copy link
Collaborator

weirdan commented Jan 3, 2023

@orklah with new functions added to stubs without @param/@return tags, aren't there chances we're now missing some more detailed signatures from the call map?

@mmcev106
Copy link
Collaborator Author

mmcev106 commented Jan 3, 2023

I will look into both.

@orklah
Copy link
Collaborator

orklah commented Jan 3, 2023

@weirdan indeed, we need to make sure the stubs are at least as detailed as the callmap, including php versions.

I wonder if it would make sense to have a separate way to document taint flow without having to make new stubs

@mmcev106
Copy link
Collaborator Author

mmcev106 commented Jan 4, 2023

@orklah & @weirdan, it does seem odd that we have two sources of truth (callmap vs. stubs). Do we want to consider exploring that rabbit hole now? Or, would it make more sense for me to duplicate the callmap details in the stubs (in this & upcoming PRs)?

@orklah
Copy link
Collaborator

orklah commented Jan 4, 2023

Well, we have 3 sources of truth in fact. Callmaps, Stubs and TypeProviders. Each override the previous one and the complexity augment each time

  • Callmaps are a quick and easy way to document signatures en masse. The PHP versioning is built-in and it's an efficient way to process all functions in one go
  • Stubs allow more complexity through conditional returns, taint annotations and basically any annotation allowed for user code and more. There's some drawbacks though. The syntax is such a pain for complex expressions: https://github.com/vimeo/psalm/blob/master/stubs/CoreGenericFunctions.phpstub#L675, phpdoc needs to be parsed so it's way less efficient than a built-in array and PHP versioning was not considered on stubs so it's a pain (for example, you can't document that a param was not existing for a given php version unless you actually duplicate stubs and have exclusive stub files by version
  • ReturnTypeProviders/ParamTypeProviders use Psalm's internals to do basically anything, very powerful but pretty complex for basic needs

I don't think there's a lot of space for improvements here, removing any of those 3 steps would be a massive work and I don't think it would be for the best.

What bothered me for this PR is that you had to create stubs for functions that are perfectly fine in Callmaps just for tainting. This will force you to actually reproduce the infos from callmaps in the stubs and that's a downgrade

I think a new Callmap-like file with taint could be a good addition, it would allow removing some stubs in favor of a simpler and more efficient syntax

@mmcev106
Copy link
Collaborator Author

mmcev106 commented Jan 4, 2023

@orklah & @weirdan, don't feel like you need to rush to respond. I'm sure this is low priority for you.

Consolidating the three sources of truth all at once does sound intimidating. However, it may not be that difficult to lay the groundwork for a new format that could eventually consolidate all of them. The functions in this PR could be the only things there to start, and everything else could be moved there incrementally over time.

If you were designing psalm for the first time today, what would be the ideal way to represent our 3 sources of truth in a single place?

Maybe the CallMap format, with expanded support for more advanced features like so?

'array_splice' => [
  'getFunctionReturnType' => function(FunctionReturnTypeProviderEvent $event): array{
    // Insert TypeProvider implementation
  },
  'arguments' => [
    'array',
    '&rw_array' => [
      'returnType' => 'array',
      'taintFlow' => true,
      // Maybe another item to replace the odd 'rw' prefix
    ],
    'offset' => 'int',
    'length=' => '?int',
    'replacement=' =>  [
      'returnType' => 'array|string',
      'taintFlow' => true,
    ]
  ]
],

I also wonder if we're unnecessarily reinventing the wheel by trying to describe PHP behaviors in something other than PHP... Would it make more sense to go all in on the stubs feature, including analyzing implementations like we would any other code that psalm analyzes?

I'm sure I'm oversimplifying things, but dumb questions sometimes help get the wheels turning. There has to be an elegant solution if we can only imagine it, and an incremental way to transition toward it.

@mmcev106 mmcev106 changed the base branch from master to 5.x May 30, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants