Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLSelect regression with subqueries #11276

Closed
2 tasks done
marwan38 opened this issue Jun 12, 2024 · 16 comments
Closed
2 tasks done

SQLSelect regression with subqueries #11276

marwan38 opened this issue Jun 12, 2024 · 16 comments

Comments

@marwan38
Copy link

marwan38 commented Jun 12, 2024

Module version(s) affected

5

Description

Regression within the SQLSelect class causes sub queries to be duplicated. This can be very easily reproduced in a blank silverstripe project.

How to reproduce

The regression itself:

Creating the select statement

$sql = SQLSelect::create('*', '( SELECT DISTINCT "SiteTree"."ClassName" FROM "SiteTree" ) as FINAL');

Output

SELECT *
 FROM ( SELECT DISTINCT "SiteTree"."ClassName" FROM "SiteTree" ) as FINAL AS "( SELECT DISTINCT
SiteTree.ClassName FROM SiteTree ) as FINAL" 

Note that the subquery is being duplicated.

Possible Solution

no response

Additional Context

I noticed that the issue cannot be reproduced when double quotations (") are left out of the query. Following the same reproduction steps WITHOUT the double quotations results in the correct output:

$sql = SQLSelect::create('*', '( SELECT DISTINCT SiteTree.ClassName FROM SiteTree ) as FINAL');

Output

SELECT *
 FROM ( SELECT DISTINCT SiteTree.ClassName FROM SiteTree ) as FINAL 

The reproduction code is an MVP, our actual use case is a little bit more complex with additional where statements, and froms. This worked just fine in silverstripe v4. A closer reproduction would be something like

$siteTree = SiteTree::get();
$query = $siteTree->dataQuery()->sql();
$finalQuery = SQLSelect::create('*', '(' . $query . ') as FINAL');
Debug::dump($finalQuery->sql());

Output (Note the duplication after the aliasing "AS FINAL AS "...."):

SELECT *
 FROM (SELECT DISTINCT "SiteTree_Live"."ClassName", "SiteTree_Live"."LastEdited",
"SiteTree_Live"."Created", "SiteTree_Live"."CanViewType", "SiteTree_Live"."CanEditType",
"SiteTree_Live"."Version", "SiteTree_Live"."URLSegment", "SiteTree_Live"."Title",
"SiteTree_Live"."MenuTitle", "SiteTree_Live"."Content", "SiteTree_Live"."MetaDescription",
"SiteTree_Live"."ExtraMeta", "SiteTree_Live"."ShowInMenus", "SiteTree_Live"."ShowInSearch",
"SiteTree_Live"."Sort", "SiteTree_Live"."HasBrokenFile", "SiteTree_Live"."HasBrokenLink",
"SiteTree_Live"."ReportClass", "SiteTree_Live"."ParentID", "SiteTree_Live"."ID", 
			CASE WHEN "SiteTree_Live"."ClassName" IS NOT NULL THEN "SiteTree_Live"."ClassName"
			ELSE 'SilverStripe\\CMS\\Model\\SiteTree' END AS "RecordClassName"
 FROM "SiteTree_Live" 
 ORDER BY "SiteTree_Live"."Sort" ASC) as FINAL AS "(SELECT DISTINCT SiteTree_Live.ClassName,
SiteTree_Live.LastEdited, SiteTree_Live.Created, SiteTree_Live.CanViewType,
SiteTree_Live.CanEditType, SiteTree_Live.Version, SiteTree_Live.URLSegment, SiteTree_Live.Title,
SiteTree_Live.MenuTitle, SiteTree_Live.Content, SiteTree_Live.MetaDescription,
SiteTree_Live.ExtraMeta, SiteTree_Live.ShowInMenus, SiteTree_Live.ShowInSearch, SiteTree_Live.Sort,
SiteTree_Live.HasBrokenFile, SiteTree_Live.HasBrokenLink, SiteTree_Live.ReportClass,
SiteTree_Live.ParentID, SiteTree_Live.ID, 
			CASE WHEN SiteTree_Live.ClassName IS NOT NULL THEN SiteTree_Live.ClassName
			ELSE 'SilverStripe\\CMS\\Model\\SiteTree' END AS RecordClassName
 FROM SiteTree_Live 
 ORDER BY SiteTree_Live.Sort ASC) as FINAL" 

Changing the alias to be resolved as the array index, like

SQLSelect::create('*', ['Final' => '(' . $siteTreeSql . ')']);

resolves the same way.

Validations

  • Check that there isn't already an issue that reports the same bug
  • Double check that your reproduction steps work in a fresh installation of silverstripe/installer (with any code examples you've provided)

Pull requests

@GuySartorelli
Copy link
Member

You say this is a regression - what is the latest version you saw this working in?

@marwan38
Copy link
Author

You say this is a regression - what is the latest version you saw this working in?

We're on version 4.13.42 of the framework I believe. Upgrading straight to 5.2

@michalkleiner
Copy link
Contributor

michalkleiner commented Jun 12, 2024

@marwan38 the example SQL ($sql = SQLSelect::create('*', '( SELECT DISTINCT "SiteTree"."ClassName" FROM "SiteTree" ) as FINAL');) is the same in both how to reproduce and possible solution. It won't fix the issue, but maybe will give more context on what you found out so far. Perhaps should be different?

@marwan38
Copy link
Author

marwan38 commented Jun 12, 2024

@michalkleiner

It's mentioned just above the example,.. my mistake I've updated it. An important note right below is that it's not exactly a "possible solution" but rather something I discovered which may help debug the issue.

@GuySartorelli
Copy link
Member

I'm going to move that into the "additional context" section then - since that's what you're saying it is.

@maxime-rainville
Copy link
Contributor

I've been asked to look into fixing this at my new job. So I'll have glance to see how difficult - or not difficult it is.

@maxime-rainville
Copy link
Contributor

I'm making a start on this.

First, I just replicated the behaviour. Adding this unit test to vendor/silverstripe/framework/tests/php/ORM/SQLSelectTest.php replicates the issue.

public function testSubqueries()
{
    $query = new SQLSelect('*', '( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO" ) as FINAL');

   $this->assertSQLEquals(
       'SELECT * FROM ( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO" ) as FINAL',
       $query->sql()
   );

}

The test passes on 4.13.x-dev. On 5.2.x-dev the test fails with:

1) SilverStripe\ORM\Tests\SQLSelectTest::testSubqueries
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'SELECT * FROM ( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO" ) as FINAL'
+'SELECT * FROM ( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO" ) as FINAL AS "( SELECT DISTINCT SQLSelectTest_DO.ClassName FROM SQLSelectTest_DO ) as FINAL"'

/var/www/html/vendor/silverstripe/framework/src/Dev/SapphireTest.php:808
/var/www/html/vendor/silverstripe/framework/tests/php/ORM/SQLSelectTest.php:1335
/var/www/html/vendor/bin/phpunit:122

@maxime-rainville
Copy link
Contributor

Here's what I found so far.

When did the issue get introduced

This behaviour was introduced by this PR: #10462

From the unit test that got added, it looks like we were trying to resolved an issue where someone would define an alias in the subquery's SQL and in the array key. Doesn't look like this was meant to affect the scenario where a SQLSelect is created with a from provided as string.

Why this is happening

Let's say you instanciate an SQLSelect this way: new SQLSelect('*', '( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO" ) as FINAL');,

The $from parameter can be provided as a string or an array. If you provide an associative array, the keys will be considered as the alias.

The problem is that if you provide a string, the key is assumed to be equal to the string. Presumably, this behaviour is there so that "MyTableName" is automatically aliased to itself. For our example, our $from parameter gets converted to:

[
    "'( SELECT DISTINCT \"SQLSelectTest_DO\".\"ClassName\" FROM \"SQLSelectTest_DO\" ) as FINAL')" =>
         "'( SELECT DISTINCT \"SQLSelectTest_DO\".\"ClassName\" FROM \"SQLSelectTest_DO\" ) as FINAL')"
]

That wasn't a problem in CMS 4 because the key was ignored if the subquery already contained an AS statement.

Bypassing the problem

If you wrap the sub query string in an array, it's happy because SQLSelect doesn't try to generate a matching key.

In the example below, all the scenarios except no-alias-string pass.

    public function subqueryProvider()
    {
        return [
            'no-alias-string' => ['( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO") AS "FINAL"'],
            'no-alias-array' => [['( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO") AS "FINAL"']],
            'no-alias-array-numeric-key' => [[0 => '( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO") AS "FINAL"']],
            'explicit-alias-string' => [['FINAL' => '( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO")']],
        ];
    }

    /**
     * @dataProvider subqueryProvider
     */
    public function testSubqueries($subquery)
    {
        // var_dump($subquery);
        $query = new SQLSelect('*', $subquery);

        $actualSQL = $query->sql();

        $this->assertSQLEquals(
            'SELECT * FROM ( SELECT DISTINCT "SQLSelectTest_DO"."ClassName" FROM "SQLSelectTest_DO") AS "FINAL"',
            $actualSQL
        );
    }

Questions or Possible solution

  • Do we think the current behaviour is actually the correct one?
    • I don't think so. Seems really weird to implicitly assume aliases in this scenario

Assuming we agree that this is behaviour is a bug, then the next step would be to decide how to fix it:

  1. We could revert Rescue Master Branch PR: Fix SQLConditionalExpression::getJoins so it always adds explicit aliases #10462.
  2. We update SQLSelect (and other related SQLExpression) to not implicitly add alias when the $from parameter is provided as a string.

I'm leaning towards option 2, but that may have a lot of other implications, which I'm not aware of.

@GuySartorelli
Copy link
Member

GuySartorelli commented Jul 24, 2024

Great investigation.

Yup sounds like a bug to me. IMO reverting isn't really viable, so some variation of option 2 sounds good. Haven't fully wrapped my head around it so I'm not sure if there are edge cases where that'll cause a new problem, but it still sounds like the better way forward even with that possibility.

@maxime-rainville
Copy link
Contributor

I'm pretty sure it's coming from this line:

$this->from[str_replace(['"','`'], '', $from)] = $from;

I'll try hacking the line and I'll throw the kitchen sink at it. We'll see what breaks.

@maxime-rainville
Copy link
Contributor

Here's my initial results. Just documenting here so I remember where I got up to.

Try number 1

First I tried doing this change to SQLConditionExpression:

-            $this->from[str_replace(['"','`'], '', $from)] = $from;
+            $this->from[] = $from;

My assumption was that the alias wouldn't matter because it would be equal to the table name anyway.

However, Versioned didn't like that. Versioned is apparently very concerned about those table aliases.

This is the specific Versioned bit that appear to trip up my test, but there's other places in Versioned that do stuff to the table aliases.

Not obviously clear to me why Versioned works this way.

Try number two

For try number 2, decided to retain the current logic for $from that looked like plain table names.

Other $from are pushed on top of the array without an alias. That seem like a fair choice since those table aliases will be non-sense anyway.

            $from = trim($from);
            if (preg_match('/^"?([a-zA-Z0-9_]+)"?$/', $from)) {
                $this->from[str_replace(['"','`'], '', $from)] = $from;
            } else {
                $this->from[] = $from;
            }

I ran builds on those changes:

@michalkleiner
Copy link
Contributor

Could the second approach open up some SQL injection vectors if we use it as it is?

@maxime-rainville
Copy link
Contributor

@michalkleiner I don't think so ... or more accurately, it's not introducing any SQL Injection vector that wouldn't be there already.

SQLConditionalExpression::addFrom() is a low level API. Very few developers will be interacting with this method directly.

setFrom and addFrom make no effort to validate their inputs. If you're interacting with SQLConditionalExpression or its descendants, you are more or less writing raw SQL. I think it's fair to put it back on the developer to sanitised their inputs in this case.

@maxime-rainville
Copy link
Contributor

maxime-rainville commented Jul 25, 2024

My kitchen-sink build mostly passes.

The PHP 8.3 fails because I had to lock to PHP 8.1, so that's expected.

There's a few Behat failures, but those are on the regular 5.2 kitchen-sink as well ... apparently some people aren't keeping up on top of their build dashboard since their boss quit on them 😜.

I'll do a PR with my changes.

@maxime-rainville
Copy link
Contributor

Pull request ready for review: #11312

@GuySartorelli GuySartorelli self-assigned this Jul 28, 2024
@GuySartorelli
Copy link
Member

PR merged. Will be automatically tagged by GitHub Actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants