Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output binding only upserting columns present in the first row #888

Closed
rc-ig opened this issue Jul 20, 2023 · 5 comments
Closed

Output binding only upserting columns present in the first row #888

rc-ig opened this issue Jul 20, 2023 · 5 comments
Labels

Comments

@rc-ig
Copy link

rc-ig commented Jul 20, 2023

I have a Python function where I want to output multiple rows that may have different sets of columns. I am using an SQL output binding, passing .set() an SqlRowList.

The function code looks like this

example_records = [
    {'pk': 1, 'column1': value1, 'column2': value2 },
    {'pk': 2, 'column1': value3, 'column2': value4, 'column3': value5 }
]

output_rows = func.SqlRowList([func.SqlRow.from_dict(row) for row in example_records])

output_table.set(output_rows)

Running this function produces a log like the following:

TID:31 BEGIN UpsertRowsAsync
TID:31 BEGIN OpenUpsertRowsAsyncConnection
TID:31 END OpenUpsertRowsAsyncConnection
TID:31 BEGIN GetServerTelemetryProperties Query=SELECT SERVERPROPERTY('EngineEdition'), SERVERPROPERTY('Edition')
Sending event TableInfoCacheMiss
TID:31 BEGIN RetrieveTableInformationAsync
TID:31 BEGIN GetColumnDefinitions Query="
    select
        COLUMN_NAME, DATA_TYPE +
            case
                when CHARACTER_MAXIMUM_LENGTH = -1 then '(max)'
                when CHARACTER_MAXIMUM_LENGTH <> -1 then '(' + cast(CHARACTER_MAXIMUM_LENGTH as varchar(4)) + ')'
                when DATETIME_PRECISION is not null and DATA_TYPE not in ('datetime', 'date', 'smalldatetime') then '(' + cast(DATETIME_PRECISION as varchar(1)) + ')'
                when DATA_TYPE in ('decimal', 'numeric') then '(' + cast(NUMERIC_PRECISION as varchar(9)) + ',' + + cast(NUMERIC_SCALE as varchar(9)) + ')'
                else ''
            end as COLUMN_DEFINITION
    from
        INFORMATION_SCHEMA.COLUMNS c
    where
        c.TABLE_NAME = 'destination_sql_table'
    and
        c.TABLE_SCHEMA = 'dbo'"
Sending event GetColumnDefinitions
TID:31 END GetColumnDefinitions Duration=4ms
TID:31 BEGIN GetPrimaryKeys Query="
    SELECT
        ccu.COLUMN_NAME,
        c.is_identity,
        case
            when isc.COLUMN_DEFAULT = NULL then 'false'
            else 'true'
        end as has_default
    FROM
        INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
    INNER JOIN
        INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu ON ccu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME AND ccu.TABLE_NAME = tc.TABLE_NAME
    INNER JOIN
        sys.columns c ON c.object_id = OBJECT_ID('[dbo].[destination_sql_table]') AND c.name = ccu.COLUMN_NAME
    INNER JOIN
        INFORMATION_SCHEMA.COLUMNS isc ON isc.TABLE_NAME = 'destination_sql_table' AND isc.COLUMN_NAME = ccu.COLUMN_NAME
    WHERE
        tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
    and
        tc.TABLE_NAME = 'destination_sql_table'
    and
        tc.TABLE_SCHEMA = 'dbo'"
Sending event GetPrimaryKeys
TID:31 END GetPrimaryKeys Duration=286ms
Sending event GetTableInfo
TID:31 END RetrieveTableInformationAsync Duration=291ms DB and Table: AzureSQLDB.dbo.destination_sql_table. Primary keys: [pk]. SQL Column and Definitions:  [[pk, char(4)],[column1, char(3)],[column2, char(2)],[column3, int]],Object columns: [pk, column1, column2]
TID:31 BEGIN UpsertRowsTransaction
TID:31 UpsertRowsTransactionBatch - Query=WITH cte AS ( SELECT * FROM OPENJSON(@rowData) WITH ([pk] char(4),[column2] char(2),[column1] char(3)) ) 
    MERGE INTO [dbo].[destination_sql_table] WITH (HOLDLOCK)
        AS ExistingData
    USING cte
        AS NewData
    ON
        ExistingData.[pk] = NewData.[pk]
    WHEN MATCHED THEN
        UPDATE SET  ExistingData.[pk] = NewData.[pk], ExistingData.[column2] = NewData.[column2], ExistingData.[column1] = NewData.[column1]
    WHEN NOT MATCHED THEN
        INSERT ([pk],[column2],[column1]) VALUES ([pk],[column2],[column1]);
Sending event Upsert
TID:31 END UpsertRowsTransaction Duration=34ms Upserted 36 row(s) into database: AzureSQLDB and table: dbo.destination_sql_table.
TID:31 END UpsertRowsAsync Duration=329ms

The server audit log shows that the following statement was executed:

WITH cte AS ( SELECT * FROM OPENJSON(@rowData) WITH ([pk] char(4),[column1] char(3),[column2] char(2)) ) 
MERGE INTO [dbo].[destination_sql_table] WITH (HOLDLOCK)
    AS ExistingData
USING cte
    AS NewData
ON
    ExistingData.[pk] = NewData.[pk]
WHEN MATCHED THEN
    UPDATE SET  ExistingData.[pk] = NewData.[pk], ExistingData.[column2] = NewData.[column2], ExistingData.[column1] = NewData.[column1]
WHEN NOT MATCHED THEN
    INSERT ([pk],[column2],[column1]) VALUES ([pk],[column2],[column1]);',N'@rowData nvarchar(max) ',@rowData=N'[{'pk': 1, 'column1': value1, 'column2': value2 },{'pk': 2, 'column1': value3, 'column2': value4, 'column3': value5 }]'

It seems to me that the output_table.set(output_rows) statement is only upserting the columns that are present in the first row. Is this the expected behaviour? Is there any way to upsert multiple rows with differing columns to a table in one function call?

I have tried to insert individual SqlRows inside a loop, but that method seems to only actually send the final call to .set() to the database.

@Charles-Gagnon
Copy link
Contributor

Could you provide the CREATE TABLE script for your DB?

@rc-ig
Copy link
Author

rc-ig commented Jul 21, 2023

Hi Charles,

The data for the function in question is all confidential, so I'm using placeholder names for columns here, and I'm only showing the first few columns of the table

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[destination_sql_table](
	[pk] [char](4) NOT NULL,
	[column1] [char](3) NULL,
	[column2] [char](2) NULL,
	[column3] [int] NULL,
) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
ALTER TABLE [dbo].[destination_sql_table] ADD  CONSTRAINT [PK_destination_sql_table] PRIMARY KEY CLUSTERED 
(
	[pk] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

I don't think there's an issue with the database columns themselves. There have been function executions in the past that have inserted values into column3 without any issues. For those executions the first row being inserted had a value present for column3, and the SQL Audit Logs show references to it in the SELECT * FROM and UPDATE SET parts of the executed statements.

For some of those working executions the full column set was not present for all rows, but when it worked it was always present in the first row.

@Charles-Gagnon
Copy link
Contributor

Thanks, I just wanted to confirm that you had a nullable extra column.

I can confirm that we are indeed just looking at the first row when retrieving columns to look at - you can see the relevant code here : https://github.com/Azure/azure-functions-sql-extension/blob/main/src/SqlAsyncCollector.cs#L229

It's currently expected that all items passed in are the same - and that any optional values are still defined for every row. You can accomplish this by using the None type in your example :

example_records = [
    {'pk': 1, 'column1': value1, 'column2': value2, 'column3': None },
    {'pk': 2, 'column1': value3, 'column2': value4, 'column3': value5 }
]

Even better would be to create a class for your object - that way you can easily guarantee all the required columns are there and that optional columns have a default value. You can see an example of this in our samples with the Product class : https://github.com/Azure/azure-functions-sql-extension/blob/main/samples/samples-python/Common/product.py

Let us know if this doesn't work for you and we can help try to find a better solution!

@Charles-Gagnon
Copy link
Contributor

I just sent out a PR to update the docs to hopefully make this more clear

#889

@rc-ig
Copy link
Author

rc-ig commented Jul 24, 2023

Thanks Charles, I've adjusted my code and this seems to be working as expected.

@rc-ig rc-ig closed this as completed Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants