Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation] Possible fanout in int_stripe__account_daily when using connected accounts #80

Open
fivetran-catfritz opened this issue May 23, 2024 · 3 comments

Comments

@fivetran-catfritz
Copy link
Contributor

fivetran-catfritz commented May 23, 2024

What to investigate

This was discovered from an investigation from error query exceeded resource limits in int_stripe__account_daily.

Upon review, this join does not join on some sort of account_id or connected_account_id. For typical cases where only one account is in use, this is no issue, however if a user is using connected accounts, this would cause a fanout since the same balance_transaction would be repeated for every account. Also, this fanout could be what is causing resource issues.

Because of internal data limitations, there is uncertainty on how to correctly address this. Ideally we want to review data from a user with connected accounts. (If that's you and would like to help us, please let us know in this thread!)

Possible Solution

For model int_stripe__account_daily, in my initial investigation I thought to update the CTE daily_account_balance_transactions with a filter like:

...
    from date_spine
    left join balance_transaction
        on cast(balance_transaction.date as date) = date_spine.date_day
        and balance_transaction.source_relation = date_spine.source_relation
        and balance_transaction.connected_account_id = 
            case when balance_transaction.connected_account_id is not null
                then date_spine.account_id
                else null end -- necessary for cases where an account is not a connected account. We don't want to erroneously filter transactions out.
    group by 1,2,3

However the issue is still that we don't have appropriate data to test if this is accurate. Just posting here, so it isn't lost.

@fivetran-catfritz fivetran-catfritz changed the title [Investigation] Possible fanout in int_stripe__account_daily [Investigation] Possible fanout in int_stripe__account_daily when using connected accounts May 23, 2024
@jjspratt
Copy link

jjspratt commented Sep 3, 2024

This might be tertiary but I have issues running this stripe__daily_overview due to connected_accounts. It will run but it takes 20-30 minutes.

It would be really convenient if there was a var to filter for a primary account_id. In our use case, the connected accounts are not relevant to our interest in the package.

@fivetran-catfritz
Copy link
Contributor Author

Hi @jjspratt thanks for commenting on this issue! Are you running this in dbt Core/Cloud or via Fivetran Quickstart? We are interested in adding this filter, however our biggest roadblock has been finding a way to automatically identify the primary account, since Quickstart does not currently support vars. Stripe does not include a flag for the primary account nor could we identify any other markers that would help. If you're not using Quickstart, then we can explore a var where you manually specify the main account.

@jjspratt
Copy link

jjspratt commented Sep 3, 2024

@fivetran-catfritz, the identifier, I believe, is that you can call accounts without specifying an account_id and it will return the primary account object. After I posted this I realized I could duplicate the account table and only insert the records I care about.

Source includes a var for table identifier. I haven't tested it yet but I feel a little silly that I hadn't thought of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants