Skip to content

feat(source-linkedin-ads): Remove custom cursors and retrievers from analytics streams to enable concurrent processing #58114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

brianjlai
Copy link
Contributor

https://github.com/airbytehq/airbyte-internal-issues/issues/12144

What

We have 11 analytics streams that used to implement CustomRetriever and CustomIncrementalSync components. Because they use a custom cursor, we forced this to use the synchronous engine because custom cursors aren't compatible with the concurrent framework.

With the introduction of the property chunking feature introduced in the CDK in version 6.45.0, the custom component functionality now exists in the low-code framework without the need for custom cursors.

How

All of the 11 analytics streams follow roughly the same pattern so each change should apply to the others

  • Update the dateRange query parameter to read from the current slice to get year/month/day. The custom components has custom behavior to inject start.year into slices. Slices actually already have the correct information using jinja datetime functions like .year, .month
  • Move the transformers from being defined in $parameters and instead the idiomatic place under DeclarativeStream.transformations. We originaly did this because we did not have a good way to inject the transformation into our custom retriever. However, now that we don't have a custom retriever, we can define them as we do for other low-code connectors
  • Replace the custom component definitions with default implementations
  • Define QueryProperties component and use it in all analytics streams

User Impact

None

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Apr 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 18, 2025 8:29pm

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one question on something that I don't think is a problem and I would like to learn about

Also, can you compare speed before and after this change? It can be just one stream as a sample but it would be interesting to have a high level estimate of how much it helped that we can share to support

property_chunking:
type: PropertyChunking
property_limit_type: property_count
property_limit: 18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the property_limit 18 because of dateRange and pivotValue which are not included in the count to 20?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct. the actual max is 20, but we must include those two values in every request in order to merge records together correctly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it would have been a better UX to have the property chunking consider the values that are always present so that the limit aligns with the API restrictions. For property_count, it is probably fine but for number of bytes, it becomes a bit more annoying to calculate. Overall, it is probably a nit but still wanted to raise my concern here

@brianjlai
Copy link
Contributor Author

Passing regression test run:
https://github.com/airbytehq/airbyte/actions/runs/14526014907

linkedin_results

@brianjlai
Copy link
Contributor Author

regression test notes:

  • There are some differing values on one of the other test runs, but after running and inspecting some results, it seems like the mismatches might be drift between when control vs. candidate runs are kicked off. The exact counts of records are the same. I also looked at some local syncs of our test data to verify that fields exist in the final emitted record
  • And on the attached run above, there were exact correct record counts or all data matched. I am inclined to release this w/ a progressive rollout

@brianjlai brianjlai merged commit c723a2e into master Apr 18, 2025
27 checks passed
@brianjlai brianjlai deleted the brian/linkedin_ads_remove_custom_retriever_and_cursor branch April 18, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants