-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat(source-linkedin-ads): Remove custom cursors and retrievers from analytics streams to enable concurrent processing #58114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source-linkedin-ads): Remove custom cursors and retrievers from analytics streams to enable concurrent processing #58114
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one question on something that I don't think is a problem and I would like to learn about
Also, can you compare speed before and after this change? It can be just one stream as a sample but it would be interesting to have a high level estimate of how much it helped that we can share to support
property_chunking: | ||
type: PropertyChunking | ||
property_limit_type: property_count | ||
property_limit: 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the property_limit 18 because of dateRange and pivotValue which are not included in the count to 20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct. the actual max is 20, but we must include those two values in every request in order to merge records together correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it would have been a better UX to have the property chunking consider the values that are always present so that the limit aligns with the API restrictions. For property_count, it is probably fine but for number of bytes, it becomes a bit more annoying to calculate. Overall, it is probably a nit but still wanted to raise my concern here
Passing regression test run: ![]() |
regression test notes:
|
https://github.com/airbytehq/airbyte-internal-issues/issues/12144
What
We have 11 analytics streams that used to implement
CustomRetriever
andCustomIncrementalSync
components. Because they use a custom cursor, we forced this to use the synchronous engine because custom cursors aren't compatible with the concurrent framework.With the introduction of the property chunking feature introduced in the CDK in version
6.45.0
, the custom component functionality now exists in the low-code framework without the need for custom cursors.How
All of the 11 analytics streams follow roughly the same pattern so each change should apply to the others
dateRange
query parameter to read from the current slice to get year/month/day. The custom components has custom behavior to injectstart.year
into slices. Slices actually already have the correct information using jinja datetime functions like.year
,.month
$parameters
and instead the idiomatic place underDeclarativeStream.transformations
. We originaly did this because we did not have a good way to inject the transformation into our custom retriever. However, now that we don't have a custom retriever, we can define them as we do for other low-code connectorsQueryProperties
component and use it in all analytics streamsUser Impact
None
Can this PR be safely reverted and rolled back?