Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support asof join #15411

Open
wants to merge 187 commits into
base: main
Choose a base branch
from
Open

feat: support asof join #15411

wants to merge 187 commits into from

Conversation

zenus
Copy link
Contributor

@zenus zenus commented May 6, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. thanks to @xudong963 for the nice work of range join
  2. thanks to duckdb for the nice idea of refactor window functions to implement asof join

Currently, due to different ways of implementing range join, the order of results obtained by asof join is random. I may be able to get help from @xudong963。that's why i do not add any test case.

image

Benchmark

image

build side : 5w probe side : 5w

image

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 6, 2024
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zenus

I think we don't need to ensure the order of results. It makes sense for different databases to have different results order.

I've skimmed through the code and there are a few points to note:

  1. todo!() needs to be replaced with code that makes sense or provides a specific error message.
  2. build_asof_join method can be further split into smaller functions.
  3. Regarding tests, if you are going to borrow the test set from duckdb, you can put the tests in the directory https://github.com/datafuselabs/databend/tree/main/tests/sqllogictests/suites/duckdb/asof_ join
  4. AsOf join can be used as an alternative to window to semantically express temporal relationships, if so, can you provide a performance comparison of asof join vs window?

In addition, I've opened a tracking issue on asof join, if you have time you can continue to finish some sub-issues in it.

@xudong963 xudong963 marked this pull request as draft May 7, 2024 11:12
@zenus
Copy link
Contributor Author

zenus commented May 7, 2024

Thanks @zenus

I think we don't need to ensure the order of results. It makes sense for different databases to have different results order.

I've skimmed through the code and there are a few points to note:

  1. todo!() needs to be replaced with code that makes sense or provides a specific error message.
  2. build_asof_join method can be further split into smaller functions.
  3. Regarding tests, if you are going to borrow the test set from duckdb, you can put the tests in the directory https://github.com/datafuselabs/databend/tree/main/tests/sqllogictests/suites/duckdb/asof_ join
  4. AsOf join can be used as an alternative to window to semantically express temporal relationships, if so, can you provide a performance comparison of asof join vs window?

In addition, I've opened a tracking issue on asof join, if you have time you can continue to finish some sub-issues in it.

nice advice , my pleasure.

@zenus
Copy link
Contributor Author

zenus commented Jul 18, 2024

@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.

@Dousir9
Copy link
Member

Dousir9 commented Jul 22, 2024

@Dousir9 could you help me , i can not pass all the case, in my laptop, i passed all new test case.

@zenus Yeah, I will review this PR today.

@Dousir9
Copy link
Member

Dousir9 commented Jul 24, 2024

@zenus Sorry for the wait, I was really busy, let's continue this excellent work.

  1. For test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest.
  2. For test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can use scripts/ci/deploy/databend-query-cluster-3-nodes.sh to start a cluster, and then execute explain ....
  3. Resolving conflicting files with the main branch.

@zenus
Copy link
Contributor Author

zenus commented Aug 13, 2024

@zenus Sorry for the wait, I was really busy, let's continue this excellent work.

  1. For test_asof_join_ints.test:65, we can fix it by adding an order by column to this sqllogictest.
  2. For test_asof_join_inequal.test:23, it looks like we need to support distributed execution for asof join, cloud you show the query plan for this sqllogictest under distributed mode ? you can use scripts/ci/deploy/databend-query-cluster-3-nodes.sh to start a cluster, and then execute explain ....
  3. Resolving conflicting files with the main branch.

@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code.
image

@zenus
Copy link
Contributor Author

zenus commented Aug 26, 2024

@Dousir9 i have fixed the first and third , and for the second one i have no idea after i took a week to read the code.

@xudong963 could you help me out ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants