Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"function lineage" #620

Open
fomcl opened this issue May 27, 2024 · 1 comment
Open

"function lineage" #620

fomcl opened this issue May 27, 2024 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@fomcl
Copy link

fomcl commented May 27, 2024

Is your feature request related to a problem? Please describe.
If an alias was used for a function call, I would like to know what the original function call looked like.

# sample code
from sqllineage.runner import LineageRunner

sql = """CREATE TABLE result AS
SELECT procedures.fahrenheit(c) AS f, a, b AS b_name
FROM tbl1
INNER JOIN tbl2
ON tbl1.some_id = tbl2.some_id;"""
result = LineageRunner(sql, dialect="greenplum")
result.print_column_lineage()
lineage = result.get_column_lineage()

# actual output
<default>.result.a <- a
<default>.result.b_name <- b
<default>.result.f <- c

# desired output
<default>.result.a <- a
<default>.result.b_name <- b
<default>.result.f <- <default>.result.procedures.fahrenheit(c) <- c

Describe the solution you'd like
See above. Right now, there's no way (or perhaps I'm doing something wrong!) of telling if c is a copy of f (select f as c) or something else.

By the way, before I forget: awesome package. Thank you very much for creating this!

@fomcl fomcl added the enhancement New feature or request label May 27, 2024
@reata
Copy link
Owner

reata commented Jun 8, 2024

Thanks for loving sqllineage.

We have a separate issue #443 to track a similar request, and function is considered a subset of column expression there. For your example, there are other SQL patterns like:

-- directly using the formula to convert Celcius to Fahrenheit
select (c * 9 / 5+ 32;
-- using case clause together with function.
select case when t == 'Celsius' then procedures.fahrenheit(c) when t == 'Fahrenheit' then c else NULL end;

All this would be too much to add to the column lineage path, if possible.

Thinking of lineage as a graph, the preferred approach would be to add the raw expression as a property to the column lineage edge (and somehow expose it to the user via some new API), instead of adding another node.

We're nowhere near getting #443 done yet. But do let me know your thoughts.

@reata reata added the question Further information is requested label Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants