-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support load CSV in PPL (inputlookup or search) #638
Comments
+1 on (Preferred) Upload CSV files to external URL. |
I agree with @penghuo this is a possible security concern - I would propose using a different approach: |
I hate to be that guy, but I know of those in the community who would want to load the CSV into their index as well as those who want to load the CSV into cloud storage. From a priority perspective, index should be the first as it is the easiest (assuming the analyst has write access to the cluster). Dealing with cloud storage introduces permissions friction. |
A straightforward solution is allowing the
Yes. I got the priorities. We have the |
Support the functionality of loading data from CSV file.
file location
There are two options in which a CSV file to store:
SPARK_LOCAL_DIRS
environment variable or configspark.local.dir
, For example,$SPARK_LOCAL_DIRS/<some_identities>/lookups/test.csv
. But uploading to an local dir could introduce potential security issues, especially if the Spark application runs on cloud service.s3://<bucket>/foo/bar/test.csv
,file:///foo/bar/test.csv
.PPL syntax
There are also two options to support this feature:
A. Introduce a new command
inputlookup
orinput
:Usage:
The
FlightDelay > 500
only works when the flights.csv contains a csv header.B. Modify the current
search
command to support file:Usage:
PS: the current
search
command syntax isBoth option A and B could be used in sub-search:
The text was updated successfully, but these errors were encountered: