Fix fromjson() to support reading from stdin#667
Fix fromjson() to support reading from stdin#667yaniv-aknin wants to merge 2 commits intopetl-developers:masterfrom
Conversation
Pull Request Test Coverage Report for Build 9050397434Details
💛 - Coveralls |
|
|
||
|
|
||
| def fromjson(source, *args, **kwargs): | ||
| def fromjson(source=None, *args, **kwargs): |
Check warning
Code scanning / Prospector (reported by Codacy)
Keyword argument before variable positional arguments list in the definition of fromjson function (keyword-arg-before-vararg)
There was a problem hiding this comment.
Code scanning / Prospector (reported by Codacy)
Keyword argument before variable positional arguments list in the definition of fromjson function (keyword-arg-before-vararg)
This wouldn't break existing code?
Maybe read_source_from_arg can be taught to read even when stdin is passed as argument as in:
table1 = etl.fromjson(sys.stdin, header=["foo", "bar"])
print(table1)There was a problem hiding this comment.
I might be wrong, but I don't think it should break existing code.
All these assertions pass -
>>> def f(s, *a, **kw):
... return (s, a, kw)
...
>>> def fn(s=None, *a, **kw):
... return (s, a, kw)
...
>>> assert f("hello") == fn("hello")
>>> assert f(None) == fn(None)
>>> assert f("hello", 1, 2) == fn("hello", 1, 2)
>>> assert f("hello", k=1) == fn("hello", k=1)
>>> assert f(s="hello") == fn(s="hello")
>>> assert f("hello", 1, 2, k=1, j=2) == fn("hello", 1, 2, k=1, j=2)re. read_source_from_arg() - my interest is in the petl executable, i.e., I would love for syntax like this to work in shell: echo '{"a":2, "b":4}' | petl 'fromjson(lines=True)'.
What do you think?
There was a problem hiding this comment.
I think that:
- Using petl as a command line tool is an interesting use case and we should give it some love by:
a. Adding more examples in the documentation.
b. Mention this usage pattern in the repository Readme/Frontage - Currently fromjson() source argument is not consistent with other functions like:
a. tojson() and tojsonarrays()
b. fromcsv() - We value not breaking existing code.
a. But I haven't had the time to check if this is the case yet.
b. If not it will be a good change for API consistency
c. I would love to hear more opinions about this change.
|
Adding to the investigation: The fromjson() parameters args and kwargs are forwarded to the json library call: def fromjson(source, *args, **kwargs):
"""..."""
...
...
dicts = json.load(f, *self.args, **self.kwargs)
... |
|
But the json library calls used take only one positional argument followed by others keyword arguments: def json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
...
def json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
...It looks like the args parameter is simply ignored by json.load and json.loads calls. Now I'm wondering if the function syntax shouldn't be: def fromjson(source=None, *, **kwargs):
"""..."""
...
|
|
After looking at the source code, I've concluded that your proposal looks good. Also, some considerations:
So, what are your plans for this PR? |
|
Thanks @juarezr , this sounds good.
Does that sound reasonable and I can update the PR? |
|
After checking it, I've found that the change in signature won't work: >>> def fromjson(source=None, *, **kwargs):
... print("source:", source, "kwargs:", kwargs)
...
File "<stdin>", line 1
SyntaxError: named arguments must follow bare *So we should keep the args argument as it is currently, and proceed. Eventually, we could remove the self.args property and the forwarding to json.load functions as it is pointless in: dicts = json.load(f, *self.args, **self.kwargs)But it shouldn't be mandatory. |
- adds a simple test invoking the petl executable - installs the package in CI so the executable is available.
| """, shell=True, check=True, capture_output=True) | ||
| assert result.stdout == b'foo\r\na\r\n' | ||
|
|
||
| def test_json_stdin(): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing function or method docstring
| @@ -0,0 +1,22 @@ | |||
| from __future__ import print_function, division, absolute_import | |||
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing module docstring
|
|
||
| import subprocess | ||
|
|
||
| def test_executable(): |
Check warning
Code scanning / Pylintpython3 (reported by Codacy)
Missing function or method docstring
|
|
||
| import subprocess | ||
|
|
||
| def test_executable(): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring
| @@ -0,0 +1,22 @@ | |||
| from __future__ import print_function, division, absolute_import | |||
Check warning
Code scanning / Pylint (reported by Codacy)
Missing module docstring
| """, shell=True, check=True, capture_output=True) | ||
| assert result.stdout == b'foo\r\na\r\n' | ||
|
|
||
| def test_json_stdin(): |
Check warning
Code scanning / Pylint (reported by Codacy)
Missing function docstring
| ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
| petl 'fromjson(lines=True).tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo,bar\r\na,b\r\nc,d\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using `assert` in non-test code.
| (echo foo,bar ; echo a,b; echo c,d) | | ||
| petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo\r\na\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using `assert` in non-test code.
| echo '[{"foo": "a", "bar": "b"}]' | | ||
| petl 'fromjson().tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo,bar\r\na,b\r\n' |
Check notice
Code scanning / Semgrep (reported by Codacy)
The application was found using `assert` in non-test code.
| assert result == b'foo\r\na\r\n' | ||
|
|
||
| def test_json_stdin(): | ||
| result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path
| import subprocess | ||
|
|
||
| def test_executable(): | ||
| result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path
| echo '[{"foo": "a", "bar": "b"}]' | | ||
| petl 'fromjson().tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo,bar\r\na,b\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
| result = subprocess.check_output(""" | ||
| echo '[{"foo": "a", "bar": "b"}]' | | ||
| petl 'fromjson().tocsv()' | ||
| """, shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell
| petl 'fromjson().tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo,bar\r\na,b\r\n' | ||
| result = subprocess.check_output(""" |
Check warning
Code scanning / Bandit (reported by Codacy)
Starting a process with a partial executable path
| (echo foo,bar ; echo a,b; echo c,d) | | ||
| petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo\r\na\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
| ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
| petl 'fromjson(lines=True).tocsv()' | ||
| """, shell=True) | ||
| assert result == b'foo,bar\r\na,b\r\nc,d\r\n' |
Check warning
Code scanning / Bandit (reported by Codacy)
Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
| result = subprocess.check_output(""" | ||
| (echo foo,bar ; echo a,b; echo c,d) | | ||
| petl 'fromcsv().cut("foo").head(1).tocsv()' | ||
| """, shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell
| result = subprocess.check_output(""" | ||
| ( echo '{"foo": "a", "bar": "b"}' ; echo '{"foo": "c", "bar": "d"}' ) | | ||
| petl 'fromjson(lines=True).tocsv()' | ||
| """, shell=True) |
Check failure
Code scanning / Bandit (reported by Codacy)
subprocess call with shell=True seems safe, but may be changed in the future, consider rewriting without shell
|
I've made some progress, but not all checks pass yet.
I'll try to look into this again next week, but admit I'm beginning to question how much time I can put into this; small fix, lots of overhead 😅 |
|
@CodiumAI-Agent /review |
Trivial PR to support reading from stdin (
source=None) infromjson().Two things you may want me to do before merging -
source=Nonein anyiomodule (e.g., also not incsvetc). Generally, perhaps I missed them but I didn't see tests that fork/exec thebin/petlexecutable at all...?source=Noneto other modules. I rangit grep 'def from' | grep source | grep -v source=Noneand found a few more that miss it, let me know if you'd like me to add it to any/all of them.Modules which appear to be missing
source=None:avro,bcolz,pytables(maybe),xmlChecklist
Use this checklist to ensure the quality of pull requests that include new code and/or make changes to existing code.
tox/pytestmasterbranch and tested before sending the PR