Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Safe printing' format is ambiguous and undocumented, and it should not be used on NUL-delimited output #130

Open
JustAnotherArchivist opened this issue Nov 9, 2020 · 2 comments
Labels
enhancement New feature or request target:4.96.0

Comments

@JustAnotherArchivist
Copy link
Contributor

JustAnotherArchivist commented Nov 9, 2020

When printing filenames (and possibly other things), lsof replaces some characters with an escaped form. However, this format is ambiguous:

> touch $'a\x0b'
> touch 'a^K'
> # Open these files, e.g. with less in two other terminals:  less $'a\x0b'  +  less 'a^K'
> lsof -Fn $'a\x0b' 'a^K'
p534276
f4
na^K
p534277
f4
na^K

Even when using the NUL-delimited output – which can safely include any possible character in a filename, since NUL bytes are not allowed there – the escaping still takes place:

> lsof -F0n $'a\x0b' 'a^K' | xxd
00000000: 7035 3334 3237 3600 0a66 3400 6e61 5e4b  p534276..f4.na^K
00000010: 000a 7035 3334 3237 3700 0a66 3400 6e61  ..p534277..f4.na
00000020: 5e4b 000a                                ^K..

There are three issues here, in my opinion:

  1. The format is ambiguous on ^ escapes.
  2. The format is undocumented.
  3. The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.
@masatake
Copy link
Contributor

masatake commented Nov 9, 2020

Thank you for reporting.

  1. The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.

I agree. If -F0 is given, lsof should print the filename as-is. I will work on this item.

  1. The format is undocumented.

I found the following description:

OUTPUT
...
       Lsof  only  outputs printable (declared so by isprint(3)) 8 bit charac‐
       ters.  Non-printable characters are printed in one of three forms:  the
       C  ``\[bfrnt]'' form; the control character `^' form (e.g., ``^@''); or
       hexadecimal leading ``\x'' form (e.g., ``\xab'').  Space is  non-print‐
       able in the COMMAND column (``\x20'') and printable elsewhere.

Is this not enough?

1.The format is ambiguous on ^ escapes.

My understanding of your point out is the format cannot represent ^ itself.
Am I correct?

@masatake masatake added the enhancement New feature or request label Nov 9, 2020
@JustAnotherArchivist
Copy link
Contributor Author

  1. Yes, I think that's effectively the issue. Escaping a literal ^ with a backslash should be sufficient. (^^ won't work as that's used for 0x1e.)
  2. Ah, thank you, I missed that as I was searching for \n etc. directly. Yes, that's fine. The only thing missing from it, as far as I can tell, is that backslashes are also escaped as \\.
  3. Excellent, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request target:4.96.0
Projects
None yet
Development

No branches or pull requests

2 participants