Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying line ending style to Unicode filter #34

Open
todofixthis opened this issue Oct 28, 2017 · 1 comment
Open

Allow specifying line ending style to Unicode filter #34

todofixthis opened this issue Oct 28, 2017 · 1 comment

Comments

@todofixthis
Copy link
Contributor

todofixthis commented Oct 28, 2017

When initialising the Unicode filter, the user should have the ability to specify which line endings to use:

# Unix line endings (default)
>>> f.Unicode(convert_newlines='\n').apply('Foo\nBar\rBaz\r\n')
'Foo\nBar\nBaz\n'

# Windows line endings
>>> f.Unicode(convert_newlines='\r\n').apply('Foo\nBar\rBaz\r\n')
'Foo\r\nBar\r\nBaz\r\n'

# Custom line endings
>>> f.Unicode(convert_newlines='|').apply('Foo\nBar\rBaz\r\n')
'Foo|Bar|Baz'

# Line endings unmodified
>>> f.Unicode(convert_newlines=False).apply('Foo\nBar\rBaz\r\n')
'Foo\nBar\rBaz\r\n'

Note that this change potentially conflicts with the normalize argument, so we'll probably need to make a couple of additional changes to support this new functionality:

  • Add a remove_unprintables argument.
  • Rename normalize to normalize_unicode.
  • Deprecate the normalize argument.
@todofixthis
Copy link
Contributor Author

Might make more sense to create 3 new filters (StripUnprintables, NormalizeUnicode, ConvertNewlines), and then convert the Unicode filter into a macro. Just sayin'... 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant