Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowed forced reading of files as UTF-8 #8

Open
rossjones opened this issue Dec 3, 2013 · 2 comments
Open

Allowed forced reading of files as UTF-8 #8

rossjones opened this issue Dec 3, 2013 · 2 comments

Comments

@rossjones
Copy link

I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.

@davidmiller
Copy link
Owner

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about
between ascii, 8859-1 and utf-8 just to read a text file into a unicode.
99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want
is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and
know that whatever I read is utf-8/unicode sans faff.


Reply to this email directly or view it on GitHubhttps://github.com//issues/8
.

Love regards etc

David Miller
http://www.deadpansincerity.com
07854 880 883

@rossjones
Copy link
Author

So problem is:

If you use open().read() and you read a file that has an accented character in it (says a þ) then it comes out as \x634234 because read() only reads ascii.
I then have to arse about decoding it or do

import codecs
codecs.open(filename, ‘r’, ‘utf-8’)

And then .read() returns a unicode.

Also, this email is the test file.

R

On 5 Dec 2013, at 18:24, David Miller notifications@github.com wrote:

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about
between ascii, 8859-1 and utf-8 just to read a text file into a unicode.
99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want
is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and
know that whatever I read is utf-8/unicode sans faff.


Reply to this email directly or view it on GitHubhttps://github.com//issues/8
.

Love regards etc

David Miller
http://www.deadpansincerity.com
07854 880 883

Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants