Allowed forced reading of files as UTF-8 #8

rossjones · 2013-12-03T11:23:21Z

I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.

davidmiller · 2013-12-05T18:24:52Z

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about
between ascii, 8859-1 and utf-8 just to read a text file into a unicode.
99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want
is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and
know that whatever I read is utf-8/unicode sans faff.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8
.

Love regards etc

David Miller
http://www.deadpansincerity.com
07854 880 883

rossjones · 2013-12-05T18:27:40Z

So problem is:

If you use open().read() and you read a file that has an accented character in it (says a þ) then it comes out as \x634234 because read() only reads ascii.
I then have to arse about decoding it or do

import codecs
codecs.open(filename, ‘r’, ‘utf-8’)

And then .read() returns a unicode.

Also, this email is the test file.

R

On 5 Dec 2013, at 18:24, David Miller notifications@github.com wrote:

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about
between ascii, 8859-1 and utf-8 just to read a text file into a unicode.
99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want
is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and
know that whatever I read is utf-8/unicode sans faff.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/8
.

Love regards etc

David Miller
http://www.deadpansincerity.com
07854 880 883
—
Reply to this email directly or view it on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowed forced reading of files as UTF-8 #8

Allowed forced reading of files as UTF-8 #8

rossjones commented Dec 3, 2013

davidmiller commented Dec 5, 2013

rossjones commented Dec 5, 2013

Allowed forced reading of files as UTF-8 #8

Allowed forced reading of files as UTF-8 #8

Comments

rossjones commented Dec 3, 2013

davidmiller commented Dec 5, 2013

rossjones commented Dec 5, 2013