Message 71855 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	brett.cannon
Recipients	benjamin.peterson, brett.cannon, loewis
Date	2008-08-24.19:53:26
SpamBayes Score	0.007948765
Marked as misclassified	No
Message-id	<[email protected]>
In-reply-to

Content
The test_imp stuff has to do with PyTokenizer_FindEncoding(). imp.find_module() only opens the file, passes the file descriptor to PyTokenizer_FindEncoding() and then returns a file object with the found encoding. Problem is that (as issue 3594 points out), PyTokenizer_FindEncoding() always fails. That means it assumes only the raw encodings are okay. With Latin-1 being one of them, it returns the file opened as Latin-1 as is correct. Removing that case here means PyTokenizer_FindEncoding() fails, and thus assumes only UTF-8 as a legitimate encoding and opens the files with the UTF-8 encoding. It took a while to find these two bugs obviously. =)

The test_imp stuff has to do with PyTokenizer_FindEncoding().
imp.find_module() only opens the file, passes the file descriptor to
PyTokenizer_FindEncoding() and then returns a file object with the found
encoding.

Problem is that (as issue 3594 points out), PyTokenizer_FindEncoding()
always fails. That means it assumes only the raw encodings are okay.
With Latin-1 being one of them, it returns the file opened as Latin-1 as
is correct. Removing that case here means PyTokenizer_FindEncoding()
fails, and thus assumes only UTF-8 as a legitimate encoding and opens
the files with the UTF-8 encoding. It took a while to find these two
bugs obviously. =)

History
Date	User	Action	Args
2008-08-24 19:53:28	brett.cannon	set	recipients: + brett.cannon, loewis, benjamin.peterson
2008-08-24 19:53:27	brett.cannon	set	messageid: <[email protected]>
2008-08-24 19:53:27	brett.cannon	link	issue3574 messages
2008-08-24 19:53:26	brett.cannon	create