Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1559,9 +1559,16 @@ expression support in the :mod:`re` module).
:func:`codecs.register_error`, see section :ref:`error-handlers`. For a
list of possible encodings, see section :ref:`standard-encodings`.

By default, the *errors* argument is not checked for best performances, but
only used at the first encoding error. Enable the development mode
(:option:`-X` ``dev`` option), or use a debug build, to check *errors*.

.. versionchanged:: 3.1
Support for keyword arguments added.

.. versionchanged:: 3.9
The *errors* is now checked in development mode and in debug mode.


.. method:: str.endswith(suffix[, start[, end]])

Expand Down Expand Up @@ -2575,6 +2582,10 @@ arbitrary binary data.
:func:`codecs.register_error`, see section :ref:`error-handlers`. For a
list of possible encodings, see section :ref:`standard-encodings`.

By default, the *errors* argument is not checked for best performances, but
only used at the first decoding error. Enable the development mode
(:option:`-X` ``dev`` option), or use a debug build, to check *errors*.

.. note::

Passing the *encoding* argument to :class:`str` allows decoding any
Expand All @@ -2584,6 +2595,9 @@ arbitrary binary data.
.. versionchanged:: 3.1
Added support for keyword arguments.

.. versionchanged:: 3.9
The *errors* is now checked in development mode and in debug mode.


.. method:: bytes.endswith(suffix[, start[, end]])
bytearray.endswith(suffix[, start[, end]])
Expand Down
7 changes: 7 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,9 @@ Miscellaneous options
not be more verbose than the default if the code is correct: new warnings
are only emitted when an issue is detected. Effect of the developer mode:

* Check *encoding* and *errors* arguments on string encoding and decoding
operations. Examples: :func:`open`, :meth:`str.encode` and
:meth:`bytes.decode`.
* Add ``default`` warning filter, as :option:`-W` ``default``.
* Install debug hooks on memory allocators: see the
:c:func:`PyMem_SetupDebugHooks` C function.
Expand Down Expand Up @@ -469,6 +472,10 @@ Miscellaneous options
The ``-X pycache_prefix`` option. The ``-X dev`` option now logs
``close()`` exceptions in :class:`io.IOBase` destructor.

.. versionchanged:: 3.9
Using ``-X dev`` option, check *encoding* and *errors* arguments on
string encoding and decoding operations.


Options you shouldn't use
~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
9 changes: 9 additions & 0 deletions Doc/whatsnew/3.9.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,15 @@ Other Language Changes
this case.
(Contributed by Victor Stinner in :issue:`20443`.)

* In development mode and in debug build, *encoding* and *errors* arguments are
now checked on string encoding and decoding operations. Examples:
:func:`open`, :meth:`str.encode` and :meth:`bytes.decode`.

By default, for best performances, the *errors* argument is only checked at
the first encoding/decoding error, and the *encoding* argument is sometimes
ignored for empty strings.
(Contributed by Victor Stinner in :issue:`37388`.)


New Modules
===========
Expand Down
4 changes: 4 additions & 0 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@
# Does io.IOBase finalizer log the exception if the close() method fails?
# The exception is ignored silently by default in release build.
_IOBASE_EMITS_UNRAISABLE = (hasattr(sys, "gettotalrefcount") or sys.flags.dev_mode)
# Does open() check its 'errors' argument?
_CHECK_ERRORS = _IOBASE_EMITS_UNRAISABLE


def open(file, mode="r", buffering=-1, encoding=None, errors=None,
Expand Down Expand Up @@ -2022,6 +2024,8 @@ def __init__(self, buffer, encoding=None, errors=None, newline=None,
else:
if not isinstance(errors, str):
raise ValueError("invalid errors: %r" % errors)
if _CHECK_ERRORS:
codecs.lookup_error(errors)

self._buffer = buffer
self._decoded_chars = '' # buffer for text returned from decoder
Expand Down
58 changes: 58 additions & 0 deletions Lib/test/test_bytes.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@
import functools
import pickle
import tempfile
import textwrap
import unittest

import test.support
import test.string_tests
import test.list_tests
from test.support import bigaddrspacetest, MAX_Py_ssize_t
from test.support.script_helper import assert_python_failure


if sys.flags.bytes_warning:
Expand Down Expand Up @@ -315,6 +317,62 @@ def test_decode(self):
# Default encoding is utf-8
self.assertEqual(self.type2test(b'\xe2\x98\x83').decode(), '\u2603')

def test_check_encoding_errors(self):
# bpo-37388: bytes(str) and bytes.encode() must check encoding
# and errors arguments in dev mode
invalid = 'Boom, Shaka Laka, Boom!'
encodings = ('ascii', 'utf8', 'latin1')
code = textwrap.dedent(f'''
import sys
type2test = {self.type2test.__name__}
encodings = {encodings!r}

for data in ('', 'short string'):
try:
type2test(data, encoding={invalid!r})
except LookupError:
pass
else:
sys.exit(21)

for encoding in encodings:
try:
type2test(data, encoding=encoding, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(22)

for data in (b'', b'short string'):
data = type2test(data)
print(repr(data))
try:
data.decode(encoding={invalid!r})
except LookupError:
sys.exit(10)
else:
sys.exit(23)

try:
data.decode(errors={invalid!r})
except LookupError:
pass
else:
sys.exit(24)

for encoding in encodings:
try:
data.decode(encoding=encoding, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(25)

sys.exit(10)
''')
proc = assert_python_failure('-X', 'dev', '-c', code)
self.assertEqual(proc.rc, 10, proc)

def test_from_int(self):
b = self.type2test(0)
self.assertEqual(b, self.type2test())
Expand Down
49 changes: 48 additions & 1 deletion Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import signal
import sys
import sysconfig
import textwrap
import threading
import time
import unittest
Expand All @@ -37,7 +38,8 @@
from collections import deque, UserList
from itertools import cycle, count
from test import support
from test.support.script_helper import assert_python_ok, run_python_until_end
from test.support.script_helper import (
assert_python_ok, assert_python_failure, run_python_until_end)
from test.support import FakePath

import codecs
Expand Down Expand Up @@ -4130,6 +4132,51 @@ def test_open_allargs(self):
# there used to be a buffer overflow in the parser for rawmode
self.assertRaises(ValueError, self.open, support.TESTFN, 'rwax+')

def test_check_encoding_errors(self):
# bpo-37388: open() and TextIOWrapper must check encoding and errors
# arguments in dev mode
mod = self.io.__name__
filename = __file__
invalid = 'Boom, Shaka Laka, Boom!'
code = textwrap.dedent(f'''
import sys
from {mod} import open, TextIOWrapper

try:
open({filename!r}, encoding={invalid!r})
except LookupError:
pass
else:
sys.exit(21)

try:
open({filename!r}, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(22)

fp = open({filename!r}, "rb")
with fp:
try:
TextIOWrapper(fp, encoding={invalid!r})
except LookupError:
pass
else:
sys.exit(23)

try:
TextIOWrapper(fp, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(24)

sys.exit(10)
''')
proc = assert_python_failure('-X', 'dev', '-c', code)
self.assertEqual(proc.rc, 10, proc)


class CMiscIOTest(MiscIOTest):
io = io
Expand Down
62 changes: 62 additions & 0 deletions Lib/test/test_unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@
import operator
import struct
import sys
import textwrap
import unittest
import warnings
from test import support, string_tests
from test.support.script_helper import assert_python_failure

# Error handling (bad decoder return)
def search_function(encoding):
Expand Down Expand Up @@ -2436,6 +2438,66 @@ def test_free_after_iterating(self):
support.check_free_after_iterating(self, iter, str)
support.check_free_after_iterating(self, reversed, str)

def test_check_encoding_errors(self):
# bpo-37388: str(bytes) and str.decode() must check encoding and errors
# arguments in dev mode
encodings = ('ascii', 'utf8', 'latin1')
invalid = 'Boom, Shaka Laka, Boom!'
code = textwrap.dedent(f'''
import sys
encodings = {encodings!r}

for data in (b'', b'short string'):
try:
str(data, encoding={invalid!r})
except LookupError:
pass
else:
sys.exit(21)

try:
str(data, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(22)

for encoding in encodings:
try:
str(data, encoding, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(22)

for data in ('', 'short string'):
try:
data.encode(encoding={invalid!r})
except LookupError:
pass
else:
sys.exit(23)

try:
data.encode(errors={invalid!r})
except LookupError:
pass
else:
sys.exit(24)

for encoding in encodings:
try:
data.encode(encoding, errors={invalid!r})
except LookupError:
pass
else:
sys.exit(24)

sys.exit(10)
''')
proc = assert_python_failure('-X', 'dev', '-c', code)
self.assertEqual(proc.rc, 10, proc)


class CAPITest(unittest.TestCase):

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
In development mode and in debug build, *encoding* and *errors* arguments are
now checked on string encoding and decoding operations. Examples: :func:`open`,
:meth:`str.encode` and :meth:`bytes.decode`.

By default, for best performances, the *errors* argument is only checked at the
first encoding/decoding error, and the *encoding* argument is sometimes ignored
for empty strings.
43 changes: 43 additions & 0 deletions Modules/_io/textio.c
Original file line number Diff line number Diff line change
Expand Up @@ -988,6 +988,46 @@ _textiowrapper_fix_encoder_state(textio *self)
return 0;
}

static int
io_check_errors(PyObject *errors)
{
assert(errors != NULL && errors != Py_None);

PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
#ifndef Py_DEBUG
/* In release mode, only check in development mode (-X dev) */
if (!interp->config.dev_mode) {
return 0;
}
#else
/* Always check in debug mode */
#endif

/* Avoid calling PyCodec_LookupError() before the codec registry is ready:
before_PyUnicode_InitEncodings() is called. */
if (!interp->fs_codec.encoding) {
return 0;
}

Py_ssize_t name_length;
const char *name = PyUnicode_AsUTF8AndSize(errors, &name_length);
if (name == NULL) {
return -1;
}
if (strlen(name) != (size_t)name_length) {
PyErr_SetString(PyExc_ValueError, "embedded null character in errors");
return -1;
}
PyObject *handler = PyCodec_LookupError(name);
if (handler != NULL) {
Py_DECREF(handler);
return 0;
}
return -1;
}



/*[clinic input]
_io.TextIOWrapper.__init__
buffer: object
Expand Down Expand Up @@ -1057,6 +1097,9 @@ _io_TextIOWrapper___init___impl(textio *self, PyObject *buffer,
errors->ob_type->tp_name);
return -1;
}
else if (io_check_errors(errors)) {
return -1;
}

if (validate_newline(newline) < 0) {
return -1;
Expand Down
Loading