Issue Details (XML | Word | Printable)

Key: PYFB-76
Type: Bug Bug
Status: Open Open
Priority: Minor Minor
Assignee: Pavel Cisar
Reporter: Joern Ungermann
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Firebird driver for Python

Firebird database with malformed strings cannot be accessed

Created: 22/Nov/18 01:22 PM   Updated: 23/Nov/18 02:15 PM
Component/s: None
Affects Version/s: 2.0
Fix Version/s: None

Environment: anaconda3 with fdb version 2.0.0 py0


 Description  « Hide
We have a firebird database containing a few malformed ASCII strings. These are byte streams, presumably ASCII coded, stored as is from a rather unreliable wireless transmission. Thus, there are bit flips, which causes the strings to contain some strange characters.
Using python 2, we had no problem accessing such databases. Using python 3, everything needs to be decoded/encoded.
Using the default access to the database using sqlalchemy and fdb causes a UnicodeDecodingError upon accessing an entry with a malformed string:

[...]
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 2659, in __xsqlda2tuple
    value = b2u(value, self.__python_charset)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 480, in b2u
    return st.decode(charset)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte

Playing around with the "charset" option of sqlalchemy could open some files, where the malformed strings contained only values available in the given charset, but introduced other problems. Looking into the code, there seems to be an "OCTETS" option, which supposedly parse the bytestream through, but fails to work (the python-translated-charset is None, which is not a valid option for encode/decode)

I'd prefer either of two options:
a) implement (optionally?) an option to influence the error behavious of decoding (either replace or ignore would suit our usecase)
b) implement a pass-through option for the byte-stream

If I oversaw some option to solve my use-case wth the given code-base, I'd be very happy to hear about it.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pavel Cisar added a comment - 22/Nov/18 03:24 PM
It's strange, because client charset OCTETS should work as pass-through for all strings as b2u() returns the string (bytes in this case) unchanged when passed charset is None (which should be according to ibase.charset_map). But P3 application or upper layer has to be prepared to handle bytes instead unicode string on output.

Anyway, this is tricky problem. I don't like the idea to hide decode errors (especially if it's easy to use this option on global scale) because then such errors can slip unnoticed to cause problems elsewhere, that are then much harder to detect. A pass-through solution that would return byte-stream instead unicode when error is encountered is the same problem, as it means that users must expect various result types.

But optional connection option (disabled by default) for decoding error resolution appears to be good enough solution, and I'll implement it for next release. However, usability depends on how this would be exposed by other layers like sqlalchemy.

As a hotfix, you can change the b2u() function in fbcore.py to replace or ignore bad characters.

Joern Ungermann added a comment - 22/Nov/18 06:28 PM - edited
Thanks, solving this would be marvelous, but I do not see an easy solution for this including all the stacks, either. The quick fix works as intended, but doesn't scale very well, as it breaks our automatic deployment. We really should switch to docker or somesuch.

If I try to use the charset=OCTETS option in sqlalchemy to bypass the coding, I get the following error:

  File "/home/icg173/bin/gloripy_V01781_R7633e89/gloripy/data/database.py", line 792, in __init__
    self.metadata.reflect(self.engine)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 3908, in reflect
    with bind.connect() as conn:
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2102, in connect
    return self._connection_cls(self, **kwargs)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 90, in __init__
    if connection is not None else engine.raw_connection()
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2188, in raw_connection
    self.pool.unique_connection, _connection)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
    return fn()
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 342, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 529, in checkout
    rec = pool._do_get()
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get
    self._dec_overflow()
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get
    return self._create_connection()
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 347, in _create_connection
    return _ConnectionRecord(self)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 474, in __init__
    self.__connect(first_connect_check=True)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect
    connection = pool._invoke_creator(self)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 412, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 827, in connect
    no_reserve, db_key_scope, no_gc, no_db_triggers, no_linger)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 760, in build_dpb
    dpb.add_string_parameter(isc_dpb_user_name, user)
  File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 625, in add_string_parameter
    value = value.encode(charset_map.get(self.charset, self.charset))
TypeError: encode() argument 1 must be str, not None

The "OCTETS" charset from "firebird+fdb://{}:{}@{}/{}?charset=OCTETS" is translated to "None" in fdb.ibase.charset_map.
The None is then passed as argument to the encode method, which doesn't like it. In this case, the value should be directly returned as bytes object?
I am not sure what fdbcore is doing there, but it would certainly break later as well, when None is passed to the decode function, which, probably, should be bypassed as well in the case of OCTETS?

Pavel Cisar added a comment - 23/Nov/18 02:15 PM
I see, FDB clearly does not handle OCTETS properly. Using it as client charset is very wild idea that should nobody even try, but anyway. The problem with OCTESTS handling is that it adds another level of complexity to the code that is already trying to handle P2/P3 string difference in split-brain fashion (i,.e. P3 support in FDB is somewhat crude and unclean thanks to its history coming back to early P3 era). I worry that it would not get better anytime soon (it's planned as part of transition from P2 centered to P3 centered development of FDB over 2.x series).

Ok, I'll try to add optional support for replace/ignore option into bytes2unicode conversions in FDB as soon as possible. However, it happens at several places in various contexts, so it may take some time as I'd like do it properly.