We have a firebird database containing a few malformed ASCII strings. These are byte streams, presumably ASCII coded, stored as is from a rather unreliable wireless transmission. Thus, there are bit flips, which causes the strings to contain some strange characters.
Using python 2, we had no problem accessing such databases. Using python 3, everything needs to be decoded/encoded.
Using the default access to the database using sqlalchemy and fdb causes a UnicodeDecodingError upon accessing an entry with a malformed string:
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 2659, in __xsqlda2tuple
value = b2u(value, self.__python_charset)
File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 480, in b2u
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte
Playing around with the "charset" option of sqlalchemy could open some files, where the malformed strings contained only values available in the given charset, but introduced other problems. Looking into the code, there seems to be an "OCTETS" option, which supposedly parse the bytestream through, but fails to work (the python-translated-charset is None, which is not a valid option for encode/decode)
I'd prefer either of two options:
a) implement (optionally?) an option to influence the error behavious of decoding (either replace or ignore would suit our usecase)
b) implement a pass-through option for the byte-stream
If I oversaw some option to solve my use-case wth the given code-base, I'd be very happy to hear about it.