Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I want to read a binary
PNM
image file from stdin. The file contains a header which is encoded as ASCII text, and a payload which is binary. As a simplified example of reading the header, I have created the following snippet:
#! /usr/bin/env python3
import sys
header = sys.stdin.readline()
print("header=["+header.strip()+"]")
I run it as "test.py" (from a Bash shell), and it works fine in this case:
$ printf "P5 1 1 255\n\x41" |./test.py
header=[P5 1 1 255]
However, a small change in the binary payload breaks it:
$ printf "P5 1 1 255\n\x81" |./test.py
Traceback (most recent call last):
File "./test.py", line 3, in <module>
header = sys.stdin.readline()
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 11: invalid start byte
Is there an easy way to make this work in Python 3?
–
To read binary data, you should use a binary stream e.g., using TextIOBase.detach()
method:
#!/usr/bin/env python3
import sys
sys.stdin = sys.stdin.detach() # convert to binary stream
header = sys.stdin.readline().decode('ascii') # b'\n'-terminated
print(header, end='')
print(repr(sys.stdin.read()))
From the docs, it is possible to read binary data (as type bytes
) from stdin with sys.stdin.buffer.read()
:
To write or read binary data from/to the standard streams, use the
underlying binary buffer object. For example, to write bytes to
stdout, use sys.stdout.buffer.write(b'abc').
So this is one direction that you can take -- read the data in binary mode. readline()
and various other functions still work. Once you have captured the ASCII string, it can be converted to text, using decode('ASCII')
, for additional text-specific processing.
Alternatively, you can use io.TextIOWrapper()
to indicate the use of the latin-1
character set on the input stream. With this, the implicit decode operation will essentially be a pass-through operation -- so the data will be of type str
(which represent text), but the data is represented with a 1-to-1 mapping from the binary (although it could be using more than one storage byte per input byte).
Here's code that works in either mode:
#! /usr/bin/python3
import sys, io
BINARY=True ## either way works
if BINARY: istream = sys.stdin.buffer
else: istream = io.TextIOWrapper(sys.stdin.buffer,encoding='latin-1')
header = istream.readline()
if BINARY: header = header.decode('ASCII')
print("header=["+header.strip()+"]")
payload = istream.read()
print("len="+str(len(payload)))
for i in payload: print( i if BINARY else ord(i) )
Test every possible 1-pixel payload with the following Bash command:
for i in $(seq 0 255) ; do printf "P5 1 1 255\n\x$(printf %02x $i)" |./test.py ; done
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.