什么是最好的方法来读取tar归档文件中的文本文件作为字符串（而不是字节字符串）？

0 人关注

我有一些文本文件需要处理（但不是提取），从一个tar档案中。我有工作的Python 2代码，我正试图将其提升到Python 3。不幸的是，Python 3返回的是字节字符串，其他代码无法正确处理。我需要将字节字符串转换为字符串。一个简单的例子是这样的。

import tarfile
with tarfile.open("file.tar") as tar:
    with tar.extractfile("test.txt") as extracted:
        lines = extracted.readlines()
        print(lines)
The result is:
['a\n', 'test\n', 'file\n']    # python 2
[b'a\n', b'test\n', b'file\n'] # python 3
下面是目前的一些修复尝试，这些尝试是有效的，然而，我需要使用带有语句、列表理解或地图的三合一来阅读一些文字，感觉很别扭。
with io.TextIOWrapper(extracted) as txtextracted:
    lines = txtextracted.readlines()
lines = [i.decode("utf-8") for i in lines]
lines = list(map(lambda x: x.decode("utf-8"),lines))
我在io.BufferedReader的文档中找不到一个更整齐的解决方案（这是TarFile.extractfile返回的对象）。我试着想出了一些解决方案，但没有一个能像python 2的解决方案那样整洁。有没有一个整洁的pythonic方法来解析tar文件的io.BufferedReader对象为字符串？


           
            
             
              标题应该修改一下，乍一看是在说最好的方法，但实际上你是在问如何把Python 2的代码移植到Python 3。


           
            
             
              Anon Coward
             
             ：


           
            
             
              将
              
               txtextracted
              
              对象包裹在一个
              
               io.TextIOWrapper
              
              实例中是否足够好？


           
            
             
              Karl Knechtel
             
             ：


           
            
             
              "有没有一个整洁的pythonic方法来解析tar文件的io.BufferedReader对象为字符串？"其实没有，至少就我所知。从根本上说，
              
               tarfile
              
              库没有理由期望存档中的文件代表文本数据；而
              
               explicit is better than implicit
              
              。我想你可以尝试
              
               .extract
              
              直接到一个临时文件，然后再解析它。这只是两个上下文管理器，不一定要嵌套。不过，这确实意味着一个临时文件。


           
            
             
              Karl Knechtel
             
             ：


           
            
             
              如果你不喜欢标准库的设计，有一个邮件列表可以讨论这个问题。但 "最佳方法 "的问题通常不适合在Stack Overflow上提出。


           
            
             
              Marco Bonelli
             
             ：


           
            
             
              那么，你可以把这两个
              
               with
              
              简化为一个。【替换代码1


         
          
           python


         
          
           python-3.x


         
          
           tarfile


          
           
            
             
             
              Karl Knechtel
             
            
            
             发布于
             
             2022-01-19


          
           
            已采纳


          
           
            
             The
             
              with
             
             statement
             
              允许有多个情境管理器
             
             而事实证明，它们的构造可能取决于链中的前几项--例如。
            
            class manager:
    def __init__(self, name, child=None):
        self.name, self.child = name, child
    def __exit__(self, t, value, traceback):
        print('exiting', self)
    def __enter__(self):
        print('entering', self)
        return self
    def __str__(self):
        childname = None if self.child is None else f"'{self.child.name}'"
        return f"manager '{self.name}' with child {childname}"
Testing it:
>>> with manager('x') as x, manager('y', x) as y, manager('z', y) as z: pass
entering manager 'x' with child None
entering manager 'y' with child 'x'
entering manager 'z' with child 'y'
exiting manager 'z' with child 'y'
exiting manager 'y' with child 'x'
exiting manager 'x' with child None