Showing posts from January, 2018

S3 boto3 'StreamingBody' object has no attribute 'tell'

I was recently trying to work with the python package warcio and feeding an s3 object from the common crawl bucket directly into it.

r = s3.get_object(Key='crawl-data/file....', Bucket='commoncrawl') for record in ArchiveIterator(r['Body']):     pass
However, this fails with the error:
self.offset = self.fh.tell() AttributeError: 'StreamingBody' object has no attribute 'tell'
The reason is that boto3 s3 objects don't support tell. It's easily fixable by creating a tiny class:

class S3ObjectWithTell:     def __init__(self, s3object):         self.s3object = s3object         self.offset = 0
    def read(self, amount=None):         result =         self.offset += len(result)         return result
    def close(self):         self.s3object.close()
    def tell(self):         return self.offset
You can now use this class and change
for record in ArchiveIterator(r['Body']): into

for record in ArchiveIterator(…