Sunday, 14 January 2018

S3 boto3 'StreamingBody' object has no attribute 'tell'

I was recently trying to work with the python package warcio and feeding an s3 object from the common crawl bucket directly into it.

r = s3.get_object(Key='crawl-data/file....', Bucket='commoncrawl')
for record in ArchiveIterator(r['Body']):
    pass

However, this fails with the error:
self.offset = self.fh.tell()
AttributeError: 'StreamingBody' object has no attribute 'tell'

The reason is that boto3 s3 objects don't support tell. It's easily fixable by creating a tiny class:

class S3ObjectWithTell:
    def __init__(self, s3object):
        self.s3object = s3object
        self.offset = 0

    def read(self, amount=None):
        result = self.s3object.read(amount)
        self.offset += len(result)
        return result

    def close(self):
        self.s3object.close()

    def tell(self):
        return self.offset

You can now use this class and change
for record in ArchiveIterator(r['Body']):
into

for record in ArchiveIterator(S3ObjectWithTell(r['Body'])):