Thoughts and experiments in programming by a startup CTO in Amsterdam.
Why this blog?
Whenever I come across a problem I look on google for answers. Every now and then the answer is not straightforward and I have to do some ingenious thinking. This blog allows me to share my finding with other developers.
While playing around with the Web Speech API I discovered something interesting: if you let the Dutch voice speak English text, it sounds remarkably like the typical "Dungrish" accent (Dutch person speaking English). I didn't expect this. Obviously the AI was not trained on reproducing foreign accents, but it still does! Very interesting to see what happens if you let the voice speak a language it was not trained on You can play around with it here if your browser supports it, or try the examples below: Select the voice, this might be empty if your browser doesn't support text to speech Enter the text you want spoken Hello good sir, can you please tell me what twenty times ten is? Or try these ready made examples: English text spoken by Dutch voice English text spoken by Hindi voice English text spoken by Italian voice English text spoken by French voice English text spoken by German voice English text
Last May I was working on hobby project similar to this: https://github.com/zakjan/cert-chain-resolver/ . As I found the cert-chain-resolver project a couple of days later I did nothing with the results, but I got some nice comments on how I used 1 VM to download & process 10TB in a couple of hours on this HN thread recently so I decided to do a write up on the process and publish the data. See the parts below: Part 1: downloading 10TB of metadata in 4 hours Part 2: fetching a ****load of certificates Part 3: playing with the data Total costs My approach was somewhat different from the github project above, instead of using the AIA extension I wanted to brute-force the solution by finding all known intermediate and root certificates in advance. Based on the checksum of the issuer/subject fields I could look up which certificates "claimed" to be the signer of the certificate and then using the signature I could filter out which ones actually were. You can us
I was recently trying to work with the python package warcio and feeding an s3 object from the common crawl bucket directly into it. r = s3.get_object(Key='crawl-data/file....', Bucket='commoncrawl') for record in ArchiveIterator(r['Body']): pass However, this fails with the error: self.offset = self.fh.tell() AttributeError: 'StreamingBody' object has no attribute 'tell' The reason is that boto3 s3 objects don't support tell . It's easily fixable by creating a tiny class: class S3ObjectWithTell: def __init__(self, s3object): self.s3object = s3object self.offset = 0 def read(self, amount=None): result = self.s3object.read(amount) self.offset += len(result) return result def close(self): self.s3object.close() def tell(self): return self.offset You can now use this class and change for record in ArchiveItera