Posts

Showing posts from January, 2016

Parsing 10TB of Metadata, 26M Domain Names and 1.4M SSL Certs for $10 on AWS

Last May I was working on hobby project similar to this:  https://github.com/zakjan/cert-chain-resolver/  . As I found the cert-chain-resolver project a couple of days later I did nothing with the results, but I got some nice comments on how I used 1 VM to download & process 10TB in a couple of hours on this HN thread recently so I decided to do a write up on the process and publish the data. See the parts below: Part 1: downloading 10TB of metadata in 4 hours Part 2: fetching a ****load of certificates Part 3: playing with the data Total costs My approach was somewhat different from the github project above, instead of using the AIA extension I wanted to brute-force the solution by finding all known intermediate and root certificates in advance. Based on the checksum of the issuer/subject fields I could look up which certificates "claimed" to be the signer of the certificate and then using the signature I could filter out which ones actually were. You can us