Parsing 10TB of Metadata, 26M Domain Names and 1.4M SSL Certs for $10 on AWS
Last May I was working on hobby project similar to this:  https://github.com/zakjan/cert-chain-resolver/  . As I found the cert-chain-resolver project a couple of days later I did nothing with the results, but I got some nice comments on how I used 1 VM to download & process 10TB in a couple of hours on this HN thread  recently so I decided to do a write up on the process and publish the data.   See the parts below:    Part 1: downloading 10TB of metadata in 4 hours  Part 2: fetching a ****load of certificates  Part 3: playing with the data  Total costs     My approach was somewhat different from the github project above, instead of using the AIA extension I wanted to brute-force the solution by finding all known intermediate and root certificates in advance. Based on the checksum of the issuer/subject fields I could look up which certificates "claimed" to be the signer of the certificate and then using the signature I could filter out which ones actually were. You can us...