[TSTIL] newnode.py
[ This is a part of "The Software That I Love", a series of posts about Software that I created or had a small part in ]
2011 - newnode.py
At Utrecht University I had done a dual BSc. degree in Computer Science and Cognitive Artificial Intelligence. Most if it was technical but I had done plenty of courses in the humanities as well. Some psychology, history of philosophy and Arabic. I was also lucky enough to have joined a book club where we read the classics. I made great friends there. One of them invited me to an honours program in Rhetoric in Amsterdam. That class had about 30 students, and Alexander Klöpping was one of them. I remember that he flew to NYC to get the first iPad and then talked about it on national television in "De Wereld Draait Door". He was in a very different world from all the other students and it was really cool to see.
When I started working life I knew I was going to miss the humanities. I registered for an evening program for a BA in English Language & Culture at Leiden University, My first job was with my dad, and it didn't really work out. It was just me and him and I struggled with motivation or even to get out of bed. People in my English classes saw I wasn't happy. Being slightly alcoholic, I joined a gang that went out to a bar every Thursday after class. Annemarie, Maryssa and prof. Liebregts were there most of the time and we had lots of fun.
One of the assignments for class was to write a job application in English. Annemarie wasn't happy in her job either and had met Derek, the CEO at this tech startup at a party of a friend. He was hiring sales people. She decided to write an actual application and got hired. Within a couple of weeks she told me that they were looking for developers too. It sounded cool and she introduced me to Johan, the CTO. After an email exchange I visited the old Mendix office on the Westzeedijk. I didn't realize it at the time, but actually Johan had only taken over the role from Roald a few months before. I think I was one of the first hires he did as CTO. Working at Mendix was an experience for which I will be forever grateful to everyone there, and a special mention to Annemarie for suggesting it to me. I ended up staying close to 7 years in various positions.
Because I was running Ubuntu on my computer, Johan said, "oh Linux, just like the cloud guys. Let's put you in that team." I got hired, but actually I didn't know anything about cloud, servers, or even Linux. I just really didn't like Microsoft and Apple.
We had two "cloud" teams. One was the SysOps team that was responsible for customer hosting & internal IT. The guys there were extremely good system engineers, and had built a rock-solid hosting operation in two colocation data centers in NL. The CEO didn't like it. By 2011 the cloud was the cool new thing and servers in a data center were obviously not the cloud. You know the old saying: If the CEO says Cloud, you just say "which one". They had looked at AWS and some others and had decided on Linode. It was more affordable and their culture fitted well with the hacker ethos of Hans, the senior systems engineer.
As the 3-headed SysOps team was already drowning in work, a new team was formed. This was Arjen and Achiel, and later myself. Once I joined, Arjen went back to Runtime, his job in the cloud team was done. Achiel and I had to build the Cloud, while the SysOps team would be maintaining the serious production systems.
I loved working with Achiel, he was so much fun. We had Mario stickers on the wall and were laughing all the time. We had a running joke that we could only use Dutch software to build the cloud: Python, Mendix and vim. He is a great engineer and now PM at CloudFlare, and he taught me so much. But if we're honest we both knew very little about systems and hosting. We were both just regular developers, he senior, I junior. In the cloud team we had to build productions systems and work with a lot of things out of our comfort zone:
- Hosting Reverse Proxies, JVM based apps, Databases
- Networking protocols: DNS, TCP/IP, HTTP, SSL/TLS, XMPP, Load Balancers, NTP
- Servers: RAM, CPU, hard drives
- Services: Backups, Monitoring, Alerting
- Distributed Systems and operating them 24/7
- Scripting in Bash, Python etc.
It was a lot. I would say that on a 0-10 scale, the job required a skill level of 6 on all these, and I was a 1, maybe 2. Achiel was a bit ahead but we really needed the SysOps guys for a lot of guidance. By now I think I'm at 8 for most of them. Over time everything went wrong so often that I simply had to learn how it worked.
Within a couple of weeks of me joining, Johan took a holiday and Roald started giving us feedback in a sprint review. Johan was always nice, but Roald gave us so much (justified) crap that I will remember that meeting for the rest of my life. It was a heated discussion and having recently joined I only observed. I still got a bit scared. It took me years to make a 180 and now I sometimes wish I was as clear and as passionate about user needs as Roald.
My first task was to automate the creation of new cloud VMs. We had inherited a wiki page with instructions on how to install a virtual machine according to Hans' specifications. We would need at least 100s of VMs, and indeed, a couple of years later we had 6000+. Being a good lazy developer, I wanted to automate this. I created `newnode.py` which would spin up two new VMs on Linode (1 for app, 1 for database) and set up the infrastructure to run one Mendix app. We had a naming scheme for different data centers. Linode had Japanese dishes (appnode-unadon), Rackspace US: Star Wars (appnode-aldebaraan), Rackspace UK: Star Trek (appnode-stargazer).
The script touched all the services & protocols above. The things that could not be properly implemented were using `expect`. I learned about "when using 'expect', expect trouble" the hard way. Despite my lack of knowledge with any of the Ops stuff, and being very new to Python, it was finished within a couple of weeks (or months?). I had this big smirk on my face, because I could do in a couple of minutes what the Ops guys were doing manually in an hour. Now to be fair, the script outputted a lot of information that we had to manually put in our DNS servers and other systems which only the SysOps guys had access to. It still took some years to automate everything end-to-end. Also, setting up a server is easy, maintaining it over time is hard. Nevertheless, it was a great step.
Hans was very much against going to the Cloud in the way that we were doing it. A bit of Not-Invented-Here syndrome maybe, but also because of costs. He could run a more stable operation for about 5x less money because the hardware he set up was tailored to exactly our workloads. It helped that he probably forgot to include his own salary in the calculation, but it was still a lot cheaper ;) After a couple of years we ended up running the same kind of VMs on our own infrastructure in three different Data Centers in NL. It was the financially sensible thing to do. In the US we kept running on cloud VMs because we had no boots on the ground. In both setups we kept using newnode.py or an evolution of it. Hans' vision was always that if you go cloud, you should really go cloud-native instead of running the same old "One VM per app" approach. And until we would have that, we should stick with hardware. In that he was completely right. It would take years before we could get to that next generation infrastructure.
The SysOps and Dev spirits kept clashing a bit over the years, until we decided to join forces and become the DevOps team together. Achiel and Johan had read "The Phoenix project" and were sold. We took the best of both worlds and it was great, and I think everyone on the team is still nostalgic for that special time in our lives. We won the Team Excellence award for it a couple years later.
next: 2011 - Social Gravity
previous: 2010 - Snapp
Comments
Post a Comment