[TSTIL] Mendix Cloud v4 - Part 1 - How it got started

[ This is a part of "The Software That I Love", a series of posts about Software that I created or had a small part in ]

2016 - Mendix Cloud v4 - Part 1 - How it got started

We had been building a big new cloud for a year or two, but we did not have a name. I was the Product Manager for "it". Should we call it the Real Mendix Cloud? The Scalable Mendix Cloud? The Next-Gen Mendix Cloud? Or a brand name like e.g. Heroku, HP Helion, IBM BlueMix were doing. After brainstorming with Roald it dawned on me that this was actually our 4th generation cloud offering. Why not simply call it v4? Customers had always had lots of confusion on where their apps were hosted. They called it "the Mendix Cloud", the "Achiel Cloud" or the "Hans Cloud". If we now said "you are on v2" or v3, and you need to go to v4 because of X, that'd be a pretty simple explanation.

I really liked the simplicity of the v4 name and pushed for it. No one had a better alternative so it was done. In the end the name worked well and I was very proud of my find. So how did we get here? A quick recap.

It had been a long-time ambition of Derek and Johan to become a cloud company. We told customers that they could create their software faster, and maintain it better, but deployment and hosting was still as slow as always. So a couple years before I joined, something happened that started our hosting adventure. We were already running some standard company things like an exchange server, a wordpress website and some other standard IT things when a customer was going live with a big Mendix project. They had a hard deadline because of compliance with a new Dutch insurance law. The only problem was that their IT team could only deliver a new hosting server in about a year. The deadline was end of the month. Not good.

I was not there for any of this, but the story I heard is as follows;

The Mendix guys, creative sales people as always, said "we can do this for you!". They went to the Mendix sys admin (Hans) and asked if he could also host a Mendix app. "Yes we can!". It was a cool engineering challenge, and so customer hosting started. Within a year or two there were all kinds of apps running on virtual machines on colocated hardware that Hans installed himself in a data center. In the beginning there was no budget for anything so the servers were really cheap supermicro machines and the switches were cheap as hell. Disks crashed and at some point the entire SAN was running without any redundancy for a couple of days. Exciting times. We now call this "Mendix Cloud v1".

After some more years there were 3 SysOps engineers with about 100 customer applications and they had been more or less "standardized" to a blueprint. The hardware had been expanded, the team was bigger (Frank and Mark joined). The old hardware was phased out and VMs moved to new infrastructure. Things were looking good. But then C-level started talking about Cloud, and that this "hardware thing" was not cloud. Something was brewing.

The requirements for the applications were increasingly "enterprisey". Uptime requirements were very high, VPN access was sometimes mandatory, real-time failovers had to be built. We were building mission-critical stuff. This was "Mendix Cloud v2". Mostly standardized hosting but also some mission critical systems with customizations. There were also still some non-standardized "v1" services running, that slowly had to be migrated to v2.

At this time I joined, as described above in "newnode.py", and Achiel and I started building the "Mendix Cloud". This was when we officially started using that name. Now the main thing about "Mendix Cloud" was to take the hardware blueprint, glue it together in an admin portal (built in Mendix) and let customers manage apps instead of servers. Ironically the Mendix Cloud Portal ran on Cloud v2, which made sense, because we had to be able to start it over SSH if it was down. Of course with Achiel and my limited skillset, we should not host any mission critical stuff yet. So we took on new simple apps, and also started migrating some apps from "v2" to the brand new "Mendix Cloud", retrospectively called "v3". Sometimes it turned out that apps had mission-critical requirements and we had to move them back to v2. Or the other way round. Fun.

We ran the "Mendix Cloud" not on our own hardware, but on cloud providers that were less reliable and much more expensive than the "v2" setup. It was officially "cloud" though because we did not run the servers ourselves, and the costs were OpEx instead of CapEx. We were at a crossroads. Cloud v3 was growing big and expensive, and not capable of serving mission critical stuff. Hans said if we go cloud, we should go really cloud. That basically meant a 12-factor platform and architecture. Now we did not have a 12-factor platform, or even a runtime that could run on it, so we would have to build a lot. That was for now a no-go. Instead we decided to move v3 to our own infrastructure which was more stable, cheap and could support mission-critical apps. We went all in on more expensive hardware (HP ProLiant DL380, NetApp SANs and Cisco switches) and built about 5 full racks worth of equipment in 3 data centers in NL. One by one we moved apps off of Rackspace and Linode and onto our own server environments.

As a medior engineer I had not so much to say about these strategic choices, but by this time we started getting the first Product Managers in the organization. Andrej was the first Product Manager, and after that I immediately applied. Everyone was very much like "wtf I thought you were a nerd that only likes programming". I was told to reconsider and come back in a week. A week later I came back without a changed mind. For some stupid reason I was really determined to do something else than programming. I wanted to have a say in the strategic direction of the product. Johan said he would have to think about it, as Product Managers are typically kind of senior and I was 26 or so. He was 3 years older and CTO since 25 so he always made a point that age did not matter. He managed to convince some people and it worked out, I switched to PM without any additional pay, training or clear responsibilities. Fun times!

I knew Johan and Derek wanted to go to the "real cloud", go global and support massive scale. We had sort of bet on Cloud Foundry already, so we started experimenting with it and see if we could run a Cloud Foundry cluster ourselves. I don't recall that that choice was ever expliticly made, so it probably was made higher up. As PM I was not sure what my responsibilities were, but I assumed my "mission from God" was to get us onto a real cloud using Cloud Foundry. Daniel, Xiwen, Frank and Riccardo started building and playing around with Cloud Foundry and AWS a lot. Once we had some apps we figured out how to build service brokers. Now we knew what we had to build, but it would be a lot.

As PM I was wearing an insane amount of hats, and I loved it. I was doing quite some cloud v4 technical architecture with the team, customer service escalations, joining CSMs on business review calls, managing v1 to v2 to v3 migrations (badly), convincing customers to not build new VPNs, thinking of a pricing strategy for cloud v4, figuring out a staged launch plan and finally creating a migration plan from v3 to v4. All this while v3 was still growing about 50% to 100% year over year and infrastructure was running into new limits all the time. We also needed marketing materials, analyst demos. I had to manage a lot of customer escalations and I did a lot of on-call 24/7 support. I was not supposed to be doing on-call duty, but I had the skills, could use the extra cash and didn't want to burden the team which was already stretched very thin. My to do list was overflowing, I was in over my head, it was pretty stressful and rough, but I'm happy to say that we managed in the end.

Jump to part 2 where we look at how we launched Mendix Cloud v4, or continue reading "The Software that I love" chronologically using the next link below.


Popular posts from this blog

AI programming tools should be added to the Joel Test

The unreasonable effectiveness of i3, or: ten years of a boring desktop environment

The long long tail of AI applications