DevEx vs. Platform Engineering

# Platform Engineering Vs. DevEx A trend that the even main stream companies are getting their eyes open to is *Platform Engineering* and *DeveloperExperience* (DevEx). At [DevOpsDays Aarhus](https://devopsdays.org/events/2022-aarhus/welcome/), [Justin Reock](https://twitter.com/jreock) introduced us to the concept of Developer Productivity Engineering. The base premise is, of course, what if we apply engineering principles to developer productivity. As Justin Reock represents Gradle a lot of his talk were on lowering build times, but I still appreciate the over arching theme. This resonated well with the crowd, and tons of interesting discussions were had. I surmise that, many orgs have platform teams or are establishing them. However, as we have seen done with agile, DevOps and SRE, I fear that the success stories, and excellent practices and principles will be perversely transformed by well meaning enterprises that fail to apply radical change to their systems. This will of course be fuelled by not quite so well meaning snake oil peddlers. This post does not try to define anything, or be the definitive truth about anything, but hopes to bring another lens through which to see this. The premise is fundamentally, that a primary constraint is *tech talent*, or software engineers. This means that we want to create a platform for our teams to become more efficient. ![[Pasted image 20220528223150.png]] Of course, at scale there is also the goal that we want to scale superlinearly with the number of engineers added, such that each engineer becomes more productive per additional engineer rather than less. The below figure 5.1 from Accelerate shows this dream. ![[Pasted image 20220528223436.png]] ## You Build It, You Run It *You build it, you run it* comes from Amazon and to me is the principle of having teams owning their code in production. Meaning you own how the software you build performs in production, both in technical and in business terms. Unfortunately, many orgs takes this as an opportunity to push more responsibilty to software teams, without bundling ownership, autonomy and mandate with it. But I digress. The point is that we have to create some software, and that software needs to run somewhere. If we take our inspiration from Docker, of container fame, we need to Build, Ship and Run our software. This means that we can consider the platform, we want to support our engineering teams could be decomposed as the following Figure: ![[Pasted image 20220528223944.png]] As code moves from the developer towards the users, it moves from left to right. There are many possible ways this pie can cut between different teams and themes, so first off I will suggest some activities that could happen in each area. ### Build Here is where code and tests are written. We add and remove dependencies. We could run a linter. We interact with our IDE. We interact with version control. We explore and learn. We design APIs, we implement. This is the primary place of creation. This also means that providing tooling and processes for the above is the primary goal for reducing friction during the *Build.* ### Ship This is where code is verified to not just *work on my machine*. Reviews might occur. It could also be that larger test suites are run. The unit tests are run on central machines. Dependencies could be scanneb for vulnerabilities or license violations. We may deploy to an environment and run automated testing on it. Often, this is referred to as Continuous Integration or Continuous Delivery. Whether these namings are false will be left as an exercise to the reader. I think, at least, it is a useful intution to have. This means the primary goal here, could be working on software delivery pipelines. Managing version control systems and processes, to enable developer velocity is also a fun topic here. Just think about the many discussions for and against controversial topics such as pull requests or whether monorepos is just a fad again. ### Run Running the code in production should be the end goal, because how else is it supposed to be providing value to end users. Activities here, may be the final delivery to production environments, though one could also argue that is part of shipping. I am not zealous in any distinction here. Making sure the software is behaving as it should is the primary concern here. Often teams forget there is more to this question that the binary *Running* or *Stopped*. We must also ask questions such as what progress are the system making, is it in the right direction? Do we have any current incidents or degradations. Here is often where we see people introduce Kubernetes and claim they have built a platform, while in reality they have created a cluster and put a bunch of cognitive load on the software teams trying to build-ship-run their way to success. Sometimes, orgs may decide that because they are in the cloud they already have all the platform they need for running their software. Clever organizations will built domain specific abstractions on top of public cloud providers to accelerate software teams. It is of course also possible to provide a platform experience on-prem, as some large enterprises are doing successfully. However, claiming "we're on-prem so we can't..." should always be a red flag. In summary, there are different activities in each of the themes, and the end goal is to reduce friction for software teams to provide value in the hands of their users. So let's look at a few ways to cut the Platform cake. Please note, that this post is a thinking tool for me, and there are many nuances lost. ## Model 1 - The DevOps Team I call this model, "The DevOps team" in part jest. ![[Pasted image 20220528230331.png]] In my experience a very common scenario is that someone ends needing up maintaining the Jenkins servers, perhaps even a Jira instance should the be so unlucky. As these teams become more and more critical infrastructure and bored, they want to move their stack to Kubernetes, because that is interesting tech. They do so, and such move from owning just Ship, to also own Run. This may be a transitional period where there is no platform where software is running, but then some clever engineer figures out that it is much easier to run production code on the existing Kubernetes cluster from the DevOps team, that what ever the official alternative is. And so suddenly, accidentially even, we have something that have grown into critical production infrastructure without noticing. Common traps in this setup is not having designated responsibilities, a heavy dose of NIH syndrome. Often this will not be a metric driven approach, but rather gut feeling driven approach. This will lead to an under investment in developer experience all around, frustrated developers. Likely, we end up building our stuff at the wrong abstraction level. On the other hand, we may be moving towards a more modern setup. ## Model 2 - Tale of Two Silos For some larger endeavours, organizations end up having a DevEx team and a Platform team. One thing that is certain about this model is that there will be a very distinct line drawn between the teams seperating the specific responsibilities of each. This means that development teams will find them frustrated as they bounce Jira tickets between teams, trying to figure out how their software fails in new and spectacular ways. ![[Pasted image 20220528230938.png]] We have a clear distinction that DevEx improve the experience developing the services and the platform team improve the experience for operating the service. Somewhere during, CI/CD there is a handoff and the ownership switches from the DevEx to the Platform team. This line could be that the Platform team owns the Jenkins servers, but DevEx owns the pipelines and libraries. This model trivially expands to a *A tale of three siloes*. ![[Pasted image 20220528231338.png]] Problems here stem from each of the individual teams caring most about solving the interesting problems in their domain, leading to suboptimizations rather than an overall improvement in developer productivity. ### Model 3 - Naive Idealism Ideally every team contribute their specializations to all the different themes. ![[Pasted image 20220528231652.png]] A platform team may provide a library, such that it is trivially to hook into the observability stack, and the tools to explore any data provided. The "DevOps" team create plugins for the IDE, such that common operations are trivial and feedback can be shown quickly and inline in the editor. DevEx team measures build times locally on developer machines, and make sure any wins propagate through the stack. Everyone is cooperating towards high developer productivity. ## Model 4 - Naive Realism I do believe there is some thing about focusing on a platform, and focusing on the developer experience as two different put synergistic themes. But how do we do it? I think it is a mess. Hopefully a beautiful mess. ![[Pasted image 20220528232201.png]] Probably, we have at least two teams, one for what goes on on developer laptops, and one for what goes on servers. As an organization scales, each will be decomposed into subthemes. I do not know what the right distinction is, but it is naive to believe that you can isolate your self in one end of the spectrum. If you are delivering a platform, make sure you are also delivering the handles to allow engineers to work effectively with your platform. In summary my key points are: - It is worth investing in developer productivity - There are so many things that can go wrong - Make sure you build at the right level of abstraction - Do not work from a premises that what you are building is valuable disregarding if developers experience friction - If something is annoying for your users to do, do it for them until you have changed it to be not annoying.