Cloud lock-in feels like something the market does to you. Mostly, it’s a series of small architecture defaults you accepted without weighing the trade-offs.
Two of them do most of the damage: too much cloud-specific infrastructure as code (IaC), and too many cloud-specific managed services. The providers steer you there by default, the tutorials assume it, and the path of least resistance quietly becomes the path of permanent residence.
Both are choices. Both are fixable, provided you decide they matter early, while fixing them is still cheap.
Problem 1: Cloud-specific IaC
Every cloud ships its own native IaC and its own native CI/CD, and each one is genuinely pleasant to use right up until the day you want to leave: CloudFormation, ARM/Bicep, Deployment Manager, plus the pipeline layer of GitHub Actions wired to one cloud’s identity model, Azure Pipelines tasks, and CodePipeline/CodeBuild.
The pipeline layer is where this gets worse, because pipeline code is sneaky. People think of it as glue rather than infrastructure, so it doesn’t get the same scrutiny, and it metastasizes. My last client had approximately 80% of their automation living in Azure Pipelines: not just deployment steps, but environment logic, secret handling, approval gates, the works. None of that is portable. The day they want a second cloud, or a credible negotiating position with their first one, that 80% is a rewrite, not a migration. That’s nuts, and it’s also completely normal, which is the problem.
The fix is boring and it works: write as much of your IaC as possible in Terraform (or OpenTofu) and Ansible, and treat anything cloud-specific as a liability you have to justify rather than a default you reach for.
- Terraform owns the infrastructure: networks, compute, IAM, storage, the lot. One language, one state model, one mental model across every provider.
- Ansible owns configuration and the imperative bits Terraform is bad at.
- CI/CD becomes a thin orchestration layer whose only job is to call Terraform and Ansible. The actual logic lives in version-controlled scripts and modules, not in proprietary pipeline YAML. If your pipeline definition is more than a few steps long, you’re putting business logic in the one place you can’t take with you.
Now the honest caveat, because anyone who’s actually done this will call me on it otherwise: Terraform is not a magic portability button. The HCL, the workflow, the state management, the module patterns, your team’s muscle memory all transfer. But a Terraform aws_* resource is still an AWS resource. The provider blocks are cloud-specific by definition, and you don’t get to write resource "generic_database" and have it land on three clouds. What you get is a single tool and a single workflow wrapping every provider, so a migration becomes “rewrite the resource definitions” instead of “rewrite the resource definitions and relearn the entire tooling stack and port all the pipeline logic.” That’s a dramatically smaller blast radius, and it’s the difference between a switch you can credibly threaten and one you can’t.
Problem 2: Cloud-specific managed services
This is the deeper hook, and it’s deeper precisely because the services are good. Aurora, DynamoDB, Cosmos DB, BigQuery, SQS, the whole managed-everything catalog: these are genuinely excellent, and they remove real operational pain. That’s exactly what makes them the stickiest form of lock-in. You don’t notice the dependency forming, because every individual decision to use one was the sensible one.
The portable alternative is to run open-source services yourself and let your IaC manage them: PostgreSQL instead of a proprietary managed database, an open message broker instead of the cloud-native queue. Package it as Helm charts, manage those charts with Terraform, and the entire definition of your stateful services becomes provider-agnostic. Kubernetes ends up as the portability substrate: the cluster looks roughly the same whether the nodes underneath it are EC2, Azure VMs, or GCE, so the workload definitions ride along to wherever you point them.
And here’s where I have to argue against my own enthusiasm, because “just self-host Postgres on Kubernetes” is the kind of advice that sounds clean on social media and bites you at 3 a.m.
- Managed services exist for a reason. When you self-host PostgreSQL, you now own backups, failover, point-in-time recovery, version upgrades, patching, and the pager. The cloud was charging you for that, and a lot of teams genuinely come out ahead paying the premium rather than staffing the expertise. Portability is a benefit with a recurring operational cost attached, and you should price both sides before deciding.
- Kubernetes is overkill for plenty of projects. If you’re running a handful of services, standing up a cluster to win portability you’ll never exercise is a bad trade. The Terraform code stays portable either way, which is the real point, so you can get a lot of the benefit without committing to K8s as your runtime for everything.
- Stateful workloads on Kubernetes are their own discipline. Operators have made this far more reasonable than it was five years ago, but a database on K8s is not a fire-and-forget proposition. Go in with eyes open.
So this isn’t “never touch a managed service.” It’s “decide deliberately.” Use a proprietary managed service when the operational savings clearly beat the lock-in cost; just know that you’re making that trade, in writing, rather than discovering it the day you try to leave.
How to actually decide
The useful question isn’t “cloud-specific or portable?” in the abstract. It’s: for this specific component, what does the exit cost, and is the convenience worth that price?
A rough hierarchy that’s served me well:
- Default to portable for the foundation: IaC tooling, CI/CD logic, networking and identity patterns, your stateful core. This is the stuff that’s expensive to unwind later and cheap to get right now.
- Allow cloud-specific where the value is genuinely differentiated and the alternative would be a heroic amount of undifferentiated heavy lifting. Some managed services really are better than anything you’d run yourself, and dogma here just costs you money.
- Refuse to let pipeline code become load-bearing. This is the cheapest win on the list and the one people skip. Keep the proprietary layer thin.
The strategic reason this is worth the discipline ties straight back to the competition story. When a recent report called Canada’s cloud market “broken”, its sharpest recommendation wasn’t about breaking up the incumbents; it was about forcing compatibility so customers can actually switch. Regulators are circling egress fees and data portability for the same reason: lock-in gives hyperscalers pricing power over customers who can’t credibly leave. But you don’t have to wait for a regulator to hand you leverage. An architecture that could move is a negotiating position whether or not you ever pull the trigger, and most of the time you won’t need to, because the provider knows you could. The cheapest exit is the one you designed in from day one. The most expensive one is the rewrite you start the morning you finally decide you’ve had enough.
The goal was never to leave. It’s to stay because you decided to, not because you had no other option. That choice gets made early, in the boring decisions, or it doesn’t get made at all.