Cloud cost optimization, the execution layer of FinOps, is the work of cutting what you waste, sizing what you keep, and understanding what each unit of your business costs to run. It matters because most cloud bills carry a large share of waste and arrive without explaining what drove them. Argus Root runs that discipline continuously, in the order the levers actually pay back, and stays honest about the cases where optimising the cloud is no longer the cheapest path.
In short
- The levers have an order of payback: rightsize and clear idle first (days, and 10–20% on its own), then commit — the rule is rightsize first, commit second, so you never lock in waste.
- Commitments do the heavy lifting on rate: Reserved Instances and Savings Plans cut 30–72% off on-demand for steady workloads, with a coverage target around 70–80%.
- Spot and preemptible capacity saves 70–90% for fault-tolerant work — batch, CI, stateless and ML with checkpointing — in exchange for interruption.
- Scheduling non-production to stop nights and weekends takes up to 80% off the dev and test layer, one of the cleanest quick wins there is.
- All three layers together — visibility, waste removal and commitments — deliver a 30–50% total reduction, against an industry average of roughly 32% waste.
Cloud cost optimization, from killing waste to cost per customer.
Most cloud bills carry a third in waste, and the monthly invoice rarely tells you what it costs to serve one customer. We run the FinOps discipline that removes the waste, rightsizes what is left, and turns spend into unit economics, and we will tell you when the cheaper answer is to leave the cloud.
Most cloud bills carry a third in waste.
A typical FinOps program recovers 30 to 40% by removing idle and oversized resources, rightsizing what remains and buying the right commitments. The harder shift in 2026 is moving past the total bill to unit economics, the cost per customer, per transaction, per inference, because AI and GPU workloads scale unpredictably and a runaway job can spend a month's budget before the invoice arrives. Real-time anomaly detection catches that early. None of it works without tagging: what is not attributed cannot be optimised.
| Lever | What it targets | When it is the move |
|---|---|---|
| Kill waste | Idle, orphaned and oversized resources | First, for the fastest money |
| Rightsizing | Capacity matched to real load | Continuously, not once a quarter |
| Commitments | Savings plans and spot capacity | Once usage is predictable |
| Anomaly detection | A runaway cost before month-end | In real time, not on the invoice |
| Unit economics | Cost per customer or transaction | When you need margin, not a lower bill |
| Repatriation | Steady-state and AI workloads | When optimising the cloud is not enough |
The waste is the starting point because it is large and well measured. The FinOps Foundation's 2026 survey of more than a thousand organisations puts cloud waste at 32 to 40% of spend in companies without a cost discipline, falling to 15 to 20% in mature ones. The gap between those two figures is the prize, and it is why a serious programme commonly reduces a monthly bill by 25 to 30%, with the first cuts landing within weeks rather than quarters. The money is there; what is usually missing is anyone whose job it is to go and get it.
The savings come in a predictable order. A quick-win sweep, shutting down idle non-production environments, deleting unattached volumes and orphaned snapshots, rightsizing the most over-provisioned instances, typically recovers 15 to 20% within the first month. Then the structural levers: rightsizing to real demand returns another 15 to 25%, scheduling non-production hours 10 to 20%, and committing the predictable base to reserved capacity or savings plans cuts 40 to 72% off on-demand pricing for that portion. None of it is a single dramatic optimisation; it is a stack of disciplined ones, which is why it has to run continuously rather than once.
Sometimes the cheapest fix is to leave.
Most FinOps providers optimise the bill and stop there, because moving a workload is not a lever they can pull. We will tell you when the structural answer is to move it. For steady-state systems and high-volume AI inference, the economics often flip: past a certain scale, self-hosting open-weight models or repatriating to EU-resident infrastructure beats any discount the hyperscaler will offer.
That is a recommendation a pure-cloud optimiser has no reason to make, and one we can both make and carry out. The move runs through our cloud migration work, the destination is our managed cloud, and for the AI side the self-hosting sits with our production AI on open-weight models in the EU.
AI sharpens the point. GPU and inference costs scale unpredictably, often with 30 to 50% of GPU capacity over-provisioned, and a single poor reservation can double a bill in a week; IDC warns that organisations underestimating AI infrastructure face compounding budget risk as agentic workloads grow. There is a measurable crossover, too: past roughly a hundred million tokens a month, self-hosting an open-weight model almost always wins on unit economics over paying per call. Knowing where that line sits for your workloads, and being able to act on it, is the difference between optimising an AI bill and structurally fixing it.
What do we run?
A continuous practice, with cost visible where decisions are made rather than in a month-end spreadsheet.
Cost visibility & tagging
A reliable cost spine across your accounts, with tagging that attributes every euro to a team, product or customer, because untagged spend cannot be cut with confidence.
Waste elimination
Idle instances, orphaned disks, forgotten environments and oversized resources found and removed, which is where the first and fastest savings live.
Rightsizing
Capacity matched to real demand on a continuous basis, so you stop paying for headroom that last year's load needed and this year's does not.
Commitment planning
Savings plans and reserved capacity for the predictable base, spot capacity for the fault-tolerant work, sized so a commitment saves money rather than locking in waste.
Anomaly detection
Spend watched in real time against your normal pattern, so a runaway script or a model change is flagged in minutes rather than discovered at month-end. See observability →
Unit economics & AI cost
Spend mapped to cost per customer, transaction and inference, so the conversation is about margin and value per euro rather than a single bill total.
We run our own infrastructure on thin margins.
We operate a portfolio of brands on infrastructure we own, which only works because we keep the cost per unit low. Cost discipline is how we survive rather than a framework we read about, and it is the same discipline we bring to your bill. We will not promise a fixed percentage before seeing your environment, because an honest FinOps engagement reports what it recovered against a baseline rather than a number from a sales deck.
# 1. usage optimisation first — never commit to waste idle: stop_after: 7d rightsize: cpu_target: 60% mem_headroom: 25% schedule: non_prod: stop_nights_weekends # up to 80% off dev/test # 2. rate optimisation second — on the steady baseline only commitments: coverage_target: 80% # RI / Savings Plans, 30-72% off keep_on_demand: bursty spot: use_for: [batch, ci, stateless] # 70-90% off, interruptible # 3. govern it so it does not drift back anomaly: alert_if: daily_spend > 1.3x_7d_avg unit_economics: track: cost_per_customer # not cost per server
Inform, optimise, operate.
FinOps runs in three phases, and skipping the first is why most cost efforts fade. Inform is establishing visibility: tagging, cost allocation and dashboards that attribute spend to the teams, products and customers that drove it, because a bill nobody can break down is a bill nobody can argue with. Optimise is where the action happens, removing waste, rightsizing, buying the right commitments and lowering the unit cost of each thing the business delivers. Operate is making it stick, the cadence and the ownership that stop the savings drifting back the moment attention moves on.
The discipline matures in stages, from a first crawl of basic visibility to a run of automated, continuous optimisation. We meet you where you are and move you up: a company with no tagging and a mystery bill needs the inform phase first, while one already attributing spend is ready to push on unit economics and commitment strategy. The phases are not a one-time project but a loop that keeps turning, because cloud spend grows and drifts continuously and a cost discipline that runs once is overtaken within a quarter.
You cannot cut what you cannot attribute.
A cloud bill arrives as one large number, and without attribution that is all it ever is. Tagging is what turns it into something you can question: every resource labelled with the team, product, environment and customer it belongs to, so a rise in the total can be traced to the thing that caused it rather than argued about in the abstract. Untagged spend cannot be cut with confidence, because no one can say what cutting it would break.
We enforce tagging at the point resources are created, through policy rather than good intentions, since tags added later are never complete and a convention nobody enforces decays within weeks. The aim is the high tag-compliance that makes every other part of the work possible: resources that do not meet the tagging rules are not provisioned in the first place. With that cost spine in place, the spend maps cleanly to the business, and a question about the bill has an answer instead of a shrug.
Which levers pay back first?
Cost work pays best in a particular sequence, and pulling the levers out of order wastes effort. Waste elimination comes first because it is pure gain with no downside: idle instances, orphaned disks, forgotten environments and oversized resources cost money and serve no one, and removing them recovers the fast 15 to 20% that builds momentum and funds the rest. Nothing about the business changes; the bill simply stops paying for things nobody uses.
Then rightsizing matches capacity to real demand, and only after that do commitments make sense, because committing to reserved capacity before you have removed the waste locks in the waste at a discount. Architectural changes, the deeper and slower wins, come last, weighed against their cost to implement. Running the levers in this order means each one funds the next and none of them commits you to a mistake, which is the difference between a programme that compounds and a one-off cleanup that erodes.
Commitments, without locking in waste.
Reserved capacity and savings plans are the largest single lever, cutting 40 to 72% off on-demand pricing, and fewer than half of organisations use them at all. The reason for the caution is real: a commitment is a bet on future usage, and committing to capacity you do not really need locks in waste for one to three years rather than removing it. The discipline is to commit only the predictable base, the always-on workloads whose floor is known, and to leave the variable layer flexible.
So we size commitments to the steady base, use on-demand for the bursts and deployments, and put fault-tolerant work, batch jobs, training, anything that can restart, on spot capacity that saves 70 to 90% in exchange for the risk of interruption. The mix follows the shape of the workload rather than a blanket policy, and it is revisited as that shape changes. A commitment portfolio that was right last year quietly becomes waste as usage shifts, which is why this is a standing review rather than an annual purchase.
How often should you rightsize?
Most resources are provisioned for a peak that rarely arrives and then never revisited, so they sit half-empty and fully billed. Rightsizing matches each resource to its real demand, and done once it recovers a useful 15 to 25%, but done once it also decays: demand moves, workloads change, and last year's right size is this year's over-provision. The saving is real only if the matching keeps happening as the load evolves.
We run rightsizing continuously rather than as a quarterly cleanup, watching actual utilisation and adjusting capacity to it, scaling down the over-provisioned and up the genuinely constrained. The aim is an estate that tracks its real demand rather than the guess made when each resource was created, so you stop paying for headroom that a past load needed and a current one does not. It is unglamorous and constant, which is exactly why it is the kind of work that gets dropped without someone owning it.
Catching the runaway in hours, not at month-end.
The bill that hurts most is the one nobody saw coming: a misconfigured auto-scaling group, a runaway data-transfer job, an instance type changed in haste, a model left running. Discovered at month-end, it is a month of unnecessary spend already gone. Watched in real time against your normal pattern, the same event shows up as an anomaly within hours, while it can still be stopped before it does real damage.
This is one place where automated detection genuinely earns its keep: cloud spend has enough signal that machine-learning anomaly detection, native to the major providers, flags an unexpected increase far faster than any manual review. We wire that into the same observability we run for everything else, so a cost spike is an alert that reaches someone, tied to an owner who investigates it, rather than a line on an invoice four weeks later. Cost anomalies, like outages, are cheapest to fix while they are still small.
Cost per customer, not cost per server.
The mature question is not how big the bill is but what each unit of the business costs to run. Spend mapped to cost per customer, per transaction, per inference turns an abstract total into a margin conversation: whether a product is profitable at its current price, whether a heavy customer is worth what they pay, whether a feature earns its infrastructure. A falling bill on a growing business can still mean worsening economics, and a rising bill can be healthy; only unit cost tells you which.
We build that view on top of the tagging spine, so the cost of serving a customer or running a feature is a number you can see and act on rather than infer. It changes what optimisation is for: instead of chasing a smaller bill for its own sake, the goal becomes a better cost per unit of value, which sometimes means spending more where it earns more. That is the shift from cost-cutting to cost intelligence, and it is where the discipline stops being about saying no and starts being about funding growth deliberately.
AI and GPU: the least predictable line on the bill.
AI has become the hardest part of a cloud bill to control, enough that the FinOps discipline formally widened to cover it. GPU capacity is routinely 30 to 50% over-provisioned, inference scales with demand in ways a steady service never did, and a single poor reservation decision can double costs in a week. Measured the old way, by GPU-hour or API call, the spend is opaque; the bill grows and nobody can say which model, endpoint or feature drove it.
So AI cost needs its own measurement: cost per inference, per query, per successful outcome, tied to the business metric rather than the raw resource, with every model and endpoint assigned an owner. We track it that way, watch for the anomalies that AI workloads throw more than most, and model the crossover where self-hosting an open-weight model on EU infrastructure beats paying per call, which past about a hundred million tokens a month it usually does. The structural fix for a runaway AI bill is often a placement decision, and it sits with our production AI work rather than a discount nobody offered.
When the big rocks are gone.
A mature cost programme runs into honest diminishing returns. Once the obvious waste is cleared, the idle servers off, the instances rightsized, the tags cleaned, what remains is a high volume of small opportunities that each take more effort to capture than they return. Practitioners describe hitting the big rocks and then facing gravel; one team reported reaching 97% optimisation and deliberately leaving the last 3% unactioned because the business reasons to keep it outweighed the saving.
We will tell you when you have reached that point rather than invent work to justify a retainer. Past it, the value moves from cutting the bill to understanding it: holding the line against drift, keeping the unit economics healthy as the business changes, and making the deliberate trade-offs where a little more spend buys reliability or speed worth more than the saving. A cost programme that knows when to stop chasing pennies and start managing value is a more honest one than a programme that pretends there is always another big cut to make.
The cadence that stops it drifting back.
Cost savings decay without a rhythm to hold them. New resources are provisioned, commitments expire, workloads grow, and an estate optimised once drifts back toward waste within a quarter if nobody is watching. So the work runs on a cadence: weekly reviews at team level to catch the new waste early, monthly reviews of the commitment portfolio as usage shifts, and quarterly architecture reviews where cost is an explicit criterion alongside performance and reliability.
The other half is shifting the thinking earlier. Cost estimation built into infrastructure change reviews and a cost line in sprint planning make spend a decision taken when a resource is designed, rather than a surprise discovered when the bill arrives. The cheapest waste to remove is the waste never created, and that only happens when cost is a first-class engineering concern from the start rather than a finance problem handed over after the fact. The cadence is what turns optimisation from an event into a property of how the estate is run.
Engineering and finance, speaking the same language.
Cloud cost is diffuse by nature: any engineer can provision spend with a click, and the bill lands on finance, who cannot see what any line is for. That split is why cost visibility is poor and accountability vague in most organisations, and it is a cultural problem as much as a technical one. The fix is a shared language and clear ownership, where every resource has an owner and every anomaly has someone responsible for investigating it, so a cost review produces actions rather than observations.
We bring the practice that connects the two sides: the engineers see the cost of what they build, finance sees what the spend buys, and the trade-offs are made by people who can see both. The point is not to restrict engineering, which kills the speed cloud exists to provide, but to fund it intelligently, so the team can move fast while the spend stays understood. FinOps done well makes cloud a strategic asset that the business steers deliberately, rather than a runaway expense it discovers monthly.
Who needs this, and when is there little left to cut?
The clearest fit is an organisation whose cloud bill has grown faster than its understanding of it: spend rising, no clear attribution, and a suspicion that a good share is waste, which the baseline figures suggest is usually right. A business scaling on cloud, one with a large and unpredictable AI bill, or one where margin depends on the cost of serving each customer, all have a cost problem worth a dedicated discipline rather than an occasional cleanup.
Where there is genuinely little left to cut, we say so. A small, well-run estate with modest, predictable spend may not warrant a standing programme, and a business that already attributes its costs cleanly and commits sensibly is beyond the point where we would add much. We would rather tell you the waste is already low than sell a programme to chase savings that are not there, because the honesty is the same one that makes us recommend leaving the cloud when that, not another optimisation, is the real answer.
Can you know the bill before it arrives?
A cost discipline that only looks backward is half a discipline. Forecasting turns the spend from a monthly surprise into a number you can plan against: projecting where the bill is heading from the current trend and the known changes ahead, so a budget is set with evidence rather than hope and a coming overspend is seen in time to act. The teams that run cloud well know roughly what next month costs before it starts, which is the opposite of the common experience of opening the invoice and flinching.
The forecast also makes trade-offs visible before they are committed. A planned launch, a new model, a growth in traffic each has a cost that can be estimated in advance and weighed against its value, rather than discovered after it has run for a month. We build that forward view on the same attributed data as the rest, so the question shifts from explaining last month's bill to deciding next month's deliberately. A bill you predicted is one you chose; a bill you only explain is one that happened to you.
The practice, not another dashboard.
The FinOps tool market has grown from a handful of products to more than a hundred vendors, and a dashboard on its own changes nothing. Plenty of organisations buy a cost tool, watch the spend in higher resolution, and carry on wasting it, because seeing the waste is not the same as acting on it. The tool reports; someone still has to decide what to cut, make the change, and hold the line, which is the work the licence does not do.
We bring the practice and use whatever tooling fits your estate rather than sell you a platform, since the right tool for Kubernetes attribution differs from the right one for commitment automation or unit economics, and the wrong one is an expensive duplicate. What matters is that the spend is attributed, the levers are pulled in order, and the actions genuinely happen, which is a discipline carried by people rather than a feature you switch on. A tool can show you the problem; closing it is the service.
What does good look like?
A cost programme should be judged on what it recovered against a clear baseline, not on a percentage quoted in a pitch. We establish the starting spend, attribute it, and then measure the savings truly delivered against it, so the value of the work is a number you can verify rather than a claim you take on faith. That is why we will not quote a fixed saving before seeing your environment: the honest figure is the one the engagement proves, and a real bill might not support a number invented in advance.
It also keeps the goal honest as the programme matures. Early on, the measure is the waste removed; later, it becomes the unit economics held steady or improved as the business grows, and the deliberate trade-offs where a little more spend bought something worth more. We report in those terms rather than a single vanity percentage, because a programme measured only on cuts will eventually cut something it should not. The proof is spend that reflects value, demonstrated against where you started, rather than a headline that flatters the work.
What does FinOps cover beyond the cloud bill?
In 2026 the discipline has outgrown the name. FinOps began as cloud cost management — the continuous cycle of visibility, allocation, reporting and governance, with optimisation as the action subset inside it — but the FinOps Foundation's own reading of the field now shows it expanding past pure cloud spend to the costs that behave the same way: SaaS licences that quietly renew, data-platform consumption, and above all the AI and GPU spend that scales faster and less predictably than anything before it. Treating those as separate problems is how an organisation ends up with a well-run cloud bill and an unmanaged everything-else; treating them as one practice is what the term now means.
That practice is a service rather than a project, which is the reframe that matters. A one-off cleanup recovers waste once and watches it return the moment attention moves on; the value is in running the cycle continuously, and most mature organisations now operate it as centralised enablement with federated execution — a central function that sets the practice and the guardrails while the engineering teams act on them. We can be that central function for an organisation that has the cloud but not the discipline, and increasingly the discipline is moving left into the platform itself, so cost awareness lives in the platform engineering golden paths rather than being audited after the fact. On Kubernetes in particular, where cost is notoriously opaque, that approach routinely takes a double-digit bite out of spend. FinOps, run this way, is the difference between knowing what you wasted and not wasting it in the first place.
Questions buyers ask.
What is FinOps, or cloud cost optimization?
How much can we save?
Why doesn't our cloud bill explain itself?
What about AI and GPU costs?
When is leaving the cloud the cheaper option?
How is this different from managed cloud?
What are the three phases of FinOps?
Why does tagging matter so much?
Should we buy reserved instances or savings plans?
How do you control AI and GPU costs?
What is unit economics in cloud cost?
How often does this need doing?
What if our spend is already well managed?
Can you forecast our cloud spend?
Do we need to buy a FinOps tool?
Do you guarantee a percentage of savings?
How quickly do we see savings?
Does this cover multicloud and SaaS spend too?
Is FinOps a one-off project or an ongoing service?
Show us your cloud bill. We'll show you the waste.
Share your current cloud spend and we map where it goes, flag the waste and the missing commitments, and show you the cost per unit of your business, along with any workload that would be cheaper to move, before you commit to anything.