Server management that keeps Linux patched, hardened, and documented.
Most server outages come from a missed patch or a configuration no one tracked, and under NIS2 patching is now a duty you have to evidence. We administer your Linux servers wherever they run: patched on time, hardened to a baseline, and watched before a problem reaches your users.
Server management is the ongoing administration that keeps a server available, secure and performing: applying OS patches, hardening the configuration, controlling drift, watching for trouble, and verifying the backups. It matters because most outages are self-inflicted — configuration errors cause a large share of them and unpatched software remains the first way in. Argus Root administers Linux servers wherever they run, yours or ours, inside the EU, and keeps a fleet at a measurable baseline rather than letting it drift.
In short
- Configuration errors cause roughly 41% of cloud outages, and unpatched vulnerabilities remain the first exploited way in — most failures are self-inflicted and preventable.
- Hardening is measured, not asserted: a CIS Benchmark baseline scored with OpenSCAP and Lynis turns "secure enough" into a number you can track.
- NIS2 turned patching from hygiene into a provable obligation — security updates within a reasonable window, with the evidence to show it.
- Configuration-as-code stops snowflake servers and drift: the hardened state is defined once, applied identically, and re-enforced on a schedule.
- A backup is only real once the restore is tested; an untested backup is an assumption, not a recovery plan.
What causes most server outages?
The expensive failures are rarely exotic. Configuration errors account for roughly 41% of cloud outages, unpatched vulnerabilities are the first exploit vector, and unplanned downtime runs into thousands of euros a minute. The jump from a 99.9% to a 99.99% target is the difference between nearly nine hours of downtime a year and under an hour. Server management is the unglamorous work that decides which side of that line you land on.
| Unmanaged server | Under management | |
|---|---|---|
| OS patching | When someone remembers | Within the NIS2 window, documented |
| Hardening | Vendor defaults | CIS baseline, SSH keys, MFA |
| Configuration | Drifts silently | Controlled and version-tracked |
| Trouble | You learn when it is down | Caught before client impact |
| Recovery | Improvised under pressure | Tested, with backups verified |
The pattern holds beyond outages. Most breaches come from known flaws nobody got round to fixing rather than from exotic zero-days: a 2025 study found that around 60% of breaches exploited a vulnerability for which a patch already existed but had not been applied. The weakness was understood, the fix was available, and the gap was the operational one between the two. Closing that gap is what server management exists to do, and it is the cheapest security work an organisation can buy.
Drift is the quieter version of the same problem. A fleet that started uniform diverges as someone makes a one-off change here and skips an update there, until no two servers are quite alike and nobody can say with confidence what is running where. A drifted estate fails audits, because a regulator wants proof that controls are applied consistently, and it fails operationally, because a fix tested on one box behaves differently on another that quietly diverged. Keeping a fleet to a known state is less glamorous than chasing threats and prevents more of them.
Why must patching now be provable?
NIS2 turned patching from hygiene into an obligation. Security updates have to be installed within a reasonable window and verifiably, which means prioritised, tested, documented and rolled out in a way you can show an auditor, rather than applied at some point and forgotten. We run patch management that produces that record by default, on a hardening baseline drawn from the CIS Benchmarks, with configuration managed through Ansible so nothing drifts between one server and the next.
That keeps the server layer audit-ready and feeds the wider picture. Detection and response above the operating system sit with our managed security, the monitoring with observability, the restore side with backup and disaster recovery, and the evidence trail with compliance.
The proof is the part most teams cannot produce on demand. An auditor does not want a claim that you follow best practice; they want measurable evidence with timestamps: the CIS benchmark score for each host, the date a given CVE was patched, the rationale where one was deferred. We generate that from the tooling rather than assemble it by hand before an assessment, running OpenSCAP CIS scans that produce a rule-level compliance score, so the answer to "show me" is a report rather than a scramble. Patching on Linux is a discipline you maintain rather than a problem you solve once, and the record is what shows you have maintained it.
What we manage.
The full server layer, on Linux, across bare metal, virtual machines and cloud instances.
OS patching
Updates prioritised by severity, tested, and applied within the NIS2 window, with a documented record of what was patched and when.
Hardening
A baseline from the CIS Benchmarks: SSH keys and MFA, a tightened firewall, disabled defaults, and mandatory access control with SELinux or AppArmor.
Configuration management
Server state defined in Ansible and version-tracked, so a fleet stays consistent and a change is deliberate rather than a one-off someone forgets.
Proactive monitoring
Resource, service and availability checks that surface a problem before it reaches your users, wired into the same observability stack we run. See observability →
Backup coordination
Backups scheduled, and the restore genuinely tested, so recovery is a rehearsed step rather than the first time anyone tries it. See backup & DR →
Performance & capacity
Tuning, resource sizing and a read on growth, so the server is sized for the load ahead instead of the load it had last year.
We wrote the tooling we manage with.
Server administration is the work we have done longest. We run fleets of Linux servers on our own infrastructure and built the management framework we use to do it, thousands of lines of tooling for patching, hardening and operations refined against real production. We are honest about the lane: this is Linux administration, on AlmaLinux, Ubuntu and Debian, with response driven by runbooks and proactive monitoring rather than a promise of a 24/7 analyst desk we do not staff. The result is a server that is patched, consistent and accounted for, run by the people who built their own way of doing it.
# /etc/ssh/sshd_config — keys only, no root, fail fast PermitRootLogin no PasswordAuthentication no KbdInteractiveAuthentication no LoginGraceTime 20 MaxAuthTries 3 AllowGroups ops # /etc/sysctl.d/60-hardening.conf — drop spoofed + redirected packets net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 # validate before reload: sshd -t && systemctl reload ssh
How do you patch without risking uptime?
Patching is where most routine-maintenance outages start, and the cause is rarely the update and nearly always the way it was applied. A blind update with no staging, no snapshot and no rollback plan is the risk: a new OpenSSL breaks an old binary, a kernel update changes how a network card behaves, a maintainer quietly changes a default, a service restarts in the middle of traffic. The failure looks obvious afterwards and was avoidable before.
So we do not patch blind. Updates are validated on a staging system that mirrors production, the OS, the kernel flavour, the major package versions and the service layout, before they touch a live server, and a snapshot is taken so there is a fast way back if something breaks anyway. Detection and dry-run are automated for speed, but a change to a critical system passes a human approval gate rather than rolling out unattended. The result is security fixes applied quickly, without the maintenance window becoming the outage you were trying to prevent.
Hardening to a baseline you can measure.
A freshly installed server is not a secure one. It ships with default accounts, open services, permissive settings and password logins, all of which an attacker enumerates within minutes of it appearing online. Hardening removes that easy attack surface before anything else is done: SSH keys and MFA in place of passwords, a tightened firewall, the unused services and defaults disabled, and mandatory access control through SELinux or AppArmor so a compromised process is boxed in rather than free to roam the host.
We harden to the CIS Benchmarks, the published, specific baselines for each operating system, rather than to a private notion of good practice, because a named standard is auditable and a personal one is not. The compliance is measured with OpenSCAP scans that score each host against the benchmark and flag where it falls short, so hardening becomes a number to track and improve rather than a one-time checklist. A well-hardened server is also a more stable one, which is part of why it survives a patch with fewer surprises and presents a smaller target in the first place.
Configuration as code, not snowflake servers.
A server configured by hand is a snowflake: unique, undocumented and impossible to reproduce when it dies. We define server state in Ansible instead, version-tracked the same way as software, so the configuration of every host is written down, reviewable and applied identically across the fleet. A change becomes a deliberate edit to the desired state rather than someone logging into a live box and adjusting it, which is how drift creeps in unnoticed.
Defining the state also means it can be enforced and restored. Ansible re-applies the desired configuration on a schedule so a host that drifts is pulled back into line, file-integrity monitoring flags an unauthorised change to a critical file, and rebuilding a lost server is running a playbook rather than reconstructing it from memory. The estate stays consistent, the next engineer can read what every server is meant to be, and a known-good state is always one command away rather than a thing that lived only in the head of whoever first set it up.
Can you patch without rebooting?
Some patches need a reboot, and a reboot is its own risk: a server that has run for a year may not come back cleanly, and the downtime has to be scheduled around the people who depend on it. Kernel updates are the usual culprit, and the choice between rebooting now and staying exposed is a real one when the flaw is being exploited. Where it fits, live kernel patching applies the fix to a running kernel without a reboot, closing the vulnerability without the outage or the gamble of a restart.
Where a reboot is unavoidable, it is planned rather than sprung: the dependent services identified, the restart sequence known, the rollback ready, and a tool like needrestart used to catch the processes still running old libraries after an update, so the patch is genuinely in effect rather than merely installed. The aim is an update that is fully applied and a system that comes back exactly as expected, instead of a patched server sitting one reboot away from a surprise nobody scheduled.
The end-of-life trap.
An operating system past its end of life stops receiving security patches, which turns every new vulnerability in it into a permanent, unfixable exposure. The risk is not theoretical: a large share of the vulnerabilities actively exploited today target software released years ago and no longer supported, the boxes a scanner flags forever because there is no patch to apply. The CentOS shift left a generation of servers stranded in exactly this position.
We treat EOL as a migration to plan rather than a finding to log. Where a system can move, we migrate it to a supported distribution, AlmaLinux, Ubuntu or Debian, on a timeline rather than in a panic; where it genuinely cannot move yet, extended-support patches for end-of-life Linux keep it receiving fixes while the migration is arranged. Either way the answer is a route off the unpatchable platform, because an EOL server is not a steady state to be monitored indefinitely but a risk running on a clock.
Anywhere the metal runs.
Server management is the discipline, not a tie to where the server lives. We administer Linux on your own hardware, in your cloud account, or on our infrastructure, and the patching, hardening and configuration are the same regardless of where the metal sits. That separates this from a hosting plan, where the management comes bundled with the data centre and ends at its door.
The hybrid estate that most organisations really run, a mix of cloud instances, on-premise boxes and the occasional legacy machine, is the harder case and the one this is built for. Managing updates and configuration consistently across that fragmented estate is the real work, and doing it from one defined state rather than three separate habits is what keeps a mixed estate coherent. You decide where your servers run; we keep them in order wherever that is.
How does onboarding work?
Onboarding starts by taking stock rather than changing things. We inventory the servers, read their current patch level and configuration, and produce a baseline that usually surfaces the drift, the missing patches and the EOL boxes nobody had counted. From there the fleet is brought to a hardened, version-controlled baseline in a planned sequence rather than all at once, so nothing breaks in the name of tidying up.
Then it becomes the standing routine: patching on the agreed cadence with the evidence trail, configuration enforced from code, monitoring wired in, backups verified, and a periodic audit against the SLAs. We work with your existing access and change process rather than impose a parallel one. The aim is a fleet that stays patched, consistent and documented as a matter of course, so the server layer stops being the thing that quietly fails an audit or an uptime target.
Monitoring that catches it first.
An unmanaged server tells you it is in trouble by going down. A managed one is watched so the trouble is caught while it is still a warning: a disk filling, a service flapping, memory creeping up, a daemon that died and did not restart. Resource, service and availability checks run continuously and surface the problem to us before it reaches your users, which is the difference between a quiet fix at 2am and an outage everyone notices by 9.
The monitoring is wired into the same observability stack we run rather than a separate dashboard nobody watches, so a server signal sits alongside the application and infrastructure picture and a pattern across them is visible rather than scattered. The point is to act on the few metrics that predict a failure rather than collect numbers for their own sake, turning maintenance from reactive firefighting into the unremarkable business of fixing small things before they become large ones.
A backup is only real once the restore works.
Most backup failures are discovered at the worst possible moment: during a restore that does not work. A backup that has never been tested is a hope rather than a safeguard, and the list of ways it quietly fails is long, an incomplete dataset, a missing dependency, a key nobody kept, a job that stopped succeeding months ago and alerted no one. The schedule running is not the same as the recovery working.
So we coordinate backups as part of administering the server and verify the restore rather than assume it, because the restore is the only part that matters when something is lost. The deeper recovery planning, the objectives, the failover, the disaster scenarios, sits with our backup and disaster recovery work; at the server layer the job is making sure the data is captured correctly and can be brought back, so a failed disk or a bad deploy is a restore rather than a catastrophe.
Performance and capacity, before it bites.
A server sized for last year's load is a slow outage waiting to happen. Performance work is the unglamorous maintenance that keeps a system responsive as it grows: tuning the configuration to the workload, right-sizing the resources so you are neither starved nor paying for idle capacity, and watching the trend so the wall is seen coming rather than hit. A database that has quietly outgrown its memory, a disk approaching full, a connection limit about to be reached, each is a failure that announces itself in advance to anyone watching the right number.
We read the growth rather than react to the crash, so capacity is added on a plan and tuning happens before a slowdown becomes a complaint. The same view feeds sensible decisions about when a workload should move to bigger hardware, scale out, or simply be configured better on what it already has. Most performance emergencies were predictable months earlier; the work is noticing in time and acting while the fix is still cheap.
Linux is the lane, and we say where it ends.
We administer Linux, principally AlmaLinux, Ubuntu and Debian, and we say so plainly rather than claim a breadth we do not run day to day. For the depth that comes from operating Linux fleets ourselves, including the automation we wrote and use in production, it is the right fit; for a Windows-first estate it is not, and we will tell you that rather than take the work and learn on your servers.
The same honesty applies to the boundary with the rest of what we offer. Server management is administration, rather than detection, recovery planning or the vulnerability program; each of those is its own discipline that this one feeds and is fed by. Where a problem belongs to another layer we say which, and where it belongs to a provider better suited than us we say that too. A clearly drawn lane is what makes the work inside it dependable.
Questions buyers ask.
What is server management?
What is the difference between a managed and an unmanaged server?
Do you manage servers you do not host?
How does server management relate to NIS2?
Do you manage Linux only?
Is this the same as your managed security?
How do you patch without causing an outage?
Can you patch the kernel without rebooting?
What about our end-of-life servers?
How do you prevent configuration drift across a fleet?
What evidence do you produce for an audit?
Do you monitor the servers as well?
Do you handle backups, or is that separate?
What does the SLA look like?
How is server management priced?
Can you take over servers another provider set up?
Do you work with our existing change process?
Do you manage containers and VMs, or bare metal alone?
Will you tell us if we do not need managed server administration?
Give us your servers. We'll keep them patched and standing.
Point us at the Linux servers you are running. We review their patch state, hardening and configuration, show you where the risk sits, and lay out what bringing them under management takes, before you commit to anything.