Agentic Engineering: From Vibe Coding to a Plan, Execute, Verify Discipline

Posted on 21st June 2026 by Rodrigo Silva

For about a year, “vibe coding” was the most fun anyone had had with a keyboard. You describe what you want, an agent writes it, you skim the result, you ship. Andrej Karpathy’s original framing was almost gleeful: you “give in to the vibes” and barely read the diffs.

Then teams tried to put vibe-coded software into production, and the bill came due.

The failure mode now has a name — AI slop: code that looks reasonable on the surface but lacks error handling, quietly introduces security vulnerabilities, breaks something three modules over, or produces an architecture nobody can maintain. It’s not that the model is dumb. It’s that “prompt and hope” has no step where anything gets checked against reality before it lands.

The numbers make the discomfort concrete. In Sonar’s 2026 developer survey, 96% of developers said they don’t fully trust the output of AI coding agents. A late-2025 Stack Overflow survey found nearly half of developers were frustrated by AI solutions that are “almost right, but not quite” — which is arguably the most expensive kind of wrong, because it survives a casual review and fails in production.

In early 2026, Karpathy named the thing that comes next: agentic engineering — the discipline of designing systems where AI agents plan, write, test, and ship code under structured human oversight. Not casual prompting. Not hope-and-check. An actual engineering methodology built for AI-first development.

This series is a hands-on guide to that discipline. By the end you’ll have built a loop you’d actually trust near your codebase.

What vibe coding is missing

Strip a vibe-coding session down and you get:

You: "Add rate limiting to the API"
Agent: <writes 80 lines>
You: <skim, looks fine, paste it in>

Three things are absent, and each is a place production breaks:

No explicit plan. The agent inferred intent and committed to an approach in a single shot. You never saw the approach, so you couldn’t catch that it chose an in-memory limiter that won’t survive a multi-instance deploy.
No isolation. The code went straight onto your working branch. When it’s wrong, untangling it from your own changes is a chore.
No verification gate. “Looks fine” is the only test. There’s no point where tests, a linter, or a security scanner can block the change before it reaches you.

Agentic engineering reintroduces all three as first-class steps. The core workflow that replaces “prompt and hope” is the Plan → Execute → Verify loop, usually shortened to PEV.

The PEV loop

Flowchart illustrating the software development process with four stages: Plan, Execute, Verify, and Ship, emphasising human oversight and feedback loops for failure.

Each phase has a job:

Plan turns a fuzzy request into an explicit, reviewable artifact: a spec and a task breakdown. This is where a human can intervene cheaply — catching a bad approach before a line of code exists costs seconds; catching it after costs a debugging session.
Execute writes the code, but in an isolated environment (a fresh branch, a sandboxed workspace) so a bad run can be thrown away with zero blast radius.
Verify runs objective checks — the test suite, linters, type checks, security scanners — and either passes the work forward or kicks it back to Plan with the failure as new context.

The arrow wrapping the whole thing is the part people skip and shouldn’t: human oversight. The human isn’t in the inner loop typing code. They’re at the boundaries — approving the plan, adjudicating verify failures the agent can’t resolve, and owning the final merge decision.

That distinction — humans at the boundaries, not in the middle — is the whole philosophy. It’s what lets the loop run fast without running blind.

A minimal PEV loop you can run

Let’s make this concrete with the smallest thing that’s still genuinely a PEV loop. We’ll give an agent a tiny task, force it to plan first, let it execute, then actually verify with a real test run instead of vibes.

We’ll use Python and the Anthropic Messages API, but nothing here is provider-specific — swap in any model SDK.

import subprocess, json, tempfile, os
from anthropic import Anthropic

client = Anthropic()
MODEL = "claude-sonnet-4-6"

def call(system, user):
    msg = client.messages.create(
        model=MODEL, max_tokens=2000,
        system=system,
        messages=[{"role": "user", "content": user}],
    )
    return "".join(b.text for b in msg.content if b.type == "text")

# ---- PLAN ---------------------------------------------------------------
def plan(task):
    system = (
        "You are a senior engineer. Given a task, produce a short, explicit "
        "plan as JSON with keys: 'approach' (one sentence) and 'steps' (list). "
        "Do not write code yet. Output JSON only, no prose, no code fences."
    )
    return json.loads(call(system, task))

# ---- EXECUTE ------------------------------------------------------------
def execute(task, the_plan):
    system = (
        "You are a careful engineer. Implement the task following the plan. "
        "Return JSON only with keys 'code' (the module) and 'tests' (pytest "
        "tests for it). No prose, no code fences."
    )
    user = f"TASK:\n{task}\n\nPLAN:\n{json.dumps(the_plan, indent=2)}"
    return json.loads(call(system, user))

# ---- VERIFY -------------------------------------------------------------
def verify(artifact):
    """Run the agent's tests in an isolated temp dir. Real check, not vibes."""
    with tempfile.TemporaryDirectory() as d:
        open(os.path.join(d, "solution.py"), "w").write(artifact["code"])
        open(os.path.join(d, "test_solution.py"), "w").write(artifact["tests"])
        result = subprocess.run(
            ["python", "-m", "pytest", "-q"],
            cwd=d, capture_output=True, text=True, timeout=60,
        )
        return result.returncode == 0, result.stdout + result.stderr

# ---- THE LOOP -----------------------------------------------------------
def pev(task, max_attempts=3):
    feedback = ""
    for attempt in range(1, max_attempts + 1):
        print(f"\n=== Attempt {attempt} ===")
        the_plan = plan(task + feedback)
        print("PLAN:", the_plan["approach"])

        artifact = execute(task, the_plan)
        passed, log = verify(artifact)

        if passed:
            print("VERIFY: passed ✅  (awaiting human review before merge)")
            return artifact
        print("VERIFY: failed ❌  — replanning with the failure as context")
        feedback = f"\n\nPrevious attempt failed these tests:\n{log[-1500:]}"

    print("Gave up after max attempts — escalate to a human.")
    return None

if __name__ == "__main__":
    pev("Write a function `slugify(s)` that lowercases, strips punctuation, "
        "and replaces runs of whitespace with single hyphens.")

Run it and watch the difference from vibe coding: the agent is forced to state an approach before coding, the work runs in a throwaable directory, and the loop only declares success when a real pytest process exits zero. A failure doesn’t get shipped — it gets fed back in as context for the next plan.

What this toy is still missing

This is a real loop, but it’s a toy, and the gaps are exactly the subject of later sections:

The plan is unreviewed. A human should be able to approve or edit it before execution.
Verification is shallow. Passing the agent’s own tests proves very little — the agent can write weak tests. We need independent tests, linters, type checks, and security scanning.
No real isolation. A temp dir works for one function; real work needs branch-per-task and git worktrees so parallel runs don’t collide.
One agent does everything. Author and grader being the same model is a conflict of interest. Splitting roles — author, tester, reviewer, security — catches far more.
No audit trail. In production, why an agent did something becomes the constraint, not whether it could.

Building a production grade PEV loop

Above, we built a PEV loop that ran an agent’s own tests in a temp directory. It was a real loop but a weak one: the plan was unreviewed, the verification trusted the agent to grade itself, and “isolation” was a folder that vanished. We’ll build each phase properly: a plan you can read and approve, an execute phase that runs in a real isolated workspace, and a verify gate the agent cannot talk its way past.

Phase 1 — Plan: make intent explicit and reviewable

The single highest-leverage move in agentic engineering is forcing a planning step before any code exists. A bad approach caught at the plan stage costs seconds. The same mistake caught after execution costs a debugging session — and after merge, an incident.

A good plan is a structured artifact, not a paragraph. Structure makes it reviewable, diff-able, and machine-checkable. A practical shape:

from dataclasses import dataclass, field
from typing import Literal

@dataclass
class PlanStep:
    id: str
    description: str
    files_touched: list[str]
    risk: Literal["low", "medium", "high"]

@dataclass
class Plan:
    goal: str
    approach: str                       # one sentence the human can sanity-check
    non_goals: list[str]                # what we are deliberately NOT doing
    steps: list[PlanStep]
    acceptance_criteria: list[str]      # the verify phase will check these
    open_questions: list[str] = field(default_factory=list)

Two fields earn their keep. non_goals stops scope creep — the most common way an agent turns “add rate limiting” into a rewrite of your middleware stack. acceptance_criteria is the contract the Verify phase will hold the code to; writing it during planning means “done” is defined before a line is written, not rationalized afterward.

The prompt that produces this should be explicit about the spec-first stance:

PLAN_SYSTEM = """You are a staff engineer doing spec-driven development.
Given a task and the relevant repo context, produce a Plan object as JSON.
Rules:
- Decompose into the smallest steps that each leave the repo green.
- State non_goals explicitly to bound scope.
- Write acceptance_criteria as concrete, testable statements.
- If the task is ambiguous, populate open_questions instead of guessing.
Output JSON only."""

Notice the last rule. A vibe-coding agent guesses when it’s unsure and you find out later. A planning agent is told to surface ambiguity as open_questions — which becomes the natural place for a human to intervene. If open_questions is non-empty, the loop pauses for an answer instead of charging ahead on an assumption.

Phase 2 — Execute: isolation is non-negotiable

The reason vibe coding feels dangerous is that the agent writes directly onto your working tree. Get isolation right and a botched run becomes a deleted branch instead of an afternoon of git reset archaeology.

The right primitive is one git worktree per task. A worktree gives each agent run its own checked-out branch in its own directory, all backed by the same repo — so parallel agents can work without colliding, and merging back is an ordinary PR.

import subprocess, uuid, pathlib

def make_worktree(repo: str, base: str = "main") -> tuple[str, str]:
    """Create an isolated worktree on a fresh branch. Returns (path, branch)."""
    branch = f"agent/{uuid.uuid4().hex[:8]}"
    path = pathlib.Path(repo).parent / "worktrees" / branch.replace("/", "-")
    subprocess.run(["git", "-C", repo, "worktree", "add", "-b", branch,
                    str(path), base], check=True)
    return str(path), branch

def teardown_worktree(repo: str, path: str, branch: str, keep: bool):
    if not keep:
        subprocess.run(["git", "-C", repo, "worktree", "remove", "--force", path])
        subprocess.run(["git", "-C", repo, "branch", "-D", branch])

Now execution happens inside that directory. The agent gets tools scoped to the worktree — read a file, write a file, run a shell command — and nothing it does can touch your main checkout. This is also where you cap blast radius with the principle of least privilege: the execution sandbox should have only the filesystem, network, and credential access the task genuinely needs. An agent implementing slugify does not need your production database URL in its environment.

def execute_in_worktree(client, task, plan, wt_path):
    """Agentic execution with file + shell tools, confined to wt_path."""
    tools = [
        {"name": "write_file", "description": "Write a file (path relative to repo root).",
         "input_schema": {"type": "object",
            "properties": {"path": {"type": "string"}, "content": {"type": "string"}},
            "required": ["path", "content"]}},
        {"name": "run", "description": "Run a shell command in the repo root.",
         "input_schema": {"type": "object",
            "properties": {"cmd": {"type": "string"}}, "required": ["cmd"]}},
    ]
    # ... a standard tool-use loop: call the model, dispatch tool calls against
    # wt_path with subprocess(cwd=wt_path), feed results back until the model
    # signals completion. Every shell command is confined to the worktree.

The implementation detail that matters: every tool call executes with cwd=wt_path, and the write_file handler rejects any path that resolves outside the worktree (guard against ../ escapes). Isolation you can bypass isn’t isolation.

Phase 3 — Verify: the gate the agent can’t sweet-talk

This is where the previous loop was weakest. Letting the agent write and grade its own work is a conflict of interest — it’ll write tests that pass. Production verification needs to be objective and independent of the author, and it needs multiple lenses, because each catches a different class of slop.

Think of the gate as a pipeline of checks, each of which can block:

@dataclass
class Check:
    name: str
    cmd: list[str]
    blocking: bool = True

GATE = [
    Check("format",   ["ruff", "format", "--check", "."]),
    Check("lint",     ["ruff", "check", "."]),
    Check("types",    ["mypy", "."]),
    Check("tests",    ["pytest", "-q", "--cov", "--cov-fail-under=80"]),
    Check("security", ["bandit", "-r", ".", "-ll"]),   # catches injected vulns
    Check("deps",     ["pip-audit"]),                   # known-CVE dependencies
]

def verify(wt_path) -> tuple[bool, dict]:
    results, ok = {}, True
    for c in GATE:
        r = subprocess.run(c.cmd, cwd=wt_path, capture_output=True,
                           text=True, timeout=300)
        passed = r.returncode == 0
        results[c.name] = {"passed": passed, "log": (r.stdout + r.stderr)[-2000:]}
        if c.blocking and not passed:
            ok = False
    return ok, results

A few deliberate choices:

A coverage floor (--cov-fail-under=80) stops the agent from “passing” by writing one trivial test. It has to actually exercise the code.
A security scanner (bandit) and a dependency auditor (pip-audit) are not optional niceties. As we’ll see later, an agent producing code at volume produces vulnerabilities at volume unless something blocks them. Security belongs in the gate, not in a later review.
Independent tests matter. A strong setup has a second agent (or a human-owned test suite) write tests the author agent never sees. Self-graded tests are a starting point, not the finish line.

Wiring it together

def production_pev(repo, task, max_attempts=3):
    feedback = ""
    for attempt in range(1, max_attempts + 1):
        plan = make_plan(task + feedback)          # Phase 1
        if plan.open_questions:
            return pause_for_human(plan)           # don't guess — escalate

        wt_path, branch = make_worktree(repo)      # Phase 2: isolate
        try:
            execute_in_worktree(client, task, plan, wt_path)
            ok, results = verify(wt_path)          # Phase 3: gate
            if ok:
                open_pull_request(branch, plan, results)   # human merges 
                teardown_worktree(repo, wt_path, branch, keep=True)
                return branch
            # failed gate → feed the specific failures back into the next plan
            feedback = summarize_failures(results)
        finally:
            teardown_worktree(repo, wt_path, branch, keep=ok)
    escalate_to_human(task, feedback)

The shape is the same loop from earlier, but every phase now has teeth: the plan is a reviewable artifact that refuses to guess, execution is confined to a disposable worktree with least-privilege access, and verification is an independent multi-check gate with a coverage floor and security scanning. Crucially, a failure doesn’t ship — it becomes precise feedback (summarize_failures extracts the actual failing test names and scanner findings) that sharpens the next plan.

What’s still open

We now have a loop that’s safe to point at a real repo. But two things still rely on hand-waving:

“Human merges” and “escalate to a human” — we keep deferring to a human at the boundaries without saying how that handoff should work. Where exactly do humans belong, and how do you keep them effective without making them a bottleneck?
One agent still does the work. We hinted that independent test-writing helps. The full version splits the job across specialized agents — author, tester, reviewer, security — and that orchestration is its own design problem.

Human Oversight, Multi-Agent Orchestration & Shipping Safely

We have a loop that plans explicitly, executes in isolation, and verifies objectively. Left alone, though, it has two unresolved weaknesses: it still treats “a human handles it” as a magic step, and it still has one agent doing — and grading — all the work. Fixing both is what separates a demo from something you’d run against your main branch a hundred times a day.

Where humans actually belong

The initial instinct is to put a human “in the loop.” That’s the wrong picture. A human reviewing every diff an agent produces is just slow vibe coding — you’ve added a bottleneck without adding rigor, and reviewers rubber-stamp under volume anyway.

The right model is humans at the boundaries, agents in the loop. There are exactly three boundaries worth a human’s attention:

Plan approval (the cheap gate). Reviewing a one-paragraph approach plus non_goals and acceptance_criteria takes thirty seconds and catches the most expensive mistakes — wrong approach, wrong scope — before any code exists. This is the single highest-ROI place to spend human attention. Pair it with the open_questions mechanism from earlier: if the agent is unsure, it asks here instead of guessing.
Verify escalation (the exception gate). When the loop exhausts its attempts or hits a failure it can’t resolve, a human adjudicates. The key design rule: the human should receive the structured failure — which checks failed, the actual scanner findings, what the agent already tried — not a raw transcript. Make the escalation legible and it takes a minute; dump a 4,000-line log and it takes an hour.
The merge decision (the accountability gate). A passing verify gate produces a pull request, not a merge. A human owns the decision to land it. This isn’t ceremony — it’s where accountability lives. You can’t fire a bot; someone human is answerable for what reached production.

PullRequest = {
    "branch": branch,
    "plan": plan,                  # the approved approach + acceptance criteria
    "gate_results": results,       # every check, pass/fail, with logs
    "diff_stat": diff_summary,     # what actually changed
    "agent_trail": run_id,         # link to the full audit trail (see below)
}

Everything else — the writing, the testing, the iterating — happens without a human in the inner loop. That’s what lets the system move fast. The boundaries are where speed and safety get reconciled.

Multi-agent orchestration: stop letting the author grade itself

Earlier, we flagged the conflict of interest in one agent writing and testing its own code. The production answer is to split the job into specialized roles, each a focused agent with its own prompt, its own context, and — importantly — no incentive to cover for the others.

A battle-tested division of labor:

Flowchart illustrating a software development process involving roles such as Planner, Author, Tester, Reviewer, and Security, with steps including implementation, testing, and security checks, leading to a pull request.

Planner owns the spec (Phase 1). It does not write code.
Author implements against the plan. It does not write its own acceptance tests.
Tester writes tests from the plan’s acceptance criteria, not from the author’s code — so the tests check intended behavior, not whatever the author happened to build. This single separation kills a huge fraction of “passes its own tests, fails in prod” slop.
Reviewer reads code + tests against the plan and can reject back to Planner with reasons. It’s looking for the things scanners miss: bad architecture, missing edge cases, misread requirements.
Security runs as its own role and as part of the automated gate. It looks specifically for injected vulnerabilities, secrets, and unsafe dependencies.

You don’t need heavyweight frameworks to start — each role is a function that calls a model with a role-specific system prompt and passes a structured artifact to the next. Orchestration can be a plain state machine. Reach for an agent framework only when you actually need durable state, parallelism, or cross-process coordination; premature orchestration infrastructure is its own kind of slop.

The security math you can’t argue with

Here’s the calculation that makes everything above non-negotiable rather than nice-to-have. Anthropic’s 2026 agentic coding guidance puts it bluntly:

An agent producing 1,000 pull requests a week at a 1% vulnerability rate ships 10 new vulnerabilities every week.

Manual review cannot keep pace with that — which is the whole point. The same scaling that makes agentic engineering powerful for you makes it powerful for an attacker, and it makes your own agents a vulnerability factory unless something blocks bad output automatically.

Three consequences:

Security lives in the harness, not in a later review. Every PEV cycle runs security scanning as a blocking check (gate). Bolting it on afterward means you’ve already shipped the slop.
Least privilege is structural. Each execution sandbox gets only the filesystem, network, and credentials its task needs. An agent’s expanded attack surface — it touches APIs, databases, external services — is exactly what a scoped sandbox contains.
New attack classes are real. “Living off the agent” — hijacking an enterprise AI’s own permissions to act maliciously — is an emerging 2026 tactic. Treat the agent’s credentials and tool access as a primary attack surface, not plumbing.

Auditability is the real constraint

There’s a counterintuitive lesson from teams running this at scale: the bottleneck stops being whether agents can do the work and becomes whether you can account for what they did. As agentic dev tooling has boomed through 2026, workflow auditability has become the binding constraint.

Every run should emit an immutable trail: the task, the approved plan, which agent did what, every tool call and its result, the full gate output, and who approved the merge. This isn’t bureaucracy — it’s what makes incidents debuggable, makes compliance possible in regulated environments, and makes the merge gate meaningful (a human approving a PR needs to be able to see why the agent did what it did).

			
def emit_trail(run_id, event, payload):
    record = {"run_id": run_id, "ts": now_iso(), "event": event, "payload": payload}
    append_only_log.write(record)     # tamper-evident, queryable, retained

If you build one thing beyond the loop itself, build this. The teams that succeed with agentic engineering aren’t the ones with the cleverest agents; they’re the ones who can answer “what happened and why” for any change that reached production.

Anti-patterns to avoid

A few failure modes show up repeatedly:

Automating a broken process. Gartner projects that ~40% of agentic projects will be cancelled by 2027 — largely not because the tech fails, but because teams automate workflows that were already broken. PEV makes a bad process faster, not better. Fix the process first.
Human-in-the-inner-loop. Reviewing every diff doesn’t scale and trains reviewers to rubber-stamp. Move humans to the three boundaries.
Self-graded work. If the author writes the tests, you’re measuring the author’s confidence, not correctness. Separate the roles.
Optional security. At agent volume, “we’ll add scanning later” means shipping vulnerabilities now.
Premature orchestration. Don’t reach for a multi-agent framework on day one. Start with a single PEV loop and one human boundary; add roles when a specific failure mode demands one.

A production-readiness checklist

Before you point an agentic loop at a repo that matters:

[ ] Planning produces a reviewable, structured spec with explicit non_goals and acceptance_criteria.
[ ] The agent surfaces ambiguity as questions instead of guessing.
[ ] Every run executes in an isolated, disposable worktree.
[ ] The execution sandbox has least-privilege filesystem/network/credential access.
[ ] The verify gate is independent of the author and includes tests with a coverage floor, type checks, linting, security scanning, and dependency auditing.
[ ] Tests are written from acceptance criteria, not from the author’s code.
[ ] Humans sit at exactly three boundaries: plan approval, verify escalation, merge.
[ ] A passing gate produces a PR, never an auto-merge.
[ ] Every run emits an immutable, queryable audit trail.
[ ] You can answer “what happened and why” for any agent-made change.

Where this leaves you

Agentic engineering isn’t about replacing developers — it’s about multiplying what one developer can responsibly oversee. The teams pulling ahead aren’t the ones who let agents run wildest; they’re the ones who turned “prompt and hope” into Plan → Execute → Verify, kept humans at the boundaries where their judgment compounds, and made every run accountable.

That’s the whole discipline. The agents will keep getting better. The engineering — the plan, the isolation, the gate, the oversight, the trail — is the part that’s yours, and it’s the part that decides whether all that capability ships value or ships slop.

Eyes Everywhere: Secure Logging and Alerting for Modern Systems – Part III

Posted on 27th April 2026 by Rodrigo Silva

Logging and alerting become truly powerful only when they are embedded into a well-designed security architecture. Individual applications emitting logs or isolated detection rules provide only partial visibility. Modern organisations operate distributed systems composed of microservices, cloud infrastructure, container platforms, and external APIs. In such environments, security visibility requires a unified architecture capable of collecting, analysing, and responding to telemetry across the entire system.

Designing Systems That Detect and Respond to Threats

A secure observability architecture combines several layers. Applications generate structured logs. Infrastructure produces telemetry about hosts, containers, and network activity. These signals are collected and aggregated through centralized pipelines, analyzed by security analytics platforms, and ultimately transformed into alerts and automated responses.

Application Layer Logging

Application logs represent the most detailed source of telemetry because they capture business logic events. These events include authentication attempts, financial transactions, configuration changes, and access to sensitive resources.

Applications should emit structured logs that can be easily parsed by downstream analytics systems. A simple implementation using Python might produce JSON-formatted events.

import json
import logging
from datetime import datetime

logger = logging.getLogger("app")

def log_user_action(user_id, action, resource):

    event = {
        "timestamp": datetime.utcnow().isoformat(),
        "event_type": "user_action",
        "user_id": user_id,
        "action": action,
        "resource": resource,
        "service": "account-service"
    }

    logger.info(json.dumps(event))

Such structured events allow centralized systems to search and correlate activity across multiple services. A user accessing a resource through an API gateway, for example, may generate logs in multiple backend services. A consistent schema allows analysts to reconstruct the entire request path.

Application logs also provide the richest context for security detection because they capture events at the business logic level rather than merely technical operations.

Infrastructure Telemetry

While application logs capture internal logic, infrastructure telemetry reveals the behavior of the underlying environment. Systems must monitor operating systems, network activity, and runtime platforms to detect suspicious behavior that occurs outside the application layer.

Operating systems produce authentication logs, process execution records, and network activity events. On a Linux system, failed SSH login attempts may appear in the authentication log.

Failed password for invalid user root from 203.0.113.11 port 42122 ssh2

Infrastructure monitoring agents collect these events and forward them to centralized logging systems.

A simple agent configuration might stream logs to a central collector.

filebeat.inputs:
- type: log
  paths:
    - /var/log/auth.log

output.elasticsearch:
  hosts: ["https://log-server.example.com:9200"]

Network telemetry is also crucial. Firewall and network gateway logs reveal scanning attempts, suspicious connections, and unusual traffic flows.

An example network connection event might appear as structured telemetry.

{
  "event_type": "network_connection",
  "source_ip": "198.51.100.44",
  "destination_port": 22,
  "protocol": "TCP",
  "timestamp": "2026-06-01T13:42:18Z"
}

Combining application and infrastructure telemetry provides comprehensive security visibility.

Aggregation Layer

Large systems generate logs from thousands of sources. Without aggregation, these logs remain scattered across machines and services.

The aggregation layer collects logs from applications, infrastructure components, and network devices. Log collectors such as Fluent Bit, Logstash, or Vector ingest telemetry and forward it to centralized platforms.

A typical log collection configuration might look like the following Fluent Bit pipeline.

fluent-bit \
  -i tail \
  -p path=/var/log/app.log \
  -o http \
  -p host=logging.example.com \
  -p port=8080

Collectors normalize log formats and enrich events with metadata such as host identifiers, container IDs, or geographic location. This enrichment improves searchability and correlation across systems.

Aggregation pipelines also buffer logs during network disruptions to ensure no data is lost.

Analysis Layer

Once collected, logs must be analyzed to detect suspicious behavior. This is the role of the analysis layer.

Security analytics platforms ingest aggregated telemetry and apply detection rules, statistical models, and anomaly detection algorithms.

A SIEM platform might evaluate login activity using a query such as the following.

SELECT username, COUNT(*) AS failures
FROM logs
WHERE event_type = 'authentication_failure'
AND timestamp > NOW() - INTERVAL '10 minutes'
GROUP BY username
HAVING COUNT(*) > 20

If this query produces results, the platform triggers an alert.

More advanced analytics engines also perform behavioral analysis. These systems learn typical usage patterns and detect deviations.

An anomaly detection algorithm implemented in Python might evaluate whether a user’s activity deviates significantly from historical behavior.

def detect_anomaly(current_activity, baseline):
    threshold = baseline * 5
    if current_activity > threshold:
        return True
    return False

These analytic processes convert massive volumes of telemetry into actionable intelligence.

Response Layer

Detection alone is insufficient. Systems must also respond to threats.

The response layer integrates alerting systems with operational tools such as incident management platforms, messaging systems, and automated security controls.

When an alert is triggered, the system may send notifications to security teams.

def send_alert(message):
    alert = {
        "alert_type": "security_event",
        "message": message,
        "timestamp": datetime.utcnow().isoformat()
    }
    notify_security_team(alert)

Automated responses may also be triggered for critical threats. For example, a brute-force attack might result in immediate blocking of the source IP address.

def block_ip(ip_address):
    firewall.block(ip_address)

These automated defenses reduce response time and limit the impact of attacks.

Securing the Logging Pipeline

Protecting Log Transport

Log data often travels across networks before reaching centralized systems. If this communication is not secured, attackers could intercept or manipulate logs.

Secure logging pipelines encrypt log transport using TLS.

A log collector configuration may enforce encrypted transport.

output:
  elasticsearch:
    hosts: ["https://logs.example.com:9200"]
    ssl.certificate_authorities: ["/etc/certs/ca.pem"]

TLS ensures confidentiality and prevents unauthorized interception of telemetry.

Mutual authentication can also verify that only trusted systems send logs to the platform.

Preventing Log Tampering

Attackers frequently attempt to erase or modify logs to hide evidence of their activity.

To protect against this, logging systems implement append-only storage and integrity verification.

One approach uses cryptographic hashing to chain log entries together.

import hashlib

def generate_log_hash(entry, previous_hash):
    combined = entry + previous_hash
    return hashlib.sha256(combined.encode()).hexdigest()

Each log entry includes the hash of the previous entry. If an attacker modifies an entry, the hash chain becomes invalid, revealing the tampering.

Write-once storage systems further protect logs by preventing modification after ingestion.

Isolation of Logging Infrastructure

Security telemetry must be isolated from application environments. If attackers gain access to the same infrastructure that stores logs, they may attempt to manipulate or delete evidence.

Organizations often deploy logging infrastructure in dedicated environments accessible only to security teams.

A simplified architecture may route logs from production systems to a separate security network.

app_server -> log_collector -> security_logging_cluster

Strict access control policies ensure that application administrators cannot modify stored logs.

Isolation ensures the integrity of forensic data during incident investigations.

Cloud-Native Logging and Alerting

Observability in Microservices Architectures

Microservices architectures introduce new challenges for observability. A single user request may traverse dozens of services before completing.

Distributed tracing provides visibility into these interactions by assigning correlation identifiers to requests.

A service might attach a request ID to every log entry.

const requestId = generateRequestId();

logger.info({
    request_id: requestId,
    event: "api_request_received"
});

Downstream services propagate this identifier so analysts can trace the entire execution path.

Kubernetes and Container Logs

Container orchestration platforms generate extensive telemetry about container lifecycles, pod scheduling, and cluster activity.

Kubernetes exposes logs through its API.

kubectl logs deployment/payment-service

Security-relevant cluster events may include pod creation, container crashes, and unexpected resource modifications.

A cluster event might look like the following.

{
  "event_type": "pod_created",
  "namespace": "production",
  "pod_name": "api-server-6f45",
  "timestamp": "2026-06-01T14:22:33Z"
}

Monitoring these events allows organizations to detect suspicious deployments or unauthorized configuration changes.

Serverless Logging Challenges

Serverless environments introduce additional complexity because execution environments are ephemeral. Functions may run for only a few milliseconds before terminating.

As a result, logs must be exported immediately to centralized systems.

A serverless function might log activity using a cloud-native logging service.

import logging

def handler(event, context):
    logging.info({
        "event_type": "function_execution",
        "function": "payment_handler",
        "timestamp": context.timestamp
    })

Centralized logging ensures that transient environments do not lose telemetry.

Automation and AI in Security Monitoring

Automated Detection Pipelines

Security monitoring increasingly relies on automated detection pipelines capable of processing large volumes of telemetry in real time.

Machine learning models can identify patterns that traditional rule-based systems might miss.

A simple anomaly detection model might analyze login frequency.

from sklearn.ensemble import IsolationForest

model = IsolationForest()
model.fit(training_data)
prediction = model.predict(new_login_data)
if prediction == -1:
    trigger_alert("Login anomaly detected")

Such models can detect subtle deviations in user behavior.

AI-Assisted Threat Analysis

Artificial intelligence systems can assist analysts by prioritizing alerts and correlating events across large datasets.

An AI system might analyze multiple signals simultaneously.

if unusual_login and new_device and large_data_access:
    alert("Possible account compromise")

By correlating signals across logs, AI systems can identify complex attack patterns.

Automated Response

Automation can also mitigate attacks automatically.

For example, if a system detects repeated authentication failures from a specific IP address, it may block that address.

if failed_attempts > 50:
    firewall.block(source_ip)

If suspicious activity occurs on a user account, automated controls may disable the account temporarily.

def disable_account(user_id):
    account_service.disable(user_id)

Automation reduces the time between detection and response, limiting the damage attackers can cause.

Logging and Alerting Best Practices Checklist

Principles for Secure Observability

Effective observability systems focus on collecting meaningful telemetry, protecting the integrity of logs, and designing alerts that enable rapid response.

Applications should log security-relevant events using structured formats. Infrastructure telemetry must complement application logs to provide full visibility into system behavior. Detection rules must evolve continuously as new threats emerge.

Security monitoring is not a static system but an evolving process.

Continuous Improvement

Attackers constantly change tactics, techniques, and procedures. Security monitoring systems must therefore evolve continuously.

Detection rules should be refined based on real-world incidents and emerging threat intelligence.

Security teams often review historical incidents and update detection logic accordingly.

def update_detection_rules(new_patterns):
    rules_engine.add(new_patterns)

This iterative improvement ensures that monitoring systems remain effective against modern threats.

Observability as a Security Strategy

Modern software systems produce enormous volumes of telemetry. Yet data alone does not provide protection. Security emerges only when organizations design observability systems that transform telemetry into actionable intelligence.

Secure logging provides the foundation. Alerting transforms logs into early warning signals. Automated responses limit the impact of attacks.

When implemented correctly, logging and alerting systems become the digital equivalent of surveillance infrastructure, continuously monitoring the environment for suspicious activity.

In an era where attackers move rapidly and stealthily, secure observability gives defenders the visibility required to detect threats early, investigate incidents effectively, and protect critical systems before damage occurs.

Eyes Everywhere: Secure Logging and Alerting for Modern Systems – Part II

Posted on 22nd April 2026 by Rodrigo Silva

Logging is the foundation of security visibility, but logs alone do not defend systems. A modern production environment may generate millions or even billions of log entries per day. Hidden within this massive stream of telemetry are the signals that reveal active attacks, compromised accounts, and data exfiltration attempts.

Without intelligent processing, these signals remain buried inside an ocean of noise.

Alerting transforms raw logs into actionable intelligence. It is the mechanism through which suspicious events are detected, prioritized, and escalated to the people or systems capable of responding.

In a mature security architecture, logging produces the raw telemetry, while alerting converts that telemetry into security awareness.

From Logs to Security Intelligence

Modern software systems produce enormous quantities of telemetry. A single API gateway may process tens of thousands of requests per second, while a Kubernetes cluster might generate hundreds of infrastructure events every minute. Each of these activities produces logs.

A simplified example of application logging may look like the following Python implementation.

import logging
import json
from datetime import datetime

logging.basicConfig(level=logging.INFO)

def log_api_request(user_id, endpoint, ip):
    event = {
        "event_type": "api_request",
        "user_id": user_id,
        "endpoint": endpoint,
        "source_ip": ip,
        "timestamp": datetime.utcnow().isoformat()
    }

    logging.info(json.dumps(event))

This code produces useful telemetry, but by itself it does not provide any defense capability. The system will continue logging events even if an attacker performs malicious actions.

To detect attacks, systems must interpret patterns in logs.

For example, a single failed login attempt is normal. Hundreds of failed login attempts from the same IP address within seconds indicate a brute-force attack.

A detection rule might analyze logs to identify such behavior.

def detect_bruteforce_attempt(log_events):

    failed_attempts = {}

    for event in log_events:
        if event["event_type"] == "authentication_failure":
            ip = event["source_ip"]
            failed_attempts[ip] = failed_attempts.get(ip, 0) + 1

            if failed_attempts[ip] > 10:
                print(f"ALERT: Possible brute force attack from {ip}")

In practice, such analysis occurs inside centralized analytics platforms rather than application code. The example illustrates how raw telemetry becomes security intelligence only when interpreted.

Detection Engineering

Detection engineering is the discipline of designing rules and analytics that transform logs into security detections.

A detection rule describes a pattern that indicates suspicious activity.

Consider a typical credential stuffing scenario where attackers attempt to log into many accounts using stolen credentials.

A detection rule may look like the following example expressed in pseudo-SIEM query language.

SELECT source_ip, COUNT(*) AS failures
FROM authentication_logs
WHERE event_type = 'authentication_failure'
AND timestamp > NOW() - INTERVAL '5 minutes'
GROUP BY source_ip
HAVING COUNT(*) > 50

If the query returns results, it means a single IP address generated more than fifty authentication failures within five minutes. This pattern strongly suggests automated attack activity.

Detection engineering often involves continuous refinement. As attackers change tactics, detection rules must evolve to identify new patterns.

Another example might detect privilege escalation events.

SELECT user_id, COUNT(*) AS role_changes
FROM audit_logs
WHERE event_type = 'role_assignment'
AND timestamp > NOW() - INTERVAL '10 minutes'
GROUP BY user_id
HAVING COUNT(*) > 5

Multiple role changes within a short time window may indicate suspicious administrative activity.

These rules convert raw telemetry into detectable threats.

Security Analytics Platforms

Manual log analysis quickly becomes impossible in modern environments. Organizations therefore rely on specialized security analytics platforms to process telemetry.

Security Information and Event Management systems, commonly known as SIEM platforms, collect logs from multiple systems and apply detection logic.

Logs might be shipped to such a platform using log collectors.

fluent-bit -i tail -p path=/var/log/app.log -o http://siem.example.com/ingest

Once ingested, the SIEM platform normalizes logs and runs detection rules.

An example event inside such a system might look like the following JSON structure.

{
  "event_type": "authentication_failure",
  "username": "alice",
  "source_ip": "198.51.100.44",
  "service": "login-api",
  "timestamp": "2026-05-12T14:18:22Z"
}

Behavioral analytics tools can also detect anomalies. Instead of relying only on fixed rules, they analyze historical patterns and identify deviations.

For example, a system might detect that a user account suddenly accesses resources from a different country.

if user_country != last_known_country:
    generate_alert("Geographic anomaly detected")

These analytics systems convert telemetry into meaningful signals that security teams can investigate.

Designing Effective Security Alerts

Not all alerts are useful. Poorly designed alerts generate noise rather than insight.

A meaningful alert must provide enough information for responders to understand the event.

A minimal alert message might include context such as the affected user, IP address, and event type.

{
  "alert_type": "brute_force_attack",
  "source_ip": "203.0.113.10",
  "failed_attempts": 120,
  "time_window": "5 minutes",
  "timestamp": "2026-05-12T15:01:22Z"
}

Such an alert provides investigators with the context required to understand the threat quickly.

Alerts must also avoid excessive false positives. If every minor anomaly generates a notification, responders quickly learn to ignore them.

Alert Severity Levels

Security alerts typically fall into several severity levels depending on their impact and urgency.

An informational event may indicate activity worth recording but not immediate action.

A suspicious event suggests unusual behavior requiring investigation.

A critical alert indicates a likely security incident that demands immediate response.

An example alert structure might encode severity explicitly.

{
  "severity": "high",
  "alert": "privilege_escalation_detected",
  "user": "admin_17",
  "action": "granted_admin_role",
  "target_user": "user_448",
  "timestamp": "2026-05-12T16:20:11Z"
}

Severity classification allows security systems to prioritize responses.

Mapping Alerts to Attack Scenarios

Effective alerting requires understanding how attacks unfold.

Credential stuffing attacks generate many login failures.

if failed_attempts > 100:
    alert("Credential stuffing attack suspected")

Privilege escalation attempts involve changes to user roles.

if event["event_type"] == "role_assignment" and event["role"] == "admin":
    alert("Administrative privilege granted")

Data exfiltration often involves unusually large data transfers.

			
if download_size > 100000000:
    alert("Large data export detected")

By mapping alerts to real-world attack behaviors, systems detect threats earlier in the attack lifecycle.

Reducing Alert Fatigue

Alert fatigue is one of the most significant challenges in security monitoring.

When systems generate excessive alerts, security teams become overwhelmed. Investigators cannot review every alert, and critical events may be missed.

Consider an example where every failed login triggers an alert.

if event["event_type"] == "authentication_failure":
    alert("Login failure detected")

Such a rule would generate thousands of alerts per day in a busy system.

Effective alerting must filter noise and focus on patterns.

Tuning Detection Rules

Detection rules often rely on thresholds.

For example, a system might trigger an alert only after repeated failures.

if failed_login_count > 10:
    alert("Multiple failed login attempts detected")

These thresholds must be tuned carefully based on real system behavior.

Too low and alerts become noisy. Too high and attacks may go unnoticed.

Behavioral and Anomaly Detection

Static thresholds are not always sufficient. Behavioral detection analyzes historical data to understand what normal activity looks like.

If a user typically downloads a few megabytes of data per day but suddenly exports gigabytes of data, the system may flag this behavior.

if current_download > (average_download * 10):
    alert("Unusual data access pattern detected")

Behavioral analysis allows systems to detect sophisticated attacks that evade simple rules.

Correlation of Multiple Signals

Sophisticated attacks often produce multiple weak signals rather than one obvious indicator.

Correlation combines these signals to detect complex threats.

For example, consider the following sequence:

Multiple failed login attempts
Successful login from new IP address
Administrative privilege change

Individually, each event might seem harmless. Together, they suggest account compromise.

A correlation rule might analyze events across multiple logs.

if failed_logins > 20 and new_ip_login and privilege_change:
    alert("Potential account takeover detected")

Correlation significantly improves detection accuracy.

Real-Time vs Delayed Alerting

Certain threats require instant response.

A brute-force attack against authentication endpoints may require immediate blocking of the source IP.

if failed_login_attempts > 100:
    block_ip(source_ip)

Unauthorized administrative actions may also require real-time intervention.

if event["event_type"] == "admin_role_granted":
    alert("Unauthorized administrative privilege assignment")

Immediate alerts are designed to stop attacks before damage occurs.

Investigative Alerts

Some alerts are better suited for investigation rather than immediate response.

For example, unusual access patterns might require analysis before action is taken.

if login_country != last_known_country:
    alert("User login from new geographic region")

Such alerts inform analysts who then evaluate the context.

Incident Response Integration

Alerting systems often integrate with incident response platforms.

When an alert triggers, it may automatically create a ticket.

def create_incident(alert):
    ticket = {
        "title": alert["alert_type"],
        "severity": alert["severity"],
        "timestamp": alert["timestamp"]
    }
    send_to_incident_system(ticket)

Security orchestration tools may also automate responses.

if alert["severity"] == "critical":
    disable_user_account(alert["user"])

These integrations accelerate response times.

Building an Effective Alerting Workflow

Alert routing ensures that notifications reach the appropriate teams.

Operational alerts may go to infrastructure teams, while security alerts go to security operations.

A routing rule might look like the following configuration.

alerts:
  - type: security
    route: security_team
  - type: infrastructure
    route: ops_team

Routing prevents unnecessary interruptions and ensures that specialists receive relevant alerts.

Escalation Paths

Some alerts require escalation if not addressed quickly.

An alert may first notify an on-call engineer. If unresolved after a defined period, it escalates to a broader response team.

if alert_not_acknowledged(minutes=10):
    escalate_to_security_manager(alert)

Escalation policies ensure that serious incidents receive attention.

Incident Triage and Investigation

Once an alert is received, investigators must determine whether the alert represents a genuine threat.

Triage typically involves examining related logs.

def investigate_alert(alert_id):
    related_events = query_logs(alert_id)
    return analyze(related_events)

Investigators look for supporting evidence such as additional suspicious activity or known attack indicators.

If the alert is confirmed as a security incident, incident response procedures begin.

At this stage, logs become crucial again, allowing investigators to reconstruct the timeline of the attack.

Alerting transforms passive logging systems into active defense mechanisms. When detection rules are carefully engineered, alerts become the early warning system that reveals threats before attackers achieve their objectives.