Security Ticketing and Incident Response

The worst time to figure out your incident response process is during an incident. I learned this the hard way when a credential leak hit production at 2 AM and we spent the first 45 minutes arguing about who should do what. That 45 minutes could have been containment time.

This article covers building an incident response process that works when everything is on fire.

Why Incident Response Matters

Security incidents are inevitable. The question isn’t if, but when. What separates good teams from bad ones is how fast they detect, contain, and recover.

Key metrics:

MTTD (Mean Time to Detect) — industry average: 197 days
MTTC (Mean Time to Contain) — industry average: 69 days
MTTR (Mean Time to Remediate) — your target: hours, not days

Incident Response Lifecycle

Incident Classification (P1-P4)

Every incident needs a severity level that drives the response urgency.

Incident Severity Matrix

Severity	Definition	Response Time	Examples
P1 — Critical	Active breach, data exfiltration, system compromise	15 min	Credential leak in production, ransomware, active attacker
P2 — High	Exploitable vulnerability, unauthorized access attempt	1 hour	Open security group on production DB, suspicious IAM activity
P3 — Medium	Potential vulnerability, policy violation	4 hours	Unpatched critical CVE, MFA not enabled on admin account
P4 — Low	Minor policy deviation, informational	24 hours	Expired SSL cert (non-prod), minor config drift

# incident_classification.yml
severity_matrix:
  P1_critical:
    impact: "Data breach, system compromise, active attacker"
    response_time: "15 minutes"
    war_room: true
    executive_notification: true
    responders: ["security-lead", "on-call-engineer", "engineering-manager"]

  P2_high:
    impact: "Exploitable vulnerability, unauthorized access"
    response_time: "1 hour"
    war_room: false
    executive_notification: false
    responders: ["security-team", "on-call-engineer"]

  P3_medium:
    impact: "Potential vulnerability, policy violation"
    response_time: "4 hours"
    war_room: false
    executive_notification: false
    responders: ["security-team"]

  P4_low:
    impact: "Minor deviation, informational"
    response_time: "24 hours"
    war_room: false
    executive_notification: false
    responders: ["security-team"]

Building Runbooks

A runbook is a step-by-step guide for responding to a specific type of incident. It removes decision-making from crisis moments.

# runbooks/credential_leak.yml
name: "Credential Leak Response"
severity: P1
trigger: "API key, access key, or password found in public repo/logs"

steps:
  - name: "Immediate (0-5 min)"
    actions:
      - "Revoke the leaked credential immediately"
      - "Check CloudTrail for usage of the credential"
      - "Open war room Slack channel: #incident-YYYY-MM-DD"
      - "Page security lead and on-call engineer"

  - name: "Contain (5-30 min)"
    actions:
      - "Identify all services using the credential"
      - "Rotate the credential on all affected services"
      - "Check for lateral movement (unusual API calls, new resources)"
      - "Block source IP if identified"

  - name: "Investigate (30-120 min)"
    actions:
      - "Run Athena query: all API calls with leaked credential"
      - "Check for data access (S3 GetObject, DynamoDB scans)"
      - "Check for persistence (new IAM users, access keys, roles)"
      - "Document timeline in incident ticket"

  - name: "Recover"
    actions:
      - "Verify all credential rotations are complete"
      - "Remove any resources created by attacker"
      - "Enable additional monitoring for 72 hours"
      - "Schedule post-incident review within 48 hours"

  - name: "Prevention"
    actions:
      - "Add Gitleaks pre-commit hook to affected repo"
      - "Enable GitHub secret scanning"
      - "Review and tighten IAM permissions"

Ticketing Workflow

Every security incident should create a ticket automatically. Here’s a PagerDuty → Jira integration:

# webhook/incident_to_ticket.py
"""PagerDuty webhook → Jira ticket creation"""
import json
import requests
from datetime import datetime

JIRA_URL = "https://company.atlassian.net"
JIRA_TOKEN = "..."  # From Secrets Manager

def create_security_ticket(incident):
    severity = incident['severity']
    title = incident['title']
    description = incident.get('description', '')

    priority_map = {
        'P1': '1',  # Highest
        'P2': '2',  # High
        'P3': '3',  # Medium
        'P4': '4',  # Low
    }

    ticket = {
        "fields": {
            "project": {"key": "SEC"},
            "summary": f"[{severity}] {title}",
            "description": {
                "type": "doc",
                "version": 1,
                "content": [{
                    "type": "paragraph",
                    "content": [{"type": "text", "text": description}]
                }]
            },
            "issuetype": {"name": "Security Incident"},
            "priority": {"id": priority_map.get(severity, '3')},
            "labels": ["security-incident", severity.lower()],
            "customfield_10100": datetime.utcnow().isoformat(),  # Detection time
        }
    }

    response = requests.post(
        f"{JIRA_URL}/rest/api/3/issue",
        json=ticket,
        headers={
            "Authorization": f"Basic {JIRA_TOKEN}",
            "Content-Type": "application/json"
        }
    )
    return response.json()['key']

War Room Protocol

For P1 incidents, you need a structured war room:

Roles:

Incident Commander — owns the timeline, makes decisions, keeps things moving
Technical Lead — hands on keyboard, investigating and remediating
Communications Lead — updates stakeholders, manages external comms if needed
Scribe — documents everything in the incident timeline

Rules of the war room:

Start a shared document for the timeline — every action gets timestamped
Update stakeholders every 30 minutes (even if the update is “still investigating”)
Don’t fix and investigate simultaneously — contain first, then investigate
Record every command you run — you’ll need this for the post-mortem

Communication Templates

Pre-written templates save precious minutes during incidents.

## Internal Update (every 30 min)

**Incident:** [Brief description]
**Severity:** P[X]
**Status:** [Investigating / Containing / Remediated / Resolved]
**Impact:** [What's affected, who's affected]
**Current Actions:** [What we're doing right now]
**Next Update:** [Time of next update]
**War Room:** #incident-YYYY-MM-DD

---

## Executive Summary (for P1/P2)

**What happened:** [1-2 sentences]
**Customer impact:** [Yes/No, scope]
**Current status:** [Contained/Investigating/Resolved]
**Root cause:** [If known, or "Under investigation"]
**ETA to resolution:** [Best estimate or "TBD"]

Post-Incident Review

Every P1 and P2 gets a post-incident review within 48 hours. The key: blameless.

# post_incident_template.yml
incident_id: "SEC-2026-042"
date: "2026-04-04"
severity: "P1"
duration: "2 hours 15 minutes"

timeline:
  - time: "02:15 UTC"
    event: "GuardDuty alert: unusual API calls from IAM user 'deploy-bot'"
  - time: "02:20 UTC"
    event: "On-call paged, acknowledged"
  - time: "02:25 UTC"
    event: "War room opened, investigation started"
  - time: "02:35 UTC"
    event: "Identified: access key leaked in public GitHub repo"
  - time: "02:37 UTC"
    event: "Access key deactivated"
  - time: "02:50 UTC"
    event: "CloudTrail audit shows 23 S3 GetObject calls to customer data"
  - time: "03:30 UTC"
    event: "All affected credentials rotated"
  - time: "04:30 UTC"
    event: "Incident resolved, monitoring elevated"

root_cause: "Access key was hardcoded in a config file committed to a public repository"

what_went_well:
  - "GuardDuty detected unusual activity within 10 minutes"
  - "On-call responded in 5 minutes"
  - "Credential was revoked within 20 minutes of detection"

what_went_wrong:
  - "No pre-commit hook to catch secrets"
  - "Access key had broader permissions than needed"
  - "Took 45 minutes to identify all affected services"

action_items:
  - owner: "security-team"
    action: "Deploy Gitleaks pre-commit hooks to all repos"
    due: "2026-04-11"
  - owner: "platform-team"
    action: "Reduce deploy-bot permissions to least privilege"
    due: "2026-04-11"
  - owner: "security-team"
    action: "Add automated credential rotation for all service accounts"
    due: "2026-04-25"

Key Takeaways

Classify by severity — P1-P4 drives response time and escalation
Write runbooks before incidents — remove decision-making from crisis moments
Automate ticket creation — alerts should create tickets without human intervention
War room protocol for P1s — Incident Commander, Tech Lead, Comms Lead, Scribe
Blameless post-mortems — focus on systems and processes, not individuals
Practice your response — tabletop exercises quarterly, game days annually