Skip to content

Latest commit

 

History

History
187 lines (140 loc) · 11.6 KB

management.md

File metadata and controls

187 lines (140 loc) · 11.6 KB

SOC/CSIRT management

This page deals with SOC / CSIRT management.

ToC

Must read

Challenges

Generic ones

As per the aforementioned article, here are some typical challenges for a SOC/CSIRT:

image

After pandemic

As per the aforementioned article, I recommend to keep in mind the following common challenges:

image

SOC organization

Tiering or not tiering?

  • No real need for tiering (L1/L2/L3)

    • this is an old model for service provider, not necesseraly for a SOC!
    • as per MITRE paper (p65):

    In this book, the constructs of “tier 1” and “tier 2+” are sometimes used to describe analysts who are primarily responsible for front-line alert triage and in-depth investigation/analysis/ response, respectively. However, not all SOCs are arranged in this manner. In fact, some readers of this book are probably very turned off by the idea of tiering at all [38]. Some industry experts have outright called tier 1 as “dead” [39]. Once again, every SOC is different, and practitioners can sometimes be divided on the best way to structure operations. SOCs which do not organize in tiers may opt for an organizational structure more based on function. Many SOCs that have more than a dozen analysts find it necessary and appropriate to tier analysis in response to these goals and operational demands. Others do not and yet still succeed, both in terms of tradecraft maturity and repeatability in operations. Either arrangement can succeed if by observing the following tips that foreshadow a longer conversation about finding and nurturing staff in “Strategy 4: Hire AND Grow Quality Staff.”

    Highly effective SOCs enable their staff to reach outside their assigned duties on a routine basis, regardless of whether they use “tier” to describe their structure.

SOC teams

  • Instead of tiering, 3 different teams should be needed, based on experience:
    • security monitoring team (which does actually the "job" of detecting security incident being fully autonomous)
    • security monitoring engineering team (which fixes/improves security monitoring like SIEM rules and SOA playbooks, generates reportings, helps with uncommon use cases handling)
    • build / project management team (which does tools integration, SIEM data ingestion, specific DevOps tasks, project management).

RACI

  • Define a RACI, above all if you contract with an MSSP.

CSIRT organization

  • Designate among team analysts:
    • triage officer;
    • incident handler;
    • incident manager;
    • deputy CERT manager.
  • Generally speaking, follow best practices as described in ENISA's ("Good practice for incident management", see "Must read")

TTP (attack methods) knowledge base reference

  • Use MITRE ATT&CK
  • Document all detections (SIEM Rules, etc.) using MITRE ATT&CK ID, whenever possible.

Data quality and management

  • Implement an information model, like the Splunk CIM one:
    • do not hesitate to extend it, depending on your needs
    • make sure this datamodel is being implemented in the SIEM, SIRP, SOA and even TIP.

Key documents for a SOC

  • Document an audit policy, that is tailored of the detection needs/expectations of the SOC:
  • Document a detection strategy, tailored to the needs and expectations regarding the SOC capabilities.
    • The document will aim to list the detection rules (SIEM searches, for instance), with key examples of results, and an overview of handling procedures.

Detection quality assessment

Detection capabilities representation

Standard for security technologies

  • Use Security Stack Mappings to picture detection capabilities for a given security solution/environment (like AWS, Azure, NDR, etc.):

SOC detection capabilities simplified view

Global self-assessment

SOC Self-assessment

CERT/CSIRT self-assessment

Reporting

Generate metrics, leveraging the SIRP traceability and logging capabilities to get relevant data, as well as a bit of scripting.

As per Gartner, MTTR:

image

And MTTC:

image

Below are my recommendations for KPI and SLA. Unless specified, here are the recommended timeframes to compute those below KPI: 1 week, 1 month, and 6 months.

SOC/CSIRT KPI:

  • Number of alerts (SIEM).
  • Number of verified alerts (meaning, confirmed security incidents).
  • Top security incident types.
  • Top applications associated to alerts (detections).
  • Top detection rules triggering most false positives.
  • Top detection rules which corresponding alerts take the longest to be handled.
  • Top 10 SIEM searches (ie: detection rules) triggering false positives.
  • Most seen TTP in detection.
  • Most common incident types.
  • Top 10 longest tickets before closure.
  • Percentage of SIEM data that is not associated to SIEM searches (ie: detection rules).

Compliance KPI:

  • Percentage of known endpoints with company-required security solutions.
  • Percentage of critical and high-risk applications that are protected by multifactor authentication.
  • Ratio of always-on personal privileged accounts to the number of individuals in roles who should have access to these accounts.
  • Percentage of employees and contractors that have completed mandatory security training.
  • Percentage of employees who report suspicious emails for the standard organization-wide phishing campaigns.
  • Percentage of click-throughs for the organization-wide phishing campaigns in the past 12 months.

SOC/CSIRT SLA:

  • Number of false positives.
  • Number of new detection use-cases (SIEM rules) being put in production.
  • Number of new detection automation use-cases (enrichment, etc.) being put in production.
  • Number of new response automation use-cases (containment, eradication) being put in production.
  • Number of detection rules which detection capability and handling process have been confirmed with purpleteaming session, so far.
  • MTTT: for critical incidents, mean time in H to triage (assign) the alerts.
  • MTTT: for medium incidents, mean time in H to triage (assign) the alerts.
  • MTTC: for critical and medium security incidents, mean time in H to handle the alerts and start mitigation steps (from triage to initial response).
  • MTTR: for critical and medium security incidents, mean time in H to handle the alerts and remediate them (from triage to remediation).

Compliance SLA:

  • Percentage of critical assets that have successfully run ransomware recovery assessment, in the past 12 months.
  • Average number of hours from the request for termination of access to sensitive or high-risk systems or information, to deprovisioning of all access.

End

Go to main page