SOC/CSIRT management

This page deals with SOC / CSIRT management.

Must read

FIRST, Building a SOC
NCSC, Building a SOC
FIRST, CERT-in-a-box
FIRST, CSIRT Services Framework
ENISA, Good practice for incident management
CIS, 8 critical security controls
CMM, SOCTOM
Linkedin Pulse, Evolution Security Operations Center
Gartner, Cybersecurity business value benchmark

Challenges

Generic ones

As per the aforementioned article, here are some typical challenges for a SOC/CSIRT:

After pandemic

As per the aforementioned article, I recommend to keep in mind the following common challenges:

SOC organization

Tiering or not tiering?

No real need for tiering (L1/L2/L3)
- this is an old model for service provider, not necesseraly for a SOC!
- as per MITRE paper (p65):
In this book, the constructs of “tier 1” and “tier 2+” are sometimes used to describe analysts who are primarily responsible for front-line alert triage and in-depth investigation/analysis/ response, respectively. However, not all SOCs are arranged in this manner. In fact, some readers of this book are probably very turned off by the idea of tiering at all [38]. Some industry experts have outright called tier 1 as “dead” [39]. Once again, every SOC is different, and practitioners can sometimes be divided on the best way to structure operations. SOCs which do not organize in tiers may opt for an organizational structure more based on function. Many SOCs that have more than a dozen analysts find it necessary and appropriate to tier analysis in response to these goals and operational demands. Others do not and yet still succeed, both in terms of tradecraft maturity and repeatability in operations. Either arrangement can succeed if by observing the following tips that foreshadow a longer conversation about finding and nurturing staff in “Strategy 4: Hire AND Grow Quality Staff.”

Highly effective SOCs enable their staff to reach outside their assigned duties on a routine basis, regardless of whether they use “tier” to describe their structure.

SOC teams

Instead of tiering, 3 different teams should be needed, based on experience:
- security monitoring team (which does actually the "job" of detecting security incident being fully autonomous)
- security monitoring engineering team (which fixes/improves security monitoring like SIEM rules and SOA playbooks, generates reportings, helps with uncommon use cases handling)
- build / project management team (which does tools integration, SIEM data ingestion, specific DevOps tasks, project management).

RACI

Define a RACI, above all if you contract with an MSSP.
- You may want to consider my own template

CSIRT organization

Designate among team analysts:
- triage officer;
- incident handler;
- incident manager;
- deputy CERT manager.
Generally speaking, follow best practices as described in ENISA's ("Good practice for incident management", see "Must read")

TTP (attack methods) knowledge base reference

Use MITRE ATT&CK
Document all detections (SIEM Rules, etc.) using MITRE ATT&CK ID, whenever possible.

Data quality and management

Implement an information model, like the Splunk CIM one:
- do not hesitate to extend it, depending on your needs
- make sure this datamodel is being implemented in the SIEM, SIRP, SOA and even TIP.

Key documents for a SOC

Document an audit policy, that is tailored of the detection needs/expectations of the SOC:
- The document aims to answer a generic question: what to audit/log, on which equipments/OSes/services/apps?
- Take the Yamato Security work as an exemple regarding an audit policy required for the Sigma community rules.
- Don't forget to read the Microsoft Windows 10 and Windows Server 2016 security auditing and monitoring reference.
Document a detection strategy, tailored to the needs and expectations regarding the SOC capabilities.
- The document will aim to list the detection rules (SIEM searches, for instance), with key examples of results, and an overview of handling procedures.

Detection quality assessment

Run regular purpleteaming sessions in time!!
- e.g.: Intrinsec, FireEye
- To do it on your own, recommended tool: Atomic Red Team
Picture the currently confirmed detection capabilities thanks to purpleteaming, with tools based on ATT&CK:
- e.g.: Vectr

Detection capabilities representation

Standard for security technologies

Use Security Stack Mappings to picture detection capabilities for a given security solution/environment (like AWS, Azure, NDR, etc.):

SOC detection capabilities simplified view

Generate ATT&CK heatmaps, to picture the SOC detection capabilities

Global self-assessment

SOC Self-assessment

Read the SOC Cyber maturity model from CMM
Run the SOC-CMM self-assessment tool

CERT/CSIRT self-assessment

Read the OpenCSIRT cybersecurity maturity framework from ENISA
- Run the OpenCSIRT, SIM3 self-assessment
Read the SOC-CMM 4CERT from CMM
- Run the SOC-CMM 4CERT self-assessment tool

Reporting

Generate metrics, leveraging the SIRP traceability and logging capabilities to get relevant data, as well as a bit of scripting.

As per Gartner, MTTR:

And MTTC:

Below are my recommendations for KPI and SLA. Unless specified, here are the recommended timeframes to compute those below KPI: 1 week, 1 month, and 6 months.

SOC/CSIRT KPI:

Number of alerts (SIEM).
Number of verified alerts (meaning, confirmed security incidents).
Top security incident types.
Top applications associated to alerts (detections).
Top detection rules triggering most false positives.
Top detection rules which corresponding alerts take the longest to be handled.
Top 10 SIEM searches (ie: detection rules) triggering false positives.
Most seen TTP in detection.
Most common incident types.
Top 10 longest tickets before closure.
Percentage of SIEM data that is not associated to SIEM searches (ie: detection rules).

Compliance KPI:

Percentage of known endpoints with company-required security solutions.
Percentage of critical and high-risk applications that are protected by multifactor authentication.
Ratio of always-on personal privileged accounts to the number of individuals in roles who should have access to these accounts.
Percentage of employees and contractors that have completed mandatory security training.
Percentage of employees who report suspicious emails for the standard organization-wide phishing campaigns.
Percentage of click-throughs for the organization-wide phishing campaigns in the past 12 months.

SOC/CSIRT SLA:

Number of false positives.
Number of new detection use-cases (SIEM rules) being put in production.
Number of new detection automation use-cases (enrichment, etc.) being put in production.
Number of new response automation use-cases (containment, eradication) being put in production.
Number of detection rules which detection capability and handling process have been confirmed with purpleteaming session, so far.
MTTT: for critical incidents, mean time in H to triage (assign) the alerts.
MTTT: for medium incidents, mean time in H to triage (assign) the alerts.
MTTC: for critical and medium security incidents, mean time in H to handle the alerts and start mitigation steps (from triage to initial response).
MTTR: for critical and medium security incidents, mean time in H to handle the alerts and remediate them (from triage to remediation).

Compliance SLA:

Percentage of critical assets that have successfully run ransomware recovery assessment, in the past 12 months.
Average number of hours from the request for termination of access to sensitive or high-risk systems or information, to deprovisioning of all access.

End

Go to main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

management.md

management.md

SOC/CSIRT management

ToC

Must read

Challenges

Generic ones

After pandemic

SOC organization

Tiering or not tiering?

SOC teams

RACI

CSIRT organization

TTP (attack methods) knowledge base reference

Data quality and management

Key documents for a SOC

Detection quality assessment

Detection capabilities representation

Standard for security technologies

SOC detection capabilities simplified view

Global self-assessment

SOC Self-assessment

CERT/CSIRT self-assessment

Reporting

SOC/CSIRT KPI:

Compliance KPI:

SOC/CSIRT SLA:

Compliance SLA:

End

Files

management.md

Latest commit

History

management.md

File metadata and controls

SOC/CSIRT management

ToC

Must read

Challenges

Generic ones

After pandemic

SOC organization

Tiering or not tiering?

SOC teams

RACI

CSIRT organization

TTP (attack methods) knowledge base reference

Data quality and management

Key documents for a SOC

Detection quality assessment

Detection capabilities representation

Standard for security technologies

SOC detection capabilities simplified view

Global self-assessment

SOC Self-assessment

CERT/CSIRT self-assessment

Reporting

SOC/CSIRT KPI:

Compliance KPI:

SOC/CSIRT SLA:

Compliance SLA:

End