This page deals with SOC / CSIRT management.
- Must read
- Challenges after pandemic
- SOC organization
- CSIRT organization
- TTP knowledge base reference
- Data quality and management
- Key documents for a SOC
- Detection assessment
- Global self assessment
- Reporting
- FIRST, Building a SOC
- NCSC, Building a SOC
- FIRST, CERT-in-a-box
- FIRST, CSIRT Services Framework
- ENISA, Good practice for incident management
- CIS, 8 critical security controls
- CMM, SOCTOM
- Linkedin Pulse, Evolution Security Operations Center
- Gartner, Cybersecurity business value benchmark
As per the aforementioned article, here are some typical challenges for a SOC/CSIRT:
As per the aforementioned article, I recommend to keep in mind the following common challenges:
-
No real need for tiering (L1/L2/L3)
- this is an old model for service provider, not necesseraly for a SOC!
- as per MITRE paper (p65):
In this book, the constructs of “tier 1” and “tier 2+” are sometimes used to describe analysts who are primarily responsible for front-line alert triage and in-depth investigation/analysis/ response, respectively. However, not all SOCs are arranged in this manner. In fact, some readers of this book are probably very turned off by the idea of tiering at all [38]. Some industry experts have outright called tier 1 as “dead” [39]. Once again, every SOC is different, and practitioners can sometimes be divided on the best way to structure operations. SOCs which do not organize in tiers may opt for an organizational structure more based on function. Many SOCs that have more than a dozen analysts find it necessary and appropriate to tier analysis in response to these goals and operational demands. Others do not and yet still succeed, both in terms of tradecraft maturity and repeatability in operations. Either arrangement can succeed if by observing the following tips that foreshadow a longer conversation about finding and nurturing staff in “Strategy 4: Hire AND Grow Quality Staff.”
Highly effective SOCs enable their staff to reach outside their assigned duties on a routine basis, regardless of whether they use “tier” to describe their structure.
- Instead of tiering, 3 different teams should be needed, based on experience:
- security monitoring team (which does actually the "job" of detecting security incident being fully autonomous)
- security monitoring engineering team (which fixes/improves security monitoring like SIEM rules and SOA playbooks, generates reportings, helps with uncommon use cases handling)
- build / project management team (which does tools integration, SIEM data ingestion, specific DevOps tasks, project management).
- Define a RACI, above all if you contract with an MSSP.
- You may want to consider my own template
- Designate among team analysts:
- triage officer;
- incident handler;
- incident manager;
- deputy CERT manager.
- Generally speaking, follow best practices as described in ENISA's ("Good practice for incident management", see "Must read")
- Use MITRE ATT&CK
- Document all detections (SIEM Rules, etc.) using MITRE ATT&CK ID, whenever possible.
- Implement an information model, like the Splunk CIM one:
- do not hesitate to extend it, depending on your needs
- make sure this datamodel is being implemented in the SIEM, SIRP, SOA and even TIP.
- Document an audit policy, that is tailored of the detection needs/expectations of the SOC:
- The document aims to answer a generic question: what to audit/log, on which equipments/OSes/services/apps?
- Take the Yamato Security work as an exemple regarding an audit policy required for the Sigma community rules.
- Don't forget to read the Microsoft Windows 10 and Windows Server 2016 security auditing and monitoring reference.
- Document a detection strategy, tailored to the needs and expectations regarding the SOC capabilities.
- The document will aim to list the detection rules (SIEM searches, for instance), with key examples of results, and an overview of handling procedures.
- Run regular purpleteaming sessions in time!!
- e.g.: Intrinsec, FireEye
- To do it on your own, recommended tool: Atomic Red Team
- Picture the currently confirmed detection capabilities thanks to purpleteaming, with tools based on ATT&CK:
- e.g.: Vectr
- Use Security Stack Mappings to picture detection capabilities for a given security solution/environment (like AWS, Azure, NDR, etc.):
- Generate ATT&CK heatmaps, to picture the SOC detection capabilities
- Read the SOC Cyber maturity model from CMM
- Run the SOC-CMM self-assessment tool
- Read the OpenCSIRT cybersecurity maturity framework from ENISA
- Run the OpenCSIRT, SIM3 self-assessment
- Read the SOC-CMM 4CERT from CMM
Generate metrics, leveraging the SIRP traceability and logging capabilities to get relevant data, as well as a bit of scripting.
As per Gartner, MTTR:
And MTTC:
Below are my recommendations for KPI and SLA. Unless specified, here are the recommended timeframes to compute those below KPI: 1 week, 1 month, and 6 months.
- Number of alerts (SIEM).
- Number of verified alerts (meaning, confirmed security incidents).
- Top security incident types.
- Top applications associated to alerts (detections).
- Top detection rules triggering most false positives.
- Top detection rules which corresponding alerts take the longest to be handled.
- Top 10 SIEM searches (ie: detection rules) triggering false positives.
- Most seen TTP in detection.
- Most common incident types.
- Top 10 longest tickets before closure.
- Percentage of SIEM data that is not associated to SIEM searches (ie: detection rules).
- Percentage of known endpoints with company-required security solutions.
- Percentage of critical and high-risk applications that are protected by multifactor authentication.
- Ratio of always-on personal privileged accounts to the number of individuals in roles who should have access to these accounts.
- Percentage of employees and contractors that have completed mandatory security training.
- Percentage of employees who report suspicious emails for the standard organization-wide phishing campaigns.
- Percentage of click-throughs for the organization-wide phishing campaigns in the past 12 months.
- Number of false positives.
- Number of new detection use-cases (SIEM rules) being put in production.
- Number of new detection automation use-cases (enrichment, etc.) being put in production.
- Number of new response automation use-cases (containment, eradication) being put in production.
- Number of detection rules which detection capability and handling process have been confirmed with purpleteaming session, so far.
- MTTT: for critical incidents, mean time in H to triage (assign) the alerts.
- MTTT: for medium incidents, mean time in H to triage (assign) the alerts.
- MTTC: for critical and medium security incidents, mean time in H to handle the alerts and start mitigation steps (from triage to initial response).
- MTTR: for critical and medium security incidents, mean time in H to handle the alerts and remediate them (from triage to remediation).
- Percentage of critical assets that have successfully run ransomware recovery assessment, in the past 12 months.
- Average number of hours from the request for termination of access to sensitive or high-risk systems or information, to deprovisioning of all access.
Go to main page