Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SrsRAN gnb Will freeze if trying to isolate cpus (commit 4ac5300) #776

Open
aibtw opened this issue Aug 19, 2024 · 8 comments
Open

SrsRAN gnb Will freeze if trying to isolate cpus (commit 4ac5300) #776

aibtw opened this issue Aug 19, 2024 · 8 comments

Comments

@aibtw
Copy link

aibtw commented Aug 19, 2024

Issue Description

I am trying to run the gNB with isolated cpus. as soon as the gNB start processing, everything freezes, and I can see two of the isolated cores are 100% loaded. The system then requires hard reboot.

Setup Details

  • Commit 4ac5300
  • i9-13900k CPU (hyper-threading enabled, isolated cores 1-18).
  • 128GB RAM
  • RAN550 RU

Expected Behavior

To run normally, separate the load on multiple cores as configured in the config file.

Actual Behaviour

the system freezes, gNB loads two cores to the max

Steps to reproduce the problem

My setup for isolating the cores is:

  • edit /etc/default/grub
  • Add GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt isolcpus=1-18 nohz_full=1-18 rcu_nocbs=1-18 kthread_cpus=0,19-31 rcu_nocb_poll mitigations=off skew_tick=1 selinux=0 enforcing=0 tsc=reliable nmi_watchdog=0 softlockup_panic=0 audit=0 intel_pstate=disable nosoftlockup hugepagesz=1G hugepages=8 hugepagesz=2M hugepages=0 default_hugepagesz=1G pcie_aspm=off"
  • update grub
  • reboot
  • run gNB normally like ./gnb -c conf_file
    The expert execution section:
expert_execution:
  cell_affinities:
    -
     l1_dl_cpus: 2,3
     l1_ul_cpus: 4,5
     l2_cell_cpus: 6,7 
     ru_cpus: 10,11,12,13  
     l1_dl_pinning: mask                            # Optional TEXT. Sets the policy used for assigning CPU cores to L1 downlink tasks.
     l1_ul_pinning: mask                              # Optional TEXT. Sets the policy used for assigning CPU cores to L1 uplink tasks. 
     l2_cell_pinning: mask                            # Optional TEXT. Sets the policy used for assigning CPU cores to L2 cell tasks.
     ru_pinning: mask                                 # Optional TEXT. Sets the policy used for assigning CPU cores to Radio Unity tasks.
  affinities:
    isolated_cpus: 1-18
    low_priority_cpus: 8,9
  threads: 
    upper_phy: 
     pdsch_processor_type: auto                    # Optional TEXT (auto). Sets the PDSCH processor type. Supported: [auto, generic, concurrent, lite].
     nof_pusch_decoder_threads: 6                          # Optional UINT (1). Sets the number of threads used to encode PUSCH.
     nof_ul_threads: 6                           # Optional UINT (1). Sets the number of upprt PHY threads to proccess uplink.
     nof_dl_threads: 6                             # Optional UINT (1). Sets the number of upprt PHY threads to proccess downlink.  
    lower_phy:
      execution_profile: quad                            # Optional TEXT. Sets the lower physical layer executor profile. Supported: [single, dual, quad].
    ofh: 
      enable_dl_parallelization: 1                  # Optional BOOLEAN. Sets the Open Fronthaul downlink parallelization flag. Supported: [0, 1].

Additional Information

The release version (24.04) doesn't get stuck.

@aibtw aibtw changed the title SrsRAN gnb Will freeze if taskset is used to isolate cpus (commit 4ac5300) SrsRAN gnb Will freeze if trying to isolate cpus (commit 4ac5300) Aug 21, 2024
@aibtw
Copy link
Author

aibtw commented Sep 29, 2024

The issue seems to persist in 51e44a6 as well.
forgot to mention we're using 5.15.0-1069-realtime #77-Ubuntu SMP PREEMPT_RT kernel build for Ubuntu 22.04.4, if that helps.

@pgawlowicz
Copy link
Collaborator

@aibtw any update on this issue?

@aibtw
Copy link
Author

aibtw commented Oct 17, 2024

unfortunately no, I wasn't able to pinpoint the issue.

It works fine for commit 40b17b4 but doesn't work on any commit after that one.

Moreover, on another machine (with exact same specs, and I am pretty sure same configs) it requires adding "sudo taskset -c 0-31" in order to work on commit 40b17b4, and I don't think it is correct to mix taskset with cpuset/cgroups, but that's the only way it could work for me, or I am missing some configurations.

@pgawlowicz
Copy link
Collaborator

@ninjab3s could you comment here?

@ninjab3s
Copy link
Contributor

ninjab3s commented Nov 1, 2024

I think your grub args are faulty. Can you provide the following information when you get a chance?

  • output of lscpu -p
  • What are the cores for housekeeping and isolation?
  • How do you use cpuset to start the gnb?
  • Did you configure any cgroups (created your own or systemd slices)?

@aibtw
Copy link
Author

aibtw commented Nov 6, 2024

Here is the output of lscpu -p:

~$ lscpu -p
# The following is the parsable format, which can be fed to other
# programs. Each different item in every column has an unique ID
# starting usually from zero.
# CPU,Core,Socket,Node,,L1d,L1i,L2,L3
0,0,0,0,,0,0,0,0
1,0,0,0,,0,0,0,0
2,1,0,0,,4,4,1,0
3,1,0,0,,4,4,1,0
4,2,0,0,,8,8,2,0
5,2,0,0,,8,8,2,0
6,3,0,0,,12,12,3,0
7,3,0,0,,12,12,3,0
8,4,0,0,,16,16,4,0
9,4,0,0,,16,16,4,0
10,5,0,0,,20,20,5,0
11,5,0,0,,20,20,5,0
12,6,0,0,,24,24,6,0
13,6,0,0,,24,24,6,0
14,7,0,0,,28,28,7,0
15,7,0,0,,28,28,7,0
16,8,0,0,,32,32,8,0
17,9,0,0,,33,33,8,0
18,10,0,0,,34,34,8,0
19,11,0,0,,35,35,8,0
20,12,0,0,,36,36,9,0
21,13,0,0,,37,37,9,0
22,14,0,0,,38,38,9,0
23,15,0,0,,39,39,9,0
24,16,0,0,,40,40,10,0
25,17,0,0,,41,41,10,0
26,18,0,0,,42,42,10,0
27,19,0,0,,43,43,10,0
28,20,0,0,,44,44,11,0
29,21,0,0,,45,45,11,0
30,22,0,0,,46,46,11,0
31,23,0,0,,47,47,11,0

I have hyperthreading enabled if that matters.
Regarding the rest of questions:

  • in above shared configs, I use cores 1 to 18 for isolation, the rest for housekeeping
  • "How do you use cpuset to start the gnb?", Sorry but I don't get the question. I just use the expert_execution in srsran configs. I supposed that this will handle the cgroups and cpuset based on the code I have seen in ./srsRAN_Project/lib/support/sysinfo.cpp. but I noticed on one system that I must run sudo sh -c 'echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control or otherwise it won't work. I don't have this issue on the current system I am using (described in the main issue)
  • nope, didn't configure my own cgroups.

@ninjab3s
Copy link
Contributor

ninjab3s commented Nov 7, 2024

I am not an expert on using isolcpus. We dont use it so I cannot verify your setup over here. I general, I would avoid using the isolated_cpus option in the gnb config. It creates two cgroups and moves all OS processes to one of them. Its less prone to errors if you do this yourself. In the following some ideas that I would address looking at your config:

  1. Make sure that CPU 1 is not included in the isolated cores. It shares a CPU core with core 0 and you want to avoid having them in two different CPU groups.
  2. Try running the gnb like this: sudo chrt -r 1 gnb -c config.yaml. You have to have a cgroup in place in order to deploy on all cores or you use taskset instead: sudo taskset -c <isloated cores> chrt -r 1 gnb -c config.yaml.

In general I would recommend creating a dedicated cgroup for your application or using systemd slices, but avoid using isolated_cpus config parameter.

@aibtw
Copy link
Author

aibtw commented Nov 13, 2024

Thanks. I assumed that since the configuration isolated_cpus was available in gNB config, then it would be the most optimal way of doing it, which is why I opened an issue. If I understand you correctly, you suggest creating and managing the cgroups manually. I will try those suggestions when I could and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants