pika莫名其妙的OOM #2947

banlilin · 2024-11-11T02:37:52Z

版本：3.3.6
服务器配置： 8C16G1TSSD
监控中内存使用并不是很高，实例数据也才100G左右。

客户端也100个左右

服务器内存使用率接近100%，把pika oom kill了


监控看内存使用率并不高，但是进程的内存一直在增加，最后导致了OOM。
没找到什么头绪

Issues-translate-bot · 2024-11-11T02:38:04Z

Bot detected the issue body's language is not English, translate it automatically.

Title: pika’s inexplicable OOM

Version: 3.3.6
Server configuration: 8C16G1TSSD
The memory usage during monitoring is not very high, and the instance data is only about 100G.

There are about 100 clients

The server memory usage is close to 100%, so pika oom is killed.

Monitoring shows that the memory usage is not high, but the memory of the process keeps increasing, which eventually leads to OOM.
I can’t find any clue

Mixficsol · 2024-11-15T12:15:08Z

确认是否使用tcmalloc，需要做定时清理工作, 确认一下是否有 Table Cache 这个配置项

Issues-translate-bot · 2024-11-15T12:15:20Z

Bot detected the issue body's language is not English, translate it automatically.

Confirm whether to use tcmalloc, you need to do regular cleanup work, and confirm whether there is a Table Cache configuration item

chenbt-hz · 2024-11-15T12:27:32Z

确认是否使用tcmalloc，需要做定时清理工作, 确认一下是否有 Table Cache 这个配置项

可以看下是不是同一个问题：#2537 (comment)

banlilin · 2024-11-18T02:06:55Z

OOM的时候 tablereader内存使用并不高，才8G左右，而且当时QPS还不到100，连接数也100左右。从节点比主节点早20分钟OOM，从节点没有连接使用。
max-cache-files这个使用的是默认配置的5000，系统所有的sst文件加起来在4000个左右
配置文件中大部分都是使用的默认配置：

我更想搞明白：除了tablereader这8个多G 内存外，还会有什么占用内存会导致进程内存到15G多而OOM

确认是否使用tcmalloc，需要做定时清理工作, 确认一下是否有 Table Cache 这个配置项

可以看下是不是同一个问题：#2537 (comment)

cheniujh · 2024-11-22T04:01:46Z

你好！
辛苦做一下操作来进行排除：

考虑是否是table cache导致的oom，我们这边原来有过oom就是table cache导致：
1. 看一下业务的key是不是大key（key size本身是不是很大，不是说value），这些key之间是否有很长的公共前缀，如果有, index block一般会比较大，table cache(table reader总开销也会比较大）
2. 尝试调整：开启cache-index-and-filter-blocks为yes，将table cache塞入block cache，这样table cache的总开销就会共享block cache的额度，更加可控，当然你这里的block cache需要调大
336版本默认的memtable 总大小上限是10G，请尝试调小。

简述：主从都oom的话，怀疑是不是后台的compaction不断打开了更多文件（往table cache里面塞入了更多index和bloom filter），建议先尝试将table cache 设置硬上限（按照上述的2，将其塞入block cache即可）。
实际上对于你16个G的内存，8G的table cache开销已经相当不小了，而且默认情况下table cache没有硬上限（只有max open file limit），比较容易成为oom的原因。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pika莫名其妙的OOM #2947

pika莫名其妙的OOM #2947

banlilin commented Nov 11, 2024

Issues-translate-bot commented Nov 11, 2024

Mixficsol commented Nov 15, 2024

Issues-translate-bot commented Nov 15, 2024

chenbt-hz commented Nov 15, 2024

banlilin commented Nov 18, 2024 •

edited

Loading

cheniujh commented Nov 22, 2024 •

edited

Loading

pika莫名其妙的OOM #2947

pika莫名其妙的OOM #2947

Comments

banlilin commented Nov 11, 2024

Issues-translate-bot commented Nov 11, 2024

Mixficsol commented Nov 15, 2024

Issues-translate-bot commented Nov 15, 2024

chenbt-hz commented Nov 15, 2024

banlilin commented Nov 18, 2024 • edited Loading

cheniujh commented Nov 22, 2024 • edited Loading

banlilin commented Nov 18, 2024 •

edited

Loading

cheniujh commented Nov 22, 2024 •

edited

Loading