首页   注册   登录
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐工具
RoboMongo
推荐书目
50 Tips and Tricks for MongoDB Developers
Related Blogs
Snail in a Turtleneck
V2EX  ›  MongoDB

mongodb 频繁异常退出 errno:24 Too many open files 求助

  •  
  •   comwrg · 129 天前 · 6972 次点击
    这是一个创建于 129 天前的主题,其中的信息可能已经有所发展或是发生改变。

    部分日志

    2019-08-01T23:59:02.301+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:02.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:03.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:03.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:04.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:04.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:05.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:05.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:06.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:06.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:07.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:07.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:08.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:08.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:09.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:09.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:10.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:10.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:11.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:11.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:12.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:12.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:13.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:13.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:14.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:14.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:15.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:15.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:16.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:16.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:17.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:17.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:18.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:18.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:19.295+0800 W NETWORK  [HostnameCanonicalizationWorker] Failed to obtain address information for hostname iZuf61zao4uxbprumx45dlZ: System error
    2019-08-01T23:59:19.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:19.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:20.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:20.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:21.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:21.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:22.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:22.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:23.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
    2019-08-01T23:59:23.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
    2019-08-01T23:59:23.631+0800 E STORAGE  [thread2] WiredTiger (24) [1564675163:631372][9783:0x7f4e30730700], file:WiredTiger.wt, WT_SESSION.checkpoint: /var/lib/mongodb/WiredTiger.turtle: handle-open: open: Too many open files
    2019-08-01T23:59:23.632+0800 E STORAGE  [thread2] WiredTiger (24) [1564675163:632761][9783:0x7f4e30730700], checkpoint-server: checkpoint server error: Too many open files
    2019-08-01T23:59:23.632+0800 E STORAGE  [thread2] WiredTiger (-31804) [1564675163:632802][9783:0x7f4e30730700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
    2019-08-01T23:59:23.632+0800 I -        [thread2] Fatal Assertion 28558
    2019-08-01T23:59:23.632+0800 I -        [thread2] 
    
    ***aborting after fassert() failure
    
    
    2019-08-01T23:59:23.638+0800 F -        [thread2] Got signal: 6 (Aborted).
    
    ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 31862
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 65535
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 31862
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    

    设置了 sysctl.conf fs.file-max = 2097152

    每天都会崩溃 实在不清楚问题所在根源

    第 1 条附言  ·  129 天前
    mongod --version
    db version v3.2.11
    git version: 009580ad490190ba33d1c6253ebd8d91808923e4
    OpenSSL version: OpenSSL 1.0.2s  28 May 2019
    allocator: tcmalloc
    modules: none
    build environment:
        distarch: x86_64
        target_arch: x86_64
    
    20 回复  |  直到 2019-08-03 13:09:09 +08:00
        1
    KYLINZZ   129 天前   ♥ 1
        2
    auser   129 天前   ♥ 1
    建议在 /proc/PID/limits 文件里看进程到底能打开多少 FD
        3
    comwrg   129 天前
    @auser


    ```
    Limit Soft Limit Hard Limit Units
    Max cpu time unlimited unlimited seconds
    Max file size unlimited unlimited bytes
    Max data size unlimited unlimited bytes
    Max stack size 8388608 unlimited bytes
    Max core file size 0 unlimited bytes
    Max resident set unlimited unlimited bytes
    Max processes 64000 64000 processes
    Max open files 64000 64000 files
    Max locked memory unlimited unlimited bytes
    Max address space unlimited unlimited bytes
    Max file locks unlimited unlimited locks
    Max pending signals 31862 31862 signals
    Max msgqueue size 819200 819200 bytes
    Max nice priority 0 0
    Max realtime priority 0 0
    Max realtime timeout unlimited unlimited us
    ```

    看了下应该是没有问题的
        4
    auser   129 天前   ♥ 1
    @comwrg 检查下 TCP 连接的数量,可以使用 ss 或者 netstat,然后看看 mongodb 进程相关的连接数量是否过多。如果过多,要根据 TCP 所处的状态来进一步推断问题在哪里,到底是什么原因把文件描述符资源占用完了。比如说被拒绝服务攻击,大量空的 TCP 连接。

    一个网络连接占用一个文件描述符( fd ),打开文件读写也占用一个。从错误日志来看,最先出现的错误是文件描述符用完,导致新的网络连接拿不到 fd,accept (接受新网络连接的系统调用)失败。这种情况还好。但是对数据库而言,文件写不进磁盘,数据无法落地,主动崩溃是好的做法。

    针对楼主的问题,我觉得很可能是频繁调用的地方,文件使用完没有关闭,导致 fd 一直无法释放,最终达到上限。现在楼主应该从网络(第一段所说)与 /proc/PID/fd/目录下来排查故障原因。
        5
    est   129 天前
    inode 用完了。
        6
    comwrg   129 天前
        7
    comwrg   129 天前
    @KYLINZZ
    我看里面的 version 是 2.6.7 与我的对不上呀 这个 BUG 也有点老老
        8
    aaa5838769   129 天前
    这种一般都是磁盘没空间了,要不就是 i 节点用完了。
        9
    julyclyde   129 天前   ♥ 1
    用 ulimit 或者 /etc/securiyt/limits.conf 去查看和修改是一种很经典的错误

    后台服务的 rlimit 要在其启动的地方设置
        10
    bigpigB   129 天前 via Android
    ulimit 改大一点
        11
    neverfall   129 天前
    只管开不管关么?
    记得 close
        12
    comwrg   129 天前
    @est @aaa5838769 都没用哈
        13
    comwrg   129 天前
    @est @aaa5838769 都没有哈
        14
    comwrg   129 天前
    @auser 非常感谢🙏,已经按照您说的去排查了

    排查到 mongodb 占用了很多 fd ( 24135/38839 )占用超过了一半往上

    ![image]( https://user-images.githubusercontent.com/19854253/62348661-efa26b00-b52f-11e9-80be-b1eef07c061b.png)

    难道真的时候项目中没有关闭连接吗 不过这个项目已经运行了好几个月了 只是最近几天 mongo 开始频繁的因为 fd 用完而崩溃
        15
    comwrg   129 天前
        16
    auser   128 天前 via iPhone
    docs.mongodb.com/v3.2/core/index-text/

    隐约感觉问题出在这里,推测是设计问题(滥用数据库)。我不会这个数据库,只能帮到这里了。
        17
    comwrg   128 天前
    @auser 好的,非常感谢您提供的建议。我自己再去慢慢排查:)
        18
    ilucio   128 天前 via Android
    将 ulimit 设置成 64000,官网文档里讲了的
        19
    auser   128 天前 via iPhone
    @comwrg

    如果系统负载跟磁盘 io 不高
    先直接把文件描述符限制增大吧
    有最终结果了分享下吧
    主要是为什么会打开那么多索引文件
        20
    comwrg   128 天前 via Android
    @auser 恩,已经设置到 200000 了
    关于   ·   FAQ   ·   API   ·   我们的愿景   ·   广告投放   ·   感谢   ·   实用小工具   ·   4266 人在线   最高记录 5043   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.3 · 35ms · UTC 06:47 · PVG 14:47 · LAX 22:47 · JFK 01:47
    ♥ Do have faith in what you're doing.