码农行者 码农行者
首页
  • Python

    • 语言特性
    • Django相关
    • Tornado
    • Celery
  • Golang

    • golang学习笔记
    • 对比python学习go
    • 模块学习
  • JavaScript

    • Javascript
  • 数据结构预算法笔记
  • ATS
  • Mongodb
  • Git
云原生
运维
垃圾佬的快乐
  • 数据库
  • 机器学习
  • 杂谈
  • 面试
关于
收藏
  • 分类
  • 标签
  • 归档
GitHub (opens new window)

DeanWu

软件工程师
首页
  • Python

    • 语言特性
    • Django相关
    • Tornado
    • Celery
  • Golang

    • golang学习笔记
    • 对比python学习go
    • 模块学习
  • JavaScript

    • Javascript
  • 数据结构预算法笔记
  • ATS
  • Mongodb
  • Git
云原生
运维
垃圾佬的快乐
  • 数据库
  • 机器学习
  • 杂谈
  • 面试
关于
收藏
  • 分类
  • 标签
  • 归档
GitHub (opens new window)
  • 记一次 Linux OOM-killer 分析过程

    • 问题现象
      • 分析
        • 优化方案
          • 参考
          • 运维
          • 常见问题记录
          DeanWu
          2018-11-28
          目录

          记一次 Linux OOM-killer 分析过程

          # 问题现象

          最近发现线上服务的服务节点经常性的挂掉,我们业务的线上服务使用docker跑的Tomcat服务。服务业务现象收集如下:

          • tomcat的相关日志,并没有明显的报错信息,但是所有日志均停在了同一时刻。
          • docker容器是活着的,但是Java进程挂了。
          • 重新启动服务后,内存占用比较高。服务器内存配置16G,jvm Xmx8g Xmn4g。
          • 磁盘IO比较高。

          作为运维,服务有没有内存溢出等问题暂且放一边,先从服务器这边排查下问题。首先怀疑是资源占用比较高,系统把服务干掉了。查看系统日志 message 发现如下日志,证实了猜想:

          Nov 27 21:02:21 localhost kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
          Nov 27 21:02:21 localhost kernel: java cpuset=aa5075722ccf791119d698c46ae14d51d2d2d45d479eee43fda347b6df0f5773 mems_allowed=0-1
          Nov 27 21:02:21 localhost kernel: CPU: 15 PID: 42005 Comm: java Tainted: G            -------------- T 3.10.0-229.el7.x86_64 #1
          Nov 27 21:02:21 localhost kernel: Hardware name: Huawei RH1288A V2/BC11SRSK0, BIOS RMIBV512 08/27/2015
          Nov 27 21:02:21 localhost kernel: ffff880287fee660 000000005270c3fe ffff880471377a58 ffffffff81604b0a
          Nov 27 21:02:21 localhost kernel: ffff880471377ae8 ffffffff815ffaaf ffff8803428abf30 ffff8803428abf48
          Nov 27 21:02:21 localhost kernel: 0000000000000206 ffff880287fee660 ffff880471377ad0 ffffffff81117aef
          Nov 27 21:02:21 localhost kernel: Call Trace:
          Nov 27 21:02:21 localhost kernel: [<ffffffff81604b0a>] dump_stack+0x19/0x1b
          Nov 27 21:02:21 localhost kernel: [<ffffffff815ffaaf>] dump_header+0x8e/0x214
          Nov 27 21:02:21 localhost kernel: [<ffffffff81117aef>] ? delayacct_end+0x8f/0xb0
          Nov 27 21:02:21 localhost kernel: [<ffffffff8115a44e>] oom_kill_process+0x24e/0x3b0
          Nov 27 21:02:21 localhost kernel: [<ffffffff81159fb6>] ? find_lock_task_mm+0x56/0xc0
          Nov 27 21:02:21 localhost kernel: [<ffffffff8115ac76>] out_of_memory+0x4b6/0x4f0
          Nov 27 21:02:21 localhost kernel: [<ffffffff81160e55>] __alloc_pages_nodemask+0xa95/0xb90
          Nov 27 21:02:21 localhost kernel: [<ffffffff811a29da>] alloc_pages_vma+0x9a/0x140
          Nov 27 21:02:21 localhost kernel: [<ffffffff81182fb7>] handle_mm_fault+0x9f7/0xd70
          Nov 27 21:02:21 localhost kernel: [<ffffffff810a94f6>] ? try_to_wake_up+0x1b6/0x280
          Nov 27 21:02:21 localhost kernel: [<ffffffff810a95e3>] ? wake_up_process+0x23/0x40
          Nov 27 21:02:21 localhost kernel: [<ffffffff8160fe06>] __do_page_fault+0x156/0x540
          Nov 27 21:02:21 localhost kernel: [<ffffffff812e3247>] ? call_rwsem_wake+0x17/0x30
          Nov 27 21:02:21 localhost kernel: [<ffffffff8109c42d>] ? up_write+0x1d/0x20
          Nov 27 21:02:21 localhost kernel: [<ffffffff810faec6>] ? __audit_syscall_exit+0x1f6/0x2a0
          Nov 27 21:02:21 localhost kernel: [<ffffffff8161020a>] do_page_fault+0x1a/0x70
          Nov 27 21:02:21 localhost kernel: [<ffffffff8160c408>] page_fault+0x28/0x30
          Nov 27 21:02:21 localhost kernel: Mem-Info:
          Nov 27 21:02:21 localhost kernel: Node 0 DMA per-cpu:
          Nov 27 21:02:21 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    1: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    2: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    3: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    4: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    5: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    6: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    7: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    8: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    9: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   10: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   11: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   12: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   13: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   14: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   15: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   16: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   17: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   18: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   19: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   20: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   21: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   22: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   23: hi:    0, btch:   1 usd:   0
          Nov 27 21:02:21 localhost kernel: Node 0 DMA32 per-cpu:
          Nov 27 21:02:21 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    1: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    2: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    3: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    4: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    5: hi:  186, btch:  31 usd:   2
          Nov 27 21:02:21 localhost kernel: CPU    6: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    7: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    8: hi:  186, btch:  31 usd:   3
          Nov 27 21:02:21 localhost kernel: CPU    9: hi:  186, btch:  31 usd:   1
          Nov 27 21:02:21 localhost kernel: CPU   10: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   11: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   12: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   13: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   14: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   15: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   16: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   17: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   18: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   19: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   20: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   21: hi:  186, btch:  31 usd:  12
          Nov 27 21:02:21 localhost kernel: CPU   22: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   23: hi:  186, btch:  31 usd:  15
          Nov 27 21:02:21 localhost kernel: Node 0 Normal per-cpu:
          Nov 27 21:02:21 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    1: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    2: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    3: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    4: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    5: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    6: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    7: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    8: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    9: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   10: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   11: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   12: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   13: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   14: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   15: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   16: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   17: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   18: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   19: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   20: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   21: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   22: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   23: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: Node 1 Normal per-cpu:
          Nov 27 21:02:21 localhost kernel: CPU    0: hi:  186, btch:  31 usd:  46
          Nov 27 21:02:21 localhost kernel: CPU    1: hi:  186, btch:  31 usd:   2
          Nov 27 21:02:21 localhost kernel: CPU    2: hi:  186, btch:  31 usd:  42
          Nov 27 21:02:21 localhost kernel: CPU    3: hi:  186, btch:  31 usd:   2
          Nov 27 21:02:21 localhost kernel: CPU    4: hi:  186, btch:  31 usd:  31
          Nov 27 21:02:21 localhost kernel: CPU    5: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU    6: hi:  186, btch:  31 usd:  16
          Nov 27 21:02:21 localhost kernel: CPU    7: hi:  186, btch:  31 usd: 164
          Nov 27 21:02:21 localhost kernel: CPU    8: hi:  186, btch:  31 usd:  19
          Nov 27 21:02:21 localhost kernel: CPU    9: hi:  186, btch:  31 usd:  18
          Nov 27 21:02:21 localhost kernel: CPU   10: hi:  186, btch:  31 usd:   8
          Nov 27 21:02:21 localhost kernel: CPU   11: hi:  186, btch:  31 usd:  20
          Nov 27 21:02:21 localhost kernel: CPU   12: hi:  186, btch:  31 usd:  31
          Nov 27 21:02:21 localhost kernel: CPU   13: hi:  186, btch:  31 usd:  27
          Nov 27 21:02:21 localhost kernel: CPU   14: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   15: hi:  186, btch:  31 usd:   4
          Nov 27 21:02:21 localhost kernel: CPU   16: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   17: hi:  186, btch:  31 usd:  30
          Nov 27 21:02:21 localhost kernel: CPU   18: hi:  186, btch:  31 usd:  30
          Nov 27 21:02:21 localhost kernel: CPU   19: hi:  186, btch:  31 usd:   0
          Nov 27 21:02:21 localhost kernel: CPU   20: hi:  186, btch:  31 usd:   3
          Nov 27 21:02:21 localhost kernel: CPU   21: hi:  186, btch:  31 usd:  24
          Nov 27 21:02:21 localhost kernel: CPU   22: hi:  186, btch:  31 usd:  12
          Nov 27 21:02:21 localhost kernel: CPU   23: hi:  186, btch:  31 usd:  18
          Nov 27 21:02:21 localhost kernel: active_anon:2724567 inactive_anon:177005 isolated_anon:0
          Nov 27 21:02:21 localhost kernel: Node 0 DMA free:15900kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
          Nov 27 21:02:21 localhost kernel: lowmem_reserve[]: 0 1764 7766 7766
          Nov 27 21:02:21 localhost kernel: Node 0 DMA32 free:34556kB min:10040kB low:12548kB high:15060kB active_anon:1612600kB inactive_anon:65356kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052096kB managed:1808016kB mlocked:0kB dirty:0kB writeback:0kB mapped:2344kB shmem:72780kB slab_reclaimable:39568kB slab_unreclaimable:23940kB kernel_stack:8096kB pagetables:5680kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
          Nov 27 21:02:21 localhost kernel: lowmem_reserve[]: 0 0 6002 6002
          Nov 27 21:02:21 localhost kernel: Node 0 Normal free:33824kB min:34160kB low:42700kB high:51240kB active_anon:5423868kB inactive_anon:20808kB active_file:20kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6291456kB managed:6146484kB mlocked:0kB dirty:0kB writeback:0kB mapped:5364kB shmem:18288kB slab_reclaimable:108572kB slab_unreclaimable:85084kB kernel_stack:26192kB pagetables:14532kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:114 all_unreclaimable? yes
          Nov 27 21:02:21 localhost kernel: lowmem_reserve[]: 0 0 0 0
          Nov 27 21:02:21 localhost kernel: Node 1 Normal free:46612kB min:45816kB low:57268kB high:68724kB active_anon:3861800kB inactive_anon:621856kB active_file:340kB inactive_file:40kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:8388608kB managed:8243396kB mlocked:0kB dirty:0kB writeback:132kB mapped:3756kB shmem:993724kB slab_reclaimable:120776kB slab_unreclaimable:139408kB kernel_stack:26976kB pagetables:18120kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2643 all_unreclaimable? yes
          Nov 27 21:02:21 localhost kernel: lowmem_reserve[]: 0 0 0 0
          Nov 27 21:02:21 localhost kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
          Nov 27 21:02:21 localhost kernel: Node 0 DMA32: 742*4kB (UEMR) 396*8kB (UEM) 344*16kB (UEM) 407*32kB (UEMR) 151*64kB (UEMR) 5*128kB (UMR) 0*256kB 1*512kB (R) 0*1024kB 0*2048kB 0*4096kB = 35480kB
          Nov 27 21:02:21 localhost kernel: Node 0 Normal: 1015*4kB (UEMR) 599*8kB (UEM) 586*16kB (UEM) 376*32kB (UEM) 99*64kB (UEM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 36724kB
          Nov 27 21:02:21 localhost kernel: Node 1 Normal: 2725*4kB (UEMR) 1187*8kB (UEM) 460*16kB (UEM) 339*32kB (UEM) 106*64kB (UEM) 7*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46284kB
          Nov 27 21:02:21 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
          Nov 27 21:02:21 localhost kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
          Nov 27 21:02:21 localhost kernel: 272043 total pagecache pages
          Nov 27 21:02:21 localhost kernel: 0 pages in swap cache
          Nov 27 21:02:21 localhost kernel: Swap cache stats: add 17671445, delete 17671445, find 29162461/30280216
          Nov 27 21:02:21 localhost kernel: Free swap  = 0kB
          Nov 27 21:02:21 localhost kernel: Total swap = 0kB
          Nov 27 21:02:21 localhost kernel: 4187036 pages RAM
          Nov 27 21:02:21 localhost kernel: 0 pages HighMem/MovableOnly
          Nov 27 21:02:21 localhost kernel: 133587 pages reserved
          Nov 27 21:02:21 localhost kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
          Nov 27 21:02:21 localhost kernel: [ 1183]     0  1183    14046     2536      29        0             0 systemd-journal
          Nov 27 21:02:21 localhost kernel: [ 1194]     0  1194    28172       64      21        0             0 lvmetad
          Nov 27 21:02:21 localhost kernel: [ 1249]     0  1249    10682      189      22        0         -1000 systemd-udevd
          Nov 27 21:02:21 localhost kernel: [ 2094]     0  2094    29177      123      25        0         -1000 auditd
          Nov 27 21:02:21 localhost kernel: [ 2121]   997  2121     1084       30       7        0             0 lsmd
          Nov 27 21:02:21 localhost kernel: [ 2124]     0  2124    53614      445      59        0             0 abrtd
          Nov 27 21:02:21 localhost kernel: [ 2126]     0  2126    32512      129      21        0             0 smartd
          Nov 27 21:02:21 localhost kernel: [ 2127]     0  2127   185139     2193     132        0             0 rsyslogd
          Nov 27 21:02:21 localhost kernel: [ 2129]     0  2129    52996      340      56        0             0 abrt-watch-log
          Nov 27 21:02:21 localhost kernel: [ 2130]     0  2130     4817       90      14        0             0 irqbalance
          Nov 27 21:02:21 localhost kernel: [ 2131]     0  2131     1093       40       8        0             0 rngd
          Nov 27 21:02:21 localhost kernel: [ 2132]     0  2132     8737      163      21        0             0 systemd-logind
          Nov 27 21:02:21 localhost kernel: [ 2133]    81  2133     7224      180      20        0          -900 dbus-daemon
          Nov 27 21:02:21 localhost kernel: [ 2534]     0  2534     6483       51      16        0             0 atd
          Nov 27 21:02:21 localhost kernel: [ 2536]     0  2536     6794       59      18        0             0 xinetd
          Nov 27 21:02:21 localhost kernel: [ 3458]     0  3458     9840      126      22        0         -1000 sshd
          Nov 27 21:02:21 localhost kernel: [18475]     0 18475    27501       32      10        0             0 agetty
          Nov 27 21:02:21 localhost kernel: [226241]   999 226241   129017     2131      48        0             0 polkitd
          Nov 27 21:02:21 localhost kernel: [226841]    38 226841     8371      130      20        0             0 ntpd
          Nov 27 21:02:21 localhost kernel: [17993]     0 17993     2838       50      11        0             0 init.sh
          Nov 27 21:02:21 localhost kernel: [18033]     0 18033   271311     4160      94        0             0 python2.6
          Nov 27 21:02:21 localhost kernel: [18034]     0 18034    11302     6761      28        0             0 python2.6
          Nov 27 21:02:21 localhost kernel: [18042]     0 18042     1038       23       8        0             0 tail
          Nov 27 21:02:21 localhost kernel: [133984]     0 133984  1556394     7033     190        0             0 scribed
          Nov 27 21:02:21 localhost kernel: [41077]    99 41077     3880       48      12        0             0 dnsmasq
          Nov 27 21:02:21 localhost kernel: [136214]     0 136214    27931      275      53        0             0 lldpd
          Nov 27 21:02:21 localhost kernel: [136217]   990 136217    27931      260      52        0             0 lldpd
          Nov 27 21:02:21 localhost kernel: [179346] 60422 179346    81770     2557      38        0             0 wagent
          Nov 27 21:02:21 localhost kernel: [184527]     0 184527    52460     2085      57        0             0 python
          Nov 27 21:02:21 localhost kernel: [184648]     0 184648    49745     1409      53        0             0 python
          Nov 27 21:02:21 localhost kernel: [184670]     0 184670    47515     1281      49        0             0 python
          Nov 27 21:02:21 localhost kernel: [184696]     0 184696    47514     1278      51        0             0 python
          Nov 27 21:02:21 localhost kernel: [184718]     0 184718    47514     1283      48        0             0 python
          Nov 27 21:02:21 localhost kernel: [184740]     0 184740    47514     1314      50        0             0 python
          Nov 27 21:02:21 localhost kernel: [184762]     0 184762    47514     1302      51        0             0 python
          Nov 27 21:02:21 localhost kernel: [56906]     0 56906   505000     6523     101        0             0 xxxAgent
          Nov 27 21:02:21 localhost kernel: [38493]     0 38493   347897     6702      90        0          -500 dockerd
          Nov 27 21:02:21 localhost kernel: [38776]     0 38776    31576      157      18        0             0 crond
          Nov 27 21:02:21 localhost kernel: [39783]     0 39783   118293      662      24        0          -500 docker-containe
          Nov 27 21:02:21 localhost kernel: [39805]     0 39805     3761       55      12        0             0 docker.sh
          Nov 27 21:02:21 localhost kernel: [39855]     0 39855  6841020  2555504    7788        0             0 java
          Nov 27 21:02:21 localhost kernel: [48691]     0 48691    48225     1480      52        0             0 python
          Nov 27 21:02:21 localhost kernel: [90430]     0 90430   312213     1081      61        0          -500 docker-containe
          Nov 27 21:02:21 localhost kernel: [68926]     0 68926     5624       62      16        0             0 xxagent
          Nov 27 21:02:21 localhost kernel: [68932]     0 68932    12834     7251      30        0             0 xxagent
          Nov 27 21:02:21 localhost kernel: [70100]     0 70100    42611      220      39        0             0 crond
          Nov 27 21:02:21 localhost kernel: [70103]     0 70103    42611      220      39        0             0 crond
          Nov 27 21:02:21 localhost kernel: [70176]     0 70176    28279       43      15        0             0 sh
          Nov 27 21:02:21 localhost kernel: [70177]     0 70177    28279       42      13        0             0 sh
          Nov 27 21:02:21 localhost kernel: [70182]     0 70182    28279       64      13        0             0 bash
          Nov 27 21:02:21 localhost kernel: [70186]     0 70186    28279       43      12        0             0 bash
          Nov 27 21:02:21 localhost kernel: [70205]     0 70205    32413       93      21        0             0 perl
          Nov 27 21:02:21 localhost kernel: [70357]     0 70357    28279       63      11        0             0 bash
          Nov 27 21:02:21 localhost kernel: [70718]     0 70718     1143       81       7        0             0 gzip
          Nov 27 21:02:21 localhost kernel: Out of memory: Kill process 39855 (java) score 632 or sacrifice child
          Nov 27 21:02:21 localhost kernel: Killed process 39855 (java) total-vm:27364080kB, anon-rss:10222016kB, file-rss:0kB
          
          1
          2
          3
          4
          5
          6
          7
          8
          9
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          30
          31
          32
          33
          34
          35
          36
          37
          38
          39
          40
          41
          42
          43
          44
          45
          46
          47
          48
          49
          50
          51
          52
          53
          54
          55
          56
          57
          58
          59
          60
          61
          62
          63
          64
          65
          66
          67
          68
          69
          70
          71
          72
          73
          74
          75
          76
          77
          78
          79
          80
          81
          82
          83
          84
          85
          86
          87
          88
          89
          90
          91
          92
          93
          94
          95
          96
          97
          98
          99
          100
          101
          102
          103
          104
          105
          106
          107
          108
          109
          110
          111
          112
          113
          114
          115
          116
          117
          118
          119
          120
          121
          122
          123
          124
          125
          126
          127
          128
          129
          130
          131
          132
          133
          134
          135
          136
          137
          138
          139
          140
          141
          142
          143
          144
          145
          146
          147
          148
          149
          150
          151
          152
          153
          154
          155
          156
          157
          158
          159
          160
          161
          162
          163
          164
          165
          166
          167
          168
          169
          170
          171
          172
          173
          174
          175
          176
          177
          178
          179
          180
          181
          182
          183
          184
          185
          186
          187
          188
          189
          190
          191
          192
          193
          194
          195
          196
          197
          198
          199
          200
          201
          202
          203
          204
          205
          206

          该日志可以说明 Java进程是被系统kill的了。下面我们来具体看下是怎么回事?

          # 分析

          Linux 系统有一种自我保护的机制叫做 OOM-killer,即当物理内存被进程使用完后,当再有进程来请求内存时,它就会把当前进程中最占用内存最多,回收内存收益最大的进程给kill掉来回收内存,保证系统的正常运行。

          从上边的日志可以看到如下几行:

          Nov 27 21:02:21 localhost kernel: Out of memory: Kill process 39855 (java) score 632 or sacrifice child
          Nov 27 21:02:21 localhost kernel: Killed process 39855 (java) total-vm:27364080kB, anon-rss:10222016kB, file-rss:0kB
          
          1
          2

          从这个日志中,我们可以看到 java进程 39855 因为Out of memory 被系统kill了。下边一行展示了当时他的内存占用情况,虚拟内存 27G左右,anon-rss(虚拟内存页,大家可理解为内存单元,每个大小为4k,一般为进程占用)10G左右,file-rss(文件内存也,当打开大文件时占用)0。

          来看上边各进程内存占用情况,找到java 进程 rss 25555044k=10G 左右,此进程占用内存最大。其次有5个在6000+, 56000*4k=1.2G, 剩余进程大约在3~4g,正好为系统的16G。

          当没有空余的内存时,触发OOM-killer机制,java进程占用内存最大,就被系统kill了。由于Java进程非正常退出,docker容器没有退出,但是已经不再提供服务了。

          # 优化方案

          方案一,降低Java虚拟机配置的最大内存或增加机器物理内存

          在不影响服务的情况下,降低Java虚拟机配置的最大内存,从而降低整体内存的占用情况,虽然内存占用可能依然较高,但是整体会有所下降,触发OOM-killer机制的几率会减少。

          方案二,修改进程得分或关闭OOM-killer机制

          系统在选择进程来kill的时候,会根据某种算法计算出一个介于[-17,15]之间的数值,保存在 /proc/[pid]/oom_adj中,得分越高表示选中的可能性越大。-17表示禁止kill。我们可以修改这个得分来设置我们的进程不被kill。

          $ echo -17 > /proc/$(pidof java)/oom_adj
          
          1

          也可采用禁止OOM-killer机制的方法。达到禁止kill的目的。如下:

          # sysctl -w vm.overcommit_memory=2
          
          1

          此方案,可能会因进程占用大量内存而没有被释放,导致系统卡死,其他操作无法处理,生产环境不建议使用。

          总之,预估好服务峰值时消耗的内存和机器的物理内存之间的关系是关键。

          # 参考

          • 与Linux OOM-killer的第一次亲密接触 (opens new window)
          • https://my.oschina.net/u/1998527/blog/903622 (opens new window)
          • https://blog.csdn.net/gugemichael/article/details/24017515 (opens new window)
          • http://www.cnblogs.com/GoodGoodWorkDayDayUp/p/3473348.html (opens new window)
          #Linux#运维知识库
          上次更新: 2023/03/19, 15:09:33
          最近更新
          01
          chromebox/chromebook 刷bios步骤
          03-01
          02
          redis 集群介绍
          11-28
          03
          go语法题二
          10-09
          更多文章>
          Theme by Vdoing | Copyright © 2015-2024 DeanWu | 遵循CC 4.0 BY-SA版权协议
          • 跟随系统
          • 浅色模式
          • 深色模式
          • 阅读模式