当我正在工作时,开发的一个员工发微信告诉我,阿里云一台服务登录很慢,登录进去操作很卡顿;
我就立刻登录进去,查看了下进程:(ps -aux)这台有nginx(有反代),还有mysql,握草,还有个tomcat,这台配置很多低,还跑了那么多服务,
虽然是拿来测试用的,也扛不住你们这样糟蹋啊,我(top)了下,尼玛,好几秒才出现;先不管,看那个服务占用CPU资源的,看了下负载(load average: 1.19, 1.39, 1.37)
正常啊。这台配置是:CPU:2核,内存:4G,20M带宽,因为这台nginx用来存放APP中的H5静态页面和图片,带宽大些。
然后我就看下nginx的错误日志,看到了问题:2048 worker_connections are not enough 有很多这样,
又看了nginx.conf配置:
#user nobody;
worker_processes 3;
#error_log logs/error.log;
#error_log logs/error.log notice;
#error_log logs/error.log info;
#pid logs/nginx.pid;
events {
use epoll;
worker_connections 2048;
}
又看了下阿里云控制台监控,CPU负载有一点点高,为什么和我在远程登录top查看的不一样啊,好纠结,最后参考了阿里云的监控,
我把上面的配置标记的地方增加下,worker_connections 2048 改了4096,增加一倍,
再看日志,没有再报了 2048 worker_connections are not enough
出现这样的了
2018/01/03 17:22:57 [alert] 9795#0: worker process 5924 exited on signal 9
2018/01/03 17:23:20 [alert] 9795#0: worker process 5929 exited on signal 9
查看内核信息(/var/log/messages):
Jan 3 17:37:31 debug010000002015 kernel: lowmem_reserve[]: 0 0 0 0
Jan 3 17:37:31 debug010000002015 kernel: Node 0 DMA: 2*4kB 2*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15752kB
Jan 3 17:37:31 debug010000002015 kernel: Node 0 DMA32: 545*4kB 439*8kB 243*16kB 218*32kB 14*64kB 58*128kB 84*256kB 6*512kB 3*1024kB 1*2048kB 0*4096kB = 54572kB
Jan 3 17:37:31 debug010000002015 kernel: Node 0 Normal: 2952*4kB 8*8kB 12*16kB 84*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 16864kB
Jan 3 17:37:31 debug010000002015 kernel: 723 total pagecache pages
Jan 3 17:37:31 debug010000002015 kernel: 0 pages in swap cache
Jan 3 17:37:31 debug010000002015 kernel: Swap cache stats: add 0, delete 0, find 0/0
Jan 3 17:37:31 debug010000002015 kernel: Free swap = 0kB
Jan 3 17:37:31 debug010000002015 kernel: Total swap = 0kB
Jan 3 17:37:31 debug010000002015 kernel: 1048575 pages RAM
Jan 3 17:37:31 debug010000002015 kernel: 67471 pages reserved
Jan 3 17:37:31 debug010000002015 kernel: 63109 pages shared
Jan 3 17:37:31 debug010000002015 kernel: 916409 pages non-shared
Jan 3 17:37:31 debug010000002015 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Jan 3 17:37:31 debug010000002015 kernel: [ 477] 0 477 2705 147 0 -17 -1000 udevd
Jan 3 17:37:31 debug010000002015 kernel: [ 1109] 0 1109 23283 76 0 -17 -1000 auditd
Jan 3 17:37:31 debug010000002015 kernel: [ 1131] 0 1131 62369 1538 0 0 0 rsyslogd
Jan 3 17:37:31 debug010000002015 kernel: [ 1448] 38 1448 7653 135 1 0 0 ntpd
Jan 3 17:37:31 debug010000002015 kernel: [ 1539] 0 1539 20226 228 1 0 0 master
Jan 3 17:37:31 debug010000002015 kernel: [ 1554] 89 1554 20289 224 0 0 0 qmgr
Jan 3 17:37:31 debug010000002015 kernel: [ 1559] 0 1559 29218 154 0 0 0 crond
Jan 3 17:37:31 debug010000002015 kernel: [ 1576] 0 1576 5277 46 0 0 0 atd
Jan 3 17:37:31 debug010000002015 kernel: [ 1601] 0 1601 396 68 1 0 0 aliyun-service
Jan 3 17:37:31 debug010000002015 kernel: [ 1614] 0 1614 17457 154 0 0 0 login
Jan 3 17:37:31 debug010000002015 kernel: [ 1616] 0 1616 1016 22 1 0 0 mingetty
Jan 3 17:37:31 debug010000002015 kernel: [ 1618] 0 1618 2661 103 1 -17 -1000 udevd
Jan 3 17:37:31 debug010000002015 kernel: [ 1619] 0 1619 1016 22 1 0 0 mingetty
Jan 3 17:37:31 debug010000002015 kernel: [ 1621] 0 1621 1016 22 0 0 0 mingetty
Jan 3 17:37:31 debug010000002015 kernel: [ 1623] 0 1623 1016 21 0 0 0 mingetty
Jan 3 17:37:31 debug010000002015 kernel: [ 1625] 0 1625 1016 22 0 0 0 mingetty
Jan 3 17:37:31 debug010000002015 kernel: [10105] 0 10105 16559 182 0 -17 -1000 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 9795] 500 9795 7158 335 0 0 0 nginx
Jan 3 17:37:31 debug010000002015 kernel: [19524] 0 19524 2280 123 0 0 0 dhclient
Jan 3 17:37:31 debug010000002015 kernel: [31947] 0 31947 16624 164 1 0 0 saslauthd
Jan 3 17:37:31 debug010000002015 kernel: [31948] 0 31948 16624 164 1 0 0 saslauthd
Jan 3 17:37:31 debug010000002015 kernel: [31949] 0 31949 16624 164 0 0 0 saslauthd
Jan 3 17:37:31 debug010000002015 kernel: [31950] 0 31950 16624 164 1 0 0 saslauthd
Jan 3 17:37:31 debug010000002015 kernel: [31951] 0 31951 16624 164 0 0 0 saslauthd
Jan 3 17:37:31 debug010000002015 kernel: [ 1175] 0 1175 2661 103 1 -17 -1000 udevd
Jan 3 17:37:31 debug010000002015 kernel: [25593] 500 25593 27111 116 0 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [15156] 0 15156 29165 160 0 0 0 wrapper
Jan 3 17:37:31 debug010000002015 kernel: [11631] 0 11631 27077 79 0 0 0 mysqld_safe
Jan 3 17:37:31 debug010000002015 kernel: [11876] 27 11876 421620 120678 1 0 0 mysqld
Jan 3 17:37:31 debug010000002015 kernel: [ 4190] 0 4190 909659 118199 0 0 0 java
Jan 3 17:37:31 debug010000002015 kernel: [21183] 0 21183 7684 173 1 0 0 AliYunDunUpdate
Jan 3 17:37:31 debug010000002015 kernel: [21235] 0 21235 32183 1526 1 0 0 AliYunDun
Jan 3 17:37:31 debug010000002015 kernel: [ 3026] 0 3026 626630 14437 0 0 0 java
Jan 3 17:37:31 debug010000002015 kernel: [ 4553] 0 4553 25640 258 1 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 4555] 500 4555 25640 262 1 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 4556] 500 4556 27111 117 0 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [ 4578] 500 4578 25238 26 0 0 0 tail
Jan 3 17:37:31 debug010000002015 kernel: [ 5161] 0 5161 25640 257 0 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 5163] 500 5163 25640 257 0 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 5164] 500 5164 27111 126 0 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [ 5421] 0 5421 25640 258 0 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 5423] 500 5423 25640 258 1 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 5424] 500 5424 27111 125 0 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [ 6188] 500 6188 36018 966 0 0 0 python
Jan 3 17:37:31 debug010000002015 kernel: [ 6190] 500 6190 283830 277086 0 0 0 nginx
Jan 3 17:37:31 debug010000002015 kernel: [ 6194] 500 6194 219105 212345 0 0 0 nginx
Jan 3 17:37:31 debug010000002015 kernel: [ 6199] 500 6199 117088 110346 0 0 0 nginx
Jan 3 17:37:31 debug010000002015 kernel: [ 6200] 0 6200 25640 263 1 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 6202] 500 6202 25640 264 0 0 0 sshd
Jan 3 17:37:31 debug010000002015 kernel: [ 6203] 500 6203 27111 119 0 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [ 6222] 500 6222 36887 108 0 0 0 su
Jan 3 17:37:31 debug010000002015 kernel: [ 6223] 0 6223 27111 120 1 0 0 bash
Jan 3 17:37:31 debug010000002015 kernel: [ 6262] 0 6262 25238 30 0 0 0 tail
Jan 3 17:37:31 debug010000002015 kernel: Out of memory: Kill process 6190 (nginx) score 282 or sacrifice child
Jan 3 17:37:31 debug010000002015 kernel: Killed process 6190, UID 500, (nginx) total-vm:1135320kB, anon-rss:1108148kB, file-rss:196kB
原因是系统内存不可用导致进程挂掉。
内存不够的原因,只有升级阿里云主机配置了,反应给我上级,等待升级,后续。。。。。。
昨天反应后,就升级配置了,花了二千多大洋,看配置
升级之后还是只要开启nginx 连接数就上涨来了,ssh连接管理敲命令卡顿,弄到半夜,找不到什么毛病,气死我了,不管了,睡觉。。。。、
早上来了以后TMD 神奇的好了,这套路难道是阿里云逼迫客户升级配置吗。想不明白。
现在nginx配置:
#user payworth; worker_processes 8; #error_log logs/error.log; #error_log logs/error.log notice; #error_log logs/error.log info; #pid logs/nginx.pid; worker_rlimit_nofile 65535; events { use epoll; worker_connections 102400; accept_mutex on; }
nginx的log/error.log里也没有任何报错了。
异常截图: