nginx 端口被占用的错误，不知道是那个进程在写错误日志，晕呀

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 2644 天前的主题，其中的信息可能已经有所发展或是发生改变。

网上看了下，这个端口 still could not bind() 的错误一大堆，各种文章的解决方法都试了，还是没解决问题。

我给 nginx server { ... } 里配置的 error_log 路径明明是：error_log /var/log/nginx/error.log; 但是在 /usr/local/nginx/logs 里面却有一个 error.log 文件，一直记录 bind() 端口被占用的错误：

2017/10/06 19:46:14 [emerg] 10093#0: bind() to 0.0.0.0:80 failed (98: Address already in use)

我只编译安装了一个 nginx，服务器运行正常。因为端口绑不上，所以 lsof -i:80 也看不到到底是哪个 nginx 在重复请求绑定，ps -ef | grep nginx 也没看到什么异常的进程：

root      1431     1  0 19:51 ?        00:00:00 nginx: master process /usr/local/nginx/sbin/nginx
nobody    1432  1431  0 19:51 ?        00:00:00 nginx: worker process      
root      1973  1773  0 20:04 pts/0    00:00:00 grep --color=auto nginx

虽然这个错误不影响当前程序的运行，但是明知道一个潜在的 nginx 在不停的请求绑定现有的端口，不知疲倦的在写错误日志，有点郁闷的

Nginx

端口

bind

log

24 条回复 • 2017-10-07 12:25:05 +08:00

cocalrush

2017-10-06 20:48:15 +08:00 via Android

如果是 linux 服务器在 root 权限下执行 netstat -ano | grep 80 就能看到是那个进程占用 80 端口了

miniyao

2017-10-06 20:55:00 +08:00

@cocalrush 你可能没看明白我的故障。我是正常的 nginx 跑在 80 端口上，然后有个潜在的进程（应该也是 nginx 进程了）一直要继续监听 80 端口，因为有一个正常在跑的 nginx，当然不能重复 bind 了，然后这个潜在的进程就不停在写 error.log 。我是烦这个 error.log ，一个月能写一个多 G 的错误日志，很烦。所以想揪出来是哪个潜在的进程在干这个破事

whoops

2017-10-06 20:55:39 +08:00

1024 以下端口是需要 root 权限的，会不会是这个原因呢

hcymk2

2017-10-06 20:57:50 +08:00

lsof

oaix

2017-10-06 21:01:20 +08:00

可以试试查查这几个地方：
1. crontab 里面有没有定时任务.
2. 有没有用 monit 之类的进程管理工具，看看是不是 monit 一直在拉起进程.

miniyao

2017-10-06 21:01:35 +08:00

@whoops 我默认的 nginx 的配置（ root 用户），就起了一个 worker_processes 1;
但是错误日志指定的路径也不是在 /usr/local/nginx/logs 这里的，就是这里 /usr/local/nginx/logs 不停的有一个 error.log 的文件在持续增长。

miniyao

2017-10-06 21:07:43 +08:00

@oaix 就是 supervisor 开机启动了一下 nginx，gunicorn, celery。celery 只用了普通的异步任务，没有定时任务。这些应该都不会（依赖）调用 nginx 吧？因为如果依赖 nginx 的化，那绑定不成功，这些任务都会不工作的（实际上，这些都工作的很正常）。

morethansean

2017-10-06 21:09:03 +08:00

把 nginx 停掉给这个进程去 bind 然后你再看看端口占用不就好了……

miniyao

2017-10-06 21:17:57 +08:00

@morethansean sudo pkill -9 nginx 杀不掉？

loopback

2017-10-06 21:21:46 +08:00

lsof /usr/local/nginx/logs/error.log

miniyao

2017-10-06 21:38:41 +08:00

@loopback 没看出啥异常，觉得有点意外的是 root nobody 应该是我正常在用的进程，它们的错误日志我指定应该是：/var/log/nginx/error.log 而不是：/usr/local/nginx/logs/error.log

root@iY13xetfofdZ:~# lsof /usr/local/nginx/logs/error.log
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 1412 root 2w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log
nginx 1412 root 8w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log
nginx 1413 nobody 2w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log
nginx 1413 nobody 8w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log
nginx 1805 root 3w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log
nginx 1805 root 8w REG 202,1 152632 2362264 /usr/local/nginx/logs/error.log

loopback

2017-10-06 21:42:13 +08:00

3 个进程打开了这个文件吧？
你 lsof -i:80 看下哪个进程绑了 80，然后就是另外两个进程的锅了。

miniyao

2017-10-06 21:50:58 +08:00

@loopback 这样只能看到已经绑在 80 端口上的进程，错误日志是报它绑不上呀

root@iY13xetfofdZ:~# lsof -i:80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 1412 root 9u IPv4 10786 0t0 TCP *:http (LISTEN)
nginx 1412 root 10u IPv6 10787 0t0 TCP *:http (LISTEN)
nginx 1413 nobody 9u IPv4 10786 0t0 TCP *:http (LISTEN)
nginx 1413 nobody 10u IPv6 10787 0t0 TCP *:http (LISTEN)
AliYunDun 1525 root 19u IPv4 11413 0t0 TCP 60.205.45.120:35292->140.205.140.205:http (ESTABLISHED)

ghostheaven

2017-10-06 21:52:20 +08:00 via Android

0.0.0.0 是所有 IP，看看有没有别的进程占用了某个 IP，比如 127.0.0.1 或者某个内网 IP 的 80 端口，导致部分绑定失败。

loopback

2017-10-06 21:52:37 +08:00

@miniyao #13 嗯，那就是 1805 写的绑不上的日志。

miniyao

2017-10-06 21:55:11 +08:00

@loopback 嗯，现在有什么办法，可以查出来是什么地方启动了 1805 这个 nginx 进程吖？

orzfly

2017-10-06 22:16:33 +08:00

我突然想到……

你要不把你 supervisor 配置文件里关于 nginx 的贴出来看看……

buran

2017-10-06 22:20:23 +08:00

@loopback 这样看了，好像有两次系统自启动 nginx 的设置，没找到第二个 nginx 是哪里启动的

root@iY13xetfofdZ:~# pstree -n
init─┬─upstart-udev-br
├─...
├─...
├─supervisord─┬─python───python
│ └─nginx
├─nginx───nginx

buran

2017-10-06 22:24:01 +08:00

@orzfly supervisor 启动 nginx 是这样的(从#18 楼的情况看，supervisor 已经启动了 nginx，下面那个 |-nginx--nginx 也要来 bind() 80 端口）：

/etc/supervisor/conf.d/nginx.conf

[program:nginx]
command=/usr/local/nginx/sbin/nginx
directory=/root
user=root
stdout_logfile=/var/log/nginx.log
autostart=true
autorestart=true
redirect_stderr=true
stopsignal=QUIT

Terenc3

2017-10-07 00:22:59 +08:00 via iPhone

用 netstat -anp | grep 80

有些服务商的镜像会启用 apache，这个要注意。

如果设置了多个 server 块内容，要注意端口和域名的设定。

veelog

2017-10-07 08:36:32 +08:00 via iPhone

根据 pid 找到进程路径，进程名不就知道了么

Beebird

2017-10-07 10:17:26 +08:00

另一个 nginx 实例被启动了，而且没有读你的配置文件。可以查查 nginx 的启动脚本是怎么写的。

oott123

2017-10-07 12:21:25 +08:00 via Android

根据 19 楼的配置来看，你需要在 nginx 的配置文件里写一个 daemon off;
或者在命令行上写 -g 'daemon off;'

并不是有别的 nginx 进程，而是你的 supervisor 觉得你的 nginx 退出了，在试图重启它，而它并没有挂，只是启动完了。

oott123

2017-10-07 12:25:05 +08:00 via Android

这么说可能更容易理解一些：nginx 的二进制程序默认行为是 fork 一个 nginx 进程在后台启动，并退出自己。
而 supervisord 的行为是，检测到前台进程退出后，认为它挂了，重启它。
于是你的 nginx 第一次会正常启动并退出，而 supervisor 则认为这不正常，重启它；重启它又会因为端口被占用，前台进程继续退出，supervisor 继续重启…

解决这个问题有很多种办法，其中最简单的一种是，让 nginx 别在后台启动，而是在前台启动就行了。