nginx+keepalived主辅切换(监控脚本在keepalived.conf中执行)

以前写过一篇,nginx+keepalived 双机互备的文章,写那篇文章的时候没有想过如果apache或者nginx 挂了,而 keepalived 或者 机器没有死,那么主辅是不会切换的,今天就研究了一下该如何监控 nginx进程呢,看官方站看到了。vrrp_script 功能,但是用他的方法实在形不通,可能是我的方法不对,或者是个BUG。所以后来我自己写了个小脚本来完成工作。
环境
Server 1 : ubuntu-server 8.04.4 192.168.6.162
Server 2 : userver-server 8.04.4 192.168.6.188
软件
Keepalived 1.1.15
nginx-0.8.35
pcre-8.02

1.分别在两台服务器上安装nginx

1
2
3
4
5
6
7
8
9
10
tar jxvf pcre-8.02.tar.bz2
cd pcre-8.02
./configure --prefix=/usr --enable-utf8 --enable-pcregrep-libbz2 --enable-pcregrep-libz
make
make install
tar zxvf nginx-0.8.35.tar.gz
cd nginx-0.8.35
--prefix=/usr/local/nginx --with-pcre --user=www --group=www --with-file-aio --with-http_ssl_module --with-http_flv_module --with-http_gzip_static_module --with-http_stub_status_module --with-cc-opt='-O3'
make
make install

2.分别在两台服务器编写配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
vim /usr/local/nginx/conf/nginx.conf

user www www;
worker_processes 1;
error_log logs/error.log notice;
pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
tcp_nopush on;
keepalive_timeout 65;
gzip on;
server {
listen 80;
server_name localhost;
index index.html index.htm;
root /var/www;
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}

3.分别在两台机器创建测试文件

echo "192.168.6.162" > /var/www/index.html
echo "192.168.6.188" > /var/www/index.html

4.安装 keepalived

apt-get install keepalived

5.在server 1服务器编写配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
vrrp_script chk_http_port {   
script "/opt/nginx_pid.sh" ###监控脚本
interval 2 ###监控时间
weight 2 ###目前搞不清楚
}
vrrp_instance VI_1 {
state MASTER ### 设置为 主
interface eth0 ### 监控网卡
virtual_router_id 51 ### 这个两台服务器必须一样
priority 101 ### 权重值 MASTRE 一定要高于 BAUCKUP
authentication {
auth_type PASS ### 加密
auth_pass eric ### 加密的密码,两台服务器一定要一样,不然会出错
}
track_script {
chk_http_port ### 执行监控的服务
}
virtual_ipaddress {
192.168.6.7 ### VIP 地址
}
}

6.在 server 2 服务器 keepalived 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
vrrp_script chk_http_port {   
script "/opt/nginx_pid.sh"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP ### 设置为 辅机
interface eth0
virtual_router_id 51 ### 与 MASTRE 设置 值一样
priority 100 ### 比 MASTRE权重值 低
authentication {
auth_type PASS
auth_pass eric ### 密码 与 MASTRE 一样
}
track_script {
chk_http_port
}
virtual_ipaddress {
192.168.6.7
}
}

7.编写监控nginx监控脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
vim /opt/nginx_pid.sh 
#!/bin/bash
# varsion 0.0.2
# 根据一网友说这样做不科学,如果nginx服务起来了,但是我把keepalived 杀掉了,我的理由是,如果nginx死掉了,我觉得就很难在起来,再有就是nagios 当然要给你报警了啊。不过这位同学说的有道理,所以就稍加改了一下脚本

A=`ps -C nginx --no-header |wc -l` ## 查看是否有 nginx进程 把值赋给变量A
if [ $A -eq 0 ];then ## 如果没有进程值得为 零
/usr/local/nginx/sbin/nginx
sleep 3
if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then
killall keepalived ## 则结束 keepalived 进程
fi
fi

8、测试,分别在两个服务器启动 nginx 和 keepalived

/usr/local/nginx/sbin/nginx
/etc/init.d/keepalived start
监控 server 1 的日志

1
2
3
4
5
6
7
8
9
10
Apr 20 18:37:39 nginx Keepalived_vrrp: Registering Kernel netlink command channel 
Apr 20 18:37:39 nginx Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 20 18:37:39 nginx Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 20 18:37:39 nginx Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'.
Apr 20 18:37:39 nginx Keepalived_healthcheckers: Configuration is using : 3401 Bytes
Apr 20 18:37:39 nginx Keepalived_vrrp: Configuration is using : 35476 Bytes
Apr 20 18:37:40 nginx Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Apr 20 18:37:41 nginx Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Apr 20 18:37:41 nginx Keepalived_vrrp: Netlink: skipping nl_cmd msg...
Apr 20 18:37:41 nginx Keepalived_vrrp: VRRP_Script(chk_http_port) succeeded

监控 server 2的日志

1
2
3
4
5
6
7
8
9
10
Apr2018:38:23 varnish Keepalived_healthcheckers: Opening file '/etc/keepalived/keepalived.conf'. 
Apr 20 18:38:23 varnish Keepalived_healthcheckers: Configuration is using : 3405 Bytes
Apr 20 18:38:23 varnish Keepalived_vrrp: Using MII-BMSR NIC polling thread...
Apr 20 18:38:23 varnish Keepalived_vrrp: Registering Kernel netlink reflector
Apr 20 18:38:23 varnish Keepalived_vrrp: Registering Kernel netlink command channel
Apr 20 18:38:23 varnish Keepalived_vrrp: Registering gratutious ARP shared channel
Apr 20 18:38:23 varnish Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'.
Apr 20 18:38:23 varnish Keepalived_vrrp: Configuration is using : 35486 Bytes
Apr 20 18:38:23 varnish Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Apr 20 18:38:25 varnish Keepalived_vrrp: VRRP_Script(chk_http_port) succeeded

看日志可以看出,两台服务器的 MASTRE 和 BACUKUP 已经都正常了
现在我们在 server 1 把 nginx 服务器停到
Server 1 $> killall nginx
这时候看server 1的日志

1
2
Apr 20 18:41:26 nginx Keepalived_healthcheckers: Terminating Healthchecker child process on signal 
Apr 20 18:41:26 nginx Keepalived_vrrp: Terminating VRRP child process on signal

可以看出keepalived 的进程已经停到
这时候看server 2的日志,看是否已经接管

1
2
3
Apr 20 18:41:23 varnish Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE 
Apr 20 18:41:24 varnish Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Apr 20 18:41:24 varnish Keepalived_vrrp: Netlink: skipping nl_cmd msg...

很明显的看出 server 2 已经接管了,已经变为 MASTER 了

转自:http://deidara.blog.51cto.com/400447/302402