[Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて

Back to archive index

renay****@ybb***** renay****@ybb*****
2015年 3月 4日 (水) 12:08:22 JST









----- Original Message -----
>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>To: 山内英生 <renay****@ybb*****> 
>Cc: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>; "linux****@lists*****" <linux****@lists*****>
>Date: 2015/3/4, Wed 11:09
>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>2015年3月4日 10:41 <renay****@ybb*****>:
>>># stonith -t external/xen0 hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" dom0="dom0.xxxx.com" reset_method="reboot" -T reset lbv2.beta.com
>>>2015 Mar  4 09:56:56 lbv2 [3387]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>2015 Mar  4 09:57:11 lbv2 [3508]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>2015 Mar  4 09:57:26 lbv2 [3629]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>----- Original Message -----
>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>>>To: 山内英生 <renay****@ybb*****>
>>>Cc: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>; "linux****@lists*****" <linux****@lists*****>
>>>Date: 2015/3/4, Wed 10:16
>>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>># stonith -t external/xen0 hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" dom0="dom0.xxxx.com" reset_method="reboot" -T reset lbv2.beta.com
>>>2015 Mar  4 09:56:56 lbv2 [3387]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>2015 Mar  4 09:57:11 lbv2 [3508]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>2015 Mar  4 09:57:26 lbv2 [3629]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>Mar 04 09:54:40 lbv1.beta.com info: Node lbv2.beta.com is member.
>>>Mar 04 09:55:56 lbv1.beta.com info: Set DC node to lbv1.beta.com.
>>>Mar 04 09:58:15 lbv1.beta.com info: Start "pengine" process.
>>>Mar 04 09:58:19 lbv1.beta.com info: Set DC node to lbv1.beta.com.
>>>Mar 04 09:54:32 lbv2.beta.com info: Starting Heartbeat 3.0.5.
>>>Mar 04 09:54:32 lbv2.beta.com info: Link lbv1.beta.com:eth1 is up.
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "crmd" process. (pid=2938)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "cib" process. (pid=2934)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "lrmd" process. (pid=2935)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "attrd" process. (pid=2937)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "ccm" process. (pid=2933)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "stonithd" process. (pid=2936)
>>>Mar 04 09:54:39 lbv2.beta.com info: Start "ipfail" process. (pid=2932)
>>>Mar 04 09:54:39 lbv2.beta.com WARN: Managed "ipfail" process exited. (pid=2932, rc=100)
>>>Mar 04 09:56:15 lbv2.beta.com info: Start "pengine" process.
>>>Mar 04 09:56:19 lbv2.beta.com info: Set DC node to lbv2.beta.com.
>>>Mar 04 09:56:23 lbv2.beta.com ERROR: Start to fail-over.
>>>Mar 04 09:56:25 lbv2.beta.com info: Resource Stonith1-1 started. (rc=0)
>>>Mar 04 09:56:26 lbv2.beta.com info: Resource Stonith1-2 started. (rc=0)
>>>Mar 04 09:56:26 lbv2.beta.com info: Resource Stonith1-3 started. (rc=0)
>>>Node lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624): pending
>>>Online: [ lbv1.beta.com ]
>>>Node lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624): pending
>>>Online: [ lbv1.beta.com ]
>>>2015年3月4日 9:05 <renay****@ybb*****>:
>>>>stonith -t external/libvirt hostlist="xx01" hypervisor_uri="xxxxx" reset_method="reboot"  -T reset ap01 
>>>> stonith -t 実行するstonithプラグイン パラメータ1・・・パラメータN -T 実行動作 stonithするホスト
>>>>----- Original Message -----
>>>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>>>>>To: 山内英生 <renay****@ybb*****>
>>>>>Cc: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>; "linux****@lists*****" <linux****@lists*****>
>>>>>Date: 2015/3/3, Tue 10:43
>>>>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>2015年3月3日 9:27 <renay****@ybb*****>:
>>>>>>----- Original Message -----
>>>>>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>>>>>>>To: 山内英生 <renay****@ybb*****>
>>>>>>>Cc: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>; "linux****@lists*****" <linux****@lists*****>
>>>>>>>Date: 2015/3/2, Mon 12:10
>>>>>>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>>>### Cluster Option ###
>>>>>>>property \
>>>>>>>    no-quorum-policy="ignore" \
>>>>>>>    stonith-enabled="true" \
>>>>>>>    startup-fencing="false" \
>>>>>>>    stonith-timeout="710s" \
>>>>>>>    crmd-transition-delay="2s"
>>>>>>>### Resource Default ###
>>>>>>>rsc_defaults \
>>>>>>>    resource-stickiness="INFINITY" \
>>>>>>>    migration-threshold="1"
>>>>>>>### Group Configuration ###
>>>>>>>group HAvarnish \
>>>>>>>    vip_208 \
>>>>>>>    varnishd
>>>>>>>group grpStonith1 \
>>>>>>>    Stonith1-1 \
>>>>>>>    Stonith1-2 \
>>>>>>>    Stonith1-3
>>>>>>>group grpStonith2 \
>>>>>>>    Stonith2-1 \
>>>>>>>    Stonith2-2 \
>>>>>>>    Stonith2-3
>>>>>>>### Clone Configuration ###
>>>>>>>clone clone_ping \
>>>>>>>    ping
>>>>>>>### Primitive Configuration ###
>>>>>>>primitive vip_208 ocf:heartbeat:IPaddr2 \
>>>>>>>    params \
>>>>>>>        ip="" \
>>>>>>>        nic="eth0" \
>>>>>>>        cidr_netmask="24" \
>>>>>>>    op start interval="0s" timeout="90s" on-fail="restart" \
>>>>>>>    op monitor interval="5s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="100s" on-fail="fence"
>>>>>>>primitive varnishd lsb:varnish \
>>>>>>>    op start interval="0s" timeout="90s" on-fail="restart" \
>>>>>>>    op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="100s" on-fail="fence"
>>>>>>>primitive ping ocf:pacemaker:ping \
>>>>>>>    params \
>>>>>>>        name="default_ping_set" \
>>>>>>>        host_list="" \
>>>>>>>        multiplier="100" \
>>>>>>>        dampen="1" \
>>>>>>>    op start interval="0s" timeout="90s" on-fail="restart" \
>>>>>>>    op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="100s" on-fail="fence"
>>>>>>>primitive Stonith1-1 stonith:external/stonith-helper \
>>>>>>>    params \
>>>>>>>        priority="1" \
>>>>>>>        stonith-timeout="40" \
>>>>>>>        hostlist="lbv1.beta.com" \
>>>>>>>        dead_check_target="" \
>>>>>>>        standby_wait_time="10" \
>>>>>>>        standby_check_command="/usr/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \
>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>        stonith-timeout="300" \
>>>>>>>        hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \
>>>>>>>        dom0="dom0.xxxx.com" \
>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>    op monitor interval="3600s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="60s" on-fail="ignore"
>>>>>>>primitive Stonith1-3 stonith:meatware \
>>>>>>>    params \
>>>>>>>        priority="3" \
>>>>>>>        stonith-timeout="600" \
>>>>>>>        hostlist="lbv1.beta.com" \
>>>>>>>    op start interval="0s" timeout="60s" \
>>>>>>>    op monitor interval="3600s" timeout="60s" \
>>>>>>>    op stop interval="0s" timeout="60s"
>>>>>>>primitive Stonith2-1 stonith:external/stonith-helper \
>>>>>>>    params \
>>>>>>>        priority="1" \
>>>>>>>        stonith-timeout="40" \
>>>>>>>        hostlist="lbv2.beta.com" \
>>>>>>>        dead_check_target="" \
>>>>>>>        standby_wait_time="10" \
>>>>>>>        standby_check_command="/usr/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \
>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>    op monitor interval="3600s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="60s" on-fail="ignore"
>>>>>>>primitive Stonith2-2 stonith:external/xen0 \
>>>>>>>    params \
>>>>>>>        priority="2" \
>>>>>>>        stonith-timeout="300" \
>>>>>>>        hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \
>>>>>>>        dom0="dom0.xxxx.com" \
>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>    op monitor interval="3600s" timeout="60s" on-fail="restart" \
>>>>>>>    op stop interval="0s" timeout="60s" on-fail="ignore"
>>>>>>>primitive Stonith2-3 stonith:meatware \
>>>>>>>    params \
>>>>>>>        priority="3" \
>>>>>>>        stonith-timeout="600" \
>>>>>>>        hostlist="lbv2.beta.com" \
>>>>>>>    op start interval="0s" timeout="60s" \
>>>>>>>    op monitor interval="3600s" timeout="60s" \
>>>>>>>    op stop interval="0s" timeout="60s"
>>>>>>>### Resource Location ###
>>>>>>>location HA_location-1 HAvarnish \
>>>>>>>    rule 200: #uname eq lbv1.beta.com \
>>>>>>>    rule 100: #uname eq lbv2.beta.com
>>>>>>>location HA_location-2 HAvarnish \
>>>>>>>    rule -INFINITY: not_defined default_ping_set or default_ping_set lt 100
>>>>>>>location HA_location-3 grpStonith1 \
>>>>>>>    rule -INFINITY: #uname eq lbv1.beta.com
>>>>>>>location HA_location-4 grpStonith2 \
>>>>>>>    rule -INFINITY: #uname eq lbv2.beta.com
>>>>>>>2015年3月1日 16:54 <renay****@ybb*****>:
>>>>>>>>----- Original Message -----
>>>>>>>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>>>>>>>>>To: renay****@ybb*****; linux****@lists*****
>>>>>>>>>Date: 2015/3/1, Sun 12:09
>>>>>>>>>Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>>>>>> crm設定ファイルのfencing_topologyの設定を見直してみた方がよいと思います。
>>>>>>>>>2015年2月28日 7:41 <renay****@ybb*****>:
>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
>>>>>>>>>>>To: linux****@lists*****
>>>>>>>>>>>Date: 2015/2/27, Fri 21:04
>>>>>>>>>>>Subject: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
>>>>>>>>>>>debian Xen上で2ノードのクラスタシステムを構築して検証をしています。
>>>>>>>>>>>Dom0はdebian7.7, Xen 4.1.4-3+deb7u3
>>>>>>>>>>>DomUはdebian7.8, pacemaker 1.1.7-1, heartbeat 1:3.0.5-3
>>>>>>>>>>>Node lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624): UNCLEAN (offl
>>>>>>>>>>>Online: [ lbv1.beta.com ]
>>>>>>>>>>>Node lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6): UNCLEAN (offl
>>>>>>>>>>>Online: [ lbv2.beta.com ]
>>>>>>>>>>>lbv1 [12657]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv2.beta.com failed with rc 1
>>>>>>>>>>>lbv2 [22225]: CRIT: external_reset_req: 'stonith-helper reset' for host lbv1.beta.com failed with rc 1
>>>>>>>>>>>primitive Stonith1-1 stonith:external/stonith-helper \
>>>>>>>>>>>    params \
>>>>>>>>>>>        priority="1" \
>>>>>>>>>>>        stonith-timeout="40" \
>>>>>>>>>>>        hostlist="lbv1.beta.com" \
>>>>>>>>>>>        dead_check_target="" \
>>>>>>>>>>>        standby_wait_time="10" \
>>>>>>>>>>>        standby_check_command="/usr/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \
>>>>>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>>>>>    op monitor interval="3600s" timeout="60s" on-fail="restart" \
>>>>>>>>>>>    op stop interval="0s" timeout="60s" on-fail="ignore"
>>>>>>>>>>>primitive Stonith2-1 stonith:external/stonith-helper \
>>>>>>>>>>>    params \
>>>>>>>>>>>        priority="1" \
>>>>>>>>>>>        stonith-timeout="40" \
>>>>>>>>>>>        hostlist="lbv2.beta.com" \
>>>>>>>>>>>        dead_check_target="" \
>>>>>>>>>>>        standby_wait_time="10" \
>>>>>>>>>>>        standby_check_command="/usr/sbin/crm_resource -r varnishd -W | grep -q `hostname`" \
>>>>>>>>>>>    op start interval="0s" timeout="60s" on-fail="restart" \
>>>>>>>>>>>    op monitor interval="3600s" timeout="60s" on-fail="restart" \
>>>>>>>>>>>    op stop interval="0s" timeout="60s" on-fail="ignore"
>>>>>>>>>>>Feb 27 19:29:04 lbv1.beta.com stonith: [18566]: CRIT: external_reset_req
>>>>>>>>>>>: 'stonith-helper reset' for host lbv2.beta.com failed with rc 1
>>>>>>>>>>>Feb 27 19:29:04 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Operation 'reboot' [18565] (call 0 from d2acf6a5-ef8d-4249-aaab-25a8686d6647) fo
>>>>>>>>>>>r host 'lbv2.beta.com' with device 'Stonith2-1' returned: -2
>>>>>>>>>>>Feb 27 19:29:04 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Stonith2-1: Performing: stonith -t external/stonith-helper -T reset lbv2.
>>>>>>>>>>>Feb 27 19:29:04 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Stonith2-1: failed: lbv2.beta.com 5
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: call_remote_ston
>>>>>>>>>>>ith: Requesting that lbv1.beta.com perform op reboot lbv2.beta.c
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-1 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-2 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-3 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: stonith_fence: F
>>>>>>>>>>>ound 3 matching devices for 'lbv2.beta.com'
>>>>>>>>>>>Feb 27 19:29:05 lbv1.beta.com stonith-ng: [2815]: info: stonith_command:
>>>>>>>>>>> Processed st_fence from lbv1.beta.com: rc=-1
>>>>>>>>>>>Feb 27 19:29:08 lbv1.beta.com crm_resource: [18790]: info: Invoked: /usr
>>>>>>>>>>>/sbin/crm_resource -r varnishd -W
>>>>>>>>>>>Feb 27 19:29:09 lbv1.beta.com stonith: [18706]: CRIT: external_reset_req
>>>>>>>>>>>: 'stonith-helper reset' for host lbv2.beta.com failed with rc 1
>>>>>>>>>>>Feb 27 19:29:09 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Operation 'reboot' [18705] (call 0 from d2acf6a5-ef8d-4249-aaab-25a8686d6647) fo
>>>>>>>>>>>r host 'lbv2.beta.com' with device 'Stonith2-1' returned: -2
>>>>>>>>>>>Feb 27 19:29:09 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Stonith2-1: Performing: stonith -t external/stonith-helper -T reset lbv2.
>>>>>>>>>>>Feb 27 19:29:09 lbv1.beta.com stonith-ng: [2815]: ERROR: log_operation:
>>>>>>>>>>>Stonith2-1: failed: lbv2.beta.com 5
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: call_remote_ston
>>>>>>>>>>>ith: Requesting that lbv1.beta.com perform op reboot lbv2.beta.c
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-1 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-2 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: can_fence_host_w
>>>>>>>>>>>ith_device: Stonith2-3 can fence lbv2.beta.com: dynamic-list
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: stonith_fence: F
>>>>>>>>>>>ound 3 matching devices for 'lbv2.beta.com'
>>>>>>>>>>>Feb 27 19:29:10 lbv1.beta.com stonith-ng: [2815]: info: stonith_command:
>>>>>>>>>>> Processed st_fence from lbv1.beta.com: rc=-1
>>>>>>>>>>>Feb 27 19:29:13 lbv1.beta.com crm_resource: [18953]: info: Invoked: /usr
>>>>>>>>>>>/sbin/crm_resource -r varnishd -W
>>>>>>>>>>>ELF Systems
>>>>>>>>>>>Masamichi Fukuda
>>>>>>>>>>>mail to: masamichi_fukud****@elf-s*****
>>>>>>>>>>>Linux-ha-japan mailing list
>>>>>>>>>>Linux-ha-japan mailing list
>>>>>>>>>ELF Systems
>>>>>>>>>Masamichi Fukuda
>>>>>>>>>mail to: masamichi_fukud****@elf-s*****  
>>>>>>>ELF Systems
>>>>>>>Masamichi Fukuda
>>>>>>>mail to: masamichi_fukud****@elf-s*****  
>>>>>ELF Systems
>>>>>Masamichi Fukuda
>>>>>mail to: masamichi_fukud****@elf-s*****
>>>ELF Systems
>>>Masamichi Fukuda
>>>mail to: masamichi_fukud****@elf-s*****
>ELF Systems
>Masamichi Fukuda
>mail to: masamichi_fukud****@elf-s*****

Linux-ha-japan メーリングリストの案内
Back to archive index