[Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて

Back to archive index

Masamichi Fukuda - elf-systems masamichi_fukud****@elf-s*****
2015年 3月 17日 (火) 23:46:55 JST


山内さん

こんばんは、福田です。

stonith-helperの-x指定は何かやり方が違うんでしょうかね。

stonith-helperを外して、xen0だけにして起動してみました。

# crm_mon -rfA
Last updated: Tue Mar 17 23:38:53 2015
Last change: Tue Mar 17 23:30:34 2015
Stack: heartbeat
Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - parti
tion with quorum
Version: 1.1.12-e32080b
2 Nodes configured
6 Resources configured


Online: [ lbv1.beta.com lbv2.beta.com ]

Full list of resources:

Stonith1-2      (stonith:external/xen0):        Stopped
Stonith2-2      (stonith:external/xen0):        Stopped
 Resource Group: HAvarnish
     vip_208    (ocf::heartbeat:IPaddr2):       Started lbv1.beta.com
     varnishd   (lsb:varnish):  Started lbv1.beta.com
 Clone Set: clone_ping [ping]
     Started: [ lbv1.beta.com lbv2.beta.com ]

Node Attributes:
* Node lbv1.beta.com:
    + default_ping_set                  : 100
* Node lbv2.beta.com:
    + default_ping_set                  : 100

Migration summary:
* Node lbv1.beta.com:
   Stonith2-2: migration-threshold=1 fail-count=1000000 last-failure='Tue
Mar 17
 23:38:34 2015'
* Node lbv2.beta.com:
   Stonith1-2: migration-threshold=1 fail-count=1000000 last-failure='Tue
Mar 17
 23:38:27 2015'

Failed actions:
    Stonith2-2_start_0 on lbv1.beta.com 'unknown error' (1): call=23, st
atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:32 2015',
queue
d=0ms, exec=1061ms
    Stonith1-2_start_0 on lbv2.beta.com 'unknown error' (1): call=23, st
atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:25 2015',
queue
d=0ms, exec=1342ms


stonith-helperがあるときと同様のfialed actionsが出ているようです。

宜しくお願いします。

以上


2015年3月17日 22:38 <renay****@ybb*****>:

> 福田さん
>
> こんばんは、山内です。
>
> ちなみに可能であれば、external/stonith-helperを外して、external/xen0だけにした場合に
> どうなるか?を確認すると、問題の切り分けになるかもしれません。
>
> 以上です。
>
>
>
> ----- Original Message -----
> > From: "renay****@ybb*****" <renay****@ybb*****>
> > To: "linux****@lists*****" <
> linux****@lists*****>
> > Cc:
> > Date: 2015/3/17, Tue 22:28
> > Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> >
> > 福田さん
> >
> > こんばんは、山内です。
> >
> > 変わらないようですね。。。
> >
> > とりあえず、明日くらいに、RHEL上ですが、
> >
> > Heartbeat3.0.6
> > Pacemakerの最新
> >
> >
> 組み合わせで、同じような設定(リソースはDummy、external/xen0はexternal/sshになりますが)stonith-helperが動くかどうかを確認してみます。
> >
> > #stonith-helperの-x指定の出力が確認出来ると、もう少し問題が絞りやすいのですが・・・
> >
> >
> > 以上です。
> >
> >
> >
> > ----- Original Message -----
> >> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >> To: 山内英生 <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >> Date: 2015/3/17, Tue 21:24
> >> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> >>
> >>
> >> 山内さん
> >>
> >> こんばんは、福田です。
> >> 最新版の情報をありがとうございました。
> >>
> >> 早速インストールしてみました。
> >>
> >> 起動後の状態です。
> >>
> >> failed actionsは変わりないようです。
> >>
> >>
> >>
> >> # crm_mon -rfA
> >> Last updated: Tue Mar 17 21:03:49 2015
> >> Last change: Tue Mar 17 20:30:58 2015
> >> Stack: heartbeat
> >> Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) -
> parti
> >> tion with quorum
> >> Version: 1.1.12-e32080b
> >> 2 Nodes configured
> >> 8 Resources configured
> >>
> >>
> >> Online: [ lbv1.beta.com lbv2.beta.com ]
> >>
> >> Full list of resources:
> >>
> >>  Resource Group: HAvarnish
> >>      vip_208    (ocf::heartbeat:IPaddr2):       Started lbv1.beta.com
> >>      varnishd   (lsb:varnish):  Started lbv1.beta.com
> >>  Resource Group: grpStonith1
> >>      Stonith1-1 (stonith:external/stonith-helper):      Stopped
> >>      Stonith1-2 (stonith:external/xen0):        Stopped
> >>  Resource Group: grpStonith2
> >>      Stonith2-1 (stonith:external/stonith-helper):      Stopped
> >>      Stonith2-2 (stonith:external/xen0):        Stopped
> >>  Clone Set: clone_ping [ping]
> >>      Started: [ lbv1.beta.com lbv2.beta.com ]
> >>
> >> Node Attributes:
> >> * Node lbv1.beta.com:
> >>     + default_ping_set                  : 100
> >> * Node lbv2.beta.com:
> >>     + default_ping_set                  : 100
> >>
> >> Migration summary:
> >> * Node lbv1.beta.com:
> >>    Stonith2-1: migration-threshold=1 fail-count=1000000
> > last-failure='Tue Mar 17
> >>  21:03:39 2015'
> >> * Node lbv2.beta.com:
> >>    Stonith1-1: migration-threshold=1 fail-count=1000000
> > last-failure='Tue Mar 17
> >>  21:03:32 2015'
> >>
> >> Failed actions:
> >>     Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1):
> > call=31, st
> >> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17
> > 21:03:37 2015', queue
> >> d=0ms, exec=1085ms
> >>     Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1):
> > call=18, st
> >> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17
> > 21:03:30 2015', queue
> >> d=0ms, exec=1061ms
> >>
> >>
> >>
> >>
> >> ログです。
> >>
> >>
> >> # less /var/log/ha-debug
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Pacemaker
> support:
> > yes
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: File
> > /etc/ha.d//haresources exists.
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: This file is
> not used
> > because pacemaker is enabled
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/heartbeat/ccm
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/pacemaker/cib
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/pacemaker/stonithd
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/pacemaker/lrmd
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/pacemaker/attrd
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking
> access of:
> > /usr/local/heartbeat/libexec/pacemaker/crmd
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Core dumps
> could be
> > lost if multiple dumps occur.
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting
> > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum
> > supportability
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting
> > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Logging daemon
> is
> > disabled --enabling logging daemon is recommended
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info:
> > **************************
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Configuration
> > validated. Starting heartbeat 3.0.6
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: heartbeat:
> version
> > 3.0.6
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Heartbeat
> generation:
> > 1423534116
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: seed is
> -1702799346
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast:
> write
> > socket priority set to IPTOS_LOWDELAY on eth1
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast:
> bound
> > send socket to device: eth1
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: set
> > SO_REUSEADDR
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast:
> bound
> > receive socket to device: eth1
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast:
> started
> > on port 694 interface eth1 to 10.0.17.133
> >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Local status
> now set
> > to: 'up'
> >> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Link
> > lbv2.beta.com:eth1 up.
> >> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Status update
> for
> > node lbv2.beta.com: status up
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Comm_now_up():
> > updating status to active
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Local status
> now set
> > to: 'active'
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child
> client
> > "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: debug:
> get_delnodelist:
> > delnodelist=
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4250]: info: Starting
> > "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid
> > 4250)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4246]: info: Starting
> > "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid
> > 4246)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4249]: info: Starting
> > "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113
> > (pid 4249)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4245]: info: Starting
> > "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid
> > 4245)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4248]: info: Starting
> > "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid
> > 4248)
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4247]: info: Starting
> > "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid
> > 4247)
> >> Mar 17 21:02:47 lbv1.beta.com ccm: [4245]: info: Hostname:
> lbv1.beta.com
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue
> length
> > from heartbeat to client ccm is set to 1024
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue
> length
> > from heartbeat to client attrd is set to 1024
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue
> length
> > from heartbeat to client stonith-ng is set to 1024
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Status update
> for
> > node lbv2.beta.com: status active
> >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue
> length
> > from heartbeat to client cib is set to 1024
> >> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost
> packet(s) for
> > [lbv2.beta.com] [15:17]
> >> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: info: No pkts missing
> from
> > lbv2.beta.com!
> >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost
> packet(s) for
> > [lbv2.beta.com] [19:21]
> >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: No pkts missing
> from
> > lbv2.beta.com!
> >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: the send queue
> length
> > from heartbeat to client crmd is set to 1024
> >> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost
> packet(s) for
> > [lbv2.beta.com] [24:26]
> >> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: info: No pkts missing
> from
> > lbv2.beta.com!
> >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost
> packet(s) for
> > [lbv2.beta.com] [26:28]
> >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing
> from
> > lbv2.beta.com!
> >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost
> packet(s) for
> > [lbv2.beta.com] [30:32]
> >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing
> from
> > lbv2.beta.com!
> >>
> >>
> >>
> >> # less /var/log/error
> >>
> >> Mar 17 21:02:47 lbv1 attrd[4249]:    error: ha_msg_dispatch: Ignored
> > incoming message. Please set_msg_callback on hbclstat
> >> Mar 17 21:02:48 lbv1 attrd[4249]:    error: ha_msg_dispatch: Ignored
> > incoming message. Please set_msg_callback on hbclstat
> >> Mar 17 21:02:53 lbv1 stonith-ng[4247]:    error: ha_msg_dispatch:
> Ignored
> > incoming message. Please set_msg_callback on hbclstat
> >> Mar 17 21:02:53 lbv1 stonith-ng[4247]:    error: ha_msg_dispatch:
> Ignored
> > incoming message. Please set_msg_callback on hbclstat
> >> Mar 17 21:03:39 lbv1 crmd[4250]:    error: process_lrm_event: Operation
> > Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4,
> cib-update=42,
> > confirmed=true) Error
> >>
> >> # cat syslog|egrep 'Mar 17 21:03|Mar 17 21:02' |egrep
> > 'heartbeat|stonith|pacemaker|error'
> >> Mar 17 21:03:24 lbv1 pengine[4253]:   notice: process_pe_message:
> Calculated
> > Transition 0: /var/lib/pacemaker/pengine/pe-input-115.bz2
> >> Mar 17 21:03:27 lbv1 crmd[4250]:   notice: run_graph: Transition 0
> > (Complete=15, Pending=0, Fired=0, Skipped=16, Incomplete=2,
> > Source=/var/lib/pacemaker/pengine/pe-input-115.bz2): Stopped
> >> Mar 17 21:03:29 lbv1 pengine[4253]:   notice: process_pe_message:
> Calculated
> > Transition 1: /var/lib/pacemaker/pengine/pe-input-116.bz2
> >> Mar 17 21:03:34 lbv1 crmd[4250]:   notice: run_graph: Transition 1
> > (Complete=8, Pending=0, Fired=0, Skipped=12, Incomplete=1,
> > Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Stopped
> >> Mar 17 21:03:37 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure:
> > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown
> error (1)
> >> Mar 17 21:03:37 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure:
> > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown
> error (1)
> >> Mar 17 21:03:37 lbv1 pengine[4253]:   notice: process_pe_message:
> Calculated
> > Transition 2: /var/lib/pacemaker/pengine/pe-input-117.bz2
> >> Mar 17 21:03:39 lbv1 stonith-ng[4247]:   notice: log_operation:
> Operation
> > 'monitor' [4377] for device 'Stonith2-1' returned: -201 (Generic
> > Pacemaker error)
> >> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation:
> > Stonith2-1:4377 [ Performing: stonith -t external/stonith-helper -S ]
> >> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation:
> > Stonith2-1:4377 [ failed to exec "stonith" ]
> >> Mar 17 21:03:39 lbv1 stonith-ng[4247]:  warning: log_operation:
> > Stonith2-1:4377 [ failed:  2 ]
> >> Mar 17 21:03:39 lbv1 crmd[4250]:    error: process_lrm_event: Operation
> > Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4,
> cib-update=42,
> > confirmed=true) Error
> >> Mar 17 21:03:40 lbv1 crmd[4250]:   notice: run_graph: Transition 2
> > (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-117.bz2): Stopped
> >> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure:
> > Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown
> error (1)
> >> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure:
> > Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown
> error (1)
> >> Mar 17 21:03:42 lbv1 pengine[4253]:  warning: unpack_rsc_op_failure:
> > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown
> error (1)
> >> Mar 17 21:03:42 lbv1 pengine[4253]:   notice: process_pe_message:
> Calculated
> > Transition 3: /var/lib/pacemaker/pengine/pe-input-118.bz2
> >> Mar 17 21:03:42 lbv1 IPaddr2(vip_208)[4448]: INFO:
> > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
> > /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto
> > not_used not_used
> >> Mar 17 21:03:47 lbv1 crmd[4250]:   notice: run_graph: Transition 3
> > (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> > Source=/var/lib/pacemaker/pengine/pe-input-118.bz2): Complete
> >>
> >> 宜しくお願いします。
> >>
> >> 以上
> >>
> >>
> >>
> >> 2015年3月17日 18:31 <renay****@ybb*****>:
> >>
> >> 福田さん
> >>>
> >>> こんばんは、山内です。
> >>>
> >>> tag付けされていないので、本日の最新版は、
> >>>
> >>>  *
> >
> https://github.com/ClusterLabs/pacemaker/tree/e32080b460f81486b85d08ec958582b3e72d858c
> >>>
> >>>
> >>> になります。
> >>> 右側の[Download ZIP]からダウンロード出来ます。
> >>>
> >>> 以上です。
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >>>
> >>>> To: "renay****@ybb*****"
> > <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >>>> Date: 2015/3/17, Tue 18:07
> >>>> Subject: スプリットブレイン時のSTONITHエラーについて
> >>>>
> >>>>
> >>>> 山内さん
> >>>>
> >>>>
> >>>> お疲れ様です、福田です。
> >>>>
> >>>>
> >>>> こちらを見たのですが、
> >>>> https://github.com/ClusterLabs/pacemaker/tags
> >>>>
> >>>>
> >>>>
> >>>> pacemaker 1.1.12 561c4cf が最新のようなのですが。
> >>>> 済みませんが、これ以降の最新版はどちらにあるか教えて頂けますか。
> >>>>
> >>>>
> >>>> 宜しくお願いします。
> >>>>
> >>>>
> >>>> 以上
> >>>>
> >>>>
> >>>>
> >>>> 2015年3月17日火曜日、<renay****@ybb*****>さんは書きました:
> >>>>
> >>>> 福田さん
> >>>>>
> >>>>> お疲れ様です。山内です。
> >>>>>
> >>>>> はい。古いです。
> >>>>>
> >>>>> PacemakerがHeartbeat3.0.6に対応したのは意外と最近です。
> >>>>> もっと新しいものを入れてください。(また、ソースから構築する必要がありますが・・・・)
> >>>>>
> >>>>>
> >>>>>
> >>>>> 本家のgithubから入手可能です。
> >>>>>  * https://github.com/ClusterLabs/pacemaker
> >>>>>
> >>>>>
> >>>>> 場合によっては、最新のmasterはエラーなどが出る場合がありますので、その場合は、バージョンを古い方にたぐって
> >>>>> いくのが良いと思います。
> >>>>>
> >>>>> 以上です。
> >>>>>
> >>>>>
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >>>>>> To: 山内英生 <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >>>>>> Date: 2015/3/17, Tue 16:06
> >>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> >>>>>>
> >>>>>>
> >>>>>> 山内さん
> >>>>>>
> >>>>>> お疲れ様です、福田です。
> >>>>>>
> >>>>>> 以前のメールでheartbeatとpacemakerを最新版を入れたほうが良いと回答頂きました。
> >>>>>> そこで今回、heartbeat3.0.6とpacemaker1.1.12を入れたのですが。
> >>>>>>
> >>>>>> heartbeat configuration: Version = "3.0.6"
> >>>>>> pacemaker configuration: Version = 1.1.12 (Build:
> > 561c4cf)pacemakerがまだ古いということでしょうか。
> >>>>>>
> >>>>>> 済みませんが、宜しくお願いします。
> >>>>>>
> >>>>>> 以上
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2015年3月17日 14:59 <renay****@ybb*****>:
> >>>>>>
> >>>>>> 福田さん
> >>>>>>>
> >>>>>>> お疲れ様です。山内です。
> >>>>>>>
> >>>>>>> ふと思ったのすが、以前のやり取りのメールで以下と回答してますが、問題ないでしょうか?
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>>>>  2)Heartbeat3.0.6+Pacemaker最新 :
> > OK
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> > どうやら、Heartbeatも最新版3.0.6を組合せる必要があるようです。
> >>>>>>>>>>>>>
> >  * http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/cceeb47a7d8f
> >>>>>>>
> >>>>>>> 以下のcrm_monのバージョンを見ると、1.1.12のようです。
> >>>>>>> Heartbeat3.0.6と組み合わせるには、かなり新しめのPacemakerが必要です。
> >>>>>>>
> >>>>>>>> # crm_mon -rfA
> >>>>>>>>
> >>>>>>>> Last updated: Tue Mar 17 14:14:39 2015
> >>>>>>>> Last change: Tue Mar 17 14:01:43 2015
> >>>>>>>> Stack: heartbeat
> >>>>>>>> Current DC: lbv2.beta.com
> > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
> >>>>>>>> tion with quorum
> >>>>>>>> Version: 1.1.12-561c4cf
> >>>>>>>
> >>>>>>> たぶん、以下の変更以降は少なくとも必要かと思います。
> >>>>>>>
> >>>>>>>
> https://github.com/ClusterLabs/pacemaker/commit/f2302da063d08719d28367d8e362b8bfb0f85bf3
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> 以上です。
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >>>>>>>> To: 山内英生 <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >>>>>>>
> >>>>>>>> Date: 2015/3/17, Tue 14:38
> >>>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 山内さん
> >>>>>>>>
> >>>>>>>> お疲れ様です、福田です。
> >>>>>>>>
> >>>>>>>> stonith-helperのシェバング行に-xを追加すれば良いのでしょうか?
> >>>>>>>> stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。
> >>>>>>>>
> >>>>>>>> crm_monでは先ほどと変わりはないようです。
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> # crm_mon -rfA
> >>>>>>>>
> >>>>>>>> Last updated: Tue Mar 17 14:14:39 2015
> >>>>>>>> Last change: Tue Mar 17 14:01:43 2015
> >>>>>>>> Stack: heartbeat
> >>>>>>>> Current DC: lbv2.beta.com
> > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
> >>>>>>>> tion with quorum
> >>>>>>>> Version: 1.1.12-561c4cf
> >>>>>>>> 2 Nodes configured
> >>>>>>>> 8 Resources configured
> >>>>>>>>
> >>>>>>>> Online: [ lbv1.beta.com lbv2.beta.com ]
> >>>>>>>>
> >>>>>>>> Full list of resources:
> >>>>>>>>
> >>>>>>>>  Resource Group: HAvarnish
> >>>>>>>>      vip_208    (ocf::heartbeat:IPaddr2):
> > Started lbv1.beta.com
> >>>>>>>>      varnishd   (lsb:varnish):  Started
> > lbv1.beta.com
> >>>>>>>>  Resource Group: grpStonith1
> >>>>>>>>      Stonith1-1
> > (stonith:external/stonith-helper):      Stopped
> >>>>>>>>      Stonith1-2 (stonith:external/xen0):
> > Stopped
> >>>>>>>>  Resource Group: grpStonith2
> >>>>>>>>      Stonith2-1
> > (stonith:external/stonith-helper):      Stopped
> >>>>>>>>      Stonith2-2 (stonith:external/xen0):
> > Stopped
> >>>>>>>>  Clone Set: clone_ping [ping]
> >>>>>>>>      Started: [ lbv1.beta.com lbv2.beta.com ]
> >>>>>>>>
> >>>>>>>> Node Attributes:
> >>>>>>>> * Node lbv1.beta.com:
> >>>>>>>>     + default_ping_set                  : 100
> >>>>>>>> * Node lbv2.beta.com:
> >>>>>>>>     + default_ping_set                  : 100
> >>>>>>>>
> >>>>>>>> Migration summary:
> >>>>>>>> * Node lbv2.beta.com:
> >>>>>>>>    Stonith1-1: migration-threshold=1
> > fail-count=1000000 last-failure='Tue Mar 17
> >>>>>>>>  14:12:16 2015'
> >>>>>>>> * Node lbv1.beta.com:
> >>>>>>>>    Stonith2-1: migration-threshold=1
> > fail-count=1000000 last-failure='Tue Mar 17
> >>>>>>>>  14:12:21 2015'
> >>>>>>>>
> >>>>>>>> Failed actions:
> >>>>>>>>     Stonith1-1_start_0 on lbv2.beta.com 'unknown
> > error' (1): call=31, st
> >>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:14
> > 2015', queued=0ms, exec=1065ms
> >>>>>>>>     Stonith2-1_start_0 on lbv1.beta.com 'unknown
> > error' (1): call=26, st
> >>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:19
> > 2015', queued=0ms, exec=1081ms
> >>>>>>>>
> >>>>>>>> その他のログを探してみました。
> >>>>>>>>
> >>>>>>>> heartbeat起動時です。
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> # less /var/log/pm_logconv.out
> >>>>>>>> Mar 17 14:11:28 lbv1.beta.com info: Starting
> > Heartbeat 3.0.6.
> >>>>>>>> Mar 17 14:11:33 lbv1.beta.com info: Link
> > lbv2.beta.com:eth1 is up.
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "ccm" process. (pid=13264)
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "lrmd" process. (pid=13267)
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "attrd" process. (pid=13268)
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "stonithd" process. (pid=13266)
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "cib" process. (pid=13265)
> >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start
> > "crmd" process. (pid=13269)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> # less /var/log/error
> >>>>>>>> Mar 17 14:12:20 lbv1 crmd[13269]:    error:
> > process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com,
> call=26,
> > status=4, cib-update=19, confirmed=true) Error
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> syslogからstonithをgrepしたものです
> >>>>>>>>
> >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info:
> > Starting child client
> > "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
> >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13266]: info:
> > Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0
> > gid 0 (pid 13266)
> >>>>>>>> Mar 17 14:11:34 lbv1 stonithd[13266]:   notice:
> > crm_cluster_connect: Connecting to cluster infrastructure: heartbeat
> >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the
> > send queue length from heartbeat to client stonithd is set to 1024
> >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:   notice:
> > setup_cib: Watching for stonith topology changes
> >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:   notice:
> > unpack_config: On loss of CCM Quorum: Ignore
> >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:  warning:
> > handle_startup_fencing: Blind faith: not fencing unseen nodes
> >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]:  warning:
> > handle_startup_fencing: Blind faith: not fencing unseen nodes
> >>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]:   notice:
> > stonith_device_register: Added 'Stonith2-1' to the device list (1 active
> > devices)
> >>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]:   notice:
> > stonith_device_register: Added 'Stonith2-2' to the device list (2 active
> > devices)
> >>>>>>>> Mar 17 14:12:04 lbv1 stonithd[13266]:   notice:
> > xml_patch_version_check: Versions did not change in patch 0.5.0
> >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:   notice:
> > log_operation: Operation 'monitor' [13386] for device
> > 'Stonith2-1' returned: -201 (Generic Pacemaker error)
> >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning:
> > log_operation: Stonith2-1:13386 [ Performing: stonith -t
> external/stonith-helper
> > -S ]
> >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning:
> > log_operation: Stonith2-1:13386 [ failed to exec "stonith" ]
> >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]:  warning:
> > log_operation: Stonith2-1:13386 [ failed:  2 ]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 宜しくお願いします。
> >>>>>>>>
> >>>>>>>> 以上
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2015年3月17日 13:32 <renay****@ybb*****>:
> >>>>>>>>
> >>>>>>>> 福田さん
> >>>>>>>>>
> >>>>>>>>> お疲れ様です。山内です。
> >>>>>>>>>
> >>>>>>>>> ということは、stonith-helperのstartに問題があるようですね。
> >>>>>>>>>
> >>>>>>>>> stonith-helperの先頭に
> >>>>>>>>>
> >>>>>>>>> #!/bin/bash -x
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> を入れて、クラスタを起動すると何かわかるかも知れません。
> >>>>>>>>>
> >>>>>>>>> ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 以上です。
> >>>>>>>>>
> >>>>>>>>> ----- Original Message -----
> >>>>>>>>>> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >>>>>>>>>> To: 山内英生 <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >>>>>>>>>
> >>>>>>>>>> Date: 2015/3/17, Tue 12:31
> >>>>>>>>>> Subject: Re: [Linux-ha-jp]
> > スプリットブレイン時のSTONITHエラーについて
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 山内さん
> >>>>>>>>>> cc:松島さん
> >>>>>>>>>>
> >>>>>>>>>> こんにちは、福田です。
> >>>>>>>>>>
> >>>>>>>>>> 同じディレクトリにxen0はありました。
> >>>>>>>>>>
> >>>>>>>>>> # pwd
> >>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external
> >>>>>>>>>>
> >>>>>>>>>> # ls
> >>>>>>>>>> drac5           ibmrsa          kdumpcheck
> > riloe          vmware
> >>>>>>>>>> dracmc-telnet  ibmrsa-telnet  libvirt
> > ssh          xen0
> >>>>>>>>>> hetzner        ipmi          nut
> > stonith-helper  xen0-ha
> >>>>>>>>>> hmchttp        ippower9258    rackpdu
> > vcenter
> >>>>>>>>>>
> >>>>>>>>>> 宜しくお願いします。
> >>>>>>>>>>
> >>>>>>>>>> 以上
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2015-03-17 10:53 GMT+09:00
> > <renay****@ybb*****>:
> >>>>>>>>>>
> >>>>>>>>>> 福田さん
> >>>>>>>>>>> cc:松島さん
> >>>>>>>>>>>
> >>>>>>>>>>> お疲れ様です。山内です。
> >>>>>>>>>>>
> >>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。
> >>>>>>>>>>>>
> >>>>>>>>>>>> stonith-helperがおかしいのでしょうか。
> >>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。
> >>>>>>>>>>>> stonith-helperはここに配置されています。
> >>>>>>>>>>>>
> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper
> >>>>>>>>>>>
> >>>>>>>>>>> このディレクトリにxen0もありますか?
> >>>>>>>>>>>
> 無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに
> >>>>>>>>>>> コピーしてみてください。
> >>>>>>>>>>>
> >>>>>>>>>>> それで稼働するなら、pm_extrasのインストールに問題があるということになります。
> >>>>>>>>>>>
> >>>>>>>>>>> 以上です。
> >>>>>>>>>>>
> >>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>> From: Masamichi Fukuda - elf-systems
> > <masamichi_fukud****@elf-s*****>
> >>>>>>>>>>>> To: 山内英生
> > <renay****@ybb*****>;
> > "linux****@lists*****"
> > <linux****@lists*****>
> >>>>>>>>>>>
> >>>>>>>>>>>> Date: 2015/3/17, Tue 10:31
> >>>>>>>>>>>> Subject: Re: [Linux-ha-jp]
> > スプリットブレイン時のSTONITHエラーについて
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 山内さん
> >>>>>>>>>>>> cc:松島さん
> >>>>>>>>>>>>
> >>>>>>>>>>>> おはようございます、福田です。
> >>>>>>>>>>>> crmの例をありがとうございます。
> >>>>>>>>>>>>
> >>>>>>>>>>>> 早速、こちらの環境に合わせてみました。
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ cat test.crm
> >>>>>>>>>>>> ### Cluster Option ###
> >>>>>>>>>>>> property \
> >>>>>>>>>>>>
> > no-quorum-policy="ignore" \
> >>>>>>>>>>>>     stonith-enabled="true"
> > \
> >>>>>>>>>>>>
> > startup-fencing="false" \
> >>>>>>>>>>>>     stonith-timeout="710s"
> > \
> >>>>>>>>>>>>
> > crmd-transition-delay="2s"
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Resource Default ###
> >>>>>>>>>>>> rsc_defaults \
> >>>>>>>>>>>>
> > resource-stickiness="INFINITY" \
> >>>>>>>>>>>>
> > migration-threshold="1"
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Group Configuration ###
> >>>>>>>>>>>> group HAvarnish \
> >>>>>>>>>>>>     vip_208 \
> >>>>>>>>>>>>     varnishd
> >>>>>>>>>>>>
> >>>>>>>>>>>> group grpStonith1 \
> >>>>>>>>>>>>     Stonith1-1 \
> >>>>>>>>>>>>     Stonith1-2
> >>>>>>>>>>>>
> >>>>>>>>>>>> group grpStonith2 \
> >>>>>>>>>>>>     Stonith2-1 \
> >>>>>>>>>>>>     Stonith2-2
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Clone Configuration ###
> >>>>>>>>>>>> clone clone_ping \
> >>>>>>>>>>>>     ping
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Fencing Topology ###
> >>>>>>>>>>>> fencing_topology \
> >>>>>>>>>>>>     lbv1.beta.com: Stonith1-1
> > Stonith1-2 \
> >>>>>>>>>>>>     lbv2.beta.com: Stonith2-1
> > Stonith2-2
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Primitive Configuration ###
> >>>>>>>>>>>> primitive vip_208
> > ocf:heartbeat:IPaddr2 \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > ip="192.168.17.208" \
> >>>>>>>>>>>>         nic="eth0" \
> >>>>>>>>>>>>         cidr_netmask="24"
> > \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="90s" on-fail="restart" \
> >>>>>>>>>>>>     op monitor
> > interval="5s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="100s" on-fail="fence"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive varnishd lsb:varnish \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="90s" on-fail="restart" \
> >>>>>>>>>>>>     op monitor
> > interval="10s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="100s" on-fail="fence"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive ping ocf:pacemaker:ping
> > \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > name="default_ping_set" \
> >>>>>>>>>>>>
> > host_list="192.168.17.254" \
> >>>>>>>>>>>>         multiplier="100"
> > \
> >>>>>>>>>>>>         dampen="1" \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="90s" on-fail="restart" \
> >>>>>>>>>>>>     op monitor
> > interval="10s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="100s" on-fail="fence"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive Stonith1-1
> > stonith:external/stonith-helper \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > pcmk_reboot_retries="1" \
> >>>>>>>>>>>>
> > pcmk_reboot_timeout="40s" \
> >>>>>>>>>>>>
> > hostlist="lbv1.beta.com" \
> >>>>>>>>>>>>
> > dead_check_target="192.168.17.132 10.0.17.132" \
> >>>>>>>>>>>>
> > standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep
> > -q `hostname`" \
> >>>>>>>>>>>>
> > run_online_check="yes" \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive Stonith1-2
> > stonith:external/xen0 \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > pcmk_reboot_timeout="60s" \
> >>>>>>>>>>>>
> > hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \
> >>>>>>>>>>>>
> > dom0="xen0.beta.com" \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>     op monitor
> > interval="3600s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive Stonith2-1
> > stonith:external/stonith-helper \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > pcmk_reboot_retries="1" \
> >>>>>>>>>>>>
> > pcmk_reboot_timeout="40s" \
> >>>>>>>>>>>>
> > hostlist="lbv2.beta.com" \
> >>>>>>>>>>>>
> > dead_check_target="192.168.17.133 10.0.17.133" \
> >>>>>>>>>>>>
> > standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep
> > -q `hostname`" \
> >>>>>>>>>>>>
> > run_online_check="yes" \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>
> >>>>>>>>>>>> primitive Stonith2-2
> > stonith:external/xen0 \
> >>>>>>>>>>>>     params \
> >>>>>>>>>>>>
> > pcmk_reboot_timeout="60s" \
> >>>>>>>>>>>>
> > hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \
> >>>>>>>>>>>>
> > dom0="xen0.beta.com" \
> >>>>>>>>>>>>     op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>     op monitor
> > interval="3600s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>     op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>
> >>>>>>>>>>>> ### Resource Location ###
> >>>>>>>>>>>> location HA_location-1 HAvarnish
> > \
> >>>>>>>>>>>>     rule 200: #uname eq
> > lbv1.beta.com \
> >>>>>>>>>>>>     rule 100: #uname eq
> > lbv2.beta.com
> >>>>>>>>>>>>
> >>>>>>>>>>>> location HA_location-2 HAvarnish
> > \
> >>>>>>>>>>>>     rule -INFINITY: not_defined
> > default_ping_set or default_ping_set lt 100
> >>>>>>>>>>>>
> >>>>>>>>>>>> location HA_location-3 grpStonith1
> > \
> >>>>>>>>>>>>     rule -INFINITY: #uname eq
> > lbv1.beta.com
> >>>>>>>>>>>>
> >>>>>>>>>>>> location HA_location-4 grpStonith2
> > \
> >>>>>>>>>>>>     rule -INFINITY: #uname eq
> > lbv2.beta.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> これを流しこんだところ、昨日とはメッセージが異なります。
> >>>>>>>>>>>> pingのメッセージはなくなっていました。
> >>>>>>>>>>>>
> >>>>>>>>>>>> # crm_mon -rfA
> >>>>>>>>>>>> Last updated: Tue Mar 17 10:21:28
> > 2015
> >>>>>>>>>>>> Last change: Tue Mar 17 10:21:09
> > 2015
> >>>>>>>>>>>> Stack: heartbeat
> >>>>>>>>>>>> Current DC: lbv2.beta.com
> > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
> >>>>>>>>>>>> tion with quorum
> >>>>>>>>>>>> Version: 1.1.12-561c4cf
> >>>>>>>>>>>> 2 Nodes configured
> >>>>>>>>>>>> 8 Resources configured
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Online: [ lbv1.beta.com
> > lbv2.beta.com ]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Full list of resources:
> >>>>>>>>>>>>
> >>>>>>>>>>>>  Resource Group: HAvarnish
> >>>>>>>>>>>>      vip_208
> > (ocf::heartbeat:IPaddr2):       Started lbv1.beta.com
> >>>>>>>>>>>>      varnishd   (lsb:varnish):
> > Started lbv1.beta.com
> >>>>>>>>>>>>  Resource Group: grpStonith1
> >>>>>>>>>>>>      Stonith1-1
> > (stonith:external/stonith-helper):      Stopped
> >>>>>>>>>>>>      Stonith1-2
> > (stonith:external/xen0):        Stopped
> >>>>>>>>>>>>  Resource Group: grpStonith2
> >>>>>>>>>>>>      Stonith2-1
> > (stonith:external/stonith-helper):      Stopped
> >>>>>>>>>>>>      Stonith2-2
> > (stonith:external/xen0):        Stopped
> >>>>>>>>>>>>  Clone Set: clone_ping [ping]
> >>>>>>>>>>>>      Started: [ lbv1.beta.com
> > lbv2.beta.com ]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Node Attributes:
> >>>>>>>>>>>> * Node lbv1.beta.com:
> >>>>>>>>>>>>     +
> > default_ping_set                  : 100
> >>>>>>>>>>>> * Node lbv2.beta.com:
> >>>>>>>>>>>>     +
> > default_ping_set                  : 100
> >>>>>>>>>>>>
> >>>>>>>>>>>> Migration summary:
> >>>>>>>>>>>> * Node lbv2.beta.com:
> >>>>>>>>>>>>    Stonith1-1: migration-threshold=1
> > fail-count=1000000 last-failure='Tue Mar 17
> >>>>>>>>>>>>  10:21:17 2015'
> >>>>>>>>>>>> * Node lbv1.beta.com:
> >>>>>>>>>>>>    Stonith2-1: migration-threshold=1
> > fail-count=1000000 last-failure='Tue Mar 17
> >>>>>>>>>>>>  10:21:17 2015'
> >>>>>>>>>>>>
> >>>>>>>>>>>> Failed actions:
> >>>>>>>>>>>>     Stonith1-1_start_0 on
> > lbv2.beta.com 'unknown error' (1): call=31, st
> >>>>>>>>>>>> atus=Error, last-rc-change='Tue
> > Mar 17 10:21:15 2015', queued=0ms, exec=1082ms
> >>>>>>>>>>>>     Stonith2-1_start_0 on
> > lbv1.beta.com 'unknown error' (1): call=31, st
> >>>>>>>>>>>> atus=Error, last-rc-change='Tue
> > Mar 17 10:21:16 2015', queued=0ms, exec=1079ms
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> /var/log/ha-debugのログです。
> >>>>>>>>>>>>
> >>>>>>>>>>>> IPaddr2(vip_208)[7851]:
> > 2015/03/17_10:21:22 INFO: Adding inet address 192.168.17.208/24 with
> broadcast
> > address 192.168.17.255 to device eth0
> >>>>>>>>>>>> IPaddr2(vip_208)[7851]:
> > 2015/03/17_10:21:22 INFO: Bringing device eth0 up
> >>>>>>>>>>>> IPaddr2(vip_208)[7851]:
> > 2015/03/17_10:21:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
> > /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto
> > not_used not_used
> >>>>>>>>>>>>
> >>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。
> >>>>>>>>>>>>
> >>>>>>>>>>>> stonith-helperがおかしいのでしょうか。
> >>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。
> >>>>>>>>>>>> stonith-helperはここに配置されています。
> >>>>>>>>>>>>
> /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 宜しくお願いします。
> >>>>>>>>>>>>
> >>>>>>>>>>>> 以上
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2015-03-17 9:45 GMT+09:00
> > <renay****@ybb*****>:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 福田さん
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> おはようございます。山内です。
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。
> >>>>>>>>>>>>> (実際には、改行に気を付けてください)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 以下の例は、PM1.1系での設定で、
> >>>>>>>>>>>>> nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。
> >>>>>>>>>>>>> nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> stonith自体は、helperとsshです。
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> (snip)
> >>>>>>>>>>>>> ### Group Configuration ###
> >>>>>>>>>>>>> group grpStonith1 \
> >>>>>>>>>>>>> prmStonith1-1 \
> >>>>>>>>>>>>> prmStonith1-2
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> group grpStonith2 \
> >>>>>>>>>>>>> prmStonith2-1 \
> >>>>>>>>>>>>> prmStonith2-2
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ### Fencing Topology ###
> >>>>>>>>>>>>> fencing_topology \
> >>>>>>>>>>>>> nodea: prmStonith1-1
> > prmStonith1-2 \
> >>>>>>>>>>>>> nodeb: prmStonith2-1
> > prmStonith2-2
> >>>>>>>>>>>>> (snp)
> >>>>>>>>>>>>> primitive prmStonith1-1
> > stonith:external/stonith-helper \
> >>>>>>>>>>>>> params \
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> pcmk_reboot_retries="1"
> > \
> >>>>>>>>>>>>> pcmk_reboot_timeout="40s"
> > \
> >>>>>>>>>>>>> hostlist="nodea" \
> >>>>>>>>>>>>> dead_check_target="192.168.28.60
> > 192.168.28.70" \
> >>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource
> > -r prmRES -W | grep -qi `hostname`" \
> >>>>>>>>>>>>> run_online_check="yes"
> > \
> >>>>>>>>>>>>> op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>> op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> primitive prmStonith1-2
> > stonith:external/ssh \
> >>>>>>>>>>>>> params \
> >>>>>>>>>>>>> pcmk_reboot_timeout="60s"
> > \
> >>>>>>>>>>>>> hostlist="nodea" \
> >>>>>>>>>>>>> op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>> op monitor
> > interval="3600s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>> op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> primitive prmStonith2-1
> > stonith:external/stonith-helper \
> >>>>>>>>>>>>> params \
> >>>>>>>>>>>>> pcmk_reboot_retries="1"
> > \
> >>>>>>>>>>>>> pcmk_reboot_timeout="40s"
> > \
> >>>>>>>>>>>>> hostlist="nodeb" \
> >>>>>>>>>>>>> dead_check_target="192.168.28.61
> > 192.168.28.71" \
> >>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource
> > -r prmRES -W | grep -qi `hostname`" \
> >>>>>>>>>>>>> run_online_check="yes"
> > \
> >>>>>>>>>>>>> op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>> op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> primitive prmStonith2-2
> > stonith:external/ssh \
> >>>>>>>>>>>>> params \
> >>>>>>>>>>>>> pcmk_reboot_timeout="60s"
> > \
> >>>>>>>>>>>>> hostlist="nodeb" \
> >>>>>>>>>>>>> op start interval="0s"
> > timeout="60s" on-fail="restart" \
> >>>>>>>>>>>>> op monitor
> > interval="3600s" timeout="60s" on-fail="restart"
> > \
> >>>>>>>>>>>>> op stop interval="0s"
> > timeout="60s" on-fail="ignore"
> >>>>>>>>>>>>> (snip)
> >>>>>>>>>>>>> location
> > rsc_location-grpStonith1-2 grpStonith1 \
> >>>>>>>>>>>>> rule -INFINITY: #uname eq nodea
> >>>>>>>>>>>>> location
> > rsc_location-grpStonith2-3 grpStonith2 \
> >>>>>>>>>>>>> rule -INFINITY: #uname eq nodeb
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 以上です。
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> ELF Systems
> >>>>>>>>>>>> Masamichi Fukuda
> >>>>>>>>>>>> mail to:
> > masamichi_fukud****@elf-s*****
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Linux-ha-japan mailing list
> >>>>>>>>>>> Linux****@lists*****
> >>>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>>
> >>>>>>>>>> ELF Systems
> >>>>>>>>>> Masamichi Fukuda
> >>>>>>>>>> mail to: masamichi_fukud****@elf-s*****
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Linux-ha-japan mailing list
> >>>>>>>>> Linux****@lists*****
> >>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> ELF Systems
> >>>>>>>> Masamichi Fukuda
> >>>>>>>> mail to: masamichi_fukud****@elf-s*****
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Linux-ha-japan mailing list
> >>>>>>> Linux****@lists*****
> >>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> ELF Systems
> >>>>>> Masamichi Fukuda
> >>>>>> mail to: masamichi_fukud****@elf-s*****
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Linux-ha-japan mailing list
> >>>>> Linux****@lists*****
> >>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>>>
> >>>>
> >>>> --
> >>>>
> >>>> ELF Systems
> >>>> Masamichi Fukuda
> >>>> mail to: masamichi_fukud****@elf-s*****
> >>>>
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Linux-ha-japan mailing list
> >>> Linux****@lists*****
> >>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>>
> >>
> >>
> >> --
> >>
> >> ELF Systems
> >> Masamichi Fukuda
> >> mail to: masamichi_fukud****@elf-s*****
> >>
> >>
> >
> > _______________________________________________
> > Linux-ha-japan mailing list
> > Linux****@lists*****
> > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>



-- 
ELF Systems
Masamichi Fukuda
mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>*
-------------- next part --------------
HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...
Télécharger 



Linux-ha-japan メーリングリストの案内
Back to archive index