[Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて

Back to archive index

Masamichi Fukuda - elf-systems masamichi_fukud****@elf-s*****
2015年 3月 17日 (火) 09:30:50 JST


山内さん

おはようございます、福田です。

サンプル等の参考urlの情報ありがとうございます。

宜しくお願いします。

以上

2015-03-16 21:48 GMT+09:00 <renay****@ybb*****>:

> 福田さん
>
> こんばんは、山内です。
>
> 以下に去年のOSC Tokyoでのfencing_topologyのサンプルがあるようです。
>
>  * http://linux-ha.sourceforge.jp/wp/wp-content/uploads/osc2014_crm.txt
>
> fencing_topologyで対象とするノードと実行stonithエージェントが制御出来ます。
>
> -----------------
> fencing_topology \
>
> server01: prmStonith1 \ server02: prmStonith2
> -----------------
>
> の形式で、
> 1行に対象ノード: 実行するstonithエージェントを記載...[複数可能]
> 以下にも本家の情報があります。
> * http://clusterlabs.org/wiki/Fencing_topology
> 以上です。
>
>
>
>
> ----- Original Message -----
> >From: Masamichi Fukuda - elf-systems <masamichi_fukud****@elf-s*****>
> >To: "linux****@lists*****" <
> linux****@lists*****>
> >Date: 2015/3/16, Mon 19:24
> >Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて
> >
> >
> >松島さん
> >
> >こんばんは、福田です。
> >早速のご連絡ありがとうございます。
> >
> >crm_mon -rfAの表示です。
> >
> >Last updated: Mon Mar 16 18:26:37 2015
> >Last change: Mon Mar 16 18:04:31 2015
> >Stack: heartbeat
> >Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
> >tion with quorum
> >Version: 1.1.12-561c4cf
> >2 Nodes configured
> >10 Resources configured
> >
> >
> >Online: [ lbv1.beta.com lbv2.beta.com ]
> >
> >Full list of resources:
> >
> > Resource Group: HAvarnish
> >     vip_208    (ocf::heartbeat:IPaddr2):       Stopped
> >     varnishd   (lsb:varnish):  Stopped
> > Resource Group: grpStonith1
> >     Stonith1-1 (stonith:external/stonith-helper):      Stopped
> >     Stonith1-2 (stonith:external/xen0):        Stopped
> >     Stonith1-3 (stonith:meatware):     Stopped
> > Resource Group: grpStonith2
> >     Stonith2-1 (stonith:external/stonith-helper):      Stopped
> >     Stonith2-2 (stonith:external/xen0):        Stopped
> >     Stonith2-3 (stonith:meatware):     Stopped
> > Clone Set: clone_ping [ping]
> >     Stopped: [ lbv1.beta.com lbv2.beta.com ]
> >
> >Node Attributes:
> >* Node lbv1.beta.com:
> >* Node lbv2.beta.com:
> >
> >Migration summary:
> >* Node lbv2.beta.com:
> >   Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Mon
> Mar 16
> > 18:23:47 2015'
> >   ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar
> 16 18:23
> >:47 2015'
> >* Node lbv1.beta.com:
> >   Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Mon
> Mar 16
> > 18:23:48 2015'
> >   ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar
> 16 18:23
> >:55 2015'
> >
> >Failed actions:
> >    Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=39, st
> >atus=Error, last-rc-change='Mon Mar 16 18:23:44 2015', queued=0ms,
> exec=2014ms
> >    ping_start_0 on lbv2.beta.com 'unknown error' (1): call=40, status=c
> >omplete, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, exec=995ms
> >    Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=39, st
> >atus=Error, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms,
> exec=2009ms
> >    ping_start_0 on lbv1.beta.com 'unknown error' (1): call=41, status=c
> >omplete, last-rc-change='Mon Mar 16 18:23:54 2015', queued=0ms, exec=182ms
> >
> >
> >標準出力、標準エラー出力はなく、ログ(/var/log/ha-debug)になります。
> >
> >ノード1側(lbv1)
> >
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: info: Pacemaker
> support: yes
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: File
> /etc/ha.d//haresources exists.
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: This file is not
> used because pacemaker is enabled
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/heartbeat/ccm
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/cib
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/stonithd
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/lrmd
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/attrd
> >Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/crmd
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Core dumps could
> be lost if multiple dumps occur.
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Logging daemon is
> disabled --enabling logging daemon is recommended
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info:
> **************************
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: Configuration
> validated. Starting heartbeat 3.0.6
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: heartbeat:
> version 3.0.6
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Heartbeat
> generation: 1423534103
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: seed is
> -1702799346
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast:
> write socket priority set to IPTOS_LOWDELAY on eth1
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast:
> bound send socket to device: eth1
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: set
> SO_REUSEADDR
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast:
> bound receive socket to device: eth1
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast:
> started on port 694 interface eth1 to 10.0.17.133
> >Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Local status now
> set to: 'up'
> >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Link
> lbv2.beta.com:eth1 up.
> >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Status update for
> node lbv2.beta.com: status up
> >Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: debug: get_delnodelist:
> delnodelist=
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Comm_now_up():
> updating status to active
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Local status now
> set to: 'active'
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Status update for
> node lbv2.beta.com: status active
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2868]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid 2868)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2866]: info: Starting
> "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid 2866)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2871]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid 2871)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2869]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid 2869)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2867]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid 2867)
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [2870]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113 (pid
> 2870)
> >Mar 16 18:22:54 lbv1.beta.com ccm: [2866]: info: Hostname: lbv1.beta.com
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
> length from heartbeat to client ccm is set to 1024
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
> length from heartbeat to client attrd is set to 1024
> >Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
> length from heartbeat to client stonithd is set to 1024
> >Mar 16 18:22:55 lbv1.beta.com heartbeat: [1957]: info: the send queue
> length from heartbeat to client cib is set to 1024
> >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s)
> for [lbv2.beta.com] [33:35]
> >Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: info: No pkts missing
> from lbv2.beta.com!
> >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: the send queue
> length from heartbeat to client crmd is set to 1024
> >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s)
> for [lbv2.beta.com] [40:42]
> >Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: No pkts missing
> from lbv2.beta.com!
> >ping(ping)[3164]:    2015/03/16_18:23:54 WARNING: Could not update
> default_ping_set = 100: rc=127
> >
> >ノード2側(lbv2)
> >
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Pacemaker
> support: yes
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: File
> /etc/ha.d//haresources exists.
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: This file is not
> used because pacemaker is enabled
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/heartbeat/ccm
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/cib
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/stonithd
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/lrmd
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/attrd
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access
> of: /usr/local/heartbeat/libexec/pacemaker/crmd
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Core dumps could
> be lost if multiple dumps occur.
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Logging daemon is
> disabled --enabling logging daemon is recommended
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info:
> **************************
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Configuration
> validated. Starting heartbeat 3.0.6
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: heartbeat:
> version 3.0.6
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Heartbeat
> generation: 1423534179
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: seed is 2086609325
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast:
> write socket priority set to IPTOS_LOWDELAY on eth1
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast:
> bound send socket to device: eth1
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: set
> SO_REUSEADDR
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast:
> bound receive socket to device: eth1
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast:
> started on port 694 interface eth1 to 10.0.17.132
> >Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Local status now
> set to: 'up'
> >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Link
> lbv1.beta.com:eth1 up.
> >Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Status update for
> node lbv1.beta.com: status up
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: debug: get_delnodelist:
> delnodelist=
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Comm_now_up():
> updating status to active
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Local status now
> set to: 'active'
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
> client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3026]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113 (pid
> 3026)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3023]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid 3023)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3025]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid 3025)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3024]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid 3024)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3022]: info: Starting
> "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid 3022)
> >Mar 16 18:22:53 lbv2.beta.com heartbeat: [3027]: info: Starting
> "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid 3027)
> >Mar 16 18:22:54 lbv2.beta.com ccm: [3022]: info: Hostname: lbv2.beta.com
> >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
> length from heartbeat to client ccm is set to 1024
> >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
> length from heartbeat to client attrd is set to 1024
> >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: Status update for
> node lbv1.beta.com: status active
> >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
> length from heartbeat to client stonithd is set to 1024
> >Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
> length from heartbeat to client cib is set to 1024
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
> member_count=1, member_quorum_votes=100
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
> total_quorum_votes=200
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: twonodes
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
> member_count=1, member_quorum_votes=100
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
> total_quorum_votes=200
> >Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: info: Break tie for 2 nodes
> cluster
> >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s)
> for [lbv1.beta.com] [30:32]
> >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: No pkts missing
> from lbv1.beta.com!
> >Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: the send queue
> length from heartbeat to client crmd is set to 1024
> >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s)
> for [lbv1.beta.com] [35:37]
> >Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: info: No pkts missing
> from lbv1.beta.com!
> >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority
> >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
> member_count=2, member_quorum_votes=200
> >Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
> total_quorum_votes=200
> >ping(ping)[3144]:    2015/03/16_18:23:46 WARNING: Could not update
> default_ping_set = 100: rc=127
> >
> >
> >
> >宜しくお願いします。
> >
> >以上
> >
> >
> >
> >
> >2015年3月16日 18:53 Takehiro Matsushima <takeh****@gmail*****>:
> >
> >福田さん
> >>
> >>こんばんは、松島です。
> >>取り急ぎ1点確認させていただけますでしょうか。
> >>
> >>ping RAのstartでunknown errorになっているのも気になりますので、
> >>pingやStonith Helperについて、各RAが標準出力・標準エラー出力に吐き出した部分も含めて
> >>該当しそうなログの引用をいただければ幸いです。
> >>
> >>----
> >>Takehiro Matsushima
> >>
> >>_______________________________________________
> >>Linux-ha-japan mailing list
> >>Linux****@lists*****
> >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >>
> >>
> >
> >
> >--
> >
> >ELF Systems
> >Masamichi Fukuda
> >mail to: masamichi_fukud****@elf-s*****
> >_______________________________________________
> >Linux-ha-japan mailing list
> >Linux****@lists*****
> >http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> >
> >
> >
>
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>



-- 
ELF Systems
Masamichi Fukuda
mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>*
-------------- next part --------------
HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...
Télécharger 



Linux-ha-japan メーリングリストの案内
Back to archive index