[Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて

Back to archive index

Masamichi Fukuda - elf-systems masamichi_fukud****@elf-s*****
2015年 3月 16日 (月) 19:24:54 JST


松島さん

こんばんは、福田です。
早速のご連絡ありがとうございます。

crm_mon -rfAの表示です。

Last updated: Mon Mar 16 18:26:37 2015
Last change: Mon Mar 16 18:04:31 2015
Stack: heartbeat
Current DC: lbv2.beta.com (82ffc36f-1ad8-8686-7db0-35686465c624) - parti
tion with quorum
Version: 1.1.12-561c4cf
2 Nodes configured
10 Resources configured


Online: [ lbv1.beta.com lbv2.beta.com ]

Full list of resources:

 Resource Group: HAvarnish
     vip_208    (ocf::heartbeat:IPaddr2):       Stopped
     varnishd   (lsb:varnish):  Stopped
 Resource Group: grpStonith1
     Stonith1-1 (stonith:external/stonith-helper):      Stopped
     Stonith1-2 (stonith:external/xen0):        Stopped
     Stonith1-3 (stonith:meatware):     Stopped
 Resource Group: grpStonith2
     Stonith2-1 (stonith:external/stonith-helper):      Stopped
     Stonith2-2 (stonith:external/xen0):        Stopped
     Stonith2-3 (stonith:meatware):     Stopped
 Clone Set: clone_ping [ping]
     Stopped: [ lbv1.beta.com lbv2.beta.com ]

Node Attributes:
* Node lbv1.beta.com:
* Node lbv2.beta.com:

Migration summary:
* Node lbv2.beta.com:
   Stonith1-1: migration-threshold=1 fail-count=1000000 last-failure='Mon
Mar 16
 18:23:47 2015'
   ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16
18:23
:47 2015'
* Node lbv1.beta.com:
   Stonith2-1: migration-threshold=1 fail-count=1000000 last-failure='Mon
Mar 16
 18:23:48 2015'
   ping: migration-threshold=1 fail-count=1000000 last-failure='Mon Mar 16
18:23
:55 2015'

Failed actions:
    Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): call=39, st
atus=Error, last-rc-change='Mon Mar 16 18:23:44 2015', queued=0ms,
exec=2014ms
    ping_start_0 on lbv2.beta.com 'unknown error' (1): call=40, status=c
omplete, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms, exec=995ms
    Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): call=39, st
atus=Error, last-rc-change='Mon Mar 16 18:23:45 2015', queued=0ms,
exec=2009ms
    ping_start_0 on lbv1.beta.com 'unknown error' (1): call=41, status=c
omplete, last-rc-change='Mon Mar 16 18:23:54 2015', queued=0ms, exec=182ms


標準出力、標準エラー出力はなく、ログ(/var/log/ha-debug)になります。

ノード1側(lbv1)

Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: info: Pacemaker support:
yes
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: File
/etc/ha.d//haresources exists.
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: WARN: This file is not
used because pacemaker is enabled
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/heartbeat/ccm
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/cib
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/stonithd
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/lrmd
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/attrd
Mar 16 18:22:47 lbv1.beta.com heartbeat: [1914]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/crmd
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Core dumps could be
lost if multiple dumps occur.
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info:
**************************
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1914]: info: Configuration
validated. Starting heartbeat 3.0.6
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: heartbeat: version
3.0.6
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Heartbeat
generation: 1423534103
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: seed is -1702799346
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: write
socket priority set to IPTOS_LOWDELAY on eth1
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: bound
send socket to device: eth1
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: set
SO_REUSEADDR
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: bound
receive socket to device: eth1
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: glib: ucast: started
on port 694 interface eth1 to 10.0.17.133
Mar 16 18:22:48 lbv1.beta.com heartbeat: [1957]: info: Local status now set
to: 'up'
Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Link lbv2.beta.com:eth1
up.
Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: info: Status update for
node lbv2.beta.com: status up
Mar 16 18:22:53 lbv1.beta.com heartbeat: [1957]: debug: get_delnodelist:
delnodelist=
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Comm_now_up():
updating status to active
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Local status now set
to: 'active'
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: Status update for
node lbv2.beta.com: status active
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2868]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid 2868)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2866]: info: Starting
"/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid 2866)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2871]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid 2871)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2869]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid 2869)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2867]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid 2867)
Mar 16 18:22:54 lbv1.beta.com heartbeat: [2870]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113 (pid
2870)
Mar 16 18:22:54 lbv1.beta.com ccm: [2866]: info: Hostname: lbv1.beta.com
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
length from heartbeat to client ccm is set to 1024
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
length from heartbeat to client attrd is set to 1024
Mar 16 18:22:54 lbv1.beta.com heartbeat: [1957]: info: the send queue
length from heartbeat to client stonithd is set to 1024
Mar 16 18:22:55 lbv1.beta.com heartbeat: [1957]: info: the send queue
length from heartbeat to client cib is set to 1024
Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) for
[lbv2.beta.com] [33:35]
Mar 16 18:22:58 lbv1.beta.com heartbeat: [1957]: info: No pkts missing from
lbv2.beta.com!
Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: the send queue
length from heartbeat to client crmd is set to 1024
Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: WARN: 1 lost packet(s) for
[lbv2.beta.com] [40:42]
Mar 16 18:22:59 lbv1.beta.com heartbeat: [1957]: info: No pkts missing from
lbv2.beta.com!
ping(ping)[3164]:    2015/03/16_18:23:54 WARNING: Could not update
default_ping_set = 100: rc=127

ノード2側(lbv2)

Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Pacemaker support:
yes
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: File
/etc/ha.d//haresources exists.
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: This file is not
used because pacemaker is enabled
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/heartbeat/ccm
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/cib
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/stonithd
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/lrmd
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/attrd
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: debug: Checking access of:
/usr/local/heartbeat/libexec/pacemaker/crmd
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Core dumps could be
lost if multiple dumps occur.
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info:
**************************
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1925]: info: Configuration
validated. Starting heartbeat 3.0.6
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: heartbeat: version
3.0.6
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Heartbeat
generation: 1423534179
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: seed is 2086609325
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: write
socket priority set to IPTOS_LOWDELAY on eth1
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: bound
send socket to device: eth1
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: set
SO_REUSEADDR
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: bound
receive socket to device: eth1
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: glib: ucast: started
on port 694 interface eth1 to 10.0.17.132
Mar 16 18:22:47 lbv2.beta.com heartbeat: [1977]: info: Local status now set
to: 'up'
Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Link lbv1.beta.com:eth1
up.
Mar 16 18:22:48 lbv2.beta.com heartbeat: [1977]: info: Status update for
node lbv1.beta.com: status up
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: debug: get_delnodelist:
delnodelist=
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Comm_now_up():
updating status to active
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Local status now set
to: 'active'
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [1977]: info: Starting child
client "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3026]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109  gid 113 (pid
3026)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3023]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109  gid 113 (pid 3023)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3025]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0  gid 0 (pid 3025)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3024]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0  gid 0 (pid 3024)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3022]: info: Starting
"/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109  gid 113 (pid 3022)
Mar 16 18:22:53 lbv2.beta.com heartbeat: [3027]: info: Starting
"/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109  gid 113 (pid 3027)
Mar 16 18:22:54 lbv2.beta.com ccm: [3022]: info: Hostname: lbv2.beta.com
Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
length from heartbeat to client ccm is set to 1024
Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
length from heartbeat to client attrd is set to 1024
Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: Status update for
node lbv1.beta.com: status active
Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
length from heartbeat to client stonithd is set to 1024
Mar 16 18:22:54 lbv2.beta.com heartbeat: [1977]: info: the send queue
length from heartbeat to client cib is set to 1024
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
member_count=1, member_quorum_votes=100
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
total_quorum_votes=200
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: quorum plugin: twonodes
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
member_count=1, member_quorum_votes=100
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
total_quorum_votes=200
Mar 16 18:22:58 lbv2.beta.com ccm: [3022]: info: Break tie for 2 nodes
cluster
Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) for
[lbv1.beta.com] [30:32]
Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: No pkts missing from
lbv1.beta.com!
Mar 16 18:22:58 lbv2.beta.com heartbeat: [1977]: info: the send queue
length from heartbeat to client crmd is set to 1024
Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: WARN: 1 lost packet(s) for
[lbv1.beta.com] [35:37]
Mar 16 18:22:59 lbv2.beta.com heartbeat: [1977]: info: No pkts missing from
lbv1.beta.com!
Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: quorum plugin: majority
Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: cluster:linux-ha,
member_count=2, member_quorum_votes=200
Mar 16 18:22:59 lbv2.beta.com ccm: [3022]: debug: total_node_count=2,
total_quorum_votes=200
ping(ping)[3144]:    2015/03/16_18:23:46 WARNING: Could not update
default_ping_set = 100: rc=127


宜しくお願いします。

以上


2015年3月16日 18:53 Takehiro Matsushima <takeh****@gmail*****>:

> 福田さん
>
> こんばんは、松島です。
> 取り急ぎ1点確認させていただけますでしょうか。
>
> ping RAのstartでunknown errorになっているのも気になりますので、
> pingやStonith Helperについて、各RAが標準出力・標準エラー出力に吐き出した部分も含めて
> 該当しそうなログの引用をいただければ幸いです。
>
> ----
> Takehiro Matsushima
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
>
>


-- 
ELF Systems
Masamichi Fukuda
mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>*
-------------- next part --------------
HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...
Télécharger 



Linux-ha-japan メーリングリストの案内
Back to archive index