[Linux-ha-jp] VM上でシリアルケーブル接続をした際の挙動

Back to archive index

OKADA Satoshi okada****@oss*****
2008年 11月 18日 (火) 12:27:07 JST


島田さん

岡田です。こんにちは。

ログからV2モードで動作していると推定します。
#VM環境は詳しくないので、はずしてたらすみません。

V2モードではノード間でやり取りするデータ量が大きくなっており、
シリアルケーブルでの通信では処理しきれないことがあるようです。

通常のLAN接続の設定で動作させてみてはいかがでしょうか。

watchdogとの関連はちょっと分かりません。


ご質問の内容からそれますが、ログから気がついた点です。
もしかしたら、ipfailを使ってらっしゃるかもしれませんが、
V2モードではpingdがipfailの代わりとなっていますので、
そちらの利用も検討してもらえれば良いかと思います。
http://www.linux-ha.org/ja/pingd_ja

設定ファイル(ha.cf, cib.xml)を可能な範囲で教えていただける
ともう少し、原因に近づけるかもしれません。

また、バージョンによって動作が違う可能性がありますので、
Heartbeatのバージョンも教えていただければと思います。





島田雄太 さんは書きました:
> 島田と申します
> 
> linux-haの挙動で質問があります。
> 
> 現在、VMwareのホストOS2台を用いて、その上にそれぞれRedhatES4をゲストOSとして
> 立ち上げています。
> 
> heartbeatとDRBDを用いて、その上にpostgresを使って構築をしております。
> VM上でシリアルケーブルを認識させているのですが、通常運用時に以下のようなエラーが出ており
> WatchDog上で障害だと判断し、切り替わりが発生してしまったように見受けられます
> 
> なぜ、そうなったのか、調査したいと考えております。
> 
> 
> そもそも設定がおかしいというのも考えられるのでしょうか。
> どういった情報をお伝えすればよいか分からない為、不適切なご相談になってしまうようでしたら
> 必要な情報をご教授いただけますでしょうか。
> 
> ご迷惑をお掛け致しますが、宜しくお願い致します
> 
> 1.コンソール上において以下のようなメッセージが出ていました
> 	「1xxxdb01 kernel: ttyS0: 1 input overrun(s)」
> 	というメッセージが表示されており、最後に
> 	「SoftDog: Initiaing system reboot.」
> 	というメッセージが表示されていた
> 
> 2./var/log/messageについて
> 
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: resync bitmap: bits=12799789
> words=399994
> Nov 17 05:37:03 1mctdb01 kernel: klogd 1.4.1, ---------- state change ----------
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: size = 48 GB (51199155 KB)
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: 0 KB marked out-of-sync by on
> disk bit-map.
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: Found 4 transactions (192
> active extents) in activity log.
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: Marked additional 507 MB as
> out-of-sync based on AL.
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4700]: cstate
> Unconfigured --> StandAlone
> Nov 17 05:37:03 1mctdb01 drbd: s0
> Nov 17 05:37:03 1mctdb01 drbd: n0
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbdsetup [4713]: cstate
> StandAlone --> Unconnected
> Nov 17 05:37:03 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
> Unconnected --> WFConnection
> Nov 17 05:37:03 1mctdb01 drbd: ]
> Nov 17 05:37:03 1mctdb01 drbd: .
> Nov 17 05:37:04 1mctdb01 drbd: WARN: stdin/stdout is not a TTY; using
> /dev/console
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
> WFConnection --> WFReportParams
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: Handshake successful: DRBD
> Network Protocol version 74
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: Connection established.
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: I am(S):
> 1:00000002:00000001:0000001e:0000000e:11
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: Peer(P):
> 1:00000002:00000001:0000001e:0000000f:10
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
> WFReportParams --> WFBitMapT
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
> Nov 17 05:37:05 1mctdb01 rc: Starting drbd:  succeeded
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: drbd0_receiver [4714]: cstate
> WFBitMapT --> SyncTarget
> Nov 17 05:37:05 1mctdb01 kernel: drbd0: Resync started as SyncTarget
> (need to sync 519348 KB [129837 bits set]).
> Nov 17 05:37:06 1mctdb01 logd: [4739]: info: logd started with default
> configuration.
> Nov 17 05:37:06 1mctdb01 logd: [4739]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:06 1mctdb01 logd: [4748]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:06 1mctdb01 kernel: Software Watchdog Timer: 0.07
> initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)
> Nov 17 05:37:09 1mctdb01 sendmail[4833]: sql_select option missing
> Nov 17 05:37:09 1mctdb01 sendmail[4833]: auxpropfunc error no
> mechanism available
> Nov 17 05:37:09 1mctdb01 sendmail: sendmail startup succeeded
> Nov 17 05:37:09 1mctdb01 sendmail: sm-client startup succeeded
> Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [startup.c(95)]:
> Nov 17 05:37:09 1mctdb01 gpm[4855]: Started gpm successfully. Entered
> daemon mode.
> Nov 17 05:37:09 1mctdb01 gpm[4855]: *** info [mice.c(1766)]:
> Nov 17 05:37:09 1mctdb01 gpm[4855]: imps2: Auto-detected intellimouse PS/2
> Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:10 1mctdb01 attrd: [4860]: info:
> G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: Hostname: 1mctdb01.tdc-asp.jp
> Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha:
> Hostname: 1mctdb01.tdc-asp.jp
> Nov 17 05:37:10 1mctdb01 gpm: gpm startup succeeded
> Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_TriggerHandler:
> Added signal manual handler
> Nov 17 05:37:10 1mctdb01 cib: [4857]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Nov 17 05:37:10 1mctdb01 cib: [4857]: info: main: Retrieval of a
> per-action CIB: disabled
> Nov 17 05:37:10 1mctdb01 ipfail: [4862]: ERROR: No managed resources
> Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Nov 17 05:37:10 1mctdb01 ccm: [4856]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
> Added signal handler for signal 10
> Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: G_main_add_SignalHandler:
> Added signal handler for signal 12
> Nov 17 05:37:10 1mctdb01 cib: [4857]: ERROR: crm_is_writable:
> /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster
> Nov 17 05:37:10 1mctdb01 cib: [4857]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> Nov 17 05:37:10 1mctdb01 attrd: [4860]: info: register_with_ha: UUID:
> c73aec4b-6947-40a2-b1e2-853d10c4d22d
> Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info:
> G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 17 05:37:10 1mctdb01 stonithd: [4859]: info:
> G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 17 05:37:10 1mctdb01 lrmd: [4858]: info: Started.
> Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: main: CRM Hg Version:
> node: 552305612591183b1628baa5bc6e903e0f1e26a3
> Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: crmd_init: Starting crmd
> Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Nov 17 05:37:10 1mctdb01 crmd: [4861]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Nov 17 05:37:10 1mctdb01 crmd: [4861]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> 
> 3./var/ha-log
> heartbeat[4806]: 2008/11/17_05:37:07 WARN: Logging daemon is disabled
> --enabling logging daemon is recommended
> heartbeat[4806]: 2008/11/17_05:37:07 info: **************************
> heartbeat[4806]: 2008/11/17_05:37:07 info: Configuration validated.
> Starting heartbeat 2.1.3
> heartbeat[4807]: 2008/11/17_05:37:07 info: heartbeat: version 2.1.3
> heartbeat[4807]: 2008/11/17_05:37:07 info: Heartbeat generation: 1211968635
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: Starting serial
> heartbeat on tty /dev/ttyS0 (19200 baud)
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: write socket
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound send
> socket to device: eth1
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: bound receive
> socket to device: eth1
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ucast: started on
> port 694 interface eth1 to 192.168.201.2
> heartbeat[4807]: 2008/11/17_05:37:08 info: glib: ping heartbeat started.
> heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler:
> Added signal manual handler
> heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_TriggerHandler:
> Added signal manual handler
> heartbeat[4807]: 2008/11/17_05:37:08 notice: Using watchdog device:
> /dev/watchdog
> heartbeat[4807]: 2008/11/17_05:37:08 info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> heartbeat[4807]: 2008/11/17_05:37:08 info: Local status now set to: 'up'
> heartbeat[4807]: 2008/11/17_05:37:09 info: Link
> 192.168.70.254:192.168.70.254 up.
> heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node
> 192.168.70.254: status ping
> heartbeat[4807]: 2008/11/17_05:37:09 info: Link
> 1mctdb02.xxx-asp.jp:/dev/ttyS0 up.
> heartbeat[4807]: 2008/11/17_05:37:09 info: Status update for node
> 1mctdb02.xxx-asp.jp: status active
> heartbeat[4807]: 2008/11/17_05:37:09 info: Link 1mctdb02.xxx-asp.jp:eth1 up.
> heartbeat[4807]: 2008/11/17_05:37:09 info: Comm_now_up(): updating
> status to active
> heartbeat[4807]: 2008/11/17_05:37:09 info: Local status now set to: 'active'
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/ccm" (400,400)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/cib" (400,400)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/lrmd -r" (0,0)
> heartbeat[4857]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/cib" as uid 400  gid 400 (pid 4857)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/stonithd" (0,0)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/attrd" (400,400)
> heartbeat[4859]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 4859)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/crmd" (400,400)
> heartbeat[4856]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/ccm" as uid 400  gid 400 (pid 4856)
> heartbeat[4807]: 2008/11/17_05:37:09 info: Starting child client
> "/usr/lib/heartbeat/ipfail" (400,400)
> heartbeat[4858]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 4858)
> heartbeat[4860]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/attrd" as uid 400  gid 400 (pid 4860)
> heartbeat[4807]: 2008/11/17_05:37:09 WARN: G_CH_dispatch_int: Dispatch
> function for read child took too long to execute: 80 ms (> 50 ms)
> (GSource: 0x9148158)
> heartbeat[4862]: 2008/11/17_05:37:09 info: Starting
> "/usr/lib/heartbeat/ipfail" as uid 400  gid 400 (pid 4862)
> heartbeat[4861]: 2008/11/17_05:37:10 info: Starting
> "/usr/lib/heartbeat/crmd" as uid 400  gid 400 (pid 4861)
> lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> attrd[4860]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> ccm[4856]: 2008/11/17_05:37:10 info: Hostname: 1mctdb01.xxx-asp.jp
> attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: Hostname:
> 1mctdb01.xxx-asp.jp
> cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> cib[4857]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added
> signal manual handler
> cib[4857]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: api_process_request: bad
> request [getrsc]
> cib[4857]: 2008/11/17_05:37:10 info: main: Retrieval of a per-action
> CIB: disabled
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG: Dumping message with 5 fields
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[0] : [t=hbapi-req]
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[1] : [reqtype=getrsc]
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[2] : [dest=1mctdb01.xxx-asp.jp]
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[3] : [pid=4862]
> heartbeat[4807]: 2008/11/17_05:37:10 ERROR: MSG[4] : [from_id=ipfail]
> ipfail[4862]: 2008/11/17_05:37:10 ERROR: No managed resources
> heartbeat[4807]: 2008/11/17_05:37:10 WARN: Managed
> /usr/lib/heartbeat/ipfail process 4862 exited with return code 100.
> lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 10
> ccm[4856]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> lrmd[4858]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 12
> cib[4857]: 2008/11/17_05:37:10 ERROR: crm_is_writable:
> /var/lib/heartbeat/crm/cib.xml must be owned and r/w by user hacluster
> cib[4857]: 2008/11/17_05:37:10 info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
> /var/lib/heartbeat/crm/cib.xml.sig)
> stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler:
> Added signal handler for signal 10
> stonithd[4859]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler:
> Added signal handler for signal 12
> attrd[4860]: 2008/11/17_05:37:10 info: register_with_ha: UUID:
> c73aec4b-6947-40a2-b1e2-853d10c4d22d
> lrmd[4858]: 2008/11/17_05:37:10 info: Started.
> crmd[4861]: 2008/11/17_05:37:10 info: main: CRM Hg Version: node:
> 552305612591183b1628baa5bc6e903e0f1e26a3
> 
> crmd[4861]: 2008/11/17_05:37:10 info: crmd_init: Starting crmd
> crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 15
> crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_TriggerHandler: Added
> signal manual handler
> crmd[4861]: 2008/11/17_05:37:10 info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> cib[4857]: 2008/11/17_05:37:10 WARN: validate_cib_digest: No on-disk
> digest present
> 
> 
> あとこういったエラーログも出ていました
> 
> heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw_ll: illegal field
> heartbeat[4807]: 2008/11/17_05:37:59 ERROR: ha_msg_addraw():
> ha_msg_addraw_ll failed
> heartbeat[4807]: 2008/11/17_05:37:59 ERROR: NV failure (string2msg_ll):
> heartbeat[4807]: 2008/11/17_05:37:59 ERROR: Input string: [>>>
> t=cib
> cib_clientid=5977
> cib_callopt=1052672
> cib_callid=4
> cib_op=cib_apply_diff
> cib_section=status
> cib_delegated_from=1mctdb01.xxx-asp.jp
> cib_clientname=5977
> (4)cib_calldata=eJyV0EEOwiAQBdC912ia2AQSmE5bbcJZCC2zIFE0QD2/QrqnfuDb6XWhS2CyKsKQWaCB5vJwESWo/aZYlArFILlqOJVfLG3/qUJJS3o8/1FmEhFgj8cXAPxpoiYfzC4234a2
> hg=483d2dbb
> ts=49208487
> ld=0.42 0.13 0.03<<
> t=cib
> cib_clientid=ab2880a6-3d82-472f-8275-2c9846356fcb
> cited_from=1mctdb01.xxx-asp.jp
> cib_clientname=4861
> (4)cib_calldata=eJyVUtlu4yAUfe9vRJFaCSJW28mIb0Es1xk6trEAV8rfF9vpVG1nHvKCdLez3MvhcHhi7PJMXrSezAhaqyl60LmYAkhsleCVa7kBJyxuzqLFghiGLQWGO8k9JU54xjxi3dq9rDC
> Kjq54S+ipeIdNnk+vM+Jb3aURe7DLFccUrmFSPupl9pVOJ8hxSQ5QS/jlmb0MaVSNRvMKlYMKU6/NScnY7zzs1+cm9VxMVXZj3HXPQUS+hv2sMY64KrnPPaFWdIpoQ4qb3+wGkasvlJZsphRdB/4KZk
> e6m+HO+sBS8wZ+6MhaMcd9AYLKnvJXBOO5Coab7Nj+YanKqCfz2Asq/ZmWHQ1TGlaN9LNaD7urwl1R8MRbHTxwGS067+a1VjsW9g++JLXjP7zaYC6c0MNSHkvcWHK+SiHAFCO9vyvkpgzDbOey4lUCZ
> NTyWg4/H49K/nHeAYBD8=
> cib_isreplyto=1mctdb01.xxx-asp.jp
> cib_update=true
> (4)cib_update_diff=eJydU+1uozAQ/N/XiCI1kjl5/YXhxLNYxl5yqAEi41Tq258NtGma6KTLHwt2ZtezO+vdbvcCqn6lB2NGO6Axje+7joCq6ld2yN9FwGF6R9/sEpeJO+4nTmSqIw+ubxcmyBtm
> ChPQOTReBnM5extxbiSlZL/fv5BVg/VDPxo8T+5PQwlAjq1/QP+VXVXll1rr/adWdq91QYnW5RNSgWi2tDhHGy/zmqpvUleElBoyb5w8mhzCR3quKFGaZf4pDI/0pDCRVbkxTMB5ugSHqwB2x70SiOT
> 6Z9YjF7/jRCjxlTM7M50fab+ihC+1et/40HpaU3Oe5mjGKfbdh/FpNSKaZF6VWdMZg439NDYrTvgyPheGwmN7ORZT6I/92PhpG/tVlqKZGYMd5z5XMG/40ciyBlY7rtsWvSg4c1UhHPBCo7KFBN9J5B
> w0SqLUj/zBHnvXJMG//6MK8EWvPZ1M6hiArHNJDZgOk/UhOYqxYb9SxwsSnHHJ57zNYp2A2baHbsX6MWJ4t6cUEHKj+P6Ic2wcRQq6LXmXJDDWKuc9lxKBSduBRMI3V/L9eb1TU+ENQzJo2fM6Jn/W5
> 8VvTVqjyyHYBrmSW3SiLVQlykJQy4oWkBVacg/UCc+Yfzrlejz50uH71dt8RMudsJ1T1noG0tOSMc4sp9RqqNSa8hfq6W3F
> oseq=62
> _compression_algorithm=zlib
> from_id=cib
> to_id=cib
> client_gen=3
> src=1mctdb02.xxx-asp.jp
> (1)srcuuid=PhGaPNvdTVSt7wPintDUbg==
> seq=4234a3
> hg=483d2dbb
> ts=49208488
> ld=0.42 0.13 0.03 2/89 24799
> ttl=3
> auth=1 2a06b826
> <<<
> ]
> 
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
> 


-- 

NTTオープンソースソフトウェアセンタ
岡田 敏 <OKADA Satoshi>





Linux-ha-japan メーリングリストの案内
Back to archive index