あきやまじろう
mayam****@gmail*****
2015年 2月 17日 (火) 18:44:48 JST
山内様 あきやまです。 ご回答ありがとうごいます。 > corosync+PMの推奨値はデフォルトの1000msだったと記憶しています。 > 変更する場合の値は、構成(負荷状況など)によりますので、倍程度からあげてみては如何でしょうか? > > tokenを伸ばす場合のデメリットは、故障検知(相手のダウン)時間が長くなることになります。 上記をtoken値の変更も検討したいと思います。 2015-02-17 14:35 GMT+09:00 <renay****@ybb*****>: > > あきやまさん > > こんにちは、山内です。 > > >token値はデフォルト値(1000ms)のままでした。 > >変更する場合ですが、推奨値はあるのでしょうか? > >もしくは算出方法等あれば、ご教授頂けると助かります。 > > corosync+PMの推奨値はデフォルトの1000msだったと記憶しています。 > 変更する場合の値は、構成(負荷状況など)によりますので、倍程度からあげてみては如何でしょうか? > > tokenを伸ばす場合のデメリットは、故障検知(相手のダウン)時間が長くなることになります。 > > 以上、宜しくお願いいたします。 > > > > ----- Original Message ----- > >From: あきやまじろう <mayam****@gmail*****> > >To: linux****@lists***** > >Date: 2015/2/17, Tue 14:08 > >Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して > > > > > >山内様 > > > >あきやまです。 > > > >ご返信ありがとうございます。 > > > >> ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、 > >> sarやvmstatでcpuの状態も見てみた方がよいかも知れません。 > >> #ある時に、何かのcronジョブなどで負荷があがったかも知れません。 > >当該時間他のアプリケーションの処理が他の時間帯に比べて負荷が高かったので、 > >その可能性も含め、状態の確認をしてみたいと思います。 > > > >> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか? > >token値はデフォルト値(1000ms)のままでした。 > >変更する場合ですが、推奨値はあるのでしょうか? > >もしくは算出方法等あれば、ご教授頂けると助かります。 > > > > > > > >2015-02-17 12:58 GMT+09:00 <renay****@ybb*****>: > > > >あきやまさん > >> > >>山内です。 > >> > >>ちなみに、プライマリ、セカンダリの負荷が高い可能性もありますので、 > >>sarやvmstatでcpuの状態も見てみた方がよいかも知れません。 > >> > >>#ある時に、何かのcronジョブなどで負荷があがったかも知れません。 > >> > >>以上です。 > >> > >> > >> > >>----- Original Message ----- > >>> From: "renay****@ybb*****" <renay****@ybb*****> > >>> To: "linux****@lists*****" < > linux****@lists*****> > >>> Cc: > >>> Date: 2015/2/17, Tue 12:31 > >>> Subject: Re: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して > >>> > >>> あきやまさん > >>> > >>> こんにちは、山内です。 > >>> > >>> corosyncのプライマリログを見る限り、セカンダリとのcorosync層の通信が切れて、一旦、再構成がかかったようです。 > >>> > >>> corosync.confのtokenパラメータを伸ばしてみるとどうなりますでしょうか? > >>> > >>>> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341 > Corosync main > >>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). > Consider > >>> token timeout increase. > >>> > >>> > >>> 上記のログが出ていますので、本来のcorosyncのスケジューリングが2秒強、遅れている状態が通知されてます。 > >>> > >>> 以上、宜しくお願いいたします。 > >>> > >>> > >>> ----- Original Message ----- > >>>> From: あきやまじろう <mayam****@gmail*****> > >>>> To: linux****@lists***** > >>>> Date: 2015/2/17, Tue 11:38 > >>>> Subject: [Linux-ha-jp] DRBDスプリットブレインの発生原因に関して > >>>> > >>>> > >>>> お世話になります。あきやまと申します。 > >>>> > >>>> 下記の環境で運用をしています。 > >>>> > >>>> 環境: > >>>> Red Hat Enterprise Linux Server 6.1 (for x86) > >>>> kernel-2.6.32-431.23.3.el6.i686 > >>>> corosync-1.4.1-17.el6_5.1.i686 > >>>> pacemaker-1.1.10-14.el6_5.3.i686 > >>>> drbd-8.3.16-1.el6.i386 > >>>> > >>>> 質問内容: > >>>> 運用中に突然DRBDスプリットブレインが発生し、手動で復旧させましたが、 > >>>> 発生原因が不明で困っています。 > >>>> 発生原因と回避策が分かれば、ご教授願えますでしょうか。 > >>>> > >>>> 現在までに判明している事: > >>>> ・DRBDネットワークのping応答は問題なし > >>>> ・messagesログにリンクダウンの出力なし > >>>> > >>>> 【プライマリ側のmessagesログ】 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned > -110 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> > Unknown ) > >>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting > header on > >>> sock: r=-110 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID > >>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe -> > >>> Unconnected ) > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected -> > >>> WFConnection ) > >>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056 > >>> from=::ffff:192.171.1.102 > >>>> Feb 16 18:00:06 server1 rshd[1057]: root @ mdn-cluster2 as root: > cmd='echo > >>> $(date +%Y/%m/%d\ %H:%M:%S)' > >>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056 > >>> duration=0(sec) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: > Agreed > >>> network protocol version 97 > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection -> > >>> WFReportParams ) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread > (from > >>> drbd0_receiver [3117]) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: > >>> <not-used> > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut > down by > >>> peer. > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams -> > >>> NetworkFailure ) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure -> > >>> Unconnected ) > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected -> > >>> WFConnection ) > >>>> > >>>> 【プライマリ側のcorosyncログ】 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [MAIN ] main.c:1341 > Corosync main > >>> process was not scheduled for 3144.5435 ms (threshold is 800.0000 ms). > Consider > >>> token timeout increase. > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 16: memb=1, > new=0, > >>> lost=1 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server1 1694608064 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:779 info: > >>> pcmk_peer_update: lost: server2 1711385280 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, > lost=0 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:791 info: > >>> pcmk_peer_update: NEW: server2 1711385280 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server1 1694608064 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:27 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:27 server1 corosync[1756]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1) > >>>> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:1341 > Corosync main > >>> process was not scheduled for 826.4143 ms (threshold is 800.0000 ms). > Consider > >>> token timeout increase. > >>>> Feb 16 18:00:28 server1 corosync[1756]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:28 server1 crmd[1766]: warning: crmd_ha_msg_filter: > Another DC > >>> detected: server2 (op=noop) > >>>> Feb 16 18:00:28 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL > >>> origin=crmd_ha_msg_filter ] > >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:00:30 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:00:30 server1 corosync[1756]: [MAIN ] main.c:1341 > Corosync main > >>> process was not scheduled for 2248.0801 ms (threshold is 800.0000 ms). > Consider > >>> token timeout increase. > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 24: memb=1, > new=0, > >>> lost=1 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server1 1694608064 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:779 info: > >>> pcmk_peer_update: lost: server2 1711385280 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, > lost=0 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:791 info: > >>> pcmk_peer_update: NEW: server2 1711385280 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server1 1694608064 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:31 server1 corosync[1756]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:32 server1 corosync[1756]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.101) ; members(old:2 left:1) > >>>> Feb 16 18:00:32 server1 corosync[1756]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:00:32 server1 cib[1761]: warning: cib_server_process_diff: > Not > >>> requesting full refresh in R/W mode > >>>> Feb 16 18:02:28 server1 crmd[1766]: error: crm_timer_popped: > Election > >>> Timeout (I_ELECTION_DC) just popped in state S_ELECTION! (120000ms) > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > >>> cause=C_TIMER_POPPED origin=crm_timer_popped ] > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > >>> cause=C_FSA_INTERNAL origin=do_election_check ] > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize: > join-4: > >>> Syncing the CIB from server2 to the rest of the cluster > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: do_dc_join_finalize: > Requested > >>> version <generation_tuple epoch="7" num_updates="87" > >>> admin_epoch="0" validate-with="pacemaker-1.2" > >>> cib-last-written="Wed Jan 21 00:17:56 2015" > >>> update-origin="server1" update-client="crmd" > >>> crm_feature_set="3.0.7" have-quorum="1" > >>> dc-uuid="server2"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: --- > 0.7.69 > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: Diff: +++ > 0.7.87 > >>> 1afd9af32d3a574358543e1d4751aaf1 > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- > <node_state > >>> crm-debug-origin="do_state_transition" join="member" > >>> id="server1"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- > >>> <lrm_rsc_op operation_key="res_drbd_r0_start_0" > >>> operation="start" > >>> transition-key="46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" > >>> transition-magic="0:0;46:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" > >>> call-id="44" last-run="1421767134" > >>> last-rc-change="1421767134" exec-time="430" > >>> id="res_drbd_r0_last_0"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- > >>> <lrm_rsc_op id="res_drbd_r0_monitor_30000" > >>> operation_key="res_drbd_r0_monitor_30000" > >>> operation="monitor" crm-debug-origin="do_update_resource" > >>> crm_feature_set="3.0.7" > >>> transition-key="47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" > >>> transition-magic="0:0;47:5:0:6fbace33-40d1-44cc-b09e-e66ef7ce03df" > >>> call-id="50" rc-code="0" op-status="0" > >>> interval="30000" last-rc-change="1421767135" > >>> exec-time="60" queue-time="0" > >>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: -- > >>> <nvpair value="1000" > >>> id="status-server2-master-res_drbd_r0"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ > <node_state > >>> id="server1" uname="server1" in_ccm="true" > >>> crmd="online" crm-debug-origin="post_cache_update" > >>> join="down" expected="member"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ > >>> <nvpair id="status-server2-master-res_drbd_r0" > >>> name="master-res_drbd_r0" value="10000"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ > >>> <lrm_rsc_op id="res_drbd_r0_last_0" > >>> operation_key="res_drbd_r0_promote_0" operation="promote" > >>> crm-debug-origin="do_update_resource" > >>> crm_feature_set="3.0.7" > >>> transition-key="14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb" > >>> transition-magic="0:0;14:1:0:2172d7d8-fe4a-427d-9311-27a13078a0fb" > >>> call-id="62" rc-code="0" op-status="0" > >>> interval="0" last-run="1424077226" > >>> last-rc-change="1424077226" exec-time="120" > >>> queue-time="0" > >>> op-digest="c0e018b73fdf522b6cdd355e125af15e"/> > >>>> Feb 16 18:02:28 server1 cib[1761]: notice: cib:diff: ++ > >>> <lrm_rsc_op id="res_drbd_r0_monitor_10000" > >>> operation_key="res_drbd_r0_monitor_10000" > >>> operation="monitor" crm-debug-origin="do_update_resource" > >>> crm_feature_set="3.0.7" > >>> transition-key="15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb" > >>> transition-magic="0:8;15:1:8:2172d7d8-fe4a-427d-9311-27a13078a0fb" > >>> call-id="68" rc-code="8" op-status="0" > >>> interval="10000" last-rc-change="1424077226" > >>> exec-time="53" queue-time="0" > >>> op-digest="6f5adcd7f1211cdfc17850827b8582c5"/> > >>>> Feb 16 18:02:28 server1 stonith-ng[1762]: warning: cib_process_diff: > Diff > >>> 0.7.69 -> 0.7.87 from local not applied to 0.7.69: Failed application > of an > >>> update diff > >>>> Feb 16 18:02:28 server1 stonith-ng[1762]: notice: > update_cib_cache_cb: > >>> [cib_diff_notify] Patch aborted: Application of an update diff failed > (-206) > >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_local_callback: > Sending > >>> full refresh (origin=crmd) > >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (10000) > >>>> Feb 16 18:02:28 server1 attrd[1764]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: probe_complete (true) > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote > >>> res_drbd_r0:1#011(Master -> Slave server2) > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message: > >>> Calculated Transition 2582: /var/lib/pacemaker/pengine/pe-input-24.bz2 > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 8: cancel res_drbd_r0_cancel_10000 on server2 > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 81: notify res_drbd_r0_pre_notify_demote_0 on server1 (local) > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 83: notify res_drbd_r0_pre_notify_demote_0 on server2 > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=113, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: run_graph: Transition > 2582 > >>> (Complete=4, Pending=0, Fired=0, Skipped=4, Incomplete=5, > >>> Source=/var/lib/pacemaker/pengine/pe-input-24.bz2): Stopped > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: LogActions: Demote > >>> res_drbd_r0:1#011(Master -> Slave server2) > >>>> Feb 16 18:02:28 server1 pengine[1765]: notice: process_pe_message: > >>> Calculated Transition 2583: /var/lib/pacemaker/pengine/pe-input-25.bz2 > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 80: notify res_drbd_r0_pre_notify_demote_0 on server1 (local) > >>>> Feb 16 18:02:28 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 82: notify res_drbd_r0_pre_notify_demote_0 on server2 > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=116, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 39: demote res_drbd_r0_demote_0 on server2 > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 81: notify res_drbd_r0_post_notify_demote_0 on server1 (local) > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 83: notify res_drbd_r0_post_notify_demote_0 on server2 > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=119, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition > 2583 > >>> (Complete=11, Pending=0, Fired=0, Skipped=1, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-input-25.bz2): Stopped > >>>> Feb 16 18:02:29 server1 pengine[1765]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:02:29 server1 pengine[1765]: notice: process_pe_message: > >>> Calculated Transition 2584: /var/lib/pacemaker/pengine/pe-input-26.bz2 > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: te_rsc_command: > Initiating > >>> action 40: monitor res_drbd_r0_monitor_30000 on server2 > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: run_graph: Transition > 2584 > >>> (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete > >>>> Feb 16 18:02:29 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > >>> cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_TIMER_POPPED > >>> origin=crm_timer_popped ] > >>>> Feb 16 18:17:29 server1 pengine[1765]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:17:29 server1 pengine[1765]: notice: process_pe_message: > >>> Calculated Transition 2585: /var/lib/pacemaker/pengine/pe-input-27.bz2 > >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: run_graph: Transition > 2585 > >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-input-27.bz2): Complete > >>>> Feb 16 18:17:29 server1 crmd[1766]: notice: do_state_transition: > State > >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > >>> cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>> > >>>> 【セカンダリ側のmessagesログ】 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: sock_recvmsg returned > -110 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: peer( Secondary -> > Unknown ) > >>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: short read expecting > header on > >>> sock: r=-110 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: new current UUID > >>> 626E92ABB7AC7487:1CC1ED4094EA9EB7:9A0088783FE79A75:99FF88783FE79A75 > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: asender terminated > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Terminating drbd0_asender > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Connection closed > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( BrokenPipe -> > >>> Unconnected ) > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver terminated > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: Restarting drbd0_receiver > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: receiver (re)started > >>>> Feb 16 18:00:05 server1 kernel: block drbd0: conn( Unconnected -> > >>> WFConnection ) > >>>> Feb 16 18:00:06 server1 xinetd[1940]: START: shell pid=1056 > >>> from=::ffff:192.171.1.102 > >>>> Feb 16 18:00:06 server1 rshd[1057]: root @ mdn-cluster2 as root: > cmd='echo > >>> $(date +%Y/%m/%d\ %H:%M:%S)' > >>>> Feb 16 18:00:06 server1 xinetd[1940]: EXIT: shell status=0 pid=1056 > >>> duration=0(sec) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Handshake successful: > Agreed > >>> network protocol version 97 > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFConnection -> > >>> WFReportParams ) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Starting asender thread > (from > >>> drbd0_receiver [3117]) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: data-integrity-alg: > >>> <not-used> > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: meta connection shut > down by > >>> peer. > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: conn( WFReportParams -> > >>> NetworkFailure ) > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: asender terminated > >>>> Feb 16 18:00:28 server1 kernel: block drbd0: Terminating drbd0_asender > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Connection closed > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( NetworkFailure -> > >>> Unconnected ) > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver terminated > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: Restarting drbd0_receiver > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: receiver (re)started > >>>> Feb 16 18:00:30 server1 kernel: block drbd0: conn( Unconnected -> > >>> WFConnection ) > >>>> > >>>> 【セカンダリ側のcorosyncログ】 > >>>> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (1000) > >>>> Feb 16 18:00:23 server2 attrd[1557]: notice: attrd_perform_update: > Sent > >>> update 45: master-res_drbd_r0=1000 > >>>> Feb 16 18:00:24 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A > >>> processor failed, forming new configuration. > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 12: memb=1, > new=0, > >>> lost=1 > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server2 1711385280 > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:779 info: > >>> pcmk_peer_update: lost: server1 1694608064 > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 12: memb=1, new=0, > lost=0 > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:731 info: > >>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous > transition > >>>> Feb 16 18:00:25 server2 cib[1554]: notice: plugin_handle_membership: > >>> Membership 12: quorum lost > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] utils.c:358 info: > >>> update_member: Node 1694608064/server1 is now: lost > >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: > plugin_handle_membership: > >>> Membership 12: quorum lost > >>>> Feb 16 18:00:25 server2 corosync[1549]: [pcmk ] plugin.c:1368 info: > >>> send_member_notification: Sending membership update 12 to 2 children > >>>> Feb 16 18:00:25 server2 cib[1554]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now lost > (was > >>> member) > >>>> Feb 16 18:00:25 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now lost > (was > >>> member) > >>>> Feb 16 18:00:25 server2 crmd[1559]: warning: reap_dead_nodes: Our DC > node > >>> (server1) left the cluster > >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION > cause=C_FSA_INTERNAL > >>> origin=reap_dead_nodes ] > >>>> Feb 16 18:00:25 server2 corosync[1549]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1) > >>>> Feb 16 18:00:25 server2 corosync[1549]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:25 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > >>> cause=C_FSA_INTERNAL origin=do_election_check ] > >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_local_callback: > Sending > >>> full refresh (origin=crmd) > >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (1000) > >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_perform_update: > Sent > >>> update 47: master-res_drbd_r0=1000 > >>>> Feb 16 18:00:25 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: probe_complete (true) > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote > >>> res_drbd_r0:0#011(Slave -> Master server2) > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message: > >>> Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-0.bz2 > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 1: cancel res_drbd_r0_cancel_30000 on server2 (local) > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 56: notify res_drbd_r0_pre_notify_promote_0 on server2 (local) > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=56, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 0 > >>> (Complete=3, Pending=0, Fired=0, Skipped=5, Incomplete=4, > >>> Source=/var/lib/pacemaker/pengine/pe-input-0.bz2): Stopped > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: LogActions: Promote > >>> res_drbd_r0:0#011(Slave -> Master server2) > >>>> Feb 16 18:00:26 server2 pengine[1558]: notice: process_pe_message: > >>> Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2 > >>>> Feb 16 18:00:26 server2 crmd[1559]: warning: destroy_action: > Cancelling > >>> timer for action 1 (src=45) > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 55: notify res_drbd_r0_pre_notify_promote_0 on server2 (local) > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 16: memb=1, > new=0, > >>> lost=0 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server2 1711385280 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 16: memb=2, new=1, > lost=0 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] utils.c:358 info: > >>> update_member: Node 1694608064/server1 is now: member > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:791 info: > >>> pcmk_peer_update: NEW: server1 1694608064 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server1 1694608064 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:26 server2 corosync[1549]: [pcmk ] plugin.c:1368 info: > >>> send_member_notification: Sending membership update 16 to 2 children > >>>> Feb 16 18:00:26 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: > plugin_handle_membership: > >>> Membership 16: quorum acquired > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now > member (was > >>> lost) > >>>> Feb 16 18:00:26 server2 cib[1554]: notice: plugin_handle_membership: > >>> Membership 16: quorum acquired > >>>> Feb 16 18:00:26 server2 cib[1554]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now > member (was > >>> lost) > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=59, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 14: promote res_drbd_r0_promote_0 on server2 (local) > >>>> Feb 16 18:00:26 server2 corosync[1549]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0) > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_promote_0 (call=62, rc=0, cib-update=49, > confirmed=true) > >>> ok > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 56: notify res_drbd_r0_post_notify_promote_0 on server2 (local) > >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (10000) > >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update: > Sent > >>> update 51: master-res_drbd_r0=10000 > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=65, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: te_rsc_command: > Initiating > >>> action 15: monitor res_drbd_r0_monitor_10000 on server2 (local) > >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (10000) > >>>> Feb 16 18:00:26 server2 attrd[1557]: notice: attrd_perform_update: > Sent > >>> update 53: master-res_drbd_r0=10000 > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_monitor_10000 (call=68, rc=8, cib-update=50, > >>> confirmed=false) master > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: process_lrm_event: > >>> server2-res_drbd_r0_monitor_10000:68 [ \n ] > >>>> Feb 16 18:00:26 server2 crmd[1559]: notice: run_graph: Transition 1 > >>> (Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete > >>>> Feb 16 18:00:27 server2 corosync[1549]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:28 server2 corosync[1549]: [TOTEM ] totemsrp.c:1595 A > >>> processor failed, forming new configuration. > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 20: memb=1, > new=0, > >>> lost=1 > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server2 1711385280 > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:779 info: > >>> pcmk_peer_update: lost: server1 1694608064 > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 20: memb=1, new=0, > lost=0 > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:731 info: > >>> ais_mark_unseen_peer_dead: Node server1 was not seen in the previous > transition > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] utils.c:358 info: > >>> update_member: Node 1694608064/server1 is now: lost > >>>> Feb 16 18:00:29 server2 corosync[1549]: [pcmk ] plugin.c:1368 info: > >>> send_member_notification: Sending membership update 20 to 2 children > >>>> Feb 16 18:00:29 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:29 server2 cib[1554]: notice: plugin_handle_membership: > >>> Membership 20: quorum lost > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: > plugin_handle_membership: > >>> Membership 20: quorum lost > >>>> Feb 16 18:00:29 server2 cib[1554]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now lost > (was > >>> member) > >>>> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No > match for > >>> shutdown action on server1 > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback: > >>> Stonith/shutdown of server1 not matched > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now lost > (was > >>> member) > >>>> Feb 16 18:00:29 server2 crmd[1559]: warning: match_down_event: No > match for > >>> shutdown action on server1 > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: peer_update_callback: > >>> Stonith/shutdown of server1 not matched > >>>> Feb 16 18:00:29 server2 corosync[1549]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:2 left:1) > >>>> Feb 16 18:00:29 server2 corosync[1549]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: do_election_count_vote: > >>> Election 4 (current: 2, owner: server1): Processed vote from server1 > (Peer is > >>> not part of our cluster) > >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_local_callback: > Sending > >>> full refresh (origin=crmd) > >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (10000) > >>>> Feb 16 18:00:29 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: probe_complete (true) > >>>> Feb 16 18:00:29 server2 pengine[1558]: notice: unpack_config: On > loss of > >>> CCM Quorum: Ignore > >>>> Feb 16 18:00:29 server2 pengine[1558]: notice: process_pe_message: > >>> Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-2.bz2 > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: run_graph: Transition 2 > >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-input-2.bz2): Complete > >>>> Feb 16 18:00:29 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > >>> cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Transitional membership event on ring 24: memb=1, > new=0, > >>> lost=0 > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:773 info: > >>> pcmk_peer_update: memb: server2 1711385280 > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:760 > notice: > >>> pcmk_peer_update: Stable membership event on ring 24: memb=2, new=1, > lost=0 > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] utils.c:358 info: > >>> update_member: Node 1694608064/server1 is now: member > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:791 info: > >>> pcmk_peer_update: NEW: server1 1694608064 > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server1 1694608064 > >>>> Feb 16 18:00:30 server2 crmd[1559]: notice: > plugin_handle_membership: > >>> Membership 24: quorum acquired > >>>> Feb 16 18:00:30 server2 cib[1554]: notice: plugin_handle_membership: > >>> Membership 24: quorum acquired > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:808 info: > >>> pcmk_peer_update: MEMB: server2 1711385280 > >>>> Feb 16 18:00:30 server2 cib[1554]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now > member (was > >>> lost) > >>>> Feb 16 18:00:30 server2 corosync[1549]: [pcmk ] plugin.c:1368 info: > >>> send_member_notification: Sending membership update 24 to 2 children > >>>> Feb 16 18:00:30 server2 corosync[1549]: [TOTEM ] totemsrp.c:1892 A > >>> processor joined or left the membership and a new membership was > formed. > >>>> Feb 16 18:00:30 server2 crmd[1559]: notice: crm_update_peer_state: > >>> plugin_handle_membership: Node server1[1694608064] - state is now > member (was > >>> lost) > >>>> Feb 16 18:00:30 server2 corosync[1549]: [CPG ] cpg.c:783 chosen > >>> downlist: sender r(0) ip(192.170.1.102) ; members(old:1 left:0) > >>>> Feb 16 18:00:31 server2 corosync[1549]: [MAIN ] main.c:278 > Completed > >>> service synchronization, ready to provide service. > >>>> Feb 16 18:00:31 server2 attrd[1557]: warning: attrd_cib_callback: > Update 45 > >>> for master-res_drbd_r0=1000 failed: Application of an update diff > failed > >>>> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: > Resource > >>> update 49 failed: (rc=-62) Timer expired > >>>> Feb 16 18:00:56 server2 crmd[1559]: warning: cib_rsc_callback: > Resource > >>> update 50 failed: (rc=-62) Timer expired > >>>> Feb 16 18:02:26 server2 attrd[1557]: warning: attrd_cib_callback: > Update 51 > >>> for master-res_drbd_r0=10000 failed: Timer expired > >>>> Feb 16 18:02:27 server2 crmd[1559]: warning: crmd_ha_msg_filter: > Another DC > >>> detected: server1 (op=join_offer) > >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_IDLE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL > >>> origin=crmd_ha_msg_filter ] > >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC > >>> cause=C_FSA_INTERNAL origin=do_election_count_vote ] > >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: do_state_transition: > State > >>> transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE > >>> origin=do_cl_join_finalize_respond ] > >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_local_callback: > Sending > >>> full refresh (origin=crmd) > >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (10000) > >>>> Feb 16 18:02:27 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: probe_complete (true) > >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event: > >>> server2-res_drbd_r0_monitor_10000:68 [ \n ] > >>>> Feb 16 18:02:27 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=74, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=77, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_demote_0 (call=80, rc=0, cib-update=74, > confirmed=true) ok > >>>> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_trigger_update: > Sending > >>> flush op to all hosts for: master-res_drbd_r0 (1000) > >>>> Feb 16 18:02:28 server2 attrd[1557]: notice: attrd_perform_update: > Sent > >>> update 67: master-res_drbd_r0=1000 > >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_notify_0 (call=83, rc=0, cib-update=0, > confirmed=true) ok > >>>> Feb 16 18:02:28 server2 crmd[1559]: notice: process_lrm_event: LRM > >>> operation res_drbd_r0_monitor_30000 (call=86, rc=0, cib-update=75, > >>> confirmed=false) ok > >>>> > >>>> 【global_common.conf】 > >>>> global { > >>>> usage-count no; > >>>> } > >>>> > >>>> common { > >>>> protocol C; > >>>> handlers { > >>>> } > >>>> startup { > >>>> } > >>>> disk { > >>>> no-disk-barrier; > >>>> no-disk-flushes; > >>>> } > >>>> net { > >>>> max-buffers 8000; > >>>> max-epoch-size 8000; > >>>> after-sb-0pri discard-younger-primary; > >>>> after-sb-1pri discard-secondary; > >>>> after-sb-2pri disconnect; > >>>> } > >>>> syncer { > >>>> rate 30M; > >>>> verify-alg sha1; > >>>> } > >>>> } > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Linux-ha-japan mailing list > >>>> Linux****@lists***** > >>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>>> > >>>> > >>>> > >>> > >>> _______________________________________________ > >>> Linux-ha-japan mailing list > >>> Linux****@lists***** > >>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>> > >> > >>_______________________________________________ > >>Linux-ha-japan mailing list > >>Linux****@lists***** > >>http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >> > > > >_______________________________________________ > >Linux-ha-japan mailing list > >Linux****@lists***** > >http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > > > > > > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > -------------- next part -------------- HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B... Télécharger