[Linux-ha-jp] heartbeatのフェイルオーバー時postgres起動について

Back to archive index

takahasi hideo hideo_tk960****@hotma*****
2011年 5月 23日 (月) 12:05:50 JST


 
高橋です。
 
xxx_ech_db02のpostgresql_logを確認致しましたが、
エラーログは特に出力されていなかったようです。
 
以下xxx_ech_db01とxxx_ech_db02のログとなります。
ログ
xxx_ech_db01ログ ここから
May 18 05:04:50 XXX-ECH-DB01 postgres[14875]: [1-1] FATAL:  terminating connection due to administrator command
May 18 05:04:50 XXX-ECH-DB01 postgres[10848]: [1-1] FATAL:  terminating connection due to administrator command
May 18 05:04:50 XXX-ECH-DB01 postgres[10848]: [1-2] STATEMENT:  delete from tbTargetMemb where ditargid=527 and diusrid<=81310016
May 18 05:04:50 XXX-ECH-DB01 postgres[10863]: [1-1] FATAL:  terminating connection due to administrator command
May 18 05:04:50 XXX-ECH-DB01 postgres[10863]: [1-2] STATEMENT:  update tbecanalysisall set dihistno=0 where dihistno=1
May 18 05:04:50 XXX-ECH-DB01 postgres[15123]: [5-1] LOG:  shutting down
May 18 05:04:53 XXX-ECH-DB01 postgres[15123]: [6-1] LOG:  database system is shut down
xxx_ech_db01ログ ここまで
 
xxx_ech_db02ログ ここから
May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [1-1] LOG:  database system was shut down at 2011-05-18 05:04:53 JST
May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [2-1] LOG:  checkpoint record is at B8/BC0AADC0
May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [3-1] LOG:  redo record is at B8/BC0AADC0; undo record is at 0/0; shutdown TRUE
May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [4-1] LOG:  next transaction ID: 0/2723015574; next OID: 2286526
May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [5-1] LOG:  next MultiXactId: 1; next MultiXactOffset: 0
May 18 08:06:27 XX-ECH-DB02 postgres[12634]: [6-1] LOG:  database system is ready ← 手動でpostgresのサービス起動してます。
xxx_ech_db02ログ ここまで

 また以下にmeesge.logを記載します。
xxx_ech_db01ログ ここから
May 18 05:06:01 xxx-ECH-DB01 check_primary: Failed to postgres service
May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/local/cluster/db/check_primary process group 12297 with signal 15
May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/mgmtd -v process group 12296 with signal 15
May 18 05:06:01 xxx-ECH-DB01 mgmtd: [12296]: info: mgmtd is shutting down
May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/crmd process group 12295 with signal 15
May 18 05:06:01 xxx-ECH-DB01 crmd: [12295]: info: crm_shutdown: Requesting shutdown
May 18 05:06:01 xxx-ECH-DB01 crmd: [12295]: info: do_shutdown_req: Sending shutdown request to DC: xxx-ech-db02
May 18 05:06:03 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=50:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:03 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify
May 18 05:06:03 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=16, rc=0) complete 
May 18 05:06:04 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=fs0_stop_0 key=38:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:04 xxx-ECH-DB01 lrmd: [12292]: info: rsc:fs0: stop
May 18 05:06:04 xxx-ECH-DB01 Filesystem[16469]: INFO: Running stop for /dev/drbd0 on /data
May 18 05:06:04 xxx-ECH-DB01 Filesystem[16469]: INFO: Trying to unmount /data
May 18 05:06:05 xxx-ECH-DB01 Filesystem[16469]: INFO: unmounted /data successfully
May 18 05:06:05 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation fs0_stop_0 (call=17, rc=0) complete 
May 18 05:06:06 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=iPaddr_stop_0 key=36:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: rsc:iPaddr: stop
May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (iPaddr:stop:stdout) In IP Stop 
May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (iPaddr:stop:stderr) SIOCDELRT: No such process 
May 18 05:06:06 xxx-ECH-DB01 IPaddr[16529]: INFO: ifconfig eth0:0 down
May 18 05:06:06 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation iPaddr_stop_0 (call=18, rc=0) complete 
May 18 05:06:08 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_demote_0 key=5:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:08 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: demote
May 18 05:06:08 xxx-ECH-DB01 kernel: drbd0: Primary/Secondary --> Secondary/Secondary
May 18 05:06:09 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:demote:stdout)  
May 18 05:06:09 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_demote_0 (call=19, rc=0) complete 
May 18 05:06:10 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=51:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:10 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify
May 18 05:06:11 xxx-ECH-DB01 crm_master: [16661]: info: Invoked: /usr/sbin/crm_master -l reboot -v 75 
May 18 05:06:12 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:notify:stdout) No set matching id=master-6cd1d0b5-ff8a-429a-81c2-db36ebb522e7 in status 
May 18 05:06:12 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=20, rc=0) complete 
May 18 05:06:12 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=49:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:12 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify
May 18 05:06:13 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=21, rc=0) complete 
May 18 05:06:13 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_stop_0 key=6:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:13 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: stop
May 18 05:06:14 xxx-ECH-DB01 crm_master: [17125]: info: Invoked: /usr/sbin/crm_master -l reboot -D 
May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: apply_xml_diff: Digest mis-match: expected 3555f6a93ffadf12e00efd1b47e3d030, calculated 0ce2f9b0a66eba20f9bf6b7b9840cbd9
May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Diff 0.106.49 -> 0.106.50 not applied to 0.106.49: Failed application of a global update.  Requesting full refresh.
May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update.  Requesting full refresh.
May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
May 18 05:06:15 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:stop:stdout)  
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate Connected --> Unconnected
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbd0_receiver [14187]: cstate Unconnected --> BrokenPipe
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: short read expecting header on sock: r=-512
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: worker terminated
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: asender terminated
May 18 05:06:15 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:stop:stdout)  
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbd0_receiver [14187]: cstate BrokenPipe --> StandAlone
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: Connection lost.
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: receiver terminated
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate StandAlone --> StandAlone
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate StandAlone --> Unconfigured
May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: worker terminated
May 18 05:06:15 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_stop_0 (call=22, rc=0) complete 
May 18 05:06:16 xxx-ECH-DB01 cib: [12291]: info: cib_replace_notify: Replaced: 0.106.49 -> 0.106.50 from <null>
May 18 05:06:16 xxx-ECH-DB01 crmd: [12295]: info: populate_cib_nodes: Requesting the list of configured nodes
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: notice: populate_cib_nodes: Node: xxx-ech-db02 (uuid: 9c26a919-fb58-4b77-8755-aee23da6a63d)
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: notice: populate_cib_nodes: Node: xxx-ech-db01 (uuid: 6cd1d0b5-ff8a-429a-81c2-db36ebb522e7)
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ]
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_shutdown: All subsystems stopped, continuing
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_control: Disconnected from the LRM
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_ha_control: Disconnected from Heartbeat
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_cib_control: Disconnecting CIB
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_exit: [crmd] stopped (0)
May 18 05:06:17 xxx-ECH-DB01 ccm: [12290]: info: client (pid=12295) removed from ccm
May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected
May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply
May 18 05:06:17 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/attrd process group 12294 with signal 15
May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected
May 18 05:06:17 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x747688)
May 18 05:06:17 xxx-ECH-DB01 attrd: [12294]: info: attrd_shutdown: Exiting
May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply
May 18 05:06:17 xxx-ECH-DB01 attrd: [12294]: info: main: Exiting...
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected
May 18 05:06:18 xxx-ECH-DB01 attrd: [12294]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply
May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/stonithd process group 12293 with signal 15
May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x747688)
May 18 05:06:18 xxx-ECH-DB01 stonithd: [12293]: notice: /usr/lib64/heartbeat/stonithd normally quit.
May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/lrmd -r process group 12292 with signal 15
May 18 05:06:18 xxx-ECH-DB01 lrmd: [12292]: info: lrmd is shutting down
May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/cib process group 12291 with signal 15
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_shutdown: Disconnected 0 clients
May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x747688)
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_disconnect: All clients disconnected...
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: initiate_exit: Sending disconnect notification to 2 peers...
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: apply_xml_diff: Digest mis-match: expected 0368e591221554460353f9a6766d975d, calculated c17ea9017490404e5d36b4fdaa9e7419
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Diff 0.106.55 -> 0.106.56 not applied to 0.106.55: Failed application of a global update.  Requesting full refresh.
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update.  Requesting full refresh.
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_diff: Not applying diff 0.106.56 -> 0.106.57 (sync in progress)
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_diff: Not applying diff 0.106.57 -> 0.106.58 (sync in progress)
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: cib_process_shutdown_req: Shutdown ACK from xxx-ech-db02
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: terminate_ha_connection: cib_process_shutdown_req: Disconnecting heartbeat
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: cib_ha_connection_destroy: Heartbeat disconnection complete... exiting
May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: main: Done
May 18 05:06:19 xxx-ECH-DB01 ccm: [12290]: info: client (pid=12291) removed from ccm
May 18 05:06:19 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/ccm process group 12290 with signal 15
May 18 05:06:19 xxx-ECH-DB01 ccm: [12290]: info: received SIGTERM, going to shut down
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBFIFO process 11871 with signal 15
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBWRITE process 11872 with signal 15
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBREAD process 11873 with signal 15
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBWRITE process 11874 with signal 15
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBREAD process 11875 with signal 15
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11871 exited. 5 remaining
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11872 exited. 4 remaining
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11873 exited. 3 remaining
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11874 exited. 2 remaining
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11875 exited. 1 remaining
May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: xxx-ech-db01 Heartbeat shutdown complete.
xxx_ech_db01ログ ここまで
 
xxx_ech_db02ログ ここから
May 18 05:04:54 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:05:40 xxx-ECH-DB02 cib: [18086]: info: cib_stats: Processed 5 operations (2000.00us average, 0% utilization) in the last 10min
May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: handle_shutdown_request: Creating shutdown request for xxx-ech-db01
May 18 05:06:02 xxx-ECH-DB02 tengine: [20587]: info: extract_event: Aborting on shutdown attribute for 6cd1d0b5-ff8a-429a-81c2-db36ebb522e7
May 18 05:06:02 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort priority upgraded to 1000000
May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: cluster_option: Using deprecated name 'default_resource_stickiness' for cluster option 'default-resource-stickiness'
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db02 is online
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db01 is shutting down
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: unpack_rsc_op: Processing failed op pgsql0_monitor_30000 on xxx-ech-db01: Timed Out
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_rsc_op: Making sure pgsql0 doesn't come up again
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: clone_print: Master/Slave Set: ms-drbd0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     drbd0:0 (ocf::heartbeat:drbd): Master xxx-ech-db01
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     drbd0:1 (ocf::heartbeat:drbd): Started xxx-ech-db02
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: group_print: Resource Group: postDb
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     iPaddr (ocf::heartbeat:IPaddr): Started xxx-ech-db01
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     fs0 (ocf::heartbeat:Filesystem): Started xxx-ech-db01
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     pgsql0 (ocf::heartbeat:pgsql): Stopped 
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource drbd0:0 cannot run anywhere
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: Promoting drbd0:1 (Slave xxx-ech-db02)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource pgsql0 cannot run anywhere
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Stop resource drbd0:0 (Master xxx-ech-db01)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: DemoteRsc: xxx-ech-db01 Demote drbd0:0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc:   xxx-ech-db01 Stop drbd0:0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Stop resource drbd0:0 (Master xxx-ech-db01)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: DemoteRsc: xxx-ech-db01 Demote drbd0:0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc:   xxx-ech-db01 Stop drbd0:0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Leave resource iPaddr (Started xxx-ech-db02)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc:   xxx-ech-db01 Stop iPaddr
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc:  xxx-ech-db02 Start iPaddr
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Leave resource fs0 (Started xxx-ech-db02)
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc:   xxx-ech-db01 Stop fs0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc:  xxx-ech-db02 Start fs0
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: stage6: Scheduling Node xxx-ech-db01 for shutdown
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-warn-24.bz2
May 18 05:06:03 xxx-ECH-DB02 pengine: [20588]: WARN: process_pe_message: Transition 6: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-24.bz2
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: unpack_graph: Unpacked transition 6: 40 actions in 40 synapses
May 18 05:06:03 xxx-ECH-DB02 pengine: [20588]: info: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 31 fired and confirmed
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 41 fired and confirmed
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 47 fired and confirmed
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 50: notify drbd0:0_pre_notify_demote_0 on xxx-ech-db01
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 56: notify drbd0:1_pre_notify_promote_0 on xxx-ech-db02
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 58: notify drbd0:1_pre_notify_demote_0 on xxx-ech-db02
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 38: stop fs0_stop_0 on xxx-ech-db01
May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=56:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:03 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=58:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:03 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=10, rc=0) complete 
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_promote_0 (56) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=11, rc=0) complete 
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 26 fired and confirmed
May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_demote_0 (58) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:04 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_pre_notify_demote_0 (50) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:04 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 32 fired and confirmed
May 18 05:06:06 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action fs0_stop_0 (38) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:06 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 36: stop iPaddr_stop_0 on xxx-ech-db01
May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action iPaddr_stop_0 (36) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 42 fired and confirmed
May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 29 fired and confirmed
May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 5: demote drbd0:0_demote_0 on xxx-ech-db01
May 18 05:06:09 xxx-ECH-DB02 kernel: drbd0: Secondary/Primary --> Secondary/Secondary
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_demote_0 (5) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 30 fired and confirmed
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 33 fired and confirmed
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 51: notify drbd0:0_post_notify_demote_0 on xxx-ech-db01
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 59: notify drbd0:1_post_notify_demote_0 on xxx-ech-db02
May 18 05:06:10 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=59:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:10 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:10 xxx-ECH-DB02 crm_master: [30986]: info: Invoked: /usr/sbin/crm_master -l reboot -v 75 
May 18 05:06:10 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status 
May 18 05:06:10 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=12, rc=0) complete 
May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_demote_0 (59) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_post_notify_demote_0 (51) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 34 fired and confirmed
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 49: notify drbd0:0_pre_notify_stop_0 on xxx-ech-db01
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 54: notify drbd0:1_pre_notify_stop_0 on xxx-ech-db02
May 18 05:06:12 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=54:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:12 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:12 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=13, rc=0) complete 
May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_stop_0 (54) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_pre_notify_stop_0 (49) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 17 fired and confirmed
May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 6: stop drbd0:0_stop_0 on xxx-ech-db01
May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: te_update_diff: Aborting on transient_attributes deletions
May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort priority upgraded to 1000000
May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort action 0 superceeded by 2
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: sock was shut down by peer
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate Connected --> BrokenPipe
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: short read expecting header on sock: r=0
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: worker terminated
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: meta connection shut down by peer.
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: asender terminated
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate BrokenPipe --> Unconnected
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: Connection lost.
May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate Unconnected --> WFConnection
May 18 05:06:15 xxx-ECH-DB02 cib: [18086]: info: sync_our_cib: Syncing CIB to xxx-ech-db01
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_stop_0 (6) confirmed on xxx-ech-db01 (rc=0)
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 18 fired and confirmed
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 21 fired and confirmed
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 55: notify drbd0:1_post_notify_stop_0 on xxx-ech-db02
May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=55:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:15 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:15 xxx-ECH-DB02 crm_master: [31670]: info: Invoked: /usr/sbin/crm_master -l reboot -v 10 
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: extract_event: Aborting on transient_attributes changes for 9c26a919-fb58-4b77-8755-aee23da6a63d
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_update_diff: Aborting on transient_attributes deletions
May 18 05:06:15 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status 
May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=14, rc=0) complete 
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_stop_0 (55) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 22 fired and confirmed
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: run_graph: ====================================================
May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: notice: run_graph: Transition 6: (Complete=29, Pending=0, Fired=0, Skipped=7, Incomplete=4)
May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: All 2 cluster nodes are eligible to run resources.
May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: WARN: cluster_option: Using deprecated name 'default_resource_stickiness' for cluster option 'default-resource-stickiness'
May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db02 is online
May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db01 is shutting down
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: unpack_rsc_op: Processing failed op pgsql0_monitor_30000 on xxx-ech-db01: Timed Out
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_rsc_op: Making sure pgsql0 doesn't come up again
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: clone_print: Master/Slave Set: ms-drbd0
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     drbd0:0 (ocf::heartbeat:drbd): Stopped 
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     drbd0:1 (ocf::heartbeat:drbd): Started xxx-ech-db02
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: group_print: Resource Group: postDb
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     iPaddr (ocf::heartbeat:IPaddr): Stopped 
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     fs0 (ocf::heartbeat:Filesystem): Stopped 
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print:     pgsql0 (ocf::heartbeat:pgsql): Stopped 
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource drbd0:0 cannot run anywhere
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: Promoting drbd0:1 (Slave xxx-ech-db02)
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource pgsql0 cannot run anywhere
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02)
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02)
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc:  xxx-ech-db02 Start iPaddr
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc:  xxx-ech-db02 Start fs0
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: stage6: Scheduling Node xxx-ech-db01 for shutdown
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense
May 18 05:06:16 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-warn-25.bz2
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: process_pe_message: Transition 7: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-25.bz2
May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: unpack_graph: Unpacked transition 7: 13 actions in 13 synapses
May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 23 fired and confirmed
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_crm_command: Executing crm-event (40): do_shutdown on xxx-ech-db01
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 53: notify drbd0:1_pre_notify_promote_0 on xxx-ech-db02
May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=53:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=15, rc=0) complete 
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_promote_0 (53) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 21 fired and confirmed
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 8: promote drbd0:1_promote_0 on xxx-ech-db02
May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_promote_0 key=8:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: promote
May 18 05:06:17 xxx-ECH-DB02 kernel: drbd0: Secondary/Unknown --> Primary/Unknown
May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:promote:stdout)  
May 18 05:06:17 xxx-ECH-DB02 drbd[31691]: INFO: drbd0 promote: primary succeeded
May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_promote_0 (call=16, rc=0) complete 
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_promote_0 (8) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 22 fired and confirmed
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 54: notify drbd0:1_post_notify_promote_0 on xxx-ech-db02
May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=54:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: notice: crmd_client_status_callback: Status update: Client xxx-ech-db01/crmd now has status [offline]
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: erase_node_from_join: Removed dead node xxx-ech-db01 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
May 18 05:06:18 xxx-ECH-DB02 crm_master: [31881]: info: Invoked: /usr/sbin/crm_master -l reboot -v 10 
May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status 
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=17, rc=0) complete 
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_promote_0 (54) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 26 fired and confirmed
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 35 fired and confirmed
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 33: start iPaddr_start_0 on xxx-ech-db02
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=iPaddr_start_0 key=33:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: rsc:iPaddr: start
May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: Using calculated nic for 192.168.22.110: eth0
May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: Using calculated netmask for 192.168.22.110: 255.255.255.0
May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: eval ifconfig eth0:0 192.168.22.110 netmask 255.255.255.0 broadcast 192.168.22.255
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation iPaddr_start_0 (call=18, rc=0) complete 
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action iPaddr_start_0 (33) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 34: start fs0_start_0 on xxx-ech-db02
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=fs0_start_0 key=34:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c)
May 18 05:06:18 xxx-ECH-DB02 cib: [18086]: info: cib_process_shutdown_req: Shutdown REQ from xxx-ech-db01
May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: rsc:fs0: start
May 18 05:06:18 xxx-ECH-DB02 Filesystem[31992]: INFO: Running start for /dev/drbd0 on /data
May 18 05:06:18 xxx-ECH-DB02 kernel: kjournald starting.  Commit interval 5 seconds
May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3-fs warning: checktime reached, running e2fsck is recommended
May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3 FS on drbd0, internal journal
May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3-fs: mounted filesystem with ordered data mode.
May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation fs0_start_0 (call=19, rc=0) complete 
May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action fs0_start_0 (34) confirmed on xxx-ech-db02 (rc=0)
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: sync_our_cib: Syncing CIB to xxx-ech-db01
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_client_status_callback: Status update: Client xxx-ech-db01/cib now has status [leave]
May 18 05:06:19 xxx-ECH-DB02 ccm: [18085]: info: Break tie for 2 nodes cluster
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: no mbr_track info
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: no mbr_track info
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_ccm_msg_callback: LOST: xxx-ech-db01
May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_ccm_msg_callback: PEER: xxx-ech-db02
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: erase_node_from_join: Removed dead node xxx-ech-db01 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=0
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: run_graph: Transition 7: (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0)
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail:  CURRENT: xxx-ech-db02 [nodeid=1, born=3]
May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: notify_crmd: Transition 7 status: te_complete - <null>
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail:  LOST:    xxx-ech-db01 [nodeid=0, born=2]
May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
May 18 05:06:50 xxx-ECH-DB02 heartbeat: [17481]: WARN: node xxx-ech-db01: is dead
May 18 05:06:50 xxx-ECH-DB02 heartbeat: [17481]: info: Link xxx-ech-db01:eth2 dead.
May 18 05:06:50 xxx-ECH-DB02 crmd: [18090]: notice: crmd_ha_status_callback: Status update: Node xxx-ech-db01 now has status [dead]
xxx_ech_db02ログ ここまで
 
 
以上 よろしくお願いします。
 
> Date: Mon, 23 May 2011 09:32:12 +0900
> From: iwasa****@3ware*****
> To: linux****@lists*****
> Subject: Re: [Linux-ha-jp] heartbeatのフェイルオーバー時postgres起動について
> 
> 岩崎@サードウェアです
> 
> > heartbeat、DRBD、postgresqlを使用して、
> > データベースクラスタサーバを構築しております
> >
> > xxx_ech_db01サーバ (master)障害が発生時、
> > xxx_ech_db02サーバ(slave)にフェイルオーバーを行うように設定しております。
> >
> > xxx_ech_db01サーバ障害(postgresにて処理に負荷が発生し、postgresのサービスが異常終了)が発生した時に、
> > xxx_ech_db02サーバにフェイルオーバーされていたのですが(masterにはなっており、mountもされていました)
> > がpostgresのサービスが起動しておりませんでした。
> 
> この場合Heartbeatは正常にフェイルオーバー処理がされているようですね。
> となると、PostgreSQLが起動しない原因は、PostgreSQL自体の起動プロセス中に問題が発生している可能性が
> 高いかと思います。PostgreSQLのログを見て何かエラーが発生しているか、長い処理が発生していないかを
> 確認してみてはいかがでしょうか。
> 
> 異常終了したときにデータが壊れてしまって起動しないというのも考えられないことではありません。
> DRBDは論理的な破壊が発生した場合、その破壊した情報も正常にレプリケーションしますので、フェイルオーバー
> 後にデータエラーで起動しない事も考えられます。
> 
> _______________________________________________
> Linux-ha-japan mailing list
> Linux****@lists*****
> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan
 		 	   		  
-------------- next part --------------
HTMLの添付ファイルを保管しました...
Télécharger 



Linux-ha-japan メーリングリストの案内
Back to archive index