takahasi hideo
hideo_tk960****@hotma*****
2011年 5月 23日 (月) 12:05:50 JST
高橋です。 xxx_ech_db02のpostgresql_logを確認致しましたが、 エラーログは特に出力されていなかったようです。 以下xxx_ech_db01とxxx_ech_db02のログとなります。 ログ xxx_ech_db01ログ ここから May 18 05:04:50 XXX-ECH-DB01 postgres[14875]: [1-1] FATAL: terminating connection due to administrator command May 18 05:04:50 XXX-ECH-DB01 postgres[10848]: [1-1] FATAL: terminating connection due to administrator command May 18 05:04:50 XXX-ECH-DB01 postgres[10848]: [1-2] STATEMENT: delete from tbTargetMemb where ditargid=527 and diusrid<=81310016 May 18 05:04:50 XXX-ECH-DB01 postgres[10863]: [1-1] FATAL: terminating connection due to administrator command May 18 05:04:50 XXX-ECH-DB01 postgres[10863]: [1-2] STATEMENT: update tbecanalysisall set dihistno=0 where dihistno=1 May 18 05:04:50 XXX-ECH-DB01 postgres[15123]: [5-1] LOG: shutting down May 18 05:04:53 XXX-ECH-DB01 postgres[15123]: [6-1] LOG: database system is shut down xxx_ech_db01ログ ここまで xxx_ech_db02ログ ここから May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [1-1] LOG: database system was shut down at 2011-05-18 05:04:53 JST May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [2-1] LOG: checkpoint record is at B8/BC0AADC0 May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [3-1] LOG: redo record is at B8/BC0AADC0; undo record is at 0/0; shutdown TRUE May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [4-1] LOG: next transaction ID: 0/2723015574; next OID: 2286526 May 18 08:06:26 XXX-ECH-DB02 postgres[12634]: [5-1] LOG: next MultiXactId: 1; next MultiXactOffset: 0 May 18 08:06:27 XX-ECH-DB02 postgres[12634]: [6-1] LOG: database system is ready ← 手動でpostgresのサービス起動してます。 xxx_ech_db02ログ ここまで また以下にmeesge.logを記載します。 xxx_ech_db01ログ ここから May 18 05:06:01 xxx-ECH-DB01 check_primary: Failed to postgres service May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/local/cluster/db/check_primary process group 12297 with signal 15 May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/mgmtd -v process group 12296 with signal 15 May 18 05:06:01 xxx-ECH-DB01 mgmtd: [12296]: info: mgmtd is shutting down May 18 05:06:01 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/crmd process group 12295 with signal 15 May 18 05:06:01 xxx-ECH-DB01 crmd: [12295]: info: crm_shutdown: Requesting shutdown May 18 05:06:01 xxx-ECH-DB01 crmd: [12295]: info: do_shutdown_req: Sending shutdown request to DC: xxx-ech-db02 May 18 05:06:03 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=50:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:03 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify May 18 05:06:03 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=16, rc=0) complete May 18 05:06:04 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=fs0_stop_0 key=38:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:04 xxx-ECH-DB01 lrmd: [12292]: info: rsc:fs0: stop May 18 05:06:04 xxx-ECH-DB01 Filesystem[16469]: INFO: Running stop for /dev/drbd0 on /data May 18 05:06:04 xxx-ECH-DB01 Filesystem[16469]: INFO: Trying to unmount /data May 18 05:06:05 xxx-ECH-DB01 Filesystem[16469]: INFO: unmounted /data successfully May 18 05:06:05 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation fs0_stop_0 (call=17, rc=0) complete May 18 05:06:06 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=iPaddr_stop_0 key=36:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: rsc:iPaddr: stop May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (iPaddr:stop:stdout) In IP Stop May 18 05:06:06 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (iPaddr:stop:stderr) SIOCDELRT: No such process May 18 05:06:06 xxx-ECH-DB01 IPaddr[16529]: INFO: ifconfig eth0:0 down May 18 05:06:06 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation iPaddr_stop_0 (call=18, rc=0) complete May 18 05:06:08 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_demote_0 key=5:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:08 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: demote May 18 05:06:08 xxx-ECH-DB01 kernel: drbd0: Primary/Secondary --> Secondary/Secondary May 18 05:06:09 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:demote:stdout) May 18 05:06:09 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_demote_0 (call=19, rc=0) complete May 18 05:06:10 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=51:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:10 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify May 18 05:06:11 xxx-ECH-DB01 crm_master: [16661]: info: Invoked: /usr/sbin/crm_master -l reboot -v 75 May 18 05:06:12 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:notify:stdout) No set matching id=master-6cd1d0b5-ff8a-429a-81c2-db36ebb522e7 in status May 18 05:06:12 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=20, rc=0) complete May 18 05:06:12 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_notify_0 key=49:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:12 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: notify May 18 05:06:13 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=21, rc=0) complete May 18 05:06:13 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_rsc_op: Performing op=drbd0:0_stop_0 key=6:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:13 xxx-ECH-DB01 lrmd: [12292]: info: rsc:drbd0:0: stop May 18 05:06:14 xxx-ECH-DB01 crm_master: [17125]: info: Invoked: /usr/sbin/crm_master -l reboot -D May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: apply_xml_diff: Digest mis-match: expected 3555f6a93ffadf12e00efd1b47e3d030, calculated 0ce2f9b0a66eba20f9bf6b7b9840cbd9 May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Diff 0.106.49 -> 0.106.50 not applied to 0.106.49: Failed application of a global update. Requesting full refresh. May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update. Requesting full refresh. May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh May 18 05:06:15 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh May 18 05:06:15 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:stop:stdout) May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate Connected --> Unconnected May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbd0_receiver [14187]: cstate Unconnected --> BrokenPipe May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: short read expecting header on sock: r=-512 May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: worker terminated May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: asender terminated May 18 05:06:15 xxx-ECH-DB01 lrmd: [12292]: info: RA output: (drbd0:0:stop:stdout) May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbd0_receiver [14187]: cstate BrokenPipe --> StandAlone May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: Connection lost. May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: receiver terminated May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate StandAlone --> StandAlone May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: drbdsetup [17137]: cstate StandAlone --> Unconfigured May 18 05:06:15 xxx-ECH-DB01 kernel: drbd0: worker terminated May 18 05:06:15 xxx-ECH-DB01 crmd: [12295]: info: process_lrm_event: LRM operation drbd0:0_stop_0 (call=22, rc=0) complete May 18 05:06:16 xxx-ECH-DB01 cib: [12291]: info: cib_replace_notify: Replaced: 0.106.49 -> 0.106.50 from <null> May 18 05:06:16 xxx-ECH-DB01 crmd: [12295]: info: populate_cib_nodes: Requesting the list of configured nodes May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: notice: populate_cib_nodes: Node: xxx-ech-db02 (uuid: 9c26a919-fb58-4b77-8755-aee23da6a63d) May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: notice: populate_cib_nodes: Node: xxx-ech-db01 (uuid: 6cd1d0b5-ff8a-429a-81c2-db36ebb522e7) May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP cause=C_HA_MESSAGE origin=route_message ] May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_shutdown: All subsystems stopped, continuing May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_lrm_control: Disconnected from the LRM May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_ha_control: Disconnected from Heartbeat May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_cib_control: Disconnecting CIB May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: crmd_cib_connection_destroy: Connection to the CIB terminated... May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ] May 18 05:06:17 xxx-ECH-DB01 crmd: [12295]: info: do_exit: [crmd] stopped (0) May 18 05:06:17 xxx-ECH-DB01 ccm: [12290]: info: client (pid=12295) removed from ccm May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply May 18 05:06:17 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/attrd process group 12294 with signal 15 May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected May 18 05:06:17 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x747688) May 18 05:06:17 xxx-ECH-DB01 attrd: [12294]: info: attrd_shutdown: Exiting May 18 05:06:17 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply May 18 05:06:17 xxx-ECH-DB01 attrd: [12294]: info: main: Exiting... May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: send_via_callback_channel: Client 60b845dc-611b-4d18-b768-e55c3d34ce58 has disconnected May 18 05:06:18 xxx-ECH-DB01 attrd: [12294]: info: attrd_cib_connection_destroy: Connection to the CIB terminated... May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_local_notify: A-Sync reply to 12295 failed: client left before we could send reply May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/stonithd process group 12293 with signal 15 May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x747688) May 18 05:06:18 xxx-ECH-DB01 stonithd: [12293]: notice: /usr/lib64/heartbeat/stonithd normally quit. May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/lrmd -r process group 12292 with signal 15 May 18 05:06:18 xxx-ECH-DB01 lrmd: [12292]: info: lrmd is shutting down May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/cib process group 12291 with signal 15 May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_shutdown: Disconnected 0 clients May 18 05:06:18 xxx-ECH-DB01 heartbeat: [11868]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x747688) May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_disconnect: All clients disconnected... May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: initiate_exit: Sending disconnect notification to 2 peers... May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: apply_xml_diff: Digest mis-match: expected 0368e591221554460353f9a6766d975d, calculated c17ea9017490404e5d36b4fdaa9e7419 May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Diff 0.106.55 -> 0.106.56 not applied to 0.106.55: Failed application of a global update. Requesting full refresh. May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: info: cib_process_diff: Requesting re-sync from peer: Failed application of a global update. Requesting full refresh. May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_diff: Not applying diff 0.106.56 -> 0.106.57 (sync in progress) May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh May 18 05:06:18 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_diff: Not applying diff 0.106.57 -> 0.106.58 (sync in progress) May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: do_cib_notify: cib_apply_diff of <diff > FAILED: Application of an update diff failed, requesting a full refresh May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: WARN: cib_process_request: cib_apply_diff operation failed: Application of an update diff failed, requesting a full refresh May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: cib_process_shutdown_req: Shutdown ACK from xxx-ech-db02 May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: terminate_ha_connection: cib_process_shutdown_req: Disconnecting heartbeat May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: cib_ha_connection_destroy: Heartbeat disconnection complete... exiting May 18 05:06:19 xxx-ECH-DB01 cib: [12291]: info: main: Done May 18 05:06:19 xxx-ECH-DB01 ccm: [12290]: info: client (pid=12291) removed from ccm May 18 05:06:19 xxx-ECH-DB01 heartbeat: [11868]: info: killing /usr/lib64/heartbeat/ccm process group 12290 with signal 15 May 18 05:06:19 xxx-ECH-DB01 ccm: [12290]: info: received SIGTERM, going to shut down May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBFIFO process 11871 with signal 15 May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBWRITE process 11872 with signal 15 May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBREAD process 11873 with signal 15 May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBWRITE process 11874 with signal 15 May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: killing HBREAD process 11875 with signal 15 May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11871 exited. 5 remaining May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11872 exited. 4 remaining May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11873 exited. 3 remaining May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11874 exited. 2 remaining May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: Core process 11875 exited. 1 remaining May 18 05:06:20 xxx-ECH-DB01 heartbeat: [11868]: info: xxx-ech-db01 Heartbeat shutdown complete. xxx_ech_db01ログ ここまで xxx_ech_db02ログ ここから May 18 05:04:54 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] May 18 05:05:40 xxx-ECH-DB02 cib: [18086]: info: cib_stats: Processed 5 operations (2000.00us average, 0% utilization) in the last 10min May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: handle_shutdown_request: Creating shutdown request for xxx-ech-db01 May 18 05:06:02 xxx-ECH-DB02 tengine: [20587]: info: extract_event: Aborting on shutdown attribute for 6cd1d0b5-ff8a-429a-81c2-db36ebb522e7 May 18 05:06:02 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort priority upgraded to 1000000 May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] May 18 05:06:02 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: cluster_option: Using deprecated name 'default_resource_stickiness' for cluster option 'default-resource-stickiness' May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db02 is online May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db01 is shutting down May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: unpack_rsc_op: Processing failed op pgsql0_monitor_30000 on xxx-ech-db01: Timed Out May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_rsc_op: Making sure pgsql0 doesn't come up again May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: clone_print: Master/Slave Set: ms-drbd0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print: drbd0:0 (ocf::heartbeat:drbd): Master xxx-ech-db01 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print: drbd0:1 (ocf::heartbeat:drbd): Started xxx-ech-db02 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: group_print: Resource Group: postDb May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print: iPaddr (ocf::heartbeat:IPaddr): Started xxx-ech-db01 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print: fs0 (ocf::heartbeat:Filesystem): Started xxx-ech-db01 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: native_print: pgsql0 (ocf::heartbeat:pgsql): Stopped May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource drbd0:0 cannot run anywhere May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: Promoting drbd0:1 (Slave xxx-ech-db02) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource pgsql0 cannot run anywhere May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Stop resource drbd0:0 (Master xxx-ech-db01) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: DemoteRsc: xxx-ech-db01 Demote drbd0:0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc: xxx-ech-db01 Stop drbd0:0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Stop resource drbd0:0 (Master xxx-ech-db01) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: DemoteRsc: xxx-ech-db01 Demote drbd0:0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc: xxx-ech-db01 Stop drbd0:0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Leave resource iPaddr (Started xxx-ech-db02) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc: xxx-ech-db01 Stop iPaddr May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc: xxx-ech-db02 Start iPaddr May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Leave resource fs0 (Started xxx-ech-db02) May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StopRsc: xxx-ech-db01 Stop fs0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc: xxx-ech-db02 Start fs0 May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: info: stage6: Scheduling Node xxx-ech-db01 for shutdown May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:02 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-warn-24.bz2 May 18 05:06:03 xxx-ECH-DB02 pengine: [20588]: WARN: process_pe_message: Transition 6: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-24.bz2 May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: unpack_graph: Unpacked transition 6: 40 actions in 40 synapses May 18 05:06:03 xxx-ECH-DB02 pengine: [20588]: info: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 25 fired and confirmed May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 31 fired and confirmed May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 41 fired and confirmed May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 47 fired and confirmed May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 50: notify drbd0:0_pre_notify_demote_0 on xxx-ech-db01 May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 56: notify drbd0:1_pre_notify_promote_0 on xxx-ech-db02 May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 58: notify drbd0:1_pre_notify_demote_0 on xxx-ech-db02 May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 38: stop fs0_stop_0 on xxx-ech-db01 May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=56:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:03 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=58:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:03 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=10, rc=0) complete May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_promote_0 (56) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:03 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=11, rc=0) complete May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 26 fired and confirmed May 18 05:06:03 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_demote_0 (58) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:04 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_pre_notify_demote_0 (50) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:04 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 32 fired and confirmed May 18 05:06:06 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action fs0_stop_0 (38) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:06 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 36: stop iPaddr_stop_0 on xxx-ech-db01 May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action iPaddr_stop_0 (36) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 42 fired and confirmed May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 29 fired and confirmed May 18 05:06:07 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 5: demote drbd0:0_demote_0 on xxx-ech-db01 May 18 05:06:09 xxx-ECH-DB02 kernel: drbd0: Secondary/Primary --> Secondary/Secondary May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_demote_0 (5) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 30 fired and confirmed May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 33 fired and confirmed May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 51: notify drbd0:0_post_notify_demote_0 on xxx-ech-db01 May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 59: notify drbd0:1_post_notify_demote_0 on xxx-ech-db02 May 18 05:06:10 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=59:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:10 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:10 xxx-ECH-DB02 crm_master: [30986]: info: Invoked: /usr/sbin/crm_master -l reboot -v 75 May 18 05:06:10 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status May 18 05:06:10 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=12, rc=0) complete May 18 05:06:10 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_demote_0 (59) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_post_notify_demote_0 (51) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 34 fired and confirmed May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 19 fired and confirmed May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 49: notify drbd0:0_pre_notify_stop_0 on xxx-ech-db01 May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 54: notify drbd0:1_pre_notify_stop_0 on xxx-ech-db02 May 18 05:06:12 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=54:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:12 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:12 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=13, rc=0) complete May 18 05:06:12 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_stop_0 (54) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_pre_notify_stop_0 (49) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 20 fired and confirmed May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 17 fired and confirmed May 18 05:06:13 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 6: stop drbd0:0_stop_0 on xxx-ech-db01 May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: te_update_diff: Aborting on transient_attributes deletions May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort priority upgraded to 1000000 May 18 05:06:14 xxx-ECH-DB02 tengine: [20587]: info: update_abort_priority: Abort action 0 superceeded by 2 May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: sock was shut down by peer May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate Connected --> BrokenPipe May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: short read expecting header on sock: r=0 May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: worker terminated May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: meta connection shut down by peer. May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: asender terminated May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate BrokenPipe --> Unconnected May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: Connection lost. May 18 05:06:15 xxx-ECH-DB02 kernel: drbd0: drbd0_receiver [20796]: cstate Unconnected --> WFConnection May 18 05:06:15 xxx-ECH-DB02 cib: [18086]: info: sync_our_cib: Syncing CIB to xxx-ech-db01 May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:0_stop_0 (6) confirmed on xxx-ech-db01 (rc=0) May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 18 fired and confirmed May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 21 fired and confirmed May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 55: notify drbd0:1_post_notify_stop_0 on xxx-ech-db02 May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=55:6:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:15 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:15 xxx-ECH-DB02 crm_master: [31670]: info: Invoked: /usr/sbin/crm_master -l reboot -v 10 May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: extract_event: Aborting on transient_attributes changes for 9c26a919-fb58-4b77-8755-aee23da6a63d May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_update_diff: Aborting on transient_attributes deletions May 18 05:06:15 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=14, rc=0) complete May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_stop_0 (55) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 22 fired and confirmed May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: info: run_graph: ==================================================== May 18 05:06:15 xxx-ECH-DB02 tengine: [20587]: notice: run_graph: Transition 6: (Complete=29, Pending=0, Fired=0, Skipped=7, Incomplete=4) May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] May 18 05:06:15 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: WARN: cluster_option: Using deprecated name 'default_resource_stickiness' for cluster option 'default-resource-stickiness' May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db02 is online May 18 05:06:15 xxx-ECH-DB02 pengine: [20588]: info: determine_online_status: Node xxx-ech-db01 is shutting down May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: unpack_rsc_op: Processing failed op pgsql0_monitor_30000 on xxx-ech-db01: Timed Out May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_rsc_op: Making sure pgsql0 doesn't come up again May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: clone_print: Master/Slave Set: ms-drbd0 May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print: drbd0:0 (ocf::heartbeat:drbd): Stopped May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print: drbd0:1 (ocf::heartbeat:drbd): Started xxx-ech-db02 May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: group_print: Resource Group: postDb May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print: iPaddr (ocf::heartbeat:IPaddr): Stopped May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print: fs0 (ocf::heartbeat:Filesystem): Stopped May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: native_print: pgsql0 (ocf::heartbeat:pgsql): Stopped May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource drbd0:0 cannot run anywhere May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: Promoting drbd0:1 (Slave xxx-ech-db02) May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: master_color: ms-drbd0: Promoted 1 instances of a possible 1 to master May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: native_color: Resource pgsql0 cannot run anywhere May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02) May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: NoRoleChange: Promote drbd0:1 (Slave -> Master xxx-ech-db02) May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc: xxx-ech-db02 Start iPaddr May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: notice: StartRsc: xxx-ech-db02 Start fs0 May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: stage6: Scheduling Node xxx-ech-db01 for shutdown May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: ERROR: unpack_operation: Specifying on_fail=fence and stonith-enabled=false makes no sense May 18 05:06:16 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe-warn-25.bz2 May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: WARN: process_pe_message: Transition 7: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-warn-25.bz2 May 18 05:06:16 xxx-ECH-DB02 pengine: [20588]: info: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: unpack_graph: Unpacked transition 7: 13 actions in 13 synapses May 18 05:06:16 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 23 fired and confirmed May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_crm_command: Executing crm-event (40): do_shutdown on xxx-ech-db01 May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 53: notify drbd0:1_pre_notify_promote_0 on xxx-ech-db02 May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=53:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=15, rc=0) complete May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_pre_notify_promote_0 (53) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 24 fired and confirmed May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 21 fired and confirmed May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 8: promote drbd0:1_promote_0 on xxx-ech-db02 May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_promote_0 key=8:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: promote May 18 05:06:17 xxx-ECH-DB02 kernel: drbd0: Secondary/Unknown --> Primary/Unknown May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:promote:stdout) May 18 05:06:17 xxx-ECH-DB02 drbd[31691]: INFO: drbd0 promote: primary succeeded May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_promote_0 (call=16, rc=0) complete May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_promote_0 (8) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 22 fired and confirmed May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 25 fired and confirmed May 18 05:06:17 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 54: notify drbd0:1_post_notify_promote_0 on xxx-ech-db02 May 18 05:06:17 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=drbd0:1_notify_0 key=54:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:17 xxx-ECH-DB02 lrmd: [18087]: info: rsc:drbd0:1: notify May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: notice: crmd_client_status_callback: Status update: Client xxx-ech-db01/crmd now has status [offline] May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: erase_node_from_join: Removed dead node xxx-ech-db01 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 May 18 05:06:18 xxx-ECH-DB02 crm_master: [31881]: info: Invoked: /usr/sbin/crm_master -l reboot -v 10 May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: RA output: (drbd0:1:notify:stdout) No set matching id=master-9c26a919-fb58-4b77-8755-aee23da6a63d in status May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation drbd0:1_notify_0 (call=17, rc=0) complete May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action drbd0:1_post_notify_promote_0 (54) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 26 fired and confirmed May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: te_pseudo_action: Pseudo action 35 fired and confirmed May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 33: start iPaddr_start_0 on xxx-ech-db02 May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=iPaddr_start_0 key=33:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: rsc:iPaddr: start May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: Using calculated nic for 192.168.22.110: eth0 May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: Using calculated netmask for 192.168.22.110: 255.255.255.0 May 18 05:06:18 xxx-ECH-DB02 IPaddr[31888]: INFO: eval ifconfig eth0:0 192.168.22.110 netmask 255.255.255.0 broadcast 192.168.22.255 May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation iPaddr_start_0 (call=18, rc=0) complete May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action iPaddr_start_0 (33) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:18 xxx-ECH-DB02 tengine: [20587]: info: send_rsc_command: Initiating action 34: start fs0_start_0 on xxx-ech-db02 May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: do_lrm_rsc_op: Performing op=fs0_start_0 key=34:7:0:69c1e75a-51c8-4a00-a8c4-26ad8b6a447c) May 18 05:06:18 xxx-ECH-DB02 cib: [18086]: info: cib_process_shutdown_req: Shutdown REQ from xxx-ech-db01 May 18 05:06:18 xxx-ECH-DB02 lrmd: [18087]: info: rsc:fs0: start May 18 05:06:18 xxx-ECH-DB02 Filesystem[31992]: INFO: Running start for /dev/drbd0 on /data May 18 05:06:18 xxx-ECH-DB02 kernel: kjournald starting. Commit interval 5 seconds May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3-fs warning: checktime reached, running e2fsck is recommended May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3 FS on drbd0, internal journal May 18 05:06:18 xxx-ECH-DB02 kernel: EXT3-fs: mounted filesystem with ordered data mode. May 18 05:06:18 xxx-ECH-DB02 crmd: [18090]: info: process_lrm_event: LRM operation fs0_start_0 (call=19, rc=0) complete May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: match_graph_event: Action fs0_start_0 (34) confirmed on xxx-ech-db02 (rc=0) May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: sync_our_cib: Syncing CIB to xxx-ech-db01 May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_client_status_callback: Status update: Client xxx-ech-db01/cib now has status [leave] May 18 05:06:19 xxx-ECH-DB02 ccm: [18085]: info: Break tie for 2 nodes cluster May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: no mbr_track info May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: no mbr_track info May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3 May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3) May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_ccm_msg_callback: LOST: xxx-ech-db01 May 18 05:06:19 xxx-ECH-DB02 cib: [18086]: info: cib_ccm_msg_callback: PEER: xxx-ech-db02 May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: erase_node_from_join: Removed dead node xxx-ech-db01 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=0 May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3 May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: run_graph: Transition 7: (Complete=13, Pending=0, Fired=0, Skipped=0, Incomplete=0) May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail: CURRENT: xxx-ech-db02 [nodeid=1, born=3] May 18 05:06:19 xxx-ECH-DB02 tengine: [20587]: info: notify_crmd: Transition 7 status: te_complete - <null> May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: ccm_event_detail: LOST: xxx-ech-db01 [nodeid=0, born=2] May 18 05:06:19 xxx-ECH-DB02 crmd: [18090]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] May 18 05:06:50 xxx-ECH-DB02 heartbeat: [17481]: WARN: node xxx-ech-db01: is dead May 18 05:06:50 xxx-ECH-DB02 heartbeat: [17481]: info: Link xxx-ech-db01:eth2 dead. May 18 05:06:50 xxx-ECH-DB02 crmd: [18090]: notice: crmd_ha_status_callback: Status update: Node xxx-ech-db01 now has status [dead] xxx_ech_db02ログ ここまで 以上 よろしくお願いします。 > Date: Mon, 23 May 2011 09:32:12 +0900 > From: iwasa****@3ware***** > To: linux****@lists***** > Subject: Re: [Linux-ha-jp] heartbeatのフェイルオーバー時postgres起動について > > 岩崎@サードウェアです > > > heartbeat、DRBD、postgresqlを使用して、 > > データベースクラスタサーバを構築しております > > > > xxx_ech_db01サーバ (master)障害が発生時、 > > xxx_ech_db02サーバ(slave)にフェイルオーバーを行うように設定しております。 > > > > xxx_ech_db01サーバ障害(postgresにて処理に負荷が発生し、postgresのサービスが異常終了)が発生した時に、 > > xxx_ech_db02サーバにフェイルオーバーされていたのですが(masterにはなっており、mountもされていました) > > がpostgresのサービスが起動しておりませんでした。 > > この場合Heartbeatは正常にフェイルオーバー処理がされているようですね。 > となると、PostgreSQLが起動しない原因は、PostgreSQL自体の起動プロセス中に問題が発生している可能性が > 高いかと思います。PostgreSQLのログを見て何かエラーが発生しているか、長い処理が発生していないかを > 確認してみてはいかがでしょうか。 > > 異常終了したときにデータが壊れてしまって起動しないというのも考えられないことではありません。 > DRBDは論理的な破壊が発生した場合、その破壊した情報も正常にレプリケーションしますので、フェイルオーバー > 後にデータエラーで起動しない事も考えられます。 > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan -------------- next part -------------- HTMLの添付ファイルを保管しました... Télécharger