Peter Luciak
2010-02-12 13:41:10 UTC
Hi,
I decided to switch the cluster stack from Heartbeat to Corosync on a
running cluster -- with some small problems this went smoothly. However
now in the MC I see one node online and the second node as offline
("Waiting for cluster status"). Apparently it cannot detect that
Pacemaker is running there, as it says so in the tooltip ;) (even after
reboots, etc.) The cluster itself is running fine and I'm able to
migrate/stop/start all resources correctly.
This is the exception it throws:
APPERROR: AppError.Text
release: 0.5.2
getHeartbeatLibPath() called to soon: unknown arch
java.lang.Throwable
at drbd.utilities.Tools.appError(Tools.java:603)
at drbd.utilities.Tools.appError(Tools.java:545)
at drbd.data.Host.getHeartbeatLibPath(Host.java:902)
at drbd.data.DrbdXML.parseSection(DrbdXML.java:575)
at drbd.data.DrbdXML.<init>(DrbdXML.java:196)
at drbd.gui.ClusterBrowser$8.output(ClusterBrowser.java:847)
at drbd.utilities.SSH$ExecCommandThread.execOneCommand(SSH.java:537)
at drbd.utilities.SSH$ExecCommandThread.exec(SSH.java:670)
at drbd.utilities.SSH$ExecCommandThread.run(SSH.java:634)
I compared `drbdadm dump-xml` and it is the same on both nodes:
<config file="/etc/drbd.conf">
<common protocol="C">
<section name="net">
<option name="cram-hmac-alg" value="sha1"/>
<option name="shared-secret" value="iblcls"/>
<option name="after-sb-0pri" value="discard-younger-primary"/>
<option name="after-sb-1pri" value="consensus"/>
<option name="after-sb-2pri" value="disconnect"/>
<option name="rr-conflict" value="disconnect"/>
</section>
<section name="disk">
<option name="on-io-error" value="detach"/>
</section>
<section name="syncer">
<option name="rate" value="80M"/>
</section>
<section name="startup">
<option name="wfc-timeout" value="0"/>
<option name="degr-wfc-timeout" value="90"/>
<option name="outdated-wfc-timeout" value="90"/>
</section>
<section name="handlers">
<option name="pri-on-incon-degr"
value="/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f"/>
<option name="pri-lost-after-sb"
value="/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f"/>
<option name="local-io-error"
value="/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f"/>
</section>
</common>
<resource name="app" protocol="C">
<host name="iblcls1">
<device minor="0">/dev/drbd0</device>
<disk>/dev/md2</disk>
<address family="ipv4" port="7788">10.10.10.1</address>
<meta-disk>internal</meta-disk>
</host>
<host name="iblcls2">
<device minor="0">/dev/drbd0</device>
<disk>/dev/md2</disk>
<address family="ipv4" port="7788">10.10.10.2</address>
<meta-disk>internal</meta-disk>
</host>
</resource>
<resource name="data" protocol="C">
<host name="iblcls1">
<device minor="1">/dev/drbd1</device>
<disk>/dev/md3</disk>
<address family="ipv4" port="7789">10.10.10.1</address>
<meta-disk>internal</meta-disk>
</host>
<host name="iblcls2">
<device minor="1">/dev/drbd1</device>
<disk>/dev/md3</disk>
<address family="ipv4" port="7789">10.10.10.2</address>
<meta-disk>internal</meta-disk>
</host>
</resource>
</config>
Attaching also `cibadmin -Ql` from the offline node.
--
Peter LUCIAK (Peter.Luciak at iblsoft.com)
IBL Software Engineering, http://www.iblsoft.com/
Mierov? 103, 82105 Bratislava, Slovakia
Phone: +421-2-32662111, Fax: +421-2-32662110
Direct: +421-2-32662175
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib2.xml
Type: text/xml
Size: 48158 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-mc/attachments/20100212/b93c918b/attachment-0001.bin>
I decided to switch the cluster stack from Heartbeat to Corosync on a
running cluster -- with some small problems this went smoothly. However
now in the MC I see one node online and the second node as offline
("Waiting for cluster status"). Apparently it cannot detect that
Pacemaker is running there, as it says so in the tooltip ;) (even after
reboots, etc.) The cluster itself is running fine and I'm able to
migrate/stop/start all resources correctly.
This is the exception it throws:
APPERROR: AppError.Text
release: 0.5.2
getHeartbeatLibPath() called to soon: unknown arch
java.lang.Throwable
at drbd.utilities.Tools.appError(Tools.java:603)
at drbd.utilities.Tools.appError(Tools.java:545)
at drbd.data.Host.getHeartbeatLibPath(Host.java:902)
at drbd.data.DrbdXML.parseSection(DrbdXML.java:575)
at drbd.data.DrbdXML.<init>(DrbdXML.java:196)
at drbd.gui.ClusterBrowser$8.output(ClusterBrowser.java:847)
at drbd.utilities.SSH$ExecCommandThread.execOneCommand(SSH.java:537)
at drbd.utilities.SSH$ExecCommandThread.exec(SSH.java:670)
at drbd.utilities.SSH$ExecCommandThread.run(SSH.java:634)
I compared `drbdadm dump-xml` and it is the same on both nodes:
<config file="/etc/drbd.conf">
<common protocol="C">
<section name="net">
<option name="cram-hmac-alg" value="sha1"/>
<option name="shared-secret" value="iblcls"/>
<option name="after-sb-0pri" value="discard-younger-primary"/>
<option name="after-sb-1pri" value="consensus"/>
<option name="after-sb-2pri" value="disconnect"/>
<option name="rr-conflict" value="disconnect"/>
</section>
<section name="disk">
<option name="on-io-error" value="detach"/>
</section>
<section name="syncer">
<option name="rate" value="80M"/>
</section>
<section name="startup">
<option name="wfc-timeout" value="0"/>
<option name="degr-wfc-timeout" value="90"/>
<option name="outdated-wfc-timeout" value="90"/>
</section>
<section name="handlers">
<option name="pri-on-incon-degr"
value="/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f"/>
<option name="pri-lost-after-sb"
value="/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b >
/proc/sysrq-trigger ; reboot -f"/>
<option name="local-io-error"
value="/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
/proc/sysrq-trigger ; halt -f"/>
</section>
</common>
<resource name="app" protocol="C">
<host name="iblcls1">
<device minor="0">/dev/drbd0</device>
<disk>/dev/md2</disk>
<address family="ipv4" port="7788">10.10.10.1</address>
<meta-disk>internal</meta-disk>
</host>
<host name="iblcls2">
<device minor="0">/dev/drbd0</device>
<disk>/dev/md2</disk>
<address family="ipv4" port="7788">10.10.10.2</address>
<meta-disk>internal</meta-disk>
</host>
</resource>
<resource name="data" protocol="C">
<host name="iblcls1">
<device minor="1">/dev/drbd1</device>
<disk>/dev/md3</disk>
<address family="ipv4" port="7789">10.10.10.1</address>
<meta-disk>internal</meta-disk>
</host>
<host name="iblcls2">
<device minor="1">/dev/drbd1</device>
<disk>/dev/md3</disk>
<address family="ipv4" port="7789">10.10.10.2</address>
<meta-disk>internal</meta-disk>
</host>
</resource>
</config>
Attaching also `cibadmin -Ql` from the offline node.
--
Peter LUCIAK (Peter.Luciak at iblsoft.com)
IBL Software Engineering, http://www.iblsoft.com/
Mierov? 103, 82105 Bratislava, Slovakia
Phone: +421-2-32662111, Fax: +421-2-32662110
Direct: +421-2-32662175
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cib2.xml
Type: text/xml
Size: 48158 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-mc/attachments/20100212/b93c918b/attachment-0001.bin>