Discussion:
[drbd-mc] Some VMs ignored by drbd-mc
Whit Blauvelt
2011-09-01 15:08:33 UTC
Permalink
Hi,

I like drbd-mc - don't use all its capabilities, but for setting up DRBD
between KVM VMs, each on its own DRBD-on-LVM resource, it's handy. So I've
got about 18 of these VMs over two servers, split between primary and
secondary roles.

There are two that drbd-mc is simply failing to show, and so is no help in
setting up and maintaining the DRBD aspect for. In this case the names are
"geos3" and "geos4", and they're running:

# virsh list --all
Id Name State
----------------------------------
5 geos4 running
24 geos2 running
25 geos3 running
...

There's also a "geos2" VM, which drbd-mc does recognize. Could it be so
simple as drbd-mc not liking a name that's a duplicate except for a digit
change? I suspect that because there's also a "geos3" which it's not
finding, although that one's currently not running.

The xml files are all in /etc/libvirt/qemu, and there's no significant
difference in the definitions:

# diff geos4.xml geos2.xml
2,3c2,3
< <name>geos4</name>
< <uuid>7b62c4b9-82ed-6d06-e303-e2eff79b24d5</uuid>
---
<name>geos2</name>
<uuid>e4a496b3-5f7d-b27a-6f86-113ec34327b1</uuid>
24c24
< <source dev='/dev/womb/geos4'/>
---
<source dev='/dev/womb/geos2'/>
38c38
< <mac address='00:16:36:10:a5:ab'/>
---
<mac address='00:16:36:22:24:a0'/>
So what's drbd-mc's blind spot from here? Any suggestions on how to get it
to recognize the reality?

Thanks,
Whit
Whit Blauvelt
2011-09-01 15:29:01 UTC
Permalink
Note of clarification: drbd-mc does recognize the storage for the "missing"
VMs, so the DRBD aspect can be set up. It's just under the VMs menu that a
couple of the running VMs go missing - perhaps because it can't see more
than one VM with a name that matches except for a different digit on the end
... or something strange.

- Whit
Whit Blauvelt
2011-09-01 15:47:36 UTC
Permalink
After confirming that 0.9.4 and 0.9.5 had the same problem with showing all
the VMs as 0.9.7, I was logged into a different VM by ssh (called "geos_x")
- not on a DRBD resource in this case, but drbd-mc had recognized it as
existing. In the ssh session I did a shutdown. I also had a console opened
into it via drbd-mc at the time, and the VMs list open in drbd-mc.

On shutting down geos_x, drbd-mc blinked a number of times, and finally
stabilized with the "missing" VMs showing in the VMs list.

Okay, what's that about?

Whit
Rasto Levrinc
2011-09-05 11:46:54 UTC
Permalink
Post by Whit Blauvelt
After confirming that 0.9.4 and 0.9.5 had the same problem with showing all
the VMs as 0.9.7, I was logged into a different VM by ssh (called "geos_x")
- not on a DRBD resource in this case, but drbd-mc had recognized it as
existing. In the ssh session I did a shutdown. I also had a console opened
into it via drbd-mc at the time, and the VMs list open in drbd-mc.
On shutting down geos_x, drbd-mc blinked a number of times, and finally
stabilized with the "missing" VMs showing in the VMs list.
Okay, what's that about?
Hi,

first there can be some mix-up with uuids, like having the same uuid
for different VMs across the cluster.

DRBD MC looks for the /etc/libvirt/qemu/*.xml to get a list of VMs. If
you use sudo make sure the user can read this directory.

Then it calls e.g. "virsh dominfo geos3" and "virsh dumpxml geos3" to
see if they are defined.

you can also try to execute "/usr/local/bin/drbd-gui-helper-0.9.7
get-vm-info" on all cluster nodes to see what is going on.

Rasto Levrinc
Whit Blauvelt
2011-09-05 12:47:01 UTC
Permalink
Post by Rasto Levrinc
Post by Whit Blauvelt
After confirming that 0.9.4 and 0.9.5 had the same problem with showing all
the VMs as 0.9.7, I was logged into a different VM by ssh (called "geos_x")
- not on a DRBD resource in this case, but drbd-mc had recognized it as
existing. In the ssh session I did a shutdown. I also had a console opened
into it via drbd-mc at the time, and the VMs list open in drbd-mc.
On shutting down geos_x, drbd-mc blinked a number of times, and finally
stabilized with the "missing" VMs showing in the VMs list.
Okay, what's that about?
first there can be some mix-up with uuids, like having the same uuid
for different VMs across the cluster.
Just checked. Not the case.
Post by Rasto Levrinc
DRBD MC looks for the /etc/libvirt/qemu/*.xml to get a list of VMs. If
you use sudo make sure the user can read this directory.
I don't use sudo. I do everything as root.
Post by Rasto Levrinc
Then it calls e.g. "virsh dominfo geos3" and "virsh dumpxml geos3" to
see if they are defined.
Of course they're defined. They're running! At this point drbd has, as I
mentioned above, rediscovered them. But they were running the whole time,
and fully available to normal operations through virsh - as they still are.
Post by Rasto Levrinc
you can also try to execute "/usr/local/bin/drbd-gui-helper-0.9.7
get-vm-info" on all cluster nodes to see what is going on.
Nice script. But it's just calling virsh, isn't it? So since virsh has had
no problem, where drbd-mc has, how will that diagnose drbd-mc's problem?

I do like drbd-mc, a lot. Just wanted you to know about the bug.

Best,
Whit
Rasto Levrinc
2011-09-05 14:31:16 UTC
Permalink
Post by Whit Blauvelt
Nice script. But it's just calling virsh, isn't it? So since virsh has had
no problem, where drbd-mc has, how will that diagnose drbd-mc's problem?
The bug could be there, but actually the bug was in some misguided
optimization and should be fixed now. Try

dmctest-0.9.8.dev.2.jar

http://oss.linbit.com/drbd-mc/

Rasto Levrinc

Loading...