Xen shutdown fails to release DRBD device

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Xen shutdown fails to release DRBD device

Valentin Vidic
Hi,

Any idea why would Xen occasionally fail to release the block
device on shutdown?

Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
Aug 21 22:56:00 node1 kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
Aug 21 22:56:00 node1 kernel: [ 2222.278304] block drbd6:   state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
Aug 21 22:56:00 node1 kernel: [ 2222.278340] block drbd6:  wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.

It seems like /etc/xen/scripts/block-drbd is called a bit too early just
sometimes. This is Xen 4.8.4 with DRBD 8.9.10 from Debian stretch.

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Roger Pau Monné-3
On Wed, Aug 22, 2018 at 03:24:24PM +0200, Valentin Vidic wrote:

> Hi,
>
> Any idea why would Xen occasionally fail to release the block
> device on shutdown?
>
> Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> Aug 21 22:56:00 node1 kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> Aug 21 22:56:00 node1 kernel: [ 2222.278304] block drbd6:   state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> Aug 21 22:56:00 node1 kernel: [ 2222.278340] block drbd6:  wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
>
> It seems like /etc/xen/scripts/block-drbd is called a bit too early just
> sometimes. This is Xen 4.8.4 with DRBD 8.9.10 from Debian stretch.

The hotplug scripts are called when the backend releases the devices
(ie: switches to state 6).

This seems to be an issue with the drbd state itself, and some kind of
primary/secondary change? (according to the log above).

The drdb hotplug script is not maintained by the Xen project, so I
would suggest you to contact the drbd project [0] in order to try to
figure out what's wrong.

Thanks, Roger.

[0]  http://lists.linbit.com/

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Wed, Aug 22, 2018 at 03:34:20PM +0200, Roger Pau Monné wrote:

> On Wed, Aug 22, 2018 at 03:24:24PM +0200, Valentin Vidic wrote:
> > Hi,
> >
> > Any idea why would Xen occasionally fail to release the block
> > device on shutdown?
> >
> > Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > Aug 21 22:56:00 node1 kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > Aug 21 22:56:00 node1 kernel: [ 2222.278304] block drbd6:   state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > Aug 21 22:56:00 node1 kernel: [ 2222.278340] block drbd6:  wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > Aug 21 22:56:00 node1 root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> >
> > It seems like /etc/xen/scripts/block-drbd is called a bit too early just
> > sometimes. This is Xen 4.8.4 with DRBD 8.9.10 from Debian stretch.
>
> The hotplug scripts are called when the backend releases the devices
> (ie: switches to state 6).
>
> This seems to be an issue with the drbd state itself, and some kind of
> primary/secondary change? (according to the log above).
>
> The drdb hotplug script is not maintained by the Xen project, so I
> would suggest you to contact the drbd project [0] in order to try to
> figure out what's wrong.

Right, block-drbd tries to move the device to Secondary state on
shutdown but this fails because something in Xen is still holding the
device open. Not sure if anything can be done here on the DRBD side
except retrying. This only happens when shutting down several VMs at
once, so it seems like some kind of timing problem in the backend
release?

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Wed, Aug 22, 2018 at 03:42:51PM +0200, Valentin Vidic wrote:
> Right, block-drbd tries to move the device to Secondary state on
> shutdown but this fails because something in Xen is still holding the
> device open. Not sure if anything can be done here on the DRBD side
> except retrying. This only happens when shutting down several VMs at
> once, so it seems like some kind of timing problem in the backend
> release?

DRBD end for this seems rather simple, it only checks if the
device->open_cnt is zero. So it would seem like drbd_release
was not called yet when the block-drbd script is run?


static enum drbd_state_rv
is_valid_state(struct drbd_device *device, union drbd_state ns)
{
...
        else if (ns.role == R_SECONDARY && device->open_cnt)
                rv = SS_DEVICE_IN_USE;
...
}

static void drbd_release(struct gendisk *gd, fmode_t mode)
{
        struct drbd_device *device = gd->private_data;
        mutex_lock(&drbd_main_mutex);
        device->open_cnt--;
        mutex_unlock(&drbd_main_mutex);
}

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Wed, Aug 22, 2018 at 03:55:46PM +0200, Valentin Vidic wrote:

> DRBD end for this seems rather simple, it only checks if the
> device->open_cnt is zero. So it would seem like drbd_release
> was not called yet when the block-drbd script is run?
>
>
> static enum drbd_state_rv
> is_valid_state(struct drbd_device *device, union drbd_state ns)
> {
> ...
>         else if (ns.role == R_SECONDARY && device->open_cnt)
>                 rv = SS_DEVICE_IN_USE;
> ...
> }
>
> static void drbd_release(struct gendisk *gd, fmode_t mode)
> {
>         struct drbd_device *device = gd->private_data;
>         mutex_lock(&drbd_main_mutex);
>         device->open_cnt--;
>         mutex_unlock(&drbd_main_mutex);
> }

On the Xen side it seems that XenbusStateClosed event is sent
to xenbus to run the block-drbd script.  However the call to
xen_blkif_disconnect in the line before that can fail with -EBUSY
if there is still some in-flight IO for the device.  Could it be
that a lot of IO during shutdown is holding the DRBD device open
while the block-drbd script has already started running?

/*
 * Callback received when the frontend's state changes.
 */
static void frontend_changed(struct xenbus_device *dev,
                             enum xenbus_state frontend_state)
{
...
        case XenbusStateClosed:
                xen_blkif_disconnect(be->blkif);
                xenbus_switch_state(dev, XenbusStateClosed);
                if (xenbus_dev_is_online(dev))
                        break;
                /* fall through if not online */
...
}

Maybe the XenbusStateClosed event should only be send when the
device is closed in xen_blkif_free or some other place?

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Roger Pau Monné-3
On Wed, Aug 22, 2018 at 05:39:28PM +0200, Valentin Vidic wrote:

> On Wed, Aug 22, 2018 at 03:55:46PM +0200, Valentin Vidic wrote:
> > DRBD end for this seems rather simple, it only checks if the
> > device->open_cnt is zero. So it would seem like drbd_release
> > was not called yet when the block-drbd script is run?
> >
> >
> > static enum drbd_state_rv
> > is_valid_state(struct drbd_device *device, union drbd_state ns)
> > {
> > ...
> >         else if (ns.role == R_SECONDARY && device->open_cnt)
> >                 rv = SS_DEVICE_IN_USE;
> > ...
> > }
> >
> > static void drbd_release(struct gendisk *gd, fmode_t mode)
> > {
> >         struct drbd_device *device = gd->private_data;
> >         mutex_lock(&drbd_main_mutex);
> >         device->open_cnt--;
> >         mutex_unlock(&drbd_main_mutex);
> > }
>
> On the Xen side it seems that XenbusStateClosed event is sent
> to xenbus to run the block-drbd script.  However the call to
> xen_blkif_disconnect in the line before that can fail with -EBUSY
> if there is still some in-flight IO for the device.  Could it be
> that a lot of IO during shutdown is holding the DRBD device open
> while the block-drbd script has already started running?
>
> /*
>  * Callback received when the frontend's state changes.
>  */
> static void frontend_changed(struct xenbus_device *dev,
>                              enum xenbus_state frontend_state)
> {
> ...
>         case XenbusStateClosed:
>                 xen_blkif_disconnect(be->blkif);
>                 xenbus_switch_state(dev, XenbusStateClosed);
>                 if (xenbus_dev_is_online(dev))
>                         break;
>                 /* fall through if not online */
> ...
> }
>
> Maybe the XenbusStateClosed event should only be send when the
> device is closed in xen_blkif_free or some other place?

Can you add some debug prints to check if xen_blkif_disconnect is
indeed returning EBUSY (or some error) and that's preventing the
device from closing correctly?

Thanks, Roger.

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Wed, Aug 22, 2018 at 05:51:54PM +0200, Roger Pau Monné wrote:
> Can you add some debug prints to check if xen_blkif_disconnect is
> indeed returning EBUSY (or some error) and that's preventing the
> device from closing correctly?

These are production nodes, but I'll try that on some test machines...

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Wed, Aug 22, 2018 at 06:23:01PM +0200, Valentin Vidic wrote:
> On Wed, Aug 22, 2018 at 05:51:54PM +0200, Roger Pau Monné wrote:
> > Can you add some debug prints to check if xen_blkif_disconnect is
> > indeed returning EBUSY (or some error) and that's preventing the
> > device from closing correctly?
>
> These are production nodes, but I'll try that on some test machines...

Managed to reproduce this and xen_blkif_disconnect is always returning 0
like you expected.  So this is some other issue, and from what I can tell
blkdev_put of the underlying drbd device gets called some time after
xenbus_switch_state(dev, XenbusStateClosed).  Any idea how to make sure
it happens in the opposite order: blkdev_put before XenbusStateClosed?

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Xen shutdown fails to release DRBD device

Valentin Vidic
On Fri, Aug 24, 2018 at 06:22:32PM +0200, Valentin Vidic wrote:
> Managed to reproduce this and xen_blkif_disconnect is always returning 0
> like you expected.  So this is some other issue, and from what I can tell
> blkdev_put of the underlying drbd device gets called some time after
> xenbus_switch_state(dev, XenbusStateClosed).  Any idea how to make sure
> it happens in the opposite order: blkdev_put before XenbusStateClosed?

Moving the XenbusStateClosed call to xen_blkif_free seems to help,
let me know if you think there is better solution for this?

static void xen_blkif_free(struct xen_blkif *blkif)
{

        WARN_ON(xen_blkif_disconnect(blkif));
        xen_vbd_free(&blkif->vbd);
        xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
        kfree(blkif->be->mode);
        kfree(blkif->be);

        /* Make sure everything is drained before shutting down */
        kmem_cache_free(xen_blkif_cachep, blkif);
}

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users