Data corruption after migration

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Data corruption after migration

Bugzilla from pgadmin@pse-consulting.de
Recently, I encountered file system data corruption on several systems
after a migration (ext4, xfs, zfs). So far, I haven't been able to nail
down the cause for it.

All VMs affected run Debian Jessie with 3.16 kernel. Possibly newer 3.16
kernels are more likely to suffer corruption.

This happened on a Xen 4.1 cluster (yes, really old) using a SAN storage
system, but also on a Xen 4.8 cluster with DRBD mirroring. All systems
are working for >>1 year now, only recently those filesystem corruption
started to happen. Data blocks seem to get randomly garbled.

Migration of Debian Stretch or Windows VMs didn't show any anomalies so far.

Can anyone shed some light on this?

Regards,

Andreas



_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Data corruption with xl migrate

Bugzilla from pgadmin@pse-consulting.de
Still experiencing data corruption when migrating 3.16 VMs from one host
to another...
Seems independent of Xen version or storage backend.

Regards
Andreas

Am 13.12.17 um 09:12 schrieb Andreas Pflug:

> Recently, I encountered file system data corruption on several systems
> after a migration (ext4, xfs, zfs). So far, I haven't been able to nail
> down the cause for it.
>
> All VMs affected run Debian Jessie with 3.16 kernel. Possibly newer 3.16
> kernels are more likely to suffer corruption.
>
> This happened on a Xen 4.1 cluster (yes, really old) using a SAN storage
> system, but also on a Xen 4.8 cluster with DRBD mirroring. All systems
> are working for >>1 year now, only recently those filesystem corruption
> started to happen. Data blocks seem to get randomly garbled.
>
> Migration of Debian Stretch or Windows VMs didn't show any anomalies so far.
>
> Can anyone shed some light on this?
>
> Regards,
>
> Andreas
>
>
>
> _______________________________________________
> Xen-users mailing list
> [hidden email]
> https://lists.xenproject.org/mailman/listinfo/xen-users



_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption with xl migrate

Valentin Vidic
On Sun, Dec 17, 2017 at 04:53:24PM +0100, Andreas Pflug wrote:
> Still experiencing data corruption when migrating 3.16 VMs from one host
> to another...
> Seems independent of Xen version or storage backend.

Is it possible to reproduce this?  I had a few random ext3 corruptions
that fsck could not fix at all. This is a Xen storage cluster with live
migration enabled and would like to figure out when and why this is
happening.

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption with xl migrate

George Dunlap
In reply to this post by Bugzilla from pgadmin@pse-consulting.de
On Sun, Dec 17, 2017 at 3:53 PM, Andreas Pflug
<[hidden email]> wrote:

> Still experiencing data corruption when migrating 3.16 VMs from one host
> to another...
> Seems independent of Xen version or storage backend.
>
> Regards
> Andreas
>
> Am 13.12.17 um 09:12 schrieb Andreas Pflug:
>> Recently, I encountered file system data corruption on several systems
>> after a migration (ext4, xfs, zfs). So far, I haven't been able to nail
>> down the cause for it.
>>
>> All VMs affected run Debian Jessie with 3.16 kernel. Possibly newer 3.16
>> kernels are more likely to suffer corruption.
>>
>> This happened on a Xen 4.1 cluster (yes, really old) using a SAN storage
>> system, but also on a Xen 4.8 cluster with DRBD mirroring. All systems
>> are working for >>1 year now, only recently those filesystem corruption
>> started to happen. Data blocks seem to get randomly garbled.
>>
>> Migration of Debian Stretch or Windows VMs didn't show any anomalies so far.
>>
>> Can anyone shed some light on this?

Thanks for the report -- cc'ing a few random people who know more
about the block layer / Debian kernels.

It seems Debian Jessie 3.16 kernel was experiencing corruption on
migration, but not the Debian Stretch kernel or Windows VMs.  That
sounds like a bug in the Debian kernel; any ideas about what patch may
be worth backporting / any steps to help further investigate the
source of the problems?

Thanks,
 -George

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption with xl migrate

Olivier LAMBERT
Hi!

Xen Orchestra team here. Our virtual appliance (XOA) is running on Jessie and few customers experienced this problem on XenServer too. I'm trying to investigate more to correlate if the affected customers did a migration before this happened (XOA is running on prem, so we can't monitor it directly).

Keep us posted for any bug report created on Debian side.

Thanks!

Olivier.

On Tue, Dec 19, 2017 at 11:33 AM, George Dunlap <[hidden email]> wrote:
On Sun, Dec 17, 2017 at 3:53 PM, Andreas Pflug
<[hidden email]> wrote:
> Still experiencing data corruption when migrating 3.16 VMs from one host
> to another...
> Seems independent of Xen version or storage backend.
>
> Regards
> Andreas
>
> Am 13.12.17 um 09:12 schrieb Andreas Pflug:
>> Recently, I encountered file system data corruption on several systems
>> after a migration (ext4, xfs, zfs). So far, I haven't been able to nail
>> down the cause for it.
>>
>> All VMs affected run Debian Jessie with 3.16 kernel. Possibly newer 3.16
>> kernels are more likely to suffer corruption.
>>
>> This happened on a Xen 4.1 cluster (yes, really old) using a SAN storage
>> system, but also on a Xen 4.8 cluster with DRBD mirroring. All systems
>> are working for >>1 year now, only recently those filesystem corruption
>> started to happen. Data blocks seem to get randomly garbled.
>>
>> Migration of Debian Stretch or Windows VMs didn't show any anomalies so far.
>>
>> Can anyone shed some light on this?

Thanks for the report -- cc'ing a few random people who know more
about the block layer / Debian kernels.

It seems Debian Jessie 3.16 kernel was experiencing corruption on
migration, but not the Debian Stretch kernel or Windows VMs.  That
sounds like a bug in the Debian kernel; any ideas about what patch may
be worth backporting / any steps to help further investigate the
source of the problems?

Thanks,
 -George

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption with xl migrate

Bugzilla from pgadmin@pse-consulting.de
In reply to this post by George Dunlap
Am 19.12.17 um 11:33 schrieb George Dunlap:

> On Sun, Dec 17, 2017 at 3:53 PM, Andreas Pflug
> <[hidden email]> wrote:
>> Still experiencing data corruption when migrating 3.16 VMs from one host
>> to another...
>> Seems independent of Xen version or storage backend.
>>
>> Regards
>> Andreas
>>
>> Am 13.12.17 um 09:12 schrieb Andreas Pflug:
>>> Recently, I encountered file system data corruption on several systems
>>> after a migration (ext4, xfs, zfs). So far, I haven't been able to nail
>>> down the cause for it.
>>>
>>> All VMs affected run Debian Jessie with 3.16 kernel. Possibly newer 3.16
>>> kernels are more likely to suffer corruption.
>>>
>>> This happened on a Xen 4.1 cluster (yes, really old) using a SAN storage
>>> system, but also on a Xen 4.8 cluster with DRBD mirroring. All systems
>>> are working for >>1 year now, only recently those filesystem corruption
>>> started to happen. Data blocks seem to get randomly garbled.
>>>
>>> Migration of Debian Stretch or Windows VMs didn't show any anomalies
>>> Can anyone shed some light on this?
> Thanks for the report -- cc'ing a few random people who know more
> about the block layer / Debian kernels.
>
> It seems Debian Jessie 3.16 kernel was experiencing corruption on
> migration, but not the Debian Stretch kernel or Windows VMs.  That
> sounds like a bug in the Debian kernel; any ideas about what patch may
> be worth backporting / any steps to help further investigate the
> source of the problems?
The latest kernel that didn't show these problems so far seems to be
3.16.36.
Actually, yesterday I encountered a fs inconsistency (remount-ro) by
shutting down a 3.16.39 VM and starting back on the other machine (drbd
synced, no anomalies seen on the drbd layer). So apparently it's not
only migration related.

All kinds of FS are affected: ext4, xfs, zfs.

I filed the bug at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884622

Regards
Andreas

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users