2 recently updated Centos6 Xen 4.6.3-5 server crashes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

2 recently updated Centos6 Xen 4.6.3-5 server crashes

Brandon Shoemaker
Hello list,

I recently upgraded our Centos6 Xen servers to 4.6.3-5 about three weeks ago and strangely this week we have had 2 servers crash with similar issues.  Nothing is able to be logged in /var/log/messages because the problem appears to be disk related as the problem develops.  I took screenshots of console for each server.

https://postimg.org/image/3xmuun3gv/
https://postimg.org/image/mw6ukvd93/

It appears there are I/O errors, then the journal aborts on the root logical volume, and then there is EXT4-fs errors, and then swap errors reported.

Our monitoring manages to capture this information remotely before the crash.  It was the same for both servers.

CRIT - filesystem has switched to read-only and is probably corrupted CRIT, missing: rw, exceeding: ro

Centos 6.8 3.18.44-20.el6.x86_64
xen version 4.6.3-5.el6
48gb RAM
Samsung 850 pro 1 TB SSD (yes I know the consumer class of this drive)
X10SRi-F motherboard 1.0b and 2.0a bios (two different servers running two different BIOS versions)

One server was about half full and the other server near full as for VPS disk allocation but had plenty of CPU and RAM resources available.  Neither server has heavy I/O characteristics.

The servers have been in use without issue for months prior to this.

I find it unlikely that both servers would have a SSD disk fail in the same exact manner within a few days of each other so I'm suspicious that this is a software bug.

I'm wondering if this could have been caused by the recent Xen and or operating system updates since it is coincidental that I am now having this issue after recently updating when before the servers were stable.

Has anyone else had any similar crashes recently like this?


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: 2 recently updated Centos6 Xen 4.6.3-5 server crashes

George Dunlap
On Wed, Feb 8, 2017 at 4:12 PM, Brandon Shoemaker
<[hidden email]> wrote:

> Hello list,
>
> I recently upgraded our Centos6 Xen servers to 4.6.3-5 about three weeks ago and strangely this week we have had 2 servers crash with similar issues.  Nothing is able to be logged in /var/log/messages because the problem appears to be disk related as the problem develops.  I took screenshots of console for each server.
>
> https://postimg.org/image/3xmuun3gv/
> https://postimg.org/image/mw6ukvd93/
>
> It appears there are I/O errors, then the journal aborts on the root logical volume, and then there is EXT4-fs errors, and then swap errors reported.
>
> Our monitoring manages to capture this information remotely before the crash.  It was the same for both servers.
>
> CRIT - filesystem has switched to read-only and is probably corrupted CRIT, missing: rw, exceeding: ro
>
> Centos 6.8 3.18.44-20.el6.x86_64
> xen version 4.6.3-5.el6
> 48gb RAM
> Samsung 850 pro 1 TB SSD (yes I know the consumer class of this drive)
> X10SRi-F motherboard 1.0b and 2.0a bios (two different servers running two different BIOS versions)
>
> One server was about half full and the other server near full as for VPS disk allocation but had plenty of CPU and RAM resources available.  Neither server has heavy I/O characteristics.
>
> The servers have been in use without issue for months prior to this.
>
> I find it unlikely that both servers would have a SSD disk fail in the same exact manner within a few days of each other so I'm suspicious that this is a software bug.
>
> I'm wondering if this could have been caused by the recent Xen and or operating system updates since it is coincidental that I am now having this issue after recently updating when before the servers were stable.
>
> Has anyone else had any similar crashes recently like this?

When did you update your kernel?

We've had a lot of complaints about the 3.18.44 kernel.  That came out
in November, but if you hadn't updated since then that may be the
issue.  We're in the process of updating it to a 4.x kernel, but
Johnny Hughes has hit some snags.


 -George

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: 2 recently updated Centos6 Xen 4.6.3-5 server crashes

Brandon Shoemaker
Hey George,

We upgraded to that kernel-3.18.44-20.el6.x86_64 for all servers on Friday December 2, 2016.  

We updated to Xen 4.6.3-5.el6.x86_64 for all servers on Friday, January 13, 2017.

Between December 2 and February 6 we did not have any issues.  Two servers crashed last week but none yet since.  We will see how this week goes.

Thanks for the inquiry.  If you have any suggestions they are welcome.

-------------
When did you update your kernel?

We've had a lot of complaints about the 3.18.44 kernel.  That came out in November, but if you hadn't updated since then that may be the issue.  We're in the process of updating it to a 4.x kernel, but Johnny Hughes has hit some snags.


 -George


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: 2 recently updated Centos6 Xen 4.6.3-5 server crashes

George Dunlap
On Mon, Feb 13, 2017 at 4:04 PM, Brandon Shoemaker
<[hidden email]> wrote:
> Hey George,
>
> We upgraded to that kernel-3.18.44-20.el6.x86_64 for all servers on Friday December 2, 2016.
>
> We updated to Xen 4.6.3-5.el6.x86_64 for all servers on Friday, January 13, 2017.

What version were you running before that?  If the delta is small
enough, it may give us a hint as to which patches may be involved in
the problem.

 -George

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: 2 recently updated Centos6 Xen 4.6.3-5 server crashes

Brandon Shoemaker
Hey George,

I usually do all servers updates at the same time so all servers are kept the same.  

On December 2 all servers were updated from kernel-3.18.41-20.el6.x86_64 to kernel-3.18.44-20.el6.x86_64.

On January 13 all servers were updated xen-4.6.3-4.el6.x86_64 to 4.6.3-5.el6.x86_64.

Let me know any other questions.  Thanks for your input.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of George Dunlap
Sent: Tuesday, February 14, 2017 4:00 AM
To: Brandon Shoemaker <[hidden email]>
Cc: [hidden email]
Subject: Re: [Xen-users] 2 recently updated Centos6 Xen 4.6.3-5 server crashes

On Mon, Feb 13, 2017 at 4:04 PM, Brandon Shoemaker <[hidden email]> wrote:
> Hey George,
>
> We upgraded to that kernel-3.18.44-20.el6.x86_64 for all servers on Friday December 2, 2016.
>
> We updated to Xen 4.6.3-5.el6.x86_64 for all servers on Friday, January 13, 2017.

What version were you running before that?  If the delta is small enough, it may give us a hint as to which patches may be involved in the problem.

 -George


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users