RE: copying large files over NFS locks up machine on-testing from Thursday

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: copying large files over NFS locks up machine on-testing from Thursday

Ian Pratt
 
> I just tried copying the 3GB file over NFS from within a domU
> on -testing in hopes of getting some debug info. dom0 became
> unresponsive for a few seconds after close to a minute. It
> had successfully copied 2.3GB when I hit ^C and then started
> a copy from NFS to the domU's / which itself is a loopback
> device mounted over NFS in dom0 - shortly thereafter the
> machine locked up, the only output being complaints from
> megaraid about aborted SCSI commands. It seems possible that
> this is a dom0 issue.

It sounds like the megaraid driver is unhappy. Can you reproduce this
copying the file to /dev/null?

It's worth checking the Dell site to make sure you have the latest
megaraid firmware and driver.

Ian

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
 > It sounds like the megaraid driver is unhappy. Can you reproduce this
> copying the file to /dev/null?

I think it was unhappy because its interrupts weren't being serviced.
Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
locks the machine up just fine. The machine will also become
unresponsive transiently when running fsck in domU on a filesystem
that is a loopback device mounted over NFS.

> It's worth checking the Dell site to make sure you have the latest
> megaraid firmware and driver.

Running native mainline 2.6.11.10 NFS transfers don't cause any
problems. To reduce the possibility of it being a benchmark-like SUE
I'll do a clean build from scratch of the dom0 kernel.


        -Kip

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
I just updated to 2.0.6. Large NFS transfers work fine with the
default configuration. If xend has been started (i.e. the bridge has
been configured) the machine will become unresponsive when copying a
large file over NFS.

I'll just skip migration testing for now.

        -Kip



On 5/21/05, Kip Macy <[hidden email]> wrote:

>  > It sounds like the megaraid driver is unhappy. Can you reproduce this
> > copying the file to /dev/null?
>
> I think it was unhappy because its interrupts weren't being serviced.
> Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
> locks the machine up just fine. The machine will also become
> unresponsive transiently when running fsck in domU on a filesystem
> that is a loopback device mounted over NFS.
>
> > It's worth checking the Dell site to make sure you have the latest
> > megaraid firmware and driver.
>
> Running native mainline 2.6.11.10 NFS transfers don't cause any
> problems. To reduce the possibility of it being a benchmark-like SUE
> I'll do a clean build from scratch of the dom0 kernel.
>
>
>         -Kip
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: copying large files over NFS locks up machine on-testing from Thursday

Ian Pratt
In reply to this post by Ian Pratt
 
> I just updated to 2.0.6. Large NFS transfers work fine with
> the default configuration. If xend has been started (i.e. the
> bridge has been configured) the machine will become
> unresponsive when copying a large file over NFS.

It would be good to see if you can reproduce this on native with a
bridge running.

Thanks,
Ian

> On 5/21/05, Kip Macy <[hidden email]> wrote:
> >  > It sounds like the megaraid driver is unhappy. Can you reproduce
> > this
> > > copying the file to /dev/null?
> >
> > I think it was unhappy because its interrupts weren't being
> serviced.
> > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1
> (NFS -> NFS)
> > locks the machine up just fine. The machine will also become
> > unresponsive transiently when running fsck in domU on a filesystem
> > that is a loopback device mounted over NFS.
> >
> > > It's worth checking the Dell site to make sure you have
> the latest
> > > megaraid firmware and driver.
> >
> > Running native mainline 2.6.11.10 NFS transfers don't cause any
> > problems. To reduce the possibility of it being a
> benchmark-like SUE
> > I'll do a clean build from scratch of the dom0 kernel.
> >
> >
> >         -Kip
> >
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
In reply to this post by Kip Macy
No bridge helps with NFS, but I just tried scp on a freshly re-booted
machine and it locked up instantly. Booting into Centos4 SMP and the
scp works fine - if this a SUE it is a particularly inventive one.

I guess it is time to move to the server room and cross my fingers
that this is a 100Mbit issue.

         -Kip


On 5/22/05, Kip Macy <[hidden email]> wrote:

> I just updated to 2.0.6. Large NFS transfers work fine with the
> default configuration. If xend has been started (i.e. the bridge has
> been configured) the machine will become unresponsive when copying a
> large file over NFS.
>
> I'll just skip migration testing for now.
>
>         -Kip
>
>
>
> On 5/21/05, Kip Macy <[hidden email]> wrote:
> >  > It sounds like the megaraid driver is unhappy. Can you reproduce this
> > > copying the file to /dev/null?
> >
> > I think it was unhappy because its interrupts weren't being serviced.
> > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1 (NFS -> NFS)
> > locks the machine up just fine. The machine will also become
> > unresponsive transiently when running fsck in domU on a filesystem
> > that is a loopback device mounted over NFS.
> >
> > > It's worth checking the Dell site to make sure you have the latest
> > > megaraid firmware and driver.
> >
> > Running native mainline 2.6.11.10 NFS transfers don't cause any
> > problems. To reduce the possibility of it being a benchmark-like SUE
> > I'll do a clean build from scratch of the dom0 kernel.
> >
> >
> >         -Kip
> >
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
In reply to this post by Ian Pratt
It turns out it isn't bridging - the NFS lockup only happens if xend
is running. If I start then stop xend and do a NFS transfer I don't
hit any problems. Any idea what xend could be doing that would be
making the system so unhappy?



  -Kip


> It would be good to see if you can reproduce this on native with a
> bridge running.
>
> Thanks,
> Ian
>
> > On 5/21/05, Kip Macy <[hidden email]> wrote:
> > >  > It sounds like the megaraid driver is unhappy. Can you reproduce
> > > this
> > > > copying the file to /dev/null?
> > >
> > > I think it was unhappy because its interrupts weren't being
> > serviced.
> > > Copying /home/kmacy/suseroot.0 to /home/kmacy/suseroot.1
> > (NFS -> NFS)
> > > locks the machine up just fine. The machine will also become
> > > unresponsive transiently when running fsck in domU on a filesystem
> > > that is a loopback device mounted over NFS.
> > >
> > > > It's worth checking the Dell site to make sure you have
> > the latest
> > > > megaraid firmware and driver.
> > >
> > > Running native mainline 2.6.11.10 NFS transfers don't cause any
> > > problems. To reduce the possibility of it being a
> > benchmark-like SUE
> > > I'll do a clean build from scratch of the dom0 kernel.
> > >
> > >
> > >         -Kip
> > >
> >
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: copying large files over NFS locks up machine on-testing from Thursday

Ian Pratt
In reply to this post by Ian Pratt

> It turns out it isn't bridging - the NFS lockup only happens
> if xend is running. If I start then stop xend and do a NFS
> transfer I don't hit any problems. Any idea what xend could
> be doing that would be making the system so unhappy?

It's really unlikely to be xend that's causing this. Are you sure its
not just xend running the network script to start the bridge?

2.0 vintage xend is the cause of many troubles, but I think its likely
to be innocent in this case :-)  

Ian

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
> It's really unlikely to be xend that's causing this. Are you sure its
> not just xend running the network script to start the bridge?

That was my initial assumption, but running /etc/xen/scripts/network
start to start the bridge before doing transfers didn't cause any
problem. Additionally, the bridge is still up after xend is shutdown.

I've just moved it into the server room where the switch is GigE so
we'll find out shortly if it is some weird interaction with the
specific rev of the network card.
 
> 2.0 vintage xend is the cause of many troubles, but I think its likely
> to be innocent in this case :-)  

That would certainly be my thinking - but at this point it is the only
common item and I'm grasping at straws.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: copying large files over NFS locks up machine on-testing from Thursday

Kip Macy
On 5/22/05, Kip Macy <[hidden email]> wrote:

> > It's really unlikely to be xend that's causing this. Are you sure its
> > not just xend running the network script to start the bridge?
>
> That was my initial assumption, but running /etc/xen/scripts/network
> start to start the bridge before doing transfers didn't cause any
> problem. Additionally, the bridge is still up after xend is shutdown.
>
> I've just moved it into the server room where the switch is GigE so
> we'll find out shortly if it is some weird interaction with the
> specific rev of the network card.

Never mind. The two switches in the server room are 100 Mbit.
 
       -Kip

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re: copying large files over NFS locks up machine on-testing from Thursday

Keir Fraser
In reply to this post by Kip Macy

On 22 May 2005, at 20:54, Kip Macy wrote:

>> It's really unlikely to be xend that's causing this. Are you sure its
>> not just xend running the network script to start the bridge?
>
> That was my initial assumption, but running /etc/xen/scripts/network
> start to start the bridge before doing transfers didn't cause any
> problem. Additionally, the bridge is still up after xend is shutdown.

Didn't you say that scp immediately after boot (so presumably no xend
and no bridge) also caused lockup? Sounds like some sort of race that
may be affected by xend having started, but I doubt xend is to blame.

  -- Keir


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Loading...