Kernel aio bug in Debian 2.6.32-5-xen kernel?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel aio bug in Debian 2.6.32-5-xen kernel?

George Dunlap-4
Recently I pulled in new changesets from xen- and qemu-unstable, and
when creating a PV guest I'm getting errors like the stack trace
below.  Is this likely to be caused by QEMU using AIO?  It this a bug
in Xen or in the Debian kernel?  Is there an easy way to turn off aio
using a config file so I can see if it is qemu's aio?

The config file is attached, for reference.

 -George

[  408.127439] BUG: unable to handle kernel paging request at af00003e
[  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
[  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
[  408.144532] Oops: 0000 [#1] SMP
[  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
[  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
x_tables xen_evtchn xenfs bridge stp loop snd_p]
[  408.194797]
[  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
(2.6.32-5-xen-686 #1) PowerEdge R710
[  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
[  408.210485] EIP is at set_page_dirty+0x1e/0x4a
[  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
[  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
[  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
task=f3de50c0 task.ti=f33be000)
[  408.241508] Stack:
[  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
f4f0c500 c10dc048
[  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
c1006767 00000000 00000000
[  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
0000000c 00001000 00000000
[  408.267509] Call Trace:
[  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
[  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
[  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
[  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
[  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
[  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
[  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
[  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
[  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
[  408.320376]  [<c1006770>] ? check_events+0x8/0xc
[  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
[  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
[  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
[  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
[  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
[  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
[  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
[  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8
[  408.382512] CR2: 00000000af00003e
[  408.385894] ---[ end trace 9ce48eb2f06897bf ]---

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel

pv.cfg (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
> Recently I pulled in new changesets from xen- and qemu-unstable, and
> when creating a PV guest I'm getting errors like the stack trace
> below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> using a config file so I can see if it is qemu's aio?
>
> The config file is attached, for reference.

Which revision of the Debian kernel is this?

It looks like Squeeze, which was a fairly old snapshot of Jeremy's
Xen.git -- it's certainly not impossible that there were latent AIO bugs
in there and Stefano has been fixing these sort of things in recent
kernels too. So it's very possible we need to backport some fix.

Ian.

>
>  -George
>
> [  408.127439] BUG: unable to handle kernel paging request at af00003e
> [  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
> [  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
> [  408.144532] Oops: 0000 [#1] SMP
> [  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
> [  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
> x_tables xen_evtchn xenfs bridge stp loop snd_p]
> [  408.194797]
> [  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
> (2.6.32-5-xen-686 #1) PowerEdge R710
> [  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
> [  408.210485] EIP is at set_page_dirty+0x1e/0x4a
> [  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
> [  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
> [  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> [  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
> task=f3de50c0 task.ti=f33be000)
> [  408.241508] Stack:
> [  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
> f4f0c500 c10dc048
> [  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
> c1006767 00000000 00000000
> [  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
> 0000000c 00001000 00000000
> [  408.267509] Call Trace:
> [  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
> [  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
> [  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
> [  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
> [  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
> [  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
> [  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
> [  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
> [  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> [  408.320376]  [<c1006770>] ? check_events+0x8/0xc
> [  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> [  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
> [  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
> [  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
> [  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
> [  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
> [  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
> 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
> [  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8
> [  408.382512] CR2: 00000000af00003e
> [  408.385894] ---[ end trace 9ce48eb2f06897bf ]---



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

George Dunlap-4
On Thu, Apr 26, 2012 at 11:59 AM, Ian Campbell <[hidden email]> wrote:

> On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
>> Recently I pulled in new changesets from xen- and qemu-unstable, and
>> when creating a PV guest I'm getting errors like the stack trace
>> below.  Is this likely to be caused by QEMU using AIO?  It this a bug
>> in Xen or in the Debian kernel?  Is there an easy way to turn off aio
>> using a config file so I can see if it is qemu's aio?
>>
>> The config file is attached, for reference.
>
> Which revision of the Debian kernel is this?
>
> It looks like Squeeze, which was a fairly old snapshot of Jeremy's
> Xen.git -- it's certainly not impossible that there were latent AIO bugs
> in there and Stefano has been fixing these sort of things in recent
> kernels too. So it's very possible we need to backport some fix.

The package info is below.  It is from squeeze, since (AFAIK) that's
the latest "stable" release (and thus what people are likely to be
using)

 -George

# dpkg -s linux-image-2.6.32-5-xen-686
Package: linux-image-2.6.32-5-xen-686
Status: install ok installed
Priority: optional
Section: kernel
Installed-Size: 78524
Maintainer: Debian Kernel Team <[hidden email]>
Architecture: i386
Source: linux-2.6
Version: 2.6.32-41
Provides: linux-image, linux-image-2.6, linux-modules-2.6.32-5-xen-686
Depends: module-init-tools, linux-base (>= 2.6.32-41), initramfs-tools (>= 0.55)
Pre-Depends: debconf | debconf-2.0
Recommends: firmware-linux-free (>= 2.6.32), libc6-xen
Suggests: linux-doc-2.6.32, grub
Breaks: initramfs-tools (<< 0.55), lilo (<< 22.8-8.2~)
Description: Linux 2.6.32 for modern PCs, Xen dom0 support
 The Linux kernel 2.6.32 and modules for use on PCs with Intel Pentium
 Pro/II/III/4/4M/D/M, Xeon, Celeron, Core or Atom; AMD Geode NX, Athlon
 (K7), Duron, Opteron, Sempron, Turion or Phenom; Transmeta Efficeon; or
 VIA C7 processors.
 .
 This kernel also runs on a Xen hypervisor.  It supports both privileged
 (dom0) and unprivileged (domU) operation.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 12:08 +0100, George Dunlap wrote:

> On Thu, Apr 26, 2012 at 11:59 AM, Ian Campbell <[hidden email]> wrote:
> > On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
> >> Recently I pulled in new changesets from xen- and qemu-unstable, and
> >> when creating a PV guest I'm getting errors like the stack trace
> >> below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> >> in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> >> using a config file so I can see if it is qemu's aio?
> >>
> >> The config file is attached, for reference.
> >
> > Which revision of the Debian kernel is this?
> >
> > It looks like Squeeze, which was a fairly old snapshot of Jeremy's
> > Xen.git -- it's certainly not impossible that there were latent AIO bugs
> > in there and Stefano has been fixing these sort of things in recent
> > kernels too. So it's very possible we need to backport some fix.
>
> The package info is below.  It is from squeeze, since (AFAIK) that's
> the latest "stable" release (and thus what people are likely to be
> using)

Right, it's also the latest available kernel package for Squeeze, which
is what I wanted to check. Not that I've been aware of any AIO fixes
recently anyway.

>
>  -George
>
> # dpkg -s linux-image-2.6.32-5-xen-686
> Package: linux-image-2.6.32-5-xen-686
> Status: install ok installed
> Priority: optional
> Section: kernel
> Installed-Size: 78524
> Maintainer: Debian Kernel Team <[hidden email]>
> Architecture: i386
> Source: linux-2.6
> Version: 2.6.32-41
> Provides: linux-image, linux-image-2.6, linux-modules-2.6.32-5-xen-686
> Depends: module-init-tools, linux-base (>= 2.6.32-41), initramfs-tools (>= 0.55)
> Pre-Depends: debconf | debconf-2.0
> Recommends: firmware-linux-free (>= 2.6.32), libc6-xen
> Suggests: linux-doc-2.6.32, grub
> Breaks: initramfs-tools (<< 0.55), lilo (<< 22.8-8.2~)
> Description: Linux 2.6.32 for modern PCs, Xen dom0 support
>  The Linux kernel 2.6.32 and modules for use on PCs with Intel Pentium
>  Pro/II/III/4/4M/D/M, Xeon, Celeron, Core or Atom; AMD Geode NX, Athlon
>  (K7), Duron, Opteron, Sempron, Turion or Phenom; Transmeta Efficeon; or
>  VIA C7 processors.
>  .
>  This kernel also runs on a Xen hypervisor.  It supports both privileged
>  (dom0) and unprivileged (domU) operation.



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Stefano Stabellini-3
In reply to this post by Ian Campbell-10
On Thu, 26 Apr 2012, Ian Campbell wrote:

> On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
> > Recently I pulled in new changesets from xen- and qemu-unstable, and
> > when creating a PV guest I'm getting errors like the stack trace
> > below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> > in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> > using a config file so I can see if it is qemu's aio?
> >
> > The config file is attached, for reference.
>
> Which revision of the Debian kernel is this?
>
> It looks like Squeeze, which was a fairly old snapshot of Jeremy's
> Xen.git -- it's certainly not impossible that there were latent AIO bugs
> in there and Stefano has been fixing these sort of things in recent
> kernels too. So it's very possible we need to backport some fix.

Right.


> > [  408.127439] BUG: unable to handle kernel paging request at af00003e
> > [  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
> > [  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
> > [  408.144532] Oops: 0000 [#1] SMP
> > [  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
> > [  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
> > x_tables xen_evtchn xenfs bridge stp loop snd_p]
> > [  408.194797]
> > [  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
> > (2.6.32-5-xen-686 #1) PowerEdge R710
> > [  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
> > [  408.210485] EIP is at set_page_dirty+0x1e/0x4a
> > [  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
> > [  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
> > [  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> > [  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
> > task=f3de50c0 task.ti=f33be000)
> > [  408.241508] Stack:
> > [  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
> > f4f0c500 c10dc048
> > [  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
> > c1006767 00000000 00000000
> > [  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
> > 0000000c 00001000 00000000
> > [  408.267509] Call Trace:
> > [  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
> > [  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
> > [  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
> > [  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
> > [  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
> > [  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
> > [  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
> > [  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
> > [  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > [  408.320376]  [<c1006770>] ? check_events+0x8/0xc
> > [  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > [  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
> > [  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
> > [  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
> > [  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
> > [  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
> > [  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
> > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
> > [  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8
> > [  408.382512] CR2: 00000000af00003e
> > [  408.385894] ---[ end trace 9ce48eb2f06897bf ]---
 
This looks like a classic direct_IO/AIO not working bug: it could be
because the m2p_override is not working correctly or it might not even
be present at all in this kernel (it went upstream in 2.6.38).
It only started showing now because qemu-xen-traditional switched to
O_DIRECT.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 12:23 +0100, Stefano Stabellini wrote:

> On Thu, 26 Apr 2012, Ian Campbell wrote:
> > On Thu, 2012-04-26 at 11:52 +0100, George Dunlap wrote:
> > > Recently I pulled in new changesets from xen- and qemu-unstable, and
> > > when creating a PV guest I'm getting errors like the stack trace
> > > below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> > > in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> > > using a config file so I can see if it is qemu's aio?
> > >
> > > The config file is attached, for reference.
> >
> > Which revision of the Debian kernel is this?
> >
> > It looks like Squeeze, which was a fairly old snapshot of Jeremy's
> > Xen.git -- it's certainly not impossible that there were latent AIO bugs
> > in there and Stefano has been fixing these sort of things in recent
> > kernels too. So it's very possible we need to backport some fix.
>
> Right.
>
>
> > > [  408.127439] BUG: unable to handle kernel paging request at af00003e
> > > [  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
> > > [  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
> > > [  408.144532] Oops: 0000 [#1] SMP
> > > [  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
> > > [  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
> > > x_tables xen_evtchn xenfs bridge stp loop snd_p]
> > > [  408.194797]
> > > [  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
> > > (2.6.32-5-xen-686 #1) PowerEdge R710
> > > [  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
> > > [  408.210485] EIP is at set_page_dirty+0x1e/0x4a
> > > [  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
> > > [  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
> > > [  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> > > [  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
> > > task=f3de50c0 task.ti=f33be000)
> > > [  408.241508] Stack:
> > > [  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
> > > f4f0c500 c10dc048
> > > [  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
> > > c1006767 00000000 00000000
> > > [  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
> > > 0000000c 00001000 00000000
> > > [  408.267509] Call Trace:
> > > [  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
> > > [  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
> > > [  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
> > > [  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
> > > [  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
> > > [  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
> > > [  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
> > > [  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
> > > [  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > [  408.320376]  [<c1006770>] ? check_events+0x8/0xc
> > > [  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > [  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
> > > [  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
> > > [  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
> > > [  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
> > > [  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
> > > [  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
> > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
> > > [  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8
> > > [  408.382512] CR2: 00000000af00003e
> > > [  408.385894] ---[ end trace 9ce48eb2f06897bf ]---
>  
> This looks like a classic direct_IO/AIO not working bug: it could be
> because the m2p_override is not working correctly or it might not even
> be present at all in this kernel (it went upstream in 2.6.38).
> It only started showing now because qemu-xen-traditional switched to
> O_DIRECT.

This kernel had VM_FOREIGN and PageForeign etc rather than the
m2p_override. Could be that we need to extend VM_FOREIGN to cover rant
mapped pages?

That's actually a fair chunk of dev work, not just a simple backport.

However this kernel does have blktap so why is qemu based AIO being used
at all?

Ian.


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Stefano Stabellini-3
On Thu, 26 Apr 2012, Ian Campbell wrote:

> > > > [  408.127439] BUG: unable to handle kernel paging request at af00003e
> > > > [  408.133612] IP: [<c10941f8>] set_page_dirty+0x1e/0x4a
> > > > [  408.138726] *pdpt = 0000000033232027 *pde = 0000000000000000
> > > > [  408.144532] Oops: 0000 [#1] SMP
> > > > [  408.147825] last sysfs file: /sys/devices/vif-1-0/uevent
> > > > [  408.153200] Modules linked in: xt_physdev iptable_filter ip_tables
> > > > x_tables xen_evtchn xenfs bridge stp loop snd_p]
> > > > [  408.194797]
> > > > [  408.196359] Pid: 1942, comm: qemu-system-i38 Not tainted
> > > > (2.6.32-5-xen-686 #1) PowerEdge R710
> > > > [  408.204938] EIP: 0061:[<c10941f8>] EFLAGS: 00010286 CPU: 0
> > > > [  408.210485] EIP is at set_page_dirty+0x1e/0x4a
> > > > [  408.214991] EAX: af000006 EBX: 00000000 ECX: c4ad7680 EDX: 41000001
> > > > [  408.221317] ESI: c4ad7680 EDI: f4f0c54c EBP: f3353200 ESP: f33bfdb8
> > > > [  408.227644]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> > > > [  408.233104] Process qemu-system-i38 (pid: 1942, ti=f33be000
> > > > task=f3de50c0 task.ti=f33be000)
> > > > [  408.241508] Stack:
> > > > [  408.243588]  c10944f0 00000000 f4f0c500 c10d991a f33533c8 f3353200
> > > > f4f0c500 c10dc048
> > > > [  408.251128] <0> 00000001 00001000 00000000 c10dcc44 00000001
> > > > c1006767 00000000 00000000
> > > > [  408.259189] <0> d4646070 00000000 f2f0869c f3ff4900 00000000
> > > > 0000000c 00001000 00000000
> > > > [  408.267509] Call Trace:
> > > > [  408.270025]  [<c10944f0>] ? set_page_dirty_lock+0x22/0x30
> > > > [  408.275486]  [<c10d991a>] ? bio_set_pages_dirty+0x22/0x2f
> > > > [  408.280944]  [<c10dc048>] ? dio_bio_submit+0x3c/0x57
> > > > [  408.285970]  [<c10dcc44>] ? __blockdev_direct_IO+0x903/0xaed
> > > > [  408.291691]  [<c1006767>] ? xen_restore_fl_direct_end+0x0/0x1
> > > > [  408.297500]  [<f62a2494>] ? ext3_direct_IO+0xed/0x18d [ext3]
> > > > [  408.303219]  [<f62a2e2b>] ? ext3_get_block+0x0/0xd1 [ext3]
> > > > [  408.308764]  [<c1090687>] ? generic_file_aio_read+0xf9/0x57b
> > > > [  408.314483]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > > [  408.320376]  [<c1006770>] ? check_events+0x8/0xc
> > > > [  408.325056]  [<c1006040>] ? xen_force_evtchn_callback+0xc/0x10
> > > > [  408.330949]  [<c109058e>] ? generic_file_aio_read+0x0/0x57b
> > > > [  408.336584]  [<c10e3725>] ? aio_rw_vect_retry+0x61/0x122
> > > > [  408.341955]  [<c10e45fa>] ? aio_run_iocb+0x61/0xef
> > > > [  408.346809]  [<c10e4ec9>] ? sys_io_submit+0x409/0x49c
> > > > [  408.351923]  [<c1008f9c>] ? syscall_call+0x7/0xb
> > > > [  408.356600] Code: c3 f6 00 10 75 04 f0 80 08 10 31 c0 c3 89 c1 8b
> > > > 40 10 8b 11 f7 c2 00 00 01 00 74 07 b8 ec 71 3d
> > > > [  408.375492] EIP: [<c10941f8>] set_page_dirty+0x1e/0x4a SS:ESP 0069:f33bfdb8
> > > > [  408.382512] CR2: 00000000af00003e
> > > > [  408.385894] ---[ end trace 9ce48eb2f06897bf ]---
> >  
> > This looks like a classic direct_IO/AIO not working bug: it could be
> > because the m2p_override is not working correctly or it might not even
> > be present at all in this kernel (it went upstream in 2.6.38).
> > It only started showing now because qemu-xen-traditional switched to
> > O_DIRECT.
>
> This kernel had VM_FOREIGN and PageForeign etc rather than the
> m2p_override. Could be that we need to extend VM_FOREIGN to cover rant
> mapped pages?
>
> That's actually a fair chunk of dev work, not just a simple backport.
>
> However this kernel does have blktap so why is qemu based AIO being used
> at all?
 
If blktap is present and working then libxl only uses QEMU for
qcow/qcow2 disk images.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

George Dunlap-4
On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini
<[hidden email]> wrote:
>> However this kernel does have blktap so why is qemu based AIO being used
>> at all?
>
> If blktap is present and working then libxl only uses QEMU for
> qcow/qcow2 disk images.

Hmm -- except that the process that's dying is clearly QEMU, and the
disk images are definitely not qcow*, and Ian seems to think this
kernel has blktap (how could I tell?), so something's not right.

Is there a command-line way to disable aio?

 -George

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote:

> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini
> <[hidden email]> wrote:
> >> However this kernel does have blktap so why is qemu based AIO being used
> >> at all?
> >
> > If blktap is present and working then libxl only uses QEMU for
> > qcow/qcow2 disk images.
>
> Hmm -- except that the process that's dying is clearly QEMU, and the
> disk images are definitely not qcow*, and Ian seems to think this
> kernel has blktap (how could I tell?), so something's not right.

It looks like it is a module -- lsmod should confirm, maybe it's a
simple as loading it?

(if so let me know and I'll be sure to include that when I write up
"installing a Debian Dom0")

>
> Is there a command-line way to disable aio?
>
>  -George



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

George Dunlap-4
On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <[hidden email]> wrote:

> On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote:
>> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini
>> <[hidden email]> wrote:
>> >> However this kernel does have blktap so why is qemu based AIO being used
>> >> at all?
>> >
>> > If blktap is present and working then libxl only uses QEMU for
>> > qcow/qcow2 disk images.
>>
>> Hmm -- except that the process that's dying is clearly QEMU, and the
>> disk images are definitely not qcow*, and Ian seems to think this
>> kernel has blktap (how could I tell?), so something's not right.
>
> It looks like it is a module -- lsmod should confirm, maybe it's a
> simple as loading it?
>
> (if so let me know and I'll be sure to include that when I write up
> "installing a Debian Dom0")

Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work.

Should this be done in one of the initscripts?  Or perhaps by xl?

It would still be good to get the AIO stuff fixed in some way, as I'm
sure I'm not the only one who's going to run into this problem.

 -George

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 14:43 +0100, George Dunlap wrote:

> On Thu, Apr 26, 2012 at 2:24 PM, Ian Campbell <[hidden email]> wrote:
> > On Thu, 2012-04-26 at 14:14 +0100, George Dunlap wrote:
> >> On Thu, Apr 26, 2012 at 1:07 PM, Stefano Stabellini
> >> <[hidden email]> wrote:
> >> >> However this kernel does have blktap so why is qemu based AIO being used
> >> >> at all?
> >> >
> >> > If blktap is present and working then libxl only uses QEMU for
> >> > qcow/qcow2 disk images.
> >>
> >> Hmm -- except that the process that's dying is clearly QEMU, and the
> >> disk images are definitely not qcow*, and Ian seems to think this
> >> kernel has blktap (how could I tell?), so something's not right.
> >
> > It looks like it is a module -- lsmod should confirm, maybe it's a
> > simple as loading it?
> >
> > (if so let me know and I'll be sure to include that when I write up
> > "installing a Debian Dom0")
>
> Indeed, blktap was *not* loaded, and "modprobe blktap" seems make things work.
>
> Should this be done in one of the initscripts?  Or perhaps by xl?

xencommons should do it, IMHO.

> It would still be good to get the AIO stuff fixed in some way, as I'm
> sure I'm not the only one who's going to run into this problem.

Stefano has fixed it in the upstream kernel. I'm afraid there is no
realistic chance of it being fixed in the squeeze kernel at this stage.

Ian.



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

George Dunlap-4
On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <[hidden email]> wrote:
>> It would still be good to get the AIO stuff fixed in some way, as I'm
>> sure I'm not the only one who's going to run into this problem.
>
> Stefano has fixed it in the upstream kernel. I'm afraid there is no
> realistic chance of it being fixed in the squeeze kernel at this stage.

Any chance we could get AIO disabled in the squeeze kernel then?

 -George

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote:
> On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <[hidden email]> wrote:
> >> It would still be good to get the AIO stuff fixed in some way, as I'm
> >> sure I'm not the only one who's going to run into this problem.
> >
> > Stefano has fixed it in the upstream kernel. I'm afraid there is no
> > realistic chance of it being fixed in the squeeze kernel at this stage.
>
> Any chance we could get AIO disabled in the squeeze kernel then?

I don't think so, that would break legitimate uses of AIO.

We could potentially ensure that Xen/qemu doesn't try to use AIO on
older kernels but AIUI you were running a newer version than provided in
Squeeze so that isn't something we can fix in Debian?

Ian.


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Stefano Stabellini-3
On Thu, 26 Apr 2012, Ian Campbell wrote:

> On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote:
> > On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <[hidden email]> wrote:
> > >> It would still be good to get the AIO stuff fixed in some way, as I'm
> > >> sure I'm not the only one who's going to run into this problem.
> > >
> > > Stefano has fixed it in the upstream kernel. I'm afraid there is no
> > > realistic chance of it being fixed in the squeeze kernel at this stage.
> >
> > Any chance we could get AIO disabled in the squeeze kernel then?
>
> I don't think so, that would break legitimate uses of AIO.
>
> We could potentially ensure that Xen/qemu doesn't try to use AIO on
> older kernels but AIUI you were running a newer version than provided in
> Squeeze so that isn't something we can fix in Debian?

We could add a patch in Debian to disable AIO and O_DIRECT in QEMU.

Otherwise I don't really know what we could do upstream to detect
whether a kernel has a buggy O_DIRECT/AIO implementation or not.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Ian Campbell-10
On Thu, 2012-04-26 at 15:10 +0100, Stefano Stabellini wrote:

> On Thu, 26 Apr 2012, Ian Campbell wrote:
> > On Thu, 2012-04-26 at 14:55 +0100, George Dunlap wrote:
> > > On Thu, Apr 26, 2012 at 2:46 PM, Ian Campbell <[hidden email]> wrote:
> > > >> It would still be good to get the AIO stuff fixed in some way, as I'm
> > > >> sure I'm not the only one who's going to run into this problem.
> > > >
> > > > Stefano has fixed it in the upstream kernel. I'm afraid there is no
> > > > realistic chance of it being fixed in the squeeze kernel at this stage.
> > >
> > > Any chance we could get AIO disabled in the squeeze kernel then?
> >
> > I don't think so, that would break legitimate uses of AIO.
> >
> > We could potentially ensure that Xen/qemu doesn't try to use AIO on
> > older kernels but AIUI you were running a newer version than provided in
> > Squeeze so that isn't something we can fix in Debian?
>
> We could add a patch in Debian to disable AIO and O_DIRECT in QEMU.

AUIU Qemu in this case is not the qemu in Debian, it's the one from
xen-unstable.

> Otherwise I don't really know what we could do upstream to detect
> whether a kernel has a buggy O_DIRECT/AIO implementation or not.

Me neither.

Ian.



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: Kernel aio bug in Debian 2.6.32-5-xen kernel?

Olaf Hering-2
In reply to this post by George Dunlap-4
On Thu, Apr 26, George Dunlap wrote:

> Recently I pulled in new changesets from xen- and qemu-unstable, and
> when creating a PV guest I'm getting errors like the stack trace
> below.  Is this likely to be caused by QEMU using AIO?  It this a bug
> in Xen or in the Debian kernel?  Is there an easy way to turn off aio
> using a config file so I can see if it is qemu's aio?

I also hit bugs in the nfs code paths with the SuSE kernels.
Try 'device_model_version="qemu-xen-traditional"' in your .cfg file.
See changeset 25222:a095e157f280.

Olaf

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel