dom0 crashing on extreme I/O

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

dom0 crashing on extreme I/O

Bugzilla from diwaker.lists@gmail.com
I have 3 VMs, two running webservers and the 3rd running
netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0 and all
the remaining VMs on a separate CPU.

Currently my dom0 has 528M of memory, while each VM has around 160M.
Under high loads, the system crashes. I'm pasting a representative crash here:

file=grant_table.c, line=729) gnttab_transfer: out-of-range or xen
frame 2f016001
(XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or
xen frame 2f017001
(XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or
xen frame 18fca001
(XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or
xen frame 18fcb001
(XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or
xen frame 2270c001
(XEN) (file=grant_table.c, line=729) gnttab_transfer: out-of-range or
xen frame 2270d001
------------[ cut here ]------------
kernel BUG at drivers/xen/netback/netback.c:335!
invalid operand: 0000 [#1]
Modules linked in: ipt_physdev iptable_filter ip_tables video thermal
processor fan button battery ac md sworks_agp agpgart dm_snapshot
dm_zero dm_mirror ext3 jbd dm_mod mptscsih mptbase sd_mod scsi_mod
CPU:    0
EIP:    0061:[<c02c6782>]    Not tainted VLI
EFLAGS: 00010246   (2.6.12.6-xen0)
EIP is at net_rx_action+0x4c2/0x4f0
eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
ds: 007b   es: 007b   ss: 0069
Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80 df26b400 5cb34f36
       00000088 00000000 0003d700 db6ff012 c04b8920 00000106 00a23e2c c05e5000
       00000000 c0363a90 00000001 00000000 00000000 00000001 c0a16510 00000000
Call Trace:
 [<c0362d90>] br_forward_finish+0x0/0x80
 [<c0363b36>] br_handle_frame_finish+0xa6/0x160
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c01423a5>] kmem_getpages+0x65/0x90
 [<c013ece2>] __rmqueue+0xb2/0xf0
 [<c032302d>] nf_iterate+0x5d/0x90
 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
 [<c032336e>] nf_hook_slow+0x6e/0x120
 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c0368549>] br_nf_pre_routing+0x319/0x4a0
 [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
 [<c032302d>] nf_iterate+0x5d/0x90
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c032336e>] nf_hook_slow+0x6e/0x120
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c0363db3>] br_handle_frame+0x1c3/0x260
 [<c0363a90>] br_handle_frame_finish+0x0/0x160
 [<c03188d3>] netif_receive_skb+0x113/0x230
 [<c02820bf>] tg3_rx+0x2cf/0x490
 [<c027e246>] tg3_restart_ints+0x26/0xa0
 [<c02823a6>] tg3_poll+0x126/0x1a0
 [<c0121660>] ksoftirqd+0x0/0xa0
 [<c0121660>] ksoftirqd+0x0/0xa0
 [<c01214ff>] tasklet_action+0x5f/0xa0
 [<c0121152>] __do_softirq+0x52/0xc0
 [<c0121207>] do_softirq+0x47/0x60
 [<c01216b9>] ksoftirqd+0x59/0xa0
 [<c013079d>] kthread+0xad/0xf0
 [<c01306f0>] kthread+0x0/0xf0
 [<c0106855>] kernel_thread_helper+0x5/0x10
Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40 c0 e8 31
ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff 8d 76 00 eb 92 <0f>
0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b 2a 01 38 19 3a c0
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 (XEN) Domain 0 shutdown: rebooting machine.

NOTE: the line number in netback.c (335) might not be very useful for
reference. I have some additional instrumentation in netback, so the
line number might not match the files in xen-unstable.hg

Will increasing dom0 memory further help? Or increasing the size of the rings?
--
Web/Blog/Gallery: http://floatingsun.net

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

RE: dom0 crashing on extreme I/O

Ian Pratt
 

> I have 3 VMs, two running webservers and the 3rd running
> netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0
> and all the remaining VMs on a separate CPU.
>
> Currently my dom0 has 528M of memory, while each VM has around 160M.
> Under high loads, the system crashes. I'm pasting a
> representative crash here:
>
> file=grant_table.c, line=729) gnttab_transfer: out-of-range
> or xen frame 2f016001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2f017001

Interesting. We've seen this very occasionally before, but this is the
first time on a 32b kernel.

The clue is that the errant frame numbers always end 001, and are
actually valid if you shift them >>12.

It would be very helpful if you could work on a minimal repro case,
ideally with only one domU.

Chris: any extra debugging that might be helpful?

Thanks,
Ian


> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fca001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fcb001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270c001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270d001 ------------[ cut here
> ]------------ kernel BUG at drivers/xen/netback/netback.c:335!
> invalid operand: 0000 [#1]
> Modules linked in: ipt_physdev iptable_filter ip_tables video
> thermal processor fan button battery ac md sworks_agp agpgart
> dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih
> mptbase sd_mod scsi_mod
> CPU:    0
> EIP:    0061:[<c02c6782>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.12.6-xen0)
> EIP is at net_rx_action+0x4c2/0x4f0
> eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
> esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
> ds: 007b   es: 007b   ss: 0069
> Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
> Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80
> df26b400 5cb34f36
>        00000088 00000000 0003d700 db6ff012 c04b8920 00000106
> 00a23e2c c05e5000
>        00000000 c0363a90 00000001 00000000 00000000 00000001
> c0a16510 00000000 Call Trace:
>  [<c0362d90>] br_forward_finish+0x0/0x80  [<c0363b36>]
> br_handle_frame_finish+0xa6/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c01423a5>]
> kmem_getpages+0x65/0x90  [<c013ece2>] __rmqueue+0xb2/0xf0  
> [<c032302d>] nf_iterate+0x5d/0x90  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c032336e>] nf_hook_slow+0x6e/0x120
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0368549>]
> br_nf_pre_routing+0x319/0x4a0  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c032302d>] nf_iterate+0x5d/0x90
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c032336e>]
> nf_hook_slow+0x6e/0x120  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c0363db3>]
> br_handle_frame+0x1c3/0x260  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c03188d3>]
> netif_receive_skb+0x113/0x230  [<c02820bf>]
> tg3_rx+0x2cf/0x490  [<c027e246>] tg3_restart_ints+0x26/0xa0  
> [<c02823a6>] tg3_poll+0x126/0x1a0  [<c0121660>]
> ksoftirqd+0x0/0xa0  [<c0121660>] ksoftirqd+0x0/0xa0  
> [<c01214ff>] tasklet_action+0x5f/0xa0  [<c0121152>]
> __do_softirq+0x52/0xc0  [<c0121207>] do_softirq+0x47/0x60  
> [<c01216b9>] ksoftirqd+0x59/0xa0  [<c013079d>]
> kthread+0xad/0xf0  [<c01306f0>] kthread+0x0/0xf0  
> [<c0106855>] kernel_thread_helper+0x5/0x10
> Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40
> c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff
> 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b
> 2a 01 38 19 3a c0  <0>Kernel panic - not syncing: Fatal
> exception in interrupt
>  (XEN) Domain 0 shutdown: rebooting machine.
>
> NOTE: the line number in netback.c (335) might not be very
> useful for reference. I have some additional instrumentation
> in netback, so the line number might not match the files in
> xen-unstable.hg
>
> Will increasing dom0 memory further help? Or increasing the
> size of the rings?
> --
> Web/Blog/Gallery: http://floatingsun.net
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: dom0 crashing on extreme I/O

Christopher Clark-2
Hi Diwaker

Could you add this patch to your build of the domain 0 kernel and try to exercise the fault again please?

thanks

Christopher

diff -r 821368442403 linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c        Thu Jan 12 11:45:49 2006
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c        Thu Jan 12 14:36:56 2006
@@ -212,6 +212,14 @@
                vdata   = (unsigned long)skb->data;
                old_mfn = virt_to_mfn(vdata);

+        if ( ((old_mfn & 0xfff) == 0x001) && (old_mfn > 0x10000000UL) )
+        {
+            printk("XXX: nasty mfn from p2m: v:%p p:%p m:%p\n",
+                    vdata, __pa(vdata), old_mfn );
+            /* HACK: let's try shifting it until it looks sane... */
+            old_mfn >>= 12;
+        }
+
                /* Memory squeeze? Back off for an arbitrary while. */
                if ((new_mfn = alloc_mfn()) == 0) {
                        if ( net_ratelimit() )


On 1/12/06, Ian Pratt <[hidden email]> wrote:


> I have 3 VMs, two running webservers and the 3rd running
> netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0
> and all the remaining VMs on a separate CPU.
>
> Currently my dom0 has 528M of memory, while each VM has around 160M.
> Under high loads, the system crashes. I'm pasting a
> representative crash here:
>
> file=grant_table.c, line=729) gnttab_transfer: out-of-range
> or xen frame 2f016001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2f017001

Interesting. We've seen this very occasionally before, but this is the
first time on a 32b kernel.

The clue is that the errant frame numbers always end 001, and are
actually valid if you shift them >>12.

It would be very helpful if you could work on a minimal repro case,
ideally with only one domU.

Chris: any extra debugging that might be helpful?

Thanks,
Ian


> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fca001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fcb001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270c001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270d001 ------------[ cut here
> ]------------ kernel BUG at drivers/xen/netback/netback.c:335!
> invalid operand: 0000 [#1]
> Modules linked in: ipt_physdev iptable_filter ip_tables video
> thermal processor fan button battery ac md sworks_agp agpgart
> dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih
> mptbase sd_mod scsi_mod
> CPU:    0
> EIP:    0061:[<c02c6782>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.12.6-xen0)
> EIP is at net_rx_action+0x4c2/0x4f0
> eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
> esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
> ds: 007b   es: 007b   ss: 0069
> Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
> Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80
> df26b400 5cb34f36
>        00000088 00000000 0003d700 db6ff012 c04b8920 00000106
> 00a23e2c c05e5000
>        00000000 c0363a90 00000001 00000000 00000000 00000001
> c0a16510 00000000 Call Trace:
>  [<c0362d90>] br_forward_finish+0x0/0x80  [<c0363b36>]
> br_handle_frame_finish+0xa6/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c01423a5>]
> kmem_getpages+0x65/0x90  [<c013ece2>] __rmqueue+0xb2/0xf0
> [<c032302d>] nf_iterate+0x5d/0x90  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c032336e>] nf_hook_slow+0x6e/0x120
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0368549>]
> br_nf_pre_routing+0x319/0x4a0  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c032302d>] nf_iterate+0x5d/0x90
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c032336e>]
> nf_hook_slow+0x6e/0x120  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c0363db3>]
> br_handle_frame+0x1c3/0x260  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c03188d3>]
> netif_receive_skb+0x113/0x230  [<c02820bf>]
> tg3_rx+0x2cf/0x490  [<c027e246>] tg3_restart_ints+0x26/0xa0
> [<c02823a6>] tg3_poll+0x126/0x1a0  [<c0121660>]
> ksoftirqd+0x0/0xa0  [<c0121660>] ksoftirqd+0x0/0xa0
> [<c01214ff>] tasklet_action+0x5f/0xa0  [<c0121152>]
> __do_softirq+0x52/0xc0  [<c0121207>] do_softirq+0x47/0x60
> [<c01216b9>] ksoftirqd+0x59/0xa0  [<c013079d>]
> kthread+0xad/0xf0  [<c01306f0>] kthread+0x0/0xf0
> [<c0106855>] kernel_thread_helper+0x5/0x10
> Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40
> c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff
> 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b
> 2a 01 38 19 3a c0  <0>Kernel panic - not syncing: Fatal
> exception in interrupt
>  (XEN) Domain 0 shutdown: rebooting machine.
>
> NOTE: the line number in netback.c (335) might not be very
> useful for reference. I have some additional instrumentation
> in netback, so the line number might not match the files in
> xen-unstable.hg
>
> Will increasing dom0 memory further help? Or increasing the
> size of the rings?
> --
> Web/Blog/Gallery: http://floatingsun.net
>


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

RE: dom0 crashing on extreme I/O

Ian Pratt
In reply to this post by Bugzilla from diwaker.lists@gmail.com
 
> I have 3 VMs, two running webservers and the 3rd running
> netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0
> and all the remaining VMs on a separate CPU.
>
> Currently my dom0 has 528M of memory, while each VM has around 160M.
> Under high loads, the system crashes. I'm pasting a
> representative crash here:

Is this PAE or not? How much memory has the system got?

Thanks,
Ian

> file=grant_table.c, line=729) gnttab_transfer: out-of-range
> or xen frame 2f016001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2f017001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fca001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 18fcb001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270c001
> (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> out-of-range or xen frame 2270d001 ------------[ cut here
> ]------------ kernel BUG at drivers/xen/netback/netback.c:335!
> invalid operand: 0000 [#1]
> Modules linked in: ipt_physdev iptable_filter ip_tables video
> thermal processor fan button battery ac md sworks_agp agpgart
> dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih
> mptbase sd_mod scsi_mod
> CPU:    0
> EIP:    0061:[<c02c6782>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.12.6-xen0)
> EIP is at net_rx_action+0x4c2/0x4f0
> eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
> esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
> ds: 007b   es: 007b   ss: 0069
> Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
> Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80
> df26b400 5cb34f36
>        00000088 00000000 0003d700 db6ff012 c04b8920 00000106
> 00a23e2c c05e5000
>        00000000 c0363a90 00000001 00000000 00000000 00000001
> c0a16510 00000000 Call Trace:
>  [<c0362d90>] br_forward_finish+0x0/0x80  [<c0363b36>]
> br_handle_frame_finish+0xa6/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c01423a5>]
> kmem_getpages+0x65/0x90  [<c013ece2>] __rmqueue+0xb2/0xf0  
> [<c032302d>] nf_iterate+0x5d/0x90  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c032336e>] nf_hook_slow+0x6e/0x120
>  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0368549>]
> br_nf_pre_routing+0x319/0x4a0  [<c0367aa0>]
> br_nf_pre_routing_finish+0x0/0x420
>  [<c032302d>] nf_iterate+0x5d/0x90
>  [<c0363a90>] br_handle_frame_finish+0x0/0x160  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c032336e>]
> nf_hook_slow+0x6e/0x120  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c0363db3>]
> br_handle_frame+0x1c3/0x260  [<c0363a90>]
> br_handle_frame_finish+0x0/0x160  [<c03188d3>]
> netif_receive_skb+0x113/0x230  [<c02820bf>]
> tg3_rx+0x2cf/0x490  [<c027e246>] tg3_restart_ints+0x26/0xa0  
> [<c02823a6>] tg3_poll+0x126/0x1a0  [<c0121660>]
> ksoftirqd+0x0/0xa0  [<c0121660>] ksoftirqd+0x0/0xa0  
> [<c01214ff>] tasklet_action+0x5f/0xa0  [<c0121152>]
> __do_softirq+0x52/0xc0  [<c0121207>] do_softirq+0x47/0x60  
> [<c01216b9>] ksoftirqd+0x59/0xa0  [<c013079d>]
> kthread+0xad/0xf0  [<c01306f0>] kthread+0x0/0xf0  
> [<c0106855>] kernel_thread_helper+0x5/0x10
> Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40
> c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff
> 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b
> 2a 01 38 19 3a c0  <0>Kernel panic - not syncing: Fatal
> exception in interrupt
>  (XEN) Domain 0 shutdown: rebooting machine.
>
> NOTE: the line number in netback.c (335) might not be very
> useful for reference. I have some additional instrumentation
> in netback, so the line number might not match the files in
> xen-unstable.hg
>
> Will increasing dom0 memory further help? Or increasing the
> size of the rings?
> --
> Web/Blog/Gallery: http://floatingsun.net
>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: dom0 crashing on extreme I/O

Bugzilla from diwaker.lists@gmail.com
In reply to this post by Christopher Clark-2
Hi Christopher,

Curiously I'm unable to reproduce the bug today (I just recompiled my
sources -- I didn't pull in any new changes though, so I don't think
anything changed) even without your patch.

I'll report back if I see the crash again,

Thanks,
Diwaker

On 1/12/06, Christopher Clark <[hidden email]> wrote:

> Hi Diwaker
>
> Could you add this patch to your build of the domain 0 kernel and try to
> exercise the fault again please?
>
> thanks
>
> Christopher
>
> diff -r 821368442403 linux-2.6-xen-sparse/drivers/xen/netback/netback.c
> --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c
>     Thu Jan 12 11:45:49 2006
> +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c
>     Thu Jan 12 14:36:56 2006
> @@ -212,6 +212,14 @@
>                 vdata   = (unsigned long)skb->data;
>                 old_mfn = virt_to_mfn(vdata);
>
> +        if ( ((old_mfn & 0xfff) == 0x001) && (old_mfn > 0x10000000UL) )
> +        {
> +            printk("XXX: nasty mfn from p2m: v:%p p:%p m:%p\n",
> +                    vdata, __pa(vdata), old_mfn );
> +            /* HACK: let's try shifting it until it looks sane... */
> +            old_mfn >>= 12;
> +        }
> +
>                 /* Memory squeeze? Back off for an arbitrary while. */
>                 if ((new_mfn = alloc_mfn()) == 0) {
>                         if ( net_ratelimit() )
>
>
>
> On 1/12/06, Ian Pratt < [hidden email]> wrote:
> >
> >
> > > I have 3 VMs, two running webservers and the 3rd running
> > > netperf/iperf. This is a multi-cpu setup, with dom0 on CPU-0
> > > and all the remaining VMs on a separate CPU.
> > >
> > > Currently my dom0 has 528M of memory, while each VM has around 160M.
> > > Under high loads, the system crashes. I'm pasting a
> > > representative crash here:
> > >
> > > file=grant_table.c, line=729) gnttab_transfer: out-of-range
> > > or xen frame 2f016001
> > > (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> > > out-of-range or xen frame 2f017001
> >
> > Interesting. We've seen this very occasionally before, but this is the
> > first time on a 32b kernel.
> >
> > The clue is that the errant frame numbers always end 001, and are
> > actually valid if you shift them >>12.
> >
> > It would be very helpful if you could work on a minimal repro case,
> > ideally with only one domU.
> >
> > Chris: any extra debugging that might be helpful?
> >
> > Thanks,
> > Ian
> >
> >
> > > (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> > > out-of-range or xen frame 18fca001
> > > (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> > > out-of-range or xen frame 18fcb001
> > > (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> > > out-of-range or xen frame 2270c001
> > > (XEN) (file=grant_table.c, line=729) gnttab_transfer:
> > > out-of-range or xen frame 2270d001 ------------[ cut here
> > > ]------------ kernel BUG at
> drivers/xen/netback/netback.c:335!
> > > invalid operand: 0000 [#1]
> > > Modules linked in: ipt_physdev iptable_filter ip_tables video
> > > thermal processor fan button battery ac md sworks_agp agpgart
> > > dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptscsih
> > > mptbase sd_mod scsi_mod
> > > CPU:    0
> > > EIP:    0061:[<c02c6782>]    Not tainted VLI
> > > EFLAGS: 00010246   (2.6.12.6-xen0)
> > > EIP is at net_rx_action+0x4c2/0x4f0
> > > eax: 0000fff7   ebx: df26b620   ecx: 00000042   edx: c04b8920
> > > esi: dc073480   edi: 00000000   ebp: c04b3900   esp: c0a23d28
> > > ds: 007b   es: 007b   ss: 0069
> > > Process ksoftirqd/0 (pid: 2, threadinfo=c0a22000 task=c0a16510)
> > > Stack: c04b38e0 c0362d90 80000000 c0363b36 dbad7d80 db6fee80
> > > df26b400 5cb34f36
> > >        00000088 00000000 0003d700 db6ff012 c04b8920 00000106
> > > 00a23e2c c05e5000
> > >        00000000 c0363a90 00000001 00000000 00000000 00000001
> > > c0a16510 00000000 Call Trace:
> > >  [<c0362d90>] br_forward_finish+0x0/0x80  [<c0363b36>]
> > > br_handle_frame_finish+0xa6/0x160  [<c0363a90>]
> > > br_handle_frame_finish+0x0/0x160  [<c01423a5>]
> > > kmem_getpages+0x65/0x90  [<c013ece2>]
> __rmqueue+0xb2/0xf0
> > > [<c032302d>] nf_iterate+0x5d/0x90  [<c0367aa0>]
> > > br_nf_pre_routing_finish+0x0/0x420
> > >  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
> > >  [<c032336e>] nf_hook_slow+0x6e/0x120
> > >  [<c0367aa0>] br_nf_pre_routing_finish+0x0/0x420
> > >  [<c0363a90>] br_handle_frame_finish+0x0/0x160
> [<c0368549>]
> > > br_nf_pre_routing+0x319/0x4a0  [<c0367aa0>]
> > > br_nf_pre_routing_finish+0x0/0x420
> > >  [<c032302d>] nf_iterate+0x5d/0x90
> > >  [<c0363a90>] br_handle_frame_finish+0x0/0x160
> [<c0363a90>]
> > > br_handle_frame_finish+0x0/0x160  [<c032336e>]
> > > nf_hook_slow+0x6e/0x120  [<c0363a90>]
> > > br_handle_frame_finish+0x0/0x160  [<c0363db3>]
> > > br_handle_frame+0x1c3/0x260  [<c0363a90>]
> > > br_handle_frame_finish+0x0/0x160  [<c03188d3>]
> > > netif_receive_skb+0x113/0x230  [<c02820bf>]
> > > tg3_rx+0x2cf/0x490  [<c027e246>]
> tg3_restart_ints+0x26/0xa0
> > > [<c02823a6>] tg3_poll+0x126/0x1a0  [<c0121660>]
> > > ksoftirqd+0x0/0xa0  [<c0121660>] ksoftirqd+0x0/0xa0
> > > [<c01214ff>] tasklet_action+0x5f/0xa0  [<c0121152>]
> > > __do_softirq+0x52/0xc0  [<c0121207>]
> do_softirq+0x47/0x60
> > > [<c01216b9>] ksoftirqd+0x59/0xa0  [<c013079d>]
> > > kthread+0xad/0xf0  [<c01306f0>] kthread+0x0/0xf0
> > > [<c0106855>] kernel_thread_helper+0x5/0x10
> > > Code: 0f 0b 44 01 38 19 3a c0 90 e9 5a fe ff ff b8 74 64 40
> > > c0 e8 31 ac e5 ff eb 8f c7 04 24 9c 10 3b c0 e8 b3 60 e5 ff
> > > 8d 76 00 eb 92 <0f> 0b 4f 01 38 19 3a c0 e9 4c fe ff ff 0f 0b
> > > 2a 01 38 19 3a c0  <0>Kernel panic - not syncing: Fatal
> > > exception in interrupt
> > >  (XEN) Domain 0 shutdown: rebooting machine.
> > >
> > > NOTE: the line number in netback.c (335) might not be very
> > > useful for reference. I have some additional instrumentation
> > > in netback, so the line number might not match the files in
> > > xen-unstable.hg
> > >
> > > Will increasing dom0 memory further help? Or increasing the
> > > size of the rings?
> > > --
> > > Web/Blog/Gallery: http://floatingsun.net
> > >
> >
>
>

--
Web/Blog/Gallery: http://floatingsun.net

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel