[xen-unstable test] 11946: regressions - FAIL

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[xen-unstable test] 11946: regressions - FAIL

Ian Jackson-2
flight 11946 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944

Regressions which are regarded as allowable (not blocking):
 test-i386-i386-win           14 guest-start.2                fail   like 11944

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install          fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64  7 windows-install        fail never pass
 test-amd64-i386-qemuu-rhel6hvm-intel  7 redhat-install         fail never pass
 test-i386-i386-xl-qemuu-winxpsp3  7 windows-install            fail never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start                 fail never pass
 test-amd64-i386-rhel6hvm-amd 11 leak-check/check             fail   never pass
 test-amd64-i386-rhel6hvm-intel 11 leak-check/check             fail never pass
 test-amd64-i386-qemuu-rhel6hvm-amd  7 redhat-install           fail never pass
 test-amd64-amd64-win         16 leak-check/check             fail   never pass
 test-amd64-i386-win-vcpus1   16 leak-check/check             fail   never pass
 test-amd64-amd64-xl-winxpsp3 13 guest-stop                   fail   never pass
 test-i386-i386-xl-winxpsp3   13 guest-stop                   fail   never pass
 test-amd64-i386-win          16 leak-check/check             fail   never pass
 test-amd64-amd64-xl-win7-amd64 13 guest-stop                   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop               fail never pass
 test-amd64-amd64-xl-win      13 guest-stop                   fail   never pass
 test-amd64-i386-xend-winxpsp3 16 leak-check/check             fail  never pass
 test-i386-i386-xl-win        13 guest-stop                   fail   never pass
 test-amd64-i386-xl-win7-amd64 13 guest-stop                   fail  never pass
 test-amd64-i386-xl-win-vcpus1 13 guest-stop                   fail  never pass

version targeted for testing:
 xen                  9207cc3a0862
baseline version:
 xen                  9ad1e42c341b

------------------------------------------------------------
People who touched revisions under test:
  David Vrabel <[hidden email]>
  Ian Campbell <[hidden email]>
  Ian Jackson <[hidden email]>
  Jan Beulich <[hidden email]>
  Julian Pidancet <[hidden email]>
  Keir Fraser <[hidden email]>
  Stefano Stabellini <[hidden email]>
  Tim Deegan <[hidden email]>
  Yongjie Ren <[hidden email]>
------------------------------------------------------------

jobs:
 build-amd64                                                  pass    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-amd64-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-i386-i386-xl                                            pass    
 test-amd64-i386-rhel6hvm-amd                                 fail    
 test-amd64-i386-qemuu-rhel6hvm-amd                           fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-amd64-xl-win7-amd64                               fail    
 test-amd64-i386-xl-win7-amd64                                fail    
 test-amd64-i386-xl-credit2                                   fail    
 test-amd64-amd64-xl-pcipt-intel                              fail    
 test-amd64-i386-rhel6hvm-intel                               fail    
 test-amd64-i386-qemuu-rhel6hvm-intel                         fail    
 test-amd64-i386-xl-multivcpu                                 pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-i386-i386-pair                                          pass    
 test-amd64-amd64-xl-sedf-pin                                 pass    
 test-amd64-amd64-pv                                          pass    
 test-amd64-i386-pv                                           pass    
 test-i386-i386-pv                                            pass    
 test-amd64-amd64-xl-sedf                                     pass    
 test-amd64-i386-win-vcpus1                                   fail    
 test-amd64-i386-xl-win-vcpus1                                fail    
 test-amd64-i386-xl-winxpsp3-vcpus1                           fail    
 test-amd64-amd64-win                                         fail    
 test-amd64-i386-win                                          fail    
 test-i386-i386-win                                           fail    
 test-amd64-amd64-xl-win                                      fail    
 test-i386-i386-xl-win                                        fail    
 test-amd64-amd64-xl-qemuu-winxpsp3                           fail    
 test-i386-i386-xl-qemuu-winxpsp3                             fail    
 test-amd64-i386-xend-winxpsp3                                fail    
 test-amd64-amd64-xl-winxpsp3                                 fail    
 test-i386-i386-xl-winxpsp3                                   fail    


------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
changeset:   24790:9207cc3a0862
tag:         tip
user:        David Vrabel <[hidden email]>
date:        Mon Feb 13 13:34:47 2012 +0000
   
    libfdt: add to build
   
    Signed-off-by: David Vrabel <[hidden email]>
    Acked-by: Tim Deegan <[hidden email]>
    Committed-by: Keir Fraser <[hidden email]>
   
   
changeset:   24789:e060d1bd7b60
user:        David Vrabel <[hidden email]>
date:        Mon Feb 13 13:34:08 2012 +0000
   
    libfdt: fixup libfdt_env.h for xen
   
    Signed-off-by: David Vrabel <[hidden email]>
    Acked-by: Tim Deegan <[hidden email]>
    Committed-by: Keir Fraser <[hidden email]>
   
   
changeset:   24788:fcc188f21e47
user:        David Vrabel <[hidden email]>
date:        Mon Feb 13 13:33:26 2012 +0000
   
    libfdt: add version 1.3.0
   
    Add libfdt 1.3.0 from http://git.jdl.com/gitweb/?p=dtc.git
   
    This will be used by Xen to parse the DTBs provided by bootloaders on
    ARM platforms.
   
    Signed-off-by: David Vrabel <[hidden email]>
    Acked-by: Tim Deegan <[hidden email]>
    Committed-by: Keir Fraser <[hidden email]>
   
   
changeset:   24787:bd0a11ed1a67
user:        Ian Campbell <[hidden email]>
date:        Mon Feb 13 12:53:28 2012 +0000
   
    MAINTAINERS: Add entry for ARM w/ virt extensions port
   
    Signed-off-by: Ian Campbell <[hidden email]>
    Committed-by: Keir Fraser <[hidden email]>
   
   
changeset:   24786:79fe73117c12
user:        Julian Pidancet <[hidden email]>
date:        Mon Feb 13 12:50:46 2012 +0000
   
    firmware: Introduce CONFIG_ROMBIOS and CONFIG_SEABIOS options
   
    This patch introduces configuration options allowing to built either a
    rombios only or a seabios only hvmloader.
   
    Building option ROMs like vgabios or etherboot is only enabled for a
    rombios hvmloader, since SeaBIOS takes care or extracting option ROMs
    itself from the PCI devices (these option ROMs are provided by the
    device model and do not need to be built in hvmloader).
   
    The Makefile in tools/firmware/ now only checks for bcc if rombios is
    enabled.
   
    These two configuration options are left on by default to remain
    compatible.
   
    Signed-off-by: Julian Pidancet <[hidden email]>
    Acked-by: Ian Campbell <[hidden email]>
   
   
changeset:   24785:e4d8d2524407
user:        Julian Pidancet <[hidden email]>
date:        Mon Feb 13 12:50:04 2012 +0000
   
    hvmloader: Move option ROM loading into a separate optionnal file
   
    Make load_rom field in struct bios_config an optionnal callback rather
    than a boolean value. It allow BIOS specific code to implement it's
    own option ROM loading methods.
   
    Facilities to scan PCI devices, extract an deploy ROMs are moved into
    a separate file that can be compiled optionnaly.
   
    Signed-off-by: Julian Pidancet <[hidden email]>
    Acked-by: Ian Campbell <[hidden email]>
   
   
changeset:   24784:ab47cfef2b0a
user:        Julian Pidancet <[hidden email]>
date:        Mon Feb 13 12:49:06 2012 +0000
   
    firmware: Use mkhex from hvmloader directory for etherboot ROMs
   
    To remain consistent with how other ROMs are built into hvmloader,
    call mkhex on etherboot ROMs from the hvmloader directory, instead of
    the etherboot directory. In other words, eb-roms.h is not used any
    more.
   
    Introduce ETHERBOOT_NICS config option to choose which ROMs should be
    built (kept rtl8139 and 8086100e per default as before).
   
    Signed-off-by: Julian Pidancet <[hidden email]>
    Acked-by: Ian Campbell <[hidden email]>
   
   
changeset:   24783:0fe9e2556e20
user:        Julian Pidancet <[hidden email]>
date:        Mon Feb 13 12:48:20 2012 +0000
   
    hvmloader: Allow the mkhex command to take several file arguments
    Signed-off-by: Julian Pidancet <[hidden email]>
    Acked-by: Ian Campbell <[hidden email]>
   
   
changeset:   24782:e1f10d12b9fe
user:        Julian Pidancet <[hidden email]>
date:        Mon Feb 13 12:47:46 2012 +0000
   
    hvmloader: Only compile 32bitbios_support.c when rombios is enabled
   
    32bitbios_support.c only contains code specific to rombios, and should
    not be built-in when building hvmloader for SeaBIOS only (as for
    rombios.c).
   
    Signed-off-by: Julian Pidancet <[hidden email]>
    Acked-by: Ian Campbell <[hidden email]>
   
   
changeset:   24781:6ae5506e49ab
user:        Jan Beulich <[hidden email]>
date:        Mon Feb 13 13:12:30 2012 +0100
   
    x86/vMCE: MC{G,i}_CTL handling adjustments
   
    - g_mcg_cap was read to determine whether MCG_CTL exists before it got
      initialized
    - h_mci_ctrl[] and dom_vmce()->mci_ctl[] both got initialized via
      memset() with an inappropriate size (hence causing a [minor?]
      information leak)
   
    Signed-off-by: Jan Beulich <[hidden email]>
    Acked-by: Keir Fraser <[hidden email]>
   
   
changeset:   24780:e953d536d3c6
user:        Jan Beulich <[hidden email]>
date:        Mon Feb 13 13:09:02 2012 +0100
   
    x86/paging: use clear_guest() for zero-filling guest buffers
   
    While static arrays of all zeros may be tolerable (but are simply
    inefficient now that we have the necessary infrastructure), using on-
    stack arrays for this purpose (particularly when their size doesn't
    have an upper limit enforced) is calling for eventual problems (even
    if the code can be reached via administrative interfaces only).
   
    Signed-off-by: Jan Beulich <[hidden email]>
    Acked-by: Tim Deegan <[hidden email]>
   
   
changeset:   24779:9ad1e42c341b
user:        Ian Campbell <[hidden email]>
date:        Fri Feb 10 17:24:50 2012 +0000
   
    xend: populate HVM guest grant table on boot
   
    Signed-off-by: Ian Campbell <[hidden email]>
    Committed-by: Ian Jackson <[hidden email]>
   
   
========================================
commit 8cc8a3651c9c5bc2d0086d12f4b870fc525b9387
Author: Jan Beulich <[hidden email]>
Date:   Tue Feb 7 18:42:56 2012 +0000

    qemu-dm: fix unregister_iomem()
   
    This function (introduced quite a long time ago in
    e7911109f4321e9ba0cc56a253b653600aa46bea - "disable qemu PCI
    devices in HVM domains") appears to be completely broken, causing
    the regression reported in
    http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1805 (due to
    the newly added caller of it in
    56d7747a3cf811910c4cf865e1ebcb8b82502005 - "qemu: clean up
    MSI-X table handling"). It's unclear how the function can ever have
    fulfilled its purpose: the value returned by iomem_index() is *not* an
    index into mmio[].
   
    Additionally, fix two problems:
    - unregister_iomem() must not clear mmio[].start, otherwise
      cpu_register_physical_memory() won't be able to re-use the previous
      slot, thus causing a leak
    - cpu_unregister_io_memory() must not check mmio[].size, otherwise it
      won't properly clean up entries (temporarily) squashed through
      unregister_iomem()
   
    Signed-off-by: Jan Beulich <[hidden email]>
    Tested-by: Stefano Stabellini <[hidden email]>
    Tested-by: Yongjie Ren <[hidden email]>

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Ian Campbell-10
On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> flight 11946 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944

Host crash:
http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log

This is the debug Andrew Cooper added recently to track down the IRQ
assertion we've been seeing, sadly it looks like the debug code tries to
call xfree from interrupt context and therefore doesn't produce full
output :-(

Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
at fault for adding the xfree in what may be an IRQ context? (are
keyhandlers run in IRQ context?)

A skanky quick "fix" follows.

        Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
        Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
        Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
        Feb 13 17:19:32.617536 (XEN)    IRQ:   0 affinity:001 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
        Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
        Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
        Feb 13 17:19:32.626512 (XEN) CPU:    0
        Feb 13 17:19:32.626525 (XEN) RIP:    e008:[<ffff82c48012c842>] xfree+0x33/0x121
        Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
        Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800   rbx: ffff8301a7e00080   rcx: 0000000000000000
        Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000   rsi: 0000000000000083   rdi: 0000000000000000
        Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18   rsp: ffff82c4802afcf8   r8:  0000000000000004
        Feb 13 17:19:32.665550 (XEN) r9:  0000000000000000   r10: 0000000000000006   r11: ffff82c480224aa0
        Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580   r13: 0000000000000005   r14: ffff82c4802aff18
        Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
        Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000   cr2: 00000000c4b4ee84
        Feb 13 17:19:32.697505 (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
        Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
        Feb 13 17:19:32.706513 (XEN)    ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
        Feb 13 17:19:32.721495 (XEN)    ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
        Feb 13 17:19:32.721531 (XEN)    00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
        Feb 13 17:19:32.729495 (XEN)    00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
        Feb 13 17:19:32.738490 (XEN)    ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
        Feb 13 17:19:32.738515 (XEN)    ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
        Feb 13 17:19:32.750497 (XEN)    ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
        Feb 13 17:19:32.762568 (XEN)    000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
        Feb 13 17:19:32.762596 (XEN)    ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
        Feb 13 17:19:32.773513 (XEN)    00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
        Feb 13 17:19:32.785503 (XEN)    0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
        Feb 13 17:19:32.785539 (XEN)    ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
        Feb 13 17:19:32.794514 (XEN)    000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
        Feb 13 17:19:32.809503 (XEN)    0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
        Feb 13 17:19:32.809529 (XEN)    ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
        Feb 13 17:19:32.814513 (XEN)    0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
        Feb 13 17:19:32.829506 (XEN)    ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
        Feb 13 17:19:32.829549 (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
        Feb 13 17:19:32.841510 (XEN)    00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
        Feb 13 17:19:32.853508 (XEN)    0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
        Feb 13 17:19:32.858496 (XEN) Xen call trace:
        Feb 13 17:19:32.858518 (XEN)    [<ffff82c48012c842>] xfree+0x33/0x121
        Feb 13 17:19:32.858547 (XEN)    [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
        Feb 13 17:19:32.870500 (XEN)    [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
        Feb 13 17:19:32.870554 (XEN)    [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
        Feb 13 17:19:32.885510 (XEN)    [<ffff82c4801581a9>] default_idle+0x99/0x9e
        Feb 13 17:19:32.885541 (XEN)    [<ffff82c48015a647>] idle_loop+0x6c/0x7c
        Feb 13 17:19:32.897496 (XEN)    
        Feb 13 17:19:32.897510 (XEN)
        Feb 13 17:19:32.897520 (XEN) ****************************************
        Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
        Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
        Feb 13 17:19:32.905522 (XEN) ****************************************
        Feb 13 17:19:32.913488 (XEN)
        Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...

# HG changeset patch
# User Ian Campbell <[hidden email]>
# Date 1329216241 0
# Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
# Parent  0ba87b95e80bae059fe70b4b117dcc409f2471ef
xen: don't try to print IRQ SSID in IRQ debug from irq context.

It is not possible to call xfree() in that context.

Signed-off-by: Ian Campbell <[hidden email]>

diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
+++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
@@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
         if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
             continue;
 
-        ssid = xsm_show_irq_sid(irq);
+        ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
 
         spin_lock_irqsave(&desc->lock, flags);
 
@@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
 
         spin_unlock_irqrestore(&desc->lock, flags);
 
-        xfree(ssid);
+        if ( ssid )
+                xfree(ssid);
     }
 
     dump_ioapic_irq_info();




_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Daniel De Graaf-4
On 02/14/2012 05:44 AM, Ian Campbell wrote:

> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>> flight 11946 xen-unstable real [real]
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
>
> Host crash:
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>
> This is the debug Andrew Cooper added recently to track down the IRQ
> assertion we've been seeing, sadly it looks like the debug code tries to
> call xfree from interrupt context and therefore doesn't produce full
> output :-(
>
> Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
> at fault for adding the xfree in what may be an IRQ context? (are
> keyhandlers run in IRQ context?)

Keyhandlers are not run in IRQ context (or at least, the primary methods of
invoking them don't run there - serial keypress, xl debug-key). The placement
of the xsm call and xfree was to avoid a similar backtrace from attempting
allocation while holding the irq's spinlock.

> A skanky quick "fix" follows.
>
>         Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
>         Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
>         Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
>         Feb 13 17:19:32.617536 (XEN)    IRQ:   0 affinity:001 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
>         Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
>         Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
>         Feb 13 17:19:32.626512 (XEN) CPU:    0
>         Feb 13 17:19:32.626525 (XEN) RIP:    e008:[<ffff82c48012c842>] xfree+0x33/0x121
>         Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
>         Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800   rbx: ffff8301a7e00080   rcx: 0000000000000000
>         Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000   rsi: 0000000000000083   rdi: 0000000000000000
>         Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18   rsp: ffff82c4802afcf8   r8:  0000000000000004
>         Feb 13 17:19:32.665550 (XEN) r9:  0000000000000000   r10: 0000000000000006   r11: ffff82c480224aa0
>         Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580   r13: 0000000000000005   r14: ffff82c4802aff18
>         Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
>         Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000   cr2: 00000000c4b4ee84
>         Feb 13 17:19:32.697505 (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
>         Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
>         Feb 13 17:19:32.706513 (XEN)    ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
>         Feb 13 17:19:32.721495 (XEN)    ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
>         Feb 13 17:19:32.721531 (XEN)    00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
>         Feb 13 17:19:32.729495 (XEN)    00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
>         Feb 13 17:19:32.738490 (XEN)    ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
>         Feb 13 17:19:32.738515 (XEN)    ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
>         Feb 13 17:19:32.750497 (XEN)    ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
>         Feb 13 17:19:32.762568 (XEN)    000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
>         Feb 13 17:19:32.762596 (XEN)    ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
>         Feb 13 17:19:32.773513 (XEN)    00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
>         Feb 13 17:19:32.785503 (XEN)    0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
>         Feb 13 17:19:32.785539 (XEN)    ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
>         Feb 13 17:19:32.794514 (XEN)    000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
>         Feb 13 17:19:32.809503 (XEN)    0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
>         Feb 13 17:19:32.809529 (XEN)    ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
>         Feb 13 17:19:32.814513 (XEN)    0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
>         Feb 13 17:19:32.829506 (XEN)    ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
>         Feb 13 17:19:32.829549 (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>         Feb 13 17:19:32.841510 (XEN)    00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
>         Feb 13 17:19:32.853508 (XEN)    0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
>         Feb 13 17:19:32.858496 (XEN) Xen call trace:
>         Feb 13 17:19:32.858518 (XEN)    [<ffff82c48012c842>] xfree+0x33/0x121
>         Feb 13 17:19:32.858547 (XEN)    [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
>         Feb 13 17:19:32.870500 (XEN)    [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
>         Feb 13 17:19:32.870554 (XEN)    [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
>         Feb 13 17:19:32.885510 (XEN)    [<ffff82c4801581a9>] default_idle+0x99/0x9e
>         Feb 13 17:19:32.885541 (XEN)    [<ffff82c48015a647>] idle_loop+0x6c/0x7c
>         Feb 13 17:19:32.897496 (XEN)    
>         Feb 13 17:19:32.897510 (XEN)
>         Feb 13 17:19:32.897520 (XEN) ****************************************
>         Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
>         Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
>         Feb 13 17:19:32.905522 (XEN) ****************************************
>         Feb 13 17:19:32.913488 (XEN)
>         Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...
>
> # HG changeset patch
> # User Ian Campbell <[hidden email]>
> # Date 1329216241 0
> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
> # Parent  0ba87b95e80bae059fe70b4b117dcc409f2471ef
> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>
> It is not possible to call xfree() in that context.
>
> Signed-off-by: Ian Campbell <[hidden email]>
>
> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
>          if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
>              continue;
>  
> -        ssid = xsm_show_irq_sid(irq);
> +        ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>  
>          spin_lock_irqsave(&desc->lock, flags);
>  
> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>  
>          spin_unlock_irqrestore(&desc->lock, flags);
>  
> -        xfree(ssid);
> +        if ( ssid )
> +                xfree(ssid);
>      }
>  
>      dump_ioapic_irq_info();
>
>
>


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Ian Campbell-10
In reply to this post by Ian Campbell-10
On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:

> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> > flight 11946 xen-unstable real [real]
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
>
> Host crash:
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>
> This is the debug Andrew Cooper added recently to track down the IRQ
> assertion we've been seeing, sadly it looks like the debug code tries to
> call xfree from interrupt context and therefore doesn't produce full
> output :-(

Are we still seeing the issue this debugging was intended to address? We
don't seem to be seeing the host crashes any more. Should the debug code
be patched up as in the following patch, otherwise when we do see it it
doesn't end up printing any useful info.

Someone recently reported bugs.debian.org/665433 to Debian, is this the
same underlying issue? That report is with Xen 4.0 FWIW.

> Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
> at fault for adding the xfree in what may be an IRQ context? (are
> keyhandlers run in IRQ context?)
>
> A skanky quick "fix" follows.
>
>         Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
>         Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
>         Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
>         Feb 13 17:19:32.617536 (XEN)    IRQ:   0 affinity:001 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
>         Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
>         Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Not tainted ]----
>         Feb 13 17:19:32.626512 (XEN) CPU:    0
>         Feb 13 17:19:32.626525 (XEN) RIP:    e008:[<ffff82c48012c842>] xfree+0x33/0x121
>         Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
>         Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800   rbx: ffff8301a7e00080   rcx: 0000000000000000
>         Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000   rsi: 0000000000000083   rdi: 0000000000000000
>         Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18   rsp: ffff82c4802afcf8   r8:  0000000000000004
>         Feb 13 17:19:32.665550 (XEN) r9:  0000000000000000   r10: 0000000000000006   r11: ffff82c480224aa0
>         Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580   r13: 0000000000000005   r14: ffff82c4802aff18
>         Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
>         Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000   cr2: 00000000c4b4ee84
>         Feb 13 17:19:32.697505 (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
>         Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
>         Feb 13 17:19:32.706513 (XEN)    ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
>         Feb 13 17:19:32.721495 (XEN)    ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
>         Feb 13 17:19:32.721531 (XEN)    00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
>         Feb 13 17:19:32.729495 (XEN)    00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
>         Feb 13 17:19:32.738490 (XEN)    ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
>         Feb 13 17:19:32.738515 (XEN)    ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
>         Feb 13 17:19:32.750497 (XEN)    ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
>         Feb 13 17:19:32.762568 (XEN)    000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
>         Feb 13 17:19:32.762596 (XEN)    ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
>         Feb 13 17:19:32.773513 (XEN)    00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
>         Feb 13 17:19:32.785503 (XEN)    0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
>         Feb 13 17:19:32.785539 (XEN)    ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
>         Feb 13 17:19:32.794514 (XEN)    000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
>         Feb 13 17:19:32.809503 (XEN)    0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
>         Feb 13 17:19:32.809529 (XEN)    ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
>         Feb 13 17:19:32.814513 (XEN)    0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
>         Feb 13 17:19:32.829506 (XEN)    ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
>         Feb 13 17:19:32.829549 (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>         Feb 13 17:19:32.841510 (XEN)    00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
>         Feb 13 17:19:32.853508 (XEN)    0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
>         Feb 13 17:19:32.858496 (XEN) Xen call trace:
>         Feb 13 17:19:32.858518 (XEN)    [<ffff82c48012c842>] xfree+0x33/0x121
>         Feb 13 17:19:32.858547 (XEN)    [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
>         Feb 13 17:19:32.870500 (XEN)    [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
>         Feb 13 17:19:32.870554 (XEN)    [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
>         Feb 13 17:19:32.885510 (XEN)    [<ffff82c4801581a9>] default_idle+0x99/0x9e
>         Feb 13 17:19:32.885541 (XEN)    [<ffff82c48015a647>] idle_loop+0x6c/0x7c
>         Feb 13 17:19:32.897496 (XEN)    
>         Feb 13 17:19:32.897510 (XEN)
>         Feb 13 17:19:32.897520 (XEN) ****************************************
>         Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
>         Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
>         Feb 13 17:19:32.905522 (XEN) ****************************************
>         Feb 13 17:19:32.913488 (XEN)
>         Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...
>
> # HG changeset patch
> # User Ian Campbell <[hidden email]>
> # Date 1329216241 0
> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
> # Parent  0ba87b95e80bae059fe70b4b117dcc409f2471ef
> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>
> It is not possible to call xfree() in that context.
>
> Signed-off-by: Ian Campbell <[hidden email]>
>
> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
>          if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
>              continue;
>  
> -        ssid = xsm_show_irq_sid(irq);
> +        ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>  
>          spin_lock_irqsave(&desc->lock, flags);
>  
> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>  
>          spin_unlock_irqrestore(&desc->lock, flags);
>  
> -        xfree(ssid);
> +        if ( ssid )
> +                xfree(ssid);
>      }
>  
>      dump_ioapic_irq_info();
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> [hidden email]
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Jan Beulich-2
>>> On 27.03.12 at 12:36, Ian Campbell <[hidden email]> wrote:
>> # HG changeset patch
>> # User Ian Campbell <[hidden email]>
>> # Date 1329216241 0
>> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
>> # Parent  0ba87b95e80bae059fe70b4b117dcc409f2471ef
>> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>>
>> It is not possible to call xfree() in that context.
>>
>> Signed-off-by: Ian Campbell <[hidden email]>
>>
>> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
>> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
>> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
>>          if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
>>              continue;
>>  
>> -        ssid = xsm_show_irq_sid(irq);
>> +        ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>>  
>>          spin_lock_irqsave(&desc->lock, flags);
>>  
>> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>>  
>>          spin_unlock_irqrestore(&desc->lock, flags);
>>  
>> -        xfree(ssid);
>> +        if ( ssid )
>> +                xfree(ssid);

But perhaps xfree(NULL) should be made usable in any context (i.e.
the assertion in there moved down)? Otherwise the construct above
is likely to get collapsed again at some point with "xfree(NULL) is
perfectly valid" in mind.

Jan

>>      }
>>  
>>      dump_ioapic_irq_info();



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

AP Xen
In reply to this post by Ian Campbell-10
On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <[hidden email]> wrote:

> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>> > flight 11946 xen-unstable real [real]
>> > http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>> >
>> > Regressions :-(
>> >
>> > Tests which did not succeed and are blocking,
>> > including tests which could not be run:
>> >  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
>>
>> Host crash:
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>>
>> This is the debug Andrew Cooper added recently to track down the IRQ
>> assertion we've been seeing, sadly it looks like the debug code tries to
>> call xfree from interrupt context and therefore doesn't produce full
>> output :-(
>
> Are we still seeing the issue this debugging was intended to address? We
> don't seem to be seeing the host crashes any more. Should the debug code
> be patched up as in the following patch, otherwise when we do see it it
> doesn't end up printing any useful info.
>
> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> same underlying issue? That report is with Xen 4.0 FWIW.

I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
code added. Can the fix to the debugging code be checked in until the
original issue has been fixed?

Thanks,
AP

(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 236 from bitmap
41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
status=00000000 mapped, unbound
(XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
(XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48012cefb>] xfree+0x33/0x118
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830214ac0080   rcx: 0000000000000000
(XEN) rdx: ffff82c4802d8880   rsi: 0000000000000083   rdi: 0000000000000000
(XEN) rbp: ffff82c4802b7c78   rsp: ffff82c4802b7c58   r8:  0000000000000004
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000010
(XEN) r12: ffff830214ac0c80   r13: 000000000000000c   r14: ffff830214ac0ca8
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426f0
(XEN) cr3: 0000000168971000   cr2: 0000000001095e00
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802b7c58:
(XEN)    ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
(XEN)    ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
(XEN)    ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
(XEN)    00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
(XEN)    ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
(XEN)    ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
(XEN)    0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
(XEN)    ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
(XEN)    ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
(XEN)    0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
(XEN)    ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
(XEN)    0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
(XEN)    0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
(XEN)    ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
(XEN)    00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
(XEN)    0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
(XEN)    ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
(XEN)    000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
(XEN)    ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
(XEN)    0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
(XEN) Xen call trace:
(XEN)    [<ffff82c48012cefb>] xfree+0x33/0x118
(XEN)    [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
(XEN)    [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
(XEN)    [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
(XEN)    [<ffff82c480160426>] common_interrupt+0x26/0x30
(XEN)    [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
(XEN)    [<ffff82c480228438>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
On 04/05/12 20:48, AP wrote:

> On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <[hidden email]> wrote:
>> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
>>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>>>> flight 11946 xen-unstable real [real]
>>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
>>> Host crash:
>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>>>
>>> This is the debug Andrew Cooper added recently to track down the IRQ
>>> assertion we've been seeing, sadly it looks like the debug code tries to
>>> call xfree from interrupt context and therefore doesn't produce full
>>> output :-(
>> Are we still seeing the issue this debugging was intended to address? We
>> don't seem to be seeing the host crashes any more. Should the debug code
>> be patched up as in the following patch, otherwise when we do see it it
>> doesn't end up printing any useful info.
>>
>> Someone recently reported bugs.debian.org/665433 to Debian, is this the
>> same underlying issue? That report is with Xen 4.0 FWIW.
> I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> code added. Can the fix to the debugging code be checked in until the
> original issue has been fixed?
>
> Thanks,
> AP
>
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap
> 41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830214ac0080   rcx: 0000000000000000
> (XEN) rdx: ffff82c4802d8880   rsi: 0000000000000083   rdi: 0000000000000000
> (XEN) rbp: ffff82c4802b7c78   rsp: ffff82c4802b7c58   r8:  0000000000000004
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000010
> (XEN) r12: ffff830214ac0c80   r13: 000000000000000c   r14: ffff830214ac0ca8
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426f0
> (XEN) cr3: 0000000168971000   cr2: 0000000001095e00
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> (XEN)    ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
> (XEN)    ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
> (XEN)    ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
> (XEN)    00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
> (XEN)    ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
> (XEN)    ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
> (XEN)    0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
> (XEN)    ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
> (XEN)    ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
> (XEN)    0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
> (XEN)    ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
> (XEN)    0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
> (XEN)    0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
> (XEN)    ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
> (XEN)    00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
> (XEN)    0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
> (XEN)    ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
> (XEN)    000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
> (XEN)    ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
> (XEN)    0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48012cefb>] xfree+0x33/0x118
> (XEN)    [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> (XEN)    [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> (XEN)    [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> (XEN)    [<ffff82c480160426>] common_interrupt+0x26/0x30
> (XEN)    [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> (XEN)    [<ffff82c480228438>] syscall_enter+0xc8/0x122
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
The attached patch should prevent this panic, allowing for all the debug
information to be printed to the console.

--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel

irq-fix-dump_irqs.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

AP Xen
On Fri, May 4, 2012 at 8:11 PM, Andrew Cooper <[hidden email]> wrote:
>
> On 04/05/12 20:48, AP wrote:
> > On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <[hidden email]>
> > wrote:
> >> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
> >>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> >>>> flight 11946 xen-unstable real [real]
> >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking,
> >>>> including tests which could not be run:
> >>>>  test-amd64-i386-xl-credit2    7 debian-install            fail REGR.
> >>>> vs. 11944
> >>> Host crash:
> >>>
> >>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
> >>>
> >>> This is the debug Andrew Cooper added recently to track down the IRQ
> >>> assertion we've been seeing, sadly it looks like the debug code tries
> >>> to
> >>> call xfree from interrupt context and therefore doesn't produce full
> >>> output :-(
> >> Are we still seeing the issue this debugging was intended to address?
> >> We
> >> don't seem to be seeing the host crashes any more. Should the debug
> >> code
> >> be patched up as in the following patch, otherwise when we do see it it
> >> doesn't end up printing any useful info.
> >>
> >> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> >> same underlying issue? That report is with Xen 4.0 FWIW.
> > I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> > code added. Can the fix to the debugging code be checked in until the
> > original issue has been fixed?
> >
> > Thanks,
> > AP
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> >
> > 41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> > (XEN) Guest interrupt information:
> > (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830214ac0080   rcx:
> > 0000000000000000
> > (XEN) rdx: ffff82c4802d8880   rsi: 0000000000000083   rdi:
> > 0000000000000000
> > (XEN) rbp: ffff82c4802b7c78   rsp: ffff82c4802b7c58   r8:
> >  0000000000000004
> > (XEN) r9:  0000000000000000   r10: 0000000000000000   r11:
> > 0000000000000010
> > (XEN) r12: ffff830214ac0c80   r13: 000000000000000c   r14:
> > ffff830214ac0ca8
> > (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4:
> > 00000000000426f0
> > (XEN) cr3: 0000000168971000   cr2: 0000000001095e00
> > (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> > (XEN)    ffff830214ac0080 ffff830214ac0c80 000000000000000c
> > ffff830214ac0ca8
> > (XEN)    ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a
> > ffff82c400000020
> > (XEN)    ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8
> > 0000000000000000
> > (XEN)    00000000000000ec 00000000000000ec ffff830214ac0c80
> > 000000000000000c
> > (XEN)    ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58
> > ffff82c480168000
> > (XEN)    ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000
> > ffff82c4802b7f18
> > (XEN)    0000000000000000 0000000000000000 ffff82c480302324
> > 0000000000000020
> > (XEN)    ffff82c4802b7dd8 0000000000000003 0000000000000000
> > 0000000000000000
> > (XEN)    ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000
> > ffff8300da996000
> > (XEN)    0000000000000000 ffffffff802b7d90 ffff82c480159160
> > ffff82c4802b7e20
> > (XEN)    ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000
> > 0000000000000003
> > (XEN)    0000000000000000 0000000000000000 00007d3b7fd48207
> > ffff82c480160426
> > (XEN)    0000000000000000 0000000000000000 0000000000000003
> > ffff8300da991000
> > (XEN)    ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282
> > ffff82c4802319a0
> > (XEN)    00000000deadbeef 0000000000000000 ffff83021c0b8081
> > 0000000000000000
> > (XEN)    0000000000000048 ffff8801d7227ec0 ffff8300da991000
> > 0000002000000000
> > (XEN)    ffff82c4801865c1 000000000000e008 0000000000000202
> > ffff82c4802b7e88
> > (XEN)    000000000000e010 0000000000000003 ffff82c4802b7ef8
> > ffff82c4802230d8
> > (XEN)    ffff82c4802b7f18 0000000000000000 0000000000000246
> > ffffffff810013aa
> > (XEN)    0000000000000000 ffffffff810013aa 000000000000e030
> > 0000000000000246
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN)    [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> > (XEN)    [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> > (XEN)    [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> > (XEN)    [<ffff82c480160426>] common_interrupt+0x26/0x30
> > (XEN)    [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> > (XEN)    [<ffff82c480228438>] syscall_enter+0xc8/0x122
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> The attached patch should prevent this panic, allowing for all the debug
> information to be printed to the console.

Thanks, that fixed it. Here is what I see now:

(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 236 from bitmap 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:01 vec:d3 type=IO-APIC-edge    status=00000030 in-flight=0 domain-list=0:  1(-S--),
(XEN)    IRQ:   2 affinity:ff vec:e2 type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:01 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:01 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   5 affinity:01 vec:50 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   6 affinity:01 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:01 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   8 affinity:08 vec:29 type=IO-APIC-edge    status=00000030 in-flight=0 domain-list=0:  8(-S--),
(XEN)    IRQ:   9 affinity:02 vec:25 type=IO-APIC-level   status=00000030 in-flight=0 domain-list=0:  9(-S--),
(XEN)    IRQ:  10 affinity:01 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:01 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound
[ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?

Let me know if you need any more info.
Thanks,
AP


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Ian Campbell-10
In reply to this post by Andrew Cooper
On Fri, 2012-05-04 at 21:11 +0100, Andrew Cooper wrote:

> On 04/05/12 20:48, AP wrote:
> > On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <[hidden email]> wrote:
> >> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
> >>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> >>>> flight 11946 xen-unstable real [real]
> >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking,
> >>>> including tests which could not be run:
> >>>>  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
> >>> Host crash:
> >>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
> >>>
> >>> This is the debug Andrew Cooper added recently to track down the IRQ
> >>> assertion we've been seeing, sadly it looks like the debug code tries to
> >>> call xfree from interrupt context and therefore doesn't produce full
> >>> output :-(
> >> Are we still seeing the issue this debugging was intended to address? We
> >> don't seem to be seeing the host crashes any more. Should the debug code
> >> be patched up as in the following patch, otherwise when we do see it it
> >> doesn't end up printing any useful info.
> >>
> >> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> >> same underlying issue? That report is with Xen 4.0 FWIW.
> > I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> > code added. Can the fix to the debugging code be checked in until the
> > original issue has been fixed?
> >
> > Thanks,
> > AP
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> > 41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> > (XEN) Guest interrupt information:
> > (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ----[ Xen-4.2-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff830214ac0080   rcx: 0000000000000000
> > (XEN) rdx: ffff82c4802d8880   rsi: 0000000000000083   rdi: 0000000000000000
> > (XEN) rbp: ffff82c4802b7c78   rsp: ffff82c4802b7c58   r8:  0000000000000004
> > (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000010
> > (XEN) r12: ffff830214ac0c80   r13: 000000000000000c   r14: ffff830214ac0ca8
> > (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426f0
> > (XEN) cr3: 0000000168971000   cr2: 0000000001095e00
> > (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> > (XEN)    ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
> > (XEN)    ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
> > (XEN)    ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
> > (XEN)    00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
> > (XEN)    ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
> > (XEN)    ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
> > (XEN)    0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
> > (XEN)    ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
> > (XEN)    ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
> > (XEN)    0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
> > (XEN)    ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
> > (XEN)    0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
> > (XEN)    0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
> > (XEN)    ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
> > (XEN)    00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
> > (XEN)    0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
> > (XEN)    ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
> > (XEN)    000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
> > (XEN)    ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
> > (XEN)    0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN)    [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> > (XEN)    [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> > (XEN)    [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> > (XEN)    [<ffff82c480160426>] common_interrupt+0x26/0x30
> > (XEN)    [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> > (XEN)    [<ffff82c480228438>] syscall_enter+0xc8/0x122
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> The attached patch should prevent this panic

This is effectively the same as my patch from
<[hidden email]>. I think "if (ssid)
xfree(...)" is preferable to "if (in_irq()) xfree(...)" but not enough
to prevent me:

Acked-by: Ian Campbell <[hidden email]>

If the debug code is going to stay for 4.2 then IMHO we should also take
this patch to make it actually useful. Otherwise we should just revert
the original debug patch before the release.



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
In reply to this post by AP Xen

> Thanks, that fixed it. Here is what I see now:
>
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap
> 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge  
> status=00000000 mapped, unbound
> (XEN)    IRQ:   1 affinity:01 vec:d3 type=IO-APIC-edge  
> status=00000030 in-flight=0 domain-list=0:  1(-S--),
> (XEN)    IRQ:   2 affinity:ff vec:e2 type=XT-PIC        
> status=00000000 mapped, unbound
> (XEN)    IRQ:   3 affinity:01 vec:40 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:   4 affinity:01 vec:48 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:   5 affinity:01 vec:50 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:   6 affinity:01 vec:58 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:   7 affinity:01 vec:60 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:   8 affinity:08 vec:29 type=IO-APIC-edge  
> status=00000030 in-flight=0 domain-list=0:  8(-S--),
> (XEN)    IRQ:   9 affinity:02 vec:25 type=IO-APIC-level  
> status=00000030 in-flight=0 domain-list=0:  9(-S--),
> (XEN)    IRQ:  10 affinity:01 vec:78 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> (XEN)    IRQ:  11 affinity:01 vec:88 type=IO-APIC-edge  
> status=00000002 mapped, unbound
> [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
>
> Let me know if you need any more info.
> Thanks,
> AP
>

There should be quite a lot more irq information dumped than just that.
Was there any more on the console or had it given up by that point? It
might be worth trying to set synchronous console to get all of that
debug information?

How easy is this error to reproduce for you? I never managed to
reproduce it reliably enough to be able to debug?

If you could provide your Xen boot console log, that would be very useful

~Andrew

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
In reply to this post by Ian Campbell-10
>> The attached patch should prevent this panic
> This is effectively the same as my patch from
> <[hidden email]>. I think "if (ssid)
> xfree(...)" is preferable to "if (in_irq()) xfree(...)" but not enough
> to prevent me:
>
> Acked-by: Ian Campbell <[hidden email]>
>
> If the debug code is going to stay for 4.2 then IMHO we should also take
> this patch to make it actually useful. Otherwise we should just revert
> the original debug patch before the release.
>
>

Yes - I was thinking the same.  I suggest that when xen-4.2-testing.hg
gets branched off unstable, this debugging gets put back to just being
an assert as before.  However, I am quite unsure as to what would happen
with interrupts following that failed assert.

I shall re-do the patch.  I think it is a fairly sensible patch to have
in even after the main debugging has been removed, especially if similar
debugging needs to be done in the future.

~Andrew

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

AP Xen
In reply to this post by Andrew Cooper
On Sat, May 5, 2012 at 4:04 AM, Andrew Cooper <[hidden email]> wrote:
>
>
> > Thanks, that fixed it. Here is what I see now:
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> > 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> > (XEN) Guest interrupt information:
> > (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN)    IRQ:   1 affinity:01 vec:d3 type=IO-APIC-edge
> > status=00000030 in-flight=0 domain-list=0:  1(-S--),
> > (XEN)    IRQ:   2 affinity:ff vec:e2 type=XT-PIC
> > status=00000000 mapped, unbound
> > (XEN)    IRQ:   3 affinity:01 vec:40 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:   4 affinity:01 vec:48 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:   5 affinity:01 vec:50 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:   6 affinity:01 vec:58 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:   7 affinity:01 vec:60 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:   8 affinity:08 vec:29 type=IO-APIC-edge
> > status=00000030 in-flight=0 domain-list=0:  8(-S--),
> > (XEN)    IRQ:   9 affinity:02 vec:25 type=IO-APIC-level
> > status=00000030 in-flight=0 domain-list=0:  9(-S--),
> > (XEN)    IRQ:  10 affinity:01 vec:78 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN)    IRQ:  11 affinity:01 vec:88 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> > elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
> >
> > Let me know if you need any more info.
> > Thanks,
> > AP
> >
>
> There should be quite a lot more irq information dumped than just that.
> Was there any more on the console or had it given up by that point? It

There was nothing more on the console. The system was hung.

> might be worth trying to set synchronous console to get all of that
> debug information?

I was running with sync_console and console_to_ring options.

> How easy is this error to reproduce for you? I never managed to
> reproduce it reliably enough to be able to debug?

I cannot reproduce it easily either. 

> If you could provide your Xen boot console log, that would be very useful

I will send full logs the next time I see the problem.

Thanks,
AP

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

AP Xen
On Sat, May 5, 2012 at 11:41 AM, AP <[hidden email]> wrote:
>
> On Sat, May 5, 2012 at 4:04 AM, Andrew Cooper <[hidden email]> wrote:
> >
> >
> > > Thanks, that fixed it. Here is what I see now:
> > >
> > > (XEN) *** IRQ BUG found ***
> > > (XEN) CPU0 -Testing vector 236 from bitmap
> > > 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> > > (XEN) Guest interrupt information:
> > > (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge
> > > status=00000000 mapped, unbound
> > > (XEN)    IRQ:   1 affinity:01 vec:d3 type=IO-APIC-edge
> > > status=00000030 in-flight=0 domain-list=0:  1(-S--),
> > > (XEN)    IRQ:   2 affinity:ff vec:e2 type=XT-PIC
> > > status=00000000 mapped, unbound
> > > (XEN)    IRQ:   3 affinity:01 vec:40 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:   4 affinity:01 vec:48 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:   5 affinity:01 vec:50 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:   6 affinity:01 vec:58 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:   7 affinity:01 vec:60 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:   8 affinity:08 vec:29 type=IO-APIC-edge
> > > status=00000030 in-flight=0 domain-list=0:  8(-S--),
> > > (XEN)    IRQ:   9 affinity:02 vec:25 type=IO-APIC-level
> > > status=00000030 in-flight=0 domain-list=0:  9(-S--),
> > > (XEN)    IRQ:  10 affinity:01 vec:78 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN)    IRQ:  11 affinity:01 vec:88 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> > > elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
> > >
> > > Let me know if you need any more info.
> > > Thanks,
> > > AP
> > >
> >
> > There should be quite a lot more irq information dumped than just that.
> > Was there any more on the console or had it given up by that point? It
>
> There was nothing more on the console. The system was hung.
>
> > might be worth trying to set synchronous console to get all of that
> > debug information?
>
> I was running with sync_console and console_to_ring options.
>
>
> > How easy is this error to reproduce for you? I never managed to
> > reproduce it reliably enough to be able to debug?
>
> I cannot reproduce it easily either. 
>
>
> > If you could provide your Xen boot console log, that would be very useful
>
> I will send full logs the next time I see the problem.

I have attached the full logs. I had a CentOS 5.6 and a Windows 7 HVM domain running.

Thanks,
AP


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel

irq_bug.log (96K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Jan Beulich-2
In reply to this post by AP Xen
>>> On 05.05.12 at 02:21, AP <[hidden email]> wrote:
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap

236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
in through the 8259A. Something fundamentally fishy must be going
on here, and I would suppose the code in question shouldn't even be
reached for legacy vectors.

Furthermore, calling dump_irqs() from the debugging code with
desc->lock still held makes it impossible to get full output, as that
function wants to lock all initialized IRQ descriptors.

Jan

> 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:01 vec:f0 type=IO-APIC-edge    status=00000000
> mapped, unbound
> (XEN)    IRQ:   1 affinity:01 vec:d3 type=IO-APIC-edge    status=00000030
> in-flight=0 domain-list=0:  1(-S--),
> (XEN)    IRQ:   2 affinity:ff vec:e2 type=XT-PIC          status=00000000
> mapped, unbound
> (XEN)    IRQ:   3 affinity:01 vec:40 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:   4 affinity:01 vec:48 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:   5 affinity:01 vec:50 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:   6 affinity:01 vec:58 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:   7 affinity:01 vec:60 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:   8 affinity:08 vec:29 type=IO-APIC-edge    status=00000030
> in-flight=0 domain-list=0:  8(-S--),
> (XEN)    IRQ:   9 affinity:02 vec:25 type=IO-APIC-level   status=00000030
> in-flight=0 domain-list=0:  9(-S--),
> (XEN)    IRQ:  10 affinity:01 vec:78 type=IO-APIC-edge    status=00000002
> mapped, unbound
> (XEN)    IRQ:  11 affinity:01 vec:88 type=IO-APIC-edge    status=00000002
> mapped, unbound
> [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
>
> Let me know if you need any more info.
> Thanks,
> AP




_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
On 07/05/2012 09:10, Jan Beulich wrote:

>>>> On 05.05.12 at 02:21, AP <[hidden email]> wrote:
>> (XEN) *** IRQ BUG found ***
>> (XEN) CPU0 -Testing vector 236 from bitmap
> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
> in through the 8259A. Something fundamentally fishy must be going
> on here, and I would suppose the code in question shouldn't even be
> reached for legacy vectors.
>
> Furthermore, calling dump_irqs() from the debugging code with
> desc->lock still held makes it impossible to get full output, as that
> function wants to lock all initialized IRQ descriptors.
>
> Jan

Yes - it has been vector 236 on each of the 3 reported failures from AP,
and I believe it was also vector 236 in the one case I managed to
reproduce the issue.

However, once we have set up the IO-APIC, the 8259A should not be used
any more.  The boot dmeg shows that io_ack_method is indeed "old" (which
was going to be my first suggestion), and that EOI Broadcast Suppression
is enabled, which I have already identified as a source of problems for
some customers.  As a 'fix', I provided the ability for
"io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
completely fixed the customer problems - just made it substantially more
rare.

AP: Can you manually invoke the 'i' debug key and provide that - it will
help to see how Xen is setting up the IO-APIC(s) on your system.

~Andrew

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Jan Beulich-2
>>> On 07.05.12 at 13:50, Andrew Cooper <[hidden email]> wrote:
> On 07/05/2012 09:10, Jan Beulich wrote:
>>>>> On 05.05.12 at 02:21, AP <[hidden email]> wrote:
>>> (XEN) *** IRQ BUG found ***
>>> (XEN) CPU0 -Testing vector 236 from bitmap
>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
>> in through the 8259A. Something fundamentally fishy must be going
>> on here, and I would suppose the code in question shouldn't even be
>> reached for legacy vectors.
>>
>> Furthermore, calling dump_irqs() from the debugging code with
>> desc->lock still held makes it impossible to get full output, as that
>> function wants to lock all initialized IRQ descriptors.
>
> Yes - it has been vector 236 on each of the 3 reported failures from AP,
> and I believe it was also vector 236 in the one case I managed to
> reproduce the issue.
>
> However, once we have set up the IO-APIC, the 8259A should not be used
> any more.  The boot dmeg shows that io_ack_method is indeed "old" (which
> was going to be my first suggestion), and that EOI Broadcast Suppression
> is enabled, which I have already identified as a source of problems for
> some customers.  As a 'fix', I provided the ability for
> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
> completely fixed the customer problems - just made it substantially more
> rare.
>
> AP: Can you manually invoke the 'i' debug key and provide that - it will
> help to see how Xen is setting up the IO-APIC(s) on your system.

Seeing the 'z' output might also be helpful, especially to see whether
any of the IO-APICs' RTEs is an ExtINT one.

Further, checking that no 8259A IRQ got (or was left) enabled for
some reason might be useful as well (cached_irq_mask plus the raw
port 0x21 and 0xA1 values).

In any case the debugging code's locking should be fixed.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
On 07/05/2012 14:34, Jan Beulich wrote:

>>>> On 07.05.12 at 13:50, Andrew Cooper <[hidden email]> wrote:
>> On 07/05/2012 09:10, Jan Beulich wrote:
>>>>>> On 05.05.12 at 02:21, AP <[hidden email]> wrote:
>>>> (XEN) *** IRQ BUG found ***
>>>> (XEN) CPU0 -Testing vector 236 from bitmap
>>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
>>> in through the 8259A. Something fundamentally fishy must be going
>>> on here, and I would suppose the code in question shouldn't even be
>>> reached for legacy vectors.
>>>
>>> Furthermore, calling dump_irqs() from the debugging code with
>>> desc->lock still held makes it impossible to get full output, as that
>>> function wants to lock all initialized IRQ descriptors.
>> Yes - it has been vector 236 on each of the 3 reported failures from AP,
>> and I believe it was also vector 236 in the one case I managed to
>> reproduce the issue.
>>
>> However, once we have set up the IO-APIC, the 8259A should not be used
>> any more.  The boot dmeg shows that io_ack_method is indeed "old" (which
>> was going to be my first suggestion), and that EOI Broadcast Suppression
>> is enabled, which I have already identified as a source of problems for
>> some customers.  As a 'fix', I provided the ability for
>> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
>> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
>> completely fixed the customer problems - just made it substantially more
>> rare.
>>
>> AP: Can you manually invoke the 'i' debug key and provide that - it will
>> help to see how Xen is setting up the IO-APIC(s) on your system.
> Seeing the 'z' output might also be helpful, especially to see whether
> any of the IO-APICs' RTEs is an ExtINT one.
>
> Further, checking that no 8259A IRQ got (or was left) enabled for
> some reason might be useful as well (cached_irq_mask plus the raw
> port 0x21 and 0xA1 values).
>
> In any case the debugging code's locking should be fixed.
>
> Jan
>

It appears we have two functions to dump the IO-APIC state:
__print_IO_APIC() which gets called on boot and from 'z', and
dump_ioapic_irq_info() which gets called from the end of 'i'.  These
should probably be consolidated somehow.

As for the debugging, perhaps change the call to dump_irqs() with a call
to dump_ioapic_irq_info() instead.

Given that the legacy vectors cant migrate, is it wise including them in
the loop in irq_move_cleanup_interrupt()?  In fact, is it wise including
any vector above LAST_DYNAMIC_VECTOR?

~Andrew

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Jan Beulich-2
>>> On 07.05.12 at 16:41, Andrew Cooper <[hidden email]> wrote:
> Given that the legacy vectors cant migrate, is it wise including them in
> the loop in irq_move_cleanup_interrupt()?  In fact, is it wise including
> any vector above LAST_DYNAMIC_VECTOR?

Likely not, but then again this is the final piece of moving an interrupt,
so there must have been something earlier that incorrectly initiated a
move. In other words, rather than fixing the loop here, we should
make sure execution can't even make it there for legacy vectors.

And of course this is irrespective of the fact that no legacy interrupt
should occur in the first place, unless this is a very strange system.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Jan Beulich-2
In reply to this post by Andrew Cooper
>>> On 07.05.12 at 16:41, Andrew Cooper <[hidden email]> wrote:
> It appears we have two functions to dump the IO-APIC state:
> __print_IO_APIC() which gets called on boot and from 'z', and
> dump_ioapic_irq_info() which gets called from the end of 'i'.  These
> should probably be consolidated somehow.

Rather not - 'z' provides information on the IO-APIC that isn't
directly related to specific interrupts, while 'i' (when it comes to
the IO-APIC) is exclusively interested in the RTEs. Unless
dump_ioapic_irq_info() is _fully_ redundant with 'z' (didn't check
in detail yet), in which case I'd vote for removing this function.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [xen-unstable test] 11946: regressions - FAIL

Andrew Cooper
In reply to this post by Jan Beulich-2
On 07/05/2012 15:50, Jan Beulich wrote:

>>>> On 07.05.12 at 16:41, Andrew Cooper <[hidden email]> wrote:
>> Given that the legacy vectors cant migrate, is it wise including them in
>> the loop in irq_move_cleanup_interrupt()?  In fact, is it wise including
>> any vector above LAST_DYNAMIC_VECTOR?
> Likely not, but then again this is the final piece of moving an interrupt,
> so there must have been something earlier that incorrectly initiated a
> move. In other words, rather than fixing the loop here, we should
> make sure execution can't even make it there for legacy vectors.
>
> And of course this is irrespective of the fact that no legacy interrupt
> should occur in the first place, unless this is a very strange system.
>
> Jan
>

The only way to get to this point is if desc->arch.move_cleanup_count is
non 0, in which case, one of these functions:

hpet_msi_ack (hpet.c)
ack_edge_ioapic_irq (io_apci.c)
mask_and_ack_level_ioapic_irq (io_apic.c)
ack_nonmaskable_msi_irq (msi.c)
iommu_msi_mask (iommu_init.c)
dma_msi_mask (iommu.c)

has called irq_complete_move, after something has called
__assign_irq_vector() to move the irq to another CPU.

I would say something very fishy is going on - no desc used by any of
those functions should have a vector from the legacy region.

As for the loop, it is probably quite sensible to reduce that down to
LAST_DYNAMIC_VECTOR.  Leaving it at NR_VECTORS is just 32 wasted
iterations of the loop in interrupt context.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
12