Network and SATA Instability on Xen 4.6/4.8

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Network and SATA Instability on Xen 4.6/4.8

Kevin Stange
Hi,

I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
on CentOS 6 and have been trying to finally move my environment up to
Xen 4.6 or 4.8 using CentOS 7.  Since I've built out my test server with
Xen 4.6, I've been having issues where the Intel NICs begin flapping
repeatedly and the SATA disk interfaces go down and will not come back
up until I reboot the server.  Even sending the bus rescan command
doesn't bring the drives back.  The issue seems to trigger based on
activity, so during something like an mdraid resync is more likely to
cause the issue, but it's not reproducible in a consistent amount of
time, which makes it hard to tell if a particular change has definitely
fixed it.

This is reminiscent of a problem I had been experiencing while running
kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
something bad with PCIe management in kernel 3.18 and thought nothing
more of it until now.

The initial test environment where the issue occurred was kernel 4.9.58
and Xen 4.6.6-7 (with security patches from CentOS).  I then tried
upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
improvements.

I tried pcie_aspm=off on the kernel line, which has helped in the past
with similar issues, but that didn't help here.

I tried booting without Xen (just kernel 4.9.63) and it seems like that
made the issue go away, which lead me to believe the issue only happens
with hardware accessed from dom0.  I dug through Xen command line
options and tried booting with msi=off and that now seems to have
resulted in the problem going away, or at least, the system hasn't
exhibited the issue since last week.  Previously, the issue would tend
to manifest after less than 24 hours.

My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.

Disk issues begin with a kernel message like this followed by continuous
ATA command failures:

ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen

NIC issues begin with a message like:

igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly

NICs do recover almost immediately but continue to flap periodically
until reboot.

I don't know if this is a bug in Xen or something else at play, but I
could really use some help figuring out what's going on, why msi=off
seems to fix it, and if there are any better ways to resolve this.

Thanks.

--
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
[hidden email] | www.steadfast.net

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

Nathan March
I've seen this same Intel behavior on my systems and have had no luck identifying a cause. It happens on my bonded tagged x540 nics, but not on my similarly configured 1g Intel nics. I'm currently testing 4.8 in the hopes it doesn't exhibit this behavior.

I'm on a mix of supermicro and Dell hardware and both have the issue. This started happening after a major dom0 kernel upgrade, but I dont have the version details handy.

I don't see the same sata instability, but I'm not using local storage (nfs via those 10g links).

Cheers,
Nathan

On December 8, 2017 1:17:30 PM PST, Kevin Stange <[hidden email]> wrote:
Hi,

I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
on CentOS 6 and have been trying to finally move my environment up to
Xen 4.6 or 4.8 using CentOS 7. Since I've built out my test server with
Xen 4.6, I've been having issues where the Intel NICs begin flapping
repeatedly and the SATA disk interfaces go down and will not come back
up until I reboot the server. Even sending the bus rescan command
doesn't bring the drives back. The issue seems to trigger based on
activity, so during something like an mdraid resync is more likely to
cause the issue, but it's not reproducible in a consistent amount of
time, which makes it hard to tell if a particular change has definitely
fixed it.

This is reminiscent of a problem I had been experiencing while running
kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
something bad with PCIe management in kernel 3.18 and thought nothing
more of it until now.

The initial test environment where the issue occurred was kernel 4.9.58
and Xen 4.6.6-7 (with security patches from CentOS). I then tried
upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
improvements.

I tried pcie_aspm=off on the kernel line, which has helped in the past
with similar issues, but that didn't help here.

I tried booting without Xen (just kernel 4.9.63) and it seems like that
made the issue go away, which lead me to believe the issue only happens
with hardware accessed from dom0. I dug through Xen command line
options and tried booting with msi=off and that now seems to have
resulted in the problem going away, or at least, the system hasn't
exhibited the issue since last week. Previously, the issue would tend
to manifest after less than 24 hours.

My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.

Disk issues begin with a kernel message like this followed by continuous
ATA command failures:

ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen

NIC issues begin with a message like:

igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly

NICs do recover almost immediately but continue to flap periodically
until reboot.

I don't know if this is a bug in Xen or something else at play, but I
could really use some help figuring out what's going on, why msi=off
seems to fix it, and if there are any better ways to resolve this.

Thanks.

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

Kevin Stange
In reply to this post by Kevin Stange
On 12/08/2017 04:50 PM, Sarah Newman wrote:

> On 12/08/2017 01:17 PM, Kevin Stange wrote:
>>
>> I don't know if this is a bug in Xen or something else at play, but I
>> could really use some help figuring out what's going on, why msi=off
>> seems to fix it, and if there are any better ways to resolve this.
>>
>> Thanks.
>>
>
> Do you mind sharing your exact lspci output and xen and linux command lines?

Not at all.

# xl info | grep command
xen_commandline        : placeholder dom0_mem=1535M cpuinfo
com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all
dom0_max_vcpus=2 msi=off

# cat /proc/cmdline
placeholder root=UUID=58788945-d0e9-4a76-87ce-4308e86b8fd6 ro nomodeset
crashkernel=auto rd.md.uuid=88fbb495:49586ca8:0d13f6b4:86500d08
rd.md.uuid=c00da35e:8bf4e8f6:b90591c9:dc98206d
rd.md.uuid=c996bde2:6b217770:e8822aa5:42722e67 rhgb quiet pcie_aspm=off
console=hvc0 earlyprintk=xen nomodeset

# lspci -v
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: fast devsel
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot-), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>

00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 1 (rev 22) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: c0000000-c01fffff
        Capabilities: [40] Subsystem: Super Micro Computer Inc Device 0001
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot+), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 3 (rev 22) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fa300000-fadfffff
        Capabilities: [40] Subsystem: Super Micro Computer Inc Device 0001
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot+), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root
Port 5 (rev 22) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        Capabilities: [40] Subsystem: Super Micro Computer Inc Device 0001
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot+), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express
Root Port 7 (rev 22) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Capabilities: [40] Subsystem: Super Micro Computer Inc Device 0001
        Capabilities: [60] MSI: Enable- Count=1/2 Maskable+ 64bit-
        Capabilities: [90] Express Root Port (Slot+), MSI 00
        Capabilities: [e0] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Access Control Services
        Capabilities: [160] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

00:0d.0 Host bridge: Intel Corporation Device 343a (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>
        Capabilities: [800] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>

00:0d.1 Host bridge: Intel Corporation Device 343b (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>
        Capabilities: [800] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>

00:0d.2 Host bridge: Intel Corporation Device 343c (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0d.3 Host bridge: Intel Corporation Device 343d (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>

00:0d.4 Host bridge: Intel Corporation 7500/5520/5500/X58 Physical Layer
Port 0 (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0d.5 Host bridge: Intel Corporation 7500/5520/5500 Physical Layer
Port 1 (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0d.6 Host bridge: Intel Corporation Device 341a (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0e.0 Host bridge: Intel Corporation Device 341c (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?>

00:0e.1 Host bridge: Intel Corporation Device 341d (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0e.2 Host bridge: Intel Corporation Device 341e (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [60] #00 [0000]

00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 22)
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00

00:13.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub I/OxAPIC
Interrupt Controller (rev 22) (prog-if 20 [IO(X)-APIC])
        Flags: bus master, fast devsel, latency 0
        Memory at fec8a000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [6c] Power Management version 3

00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System
Management Registers (rev 22) (prog-if 00 [8259])
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
        Kernel driver in use: i7core_edac
        Kernel modules: i7core_edac

00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and
Scratch Pad Registers (rev 22) (prog-if 00 [8259])
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00

00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status
and RAS Registers (rev 22) (prog-if 00 [8259])
        Flags: fast devsel
        Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00

00:14.3 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Throttle
Registers (rev 22) (prog-if 00 [8259])
        Flags: fast devsel
        Kernel modules: i5500_temp

00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at faef4000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 44
        Memory at faef0000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 45
        Memory at faeec000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 46
        Memory at faee8000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at faee4000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 44
        Memory at faee0000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 45
        Memory at faedc000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset
QuickData Technology Device (rev 22)
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 46
        Memory at faed8000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] MSI-X: Enable- Count=1 Masked-
        Capabilities: [90] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [e0] Power Management version 3
        Kernel driver in use: ioatdma
        Kernel modules: ioatdma

00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #4 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 16
        I/O ports at cea0 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #5 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 21
        I/O ports at ce80 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #6 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 19
        I/O ports at ce20 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #2 (prog-if 20 [EHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 18
        Memory at faef8000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
        Kernel driver in use: ehci-pci

00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #1 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 23
        I/O ports at cf20 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #2 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 19
        I/O ports at cf00 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB
UHCI Controller #3 (prog-if 00 [UHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 18
        I/O ports at cec0 [size=32]
        Capabilities: [50] PCI Advanced Features
        Kernel driver in use: uhci_hcd

00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2
EHCI Controller #1 (prog-if 20 [EHCI])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0, IRQ 23
        Memory at faefa000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
        Kernel driver in use: ehci-pci

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) (prog-if
01 [Subtractive decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        Memory behind bridge: f9700000-f9ffffff
        Prefetchable memory behind bridge: 00000000f8000000-00000000f8ffffff
        Capabilities: [50] Subsystem: Super Micro Computer Inc Device 0001

00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface
Controller
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>
        Kernel driver in use: lpc_ich
        Kernel modules: lpc_ich

00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA
AHCI Controller (prog-if 01 [AHCI 1.0])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
        I/O ports at cff0 [size=8]
        I/O ports at cfac [size=4]
        I/O ports at cfe0 [size=8]
        I/O ports at cfa8 [size=4]
        I/O ports at cf80 [size=32]
        Memory at faefe000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable- Count=1/16 Maskable- 64bit-
        Capabilities: [70] Power Management version 3
        Capabilities: [a8] SATA HBA v1.0
        Capabilities: [b0] PCI Advanced Features
        Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: medium devsel, IRQ 18
        Memory at faefc000 (64-bit, non-prefetchable) [size=256]
        I/O ports at 0400 [size=32]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801

01:03.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA
G200eW WPCM450 (rev 0a) (prog-if 00 [VGA controller])
        Subsystem: Super Micro Computer Inc Device 0001
        Flags: bus master, medium devsel, latency 64, IRQ 10
        Memory at f8000000 (32-bit, prefetchable) [size=16M]
        Memory at f97fc000 (32-bit, non-prefetchable) [size=16K]
        Memory at f9800000 (32-bit, non-prefetchable) [size=8M]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 1
        Kernel modules: mgag200

04:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network
Connection (rev 02)
        Subsystem: Super Micro Computer Inc Device 10a7
        Flags: bus master, fast devsel, latency 0, IRQ 24
        Memory at fa9e0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fac00000 (32-bit, non-prefetchable) [size=2M]
        I/O ports at df80 [size=32]
        Memory at fa9dc000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at faa00000 [disabled] [size=2M]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [60] MSI-X: Enable- Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-25-90-ff-ff-35-30-8c
        Kernel driver in use: igb
        Kernel modules: igb

04:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network
Connection (rev 02)
        Subsystem: Super Micro Computer Inc Device 10a7
        Flags: bus master, fast devsel, latency 0, IRQ 34
        Memory at fa3e0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fa600000 (32-bit, non-prefetchable) [size=2M]
        I/O ports at df40 [size=32]
        Memory at fa3dc000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at fa400000 [disabled] [size=2M]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [60] MSI-X: Enable- Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-25-90-ff-ff-35-30-8c
        Kernel driver in use: igb
        Kernel modules: igb

05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
        Subsystem: Super Micro Computer Inc Device 10c9
        Flags: bus master, fast devsel, latency 0, IRQ 28
        Memory at c0000000 (32-bit, non-prefetchable) [size=128K]
        Memory at c0020000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at ef80 [size=32]
        Memory at c00c0000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at c0040000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable- Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-25-90-ff-ff-2b-d9-2c
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: igb
        Kernel modules: igb

05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
        Subsystem: Super Micro Computer Inc Device 10c9
        Flags: bus master, fast devsel, latency 0, IRQ 40
        Memory at c0060000 (32-bit, non-prefetchable) [size=128K]
        Memory at c0080000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at ef40 [size=32]
        Memory at c0104000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at c00a0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable- Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-25-90-ff-ff-2b-d9-2c
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: igb
        Kernel modules: igb

fe:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture Generic Non-core Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture System Address Decoder (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0
(rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
0 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
1 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1
(rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Target Address Decoder (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller RAS Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Test Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

fe:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture Generic Non-core Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath
Architecture System Address Decoder (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0
(rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
0 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link
1 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1
(rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Target Address Decoder (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller RAS Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Test Registers (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 0 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 1 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Address (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Rank (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

ff:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated
Memory Controller Channel 2 Thermal Control (rev 02)
        Subsystem: Intel Corporation Device 8086
        Flags: bus master, fast devsel, latency 0

--
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
[hidden email] | www.steadfast.net

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

Sarah Newman
On 12/08/2017 03:28 PM, Kevin Stange wrote:

> On 12/08/2017 04:50 PM, Sarah Newman wrote:
>> On 12/08/2017 01:17 PM, Kevin Stange wrote:
>>>
>>> I don't know if this is a bug in Xen or something else at play, but I
>>> could really use some help figuring out what's going on, why msi=off
>>> seems to fix it, and if there are any better ways to resolve this.
>>>
>>> Thanks.
>>>
>>
>> Do you mind sharing your exact lspci output and xen and linux command lines?
>
> Not at all.
>
> # xl info | grep command
> xen_commandline        : placeholder dom0_mem=1535M cpuinfo
> com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all
> dom0_max_vcpus=2 msi=off

1535M is kind of an unusual amount of ram, don't you think? It shouldn't matter but have you tried something a bit more round, maybe at least on a
4MiB boundary?

You might want to check if there were any changes to the typical memory map in between 4.4 and 4.6 and review at your BIOS settings. I vaguely recall
BIOS settings related to limiting memory ranges and PCI. Maybe the old version of Xen was accidentally enforcing that.

>
> # cat /proc/cmdline
> placeholder root=UUID=58788945-d0e9-4a76-87ce-4308e86b8fd6 ro nomodeset
> crashkernel=auto rd.md.uuid=88fbb495:49586ca8:0d13f6b4:86500d08
> rd.md.uuid=c00da35e:8bf4e8f6:b90591c9:dc98206d
> rd.md.uuid=c996bde2:6b217770:e8822aa5:42722e67 rhgb quiet pcie_aspm=off
> console=hvc0 earlyprintk=xen nomodeset

We run with edd=off but no modifiers to pci.

--Sarah

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

Kevin Stange
On 12/08/2017 05:57 PM, Sarah Newman wrote:

> On 12/08/2017 03:28 PM, Kevin Stange wrote:
>> On 12/08/2017 04:50 PM, Sarah Newman wrote:
>>> On 12/08/2017 01:17 PM, Kevin Stange wrote:
>>>>
>>>> I don't know if this is a bug in Xen or something else at play, but I
>>>> could really use some help figuring out what's going on, why msi=off
>>>> seems to fix it, and if there are any better ways to resolve this.
>>>>
>>>> Thanks.
>>>>
>>>
>>> Do you mind sharing your exact lspci output and xen and linux command lines?
>>
>> Not at all.
>>
>> # xl info | grep command
>> xen_commandline        : placeholder dom0_mem=1535M cpuinfo
>> com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all
>> dom0_max_vcpus=2 msi=off
>
> 1535M is kind of an unusual amount of ram, don't you think? It shouldn't matter but have you tried something a bit more round, maybe at least on a
> 4MiB boundary?

I'll try that.  This specific number is due to some math that the
management software install script does at setup.  Sometimes the
reported RAM is not quite 100%.  In this case it should have come out to
1536 MB (1.5 GB), but the reported total memory is 73718 MB (10 MB short
of 72 GB) and basically just truncated the result to an int.

> You might want to check if there were any changes to the typical memory map in between 4.4 and 4.6 and review at your BIOS settings. I vaguely recall
> BIOS settings related to limiting memory ranges and PCI. Maybe the old version of Xen was accidentally enforcing that.

I don't really know where I'd need to look for the aforementioned
changes or how to measure them against my BIOS options.  Can you point
me at any more detailed resources for this?

>> # cat /proc/cmdline
>> placeholder root=UUID=58788945-d0e9-4a76-87ce-4308e86b8fd6 ro nomodeset
>> crashkernel=auto rd.md.uuid=88fbb495:49586ca8:0d13f6b4:86500d08
>> rd.md.uuid=c00da35e:8bf4e8f6:b90591c9:dc98206d
>> rd.md.uuid=c996bde2:6b217770:e8822aa5:42722e67 rhgb quiet pcie_aspm=off
>> console=hvc0 earlyprintk=xen nomodeset
>
> We run with edd=off but no modifiers to pci.

pcie_aspm=off was a shot in the dark, but didn't do anything to help.
We had some X8SIE-F motherboards which had a problem under the CentOS 6
kernel with NIC behavior and this option fixed that, so it's just one of
the things I throw at the wall when I'm having PCI problems, especially
if NIC related.  I don't think it's doing anything on this server.  I
wouldn't mind if ASPM was just not a thing on servers, though.

pci=nomsi was something I thought I might try next as well, but msi=off
on the Xen line seems to mitigate the problem so far.

--
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
[hidden email] | www.steadfast.net

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

J. Roeleveld
In reply to this post by Kevin Stange
On Friday, December 8, 2017 10:17:30 PM CET Kevin Stange wrote:

> Hi,
>
> I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
> on CentOS 6 and have been trying to finally move my environment up to
> Xen 4.6 or 4.8 using CentOS 7.  Since I've built out my test server with
> Xen 4.6, I've been having issues where the Intel NICs begin flapping
> repeatedly and the SATA disk interfaces go down and will not come back
> up until I reboot the server.  Even sending the bus rescan command
> doesn't bring the drives back.  The issue seems to trigger based on
> activity, so during something like an mdraid resync is more likely to
> cause the issue, but it's not reproducible in a consistent amount of
> time, which makes it hard to tell if a particular change has definitely
> fixed it.
>
> This is reminiscent of a problem I had been experiencing while running
> kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
> upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
> something bad with PCIe management in kernel 3.18 and thought nothing
> more of it until now.
>
> The initial test environment where the issue occurred was kernel 4.9.58
> and Xen 4.6.6-7 (with security patches from CentOS).  I then tried
> upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
> improvements.
>
> I tried pcie_aspm=off on the kernel line, which has helped in the past
> with similar issues, but that didn't help here.
>
> I tried booting without Xen (just kernel 4.9.63) and it seems like that
> made the issue go away, which lead me to believe the issue only happens
> with hardware accessed from dom0.  I dug through Xen command line
> options and tried booting with msi=off and that now seems to have
> resulted in the problem going away, or at least, the system hasn't
> exhibited the issue since last week.  Previously, the issue would tend
> to manifest after less than 24 hours.
>
> My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.
>
> Disk issues begin with a kernel message like this followed by continuous
> ATA command failures:
>
> ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen
>
> NIC issues begin with a message like:
>
> igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly
>
> NICs do recover almost immediately but continue to flap periodically
> until reboot.
>
> I don't know if this is a bug in Xen or something else at play, but I
> could really use some help figuring out what's going on, why msi=off
> seems to fix it, and if there are any better ways to resolve this.
>
> Thanks.

I have not seen anything like this on any server I am currently using and it's
a mix of Tyan boards and Supermicro. (Switching away from Tyan for unrelated
reasons)

# xl info | grep command
xen_commandline        : dom0_mem=24GB,max:24GB console=vga dom0_max_vcpus=4
dom0_vcpus_pin gnttab_max_frames=256

# cat /proc/cmdline
root=zhost/host/root by=id elevator=noop logo.nologo triggers=zfs quiet
refresh softlevel=prexen

FYI: I use ZFS and some of the VMs are using 2 SSDs that are maintained by the
host.
The majority of the storage is handled by a storage domain which has the HBA
assigned to it directly.

I have 4 10Gbe ports that are bonded and VLAN tagged to provide connectivity
to other hosts.

Mainboard:
Supermicro X10DRI-T4i

The hardware is occasionally stressed both on the SSDs (connected via SATA)
and the network.

I am running a 4.9.49 kernel with Xen 4.8.2 and ZoL 0.7.3.

--
Joost


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

George Dunlap
In reply to this post by Kevin Stange
On Fri, Dec 8, 2017 at 9:17 PM, Kevin Stange <[hidden email]> wrote:

> Hi,
>
> I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
> on CentOS 6 and have been trying to finally move my environment up to
> Xen 4.6 or 4.8 using CentOS 7.  Since I've built out my test server with
> Xen 4.6, I've been having issues where the Intel NICs begin flapping
> repeatedly and the SATA disk interfaces go down and will not come back
> up until I reboot the server.  Even sending the bus rescan command
> doesn't bring the drives back.  The issue seems to trigger based on
> activity, so during something like an mdraid resync is more likely to
> cause the issue, but it's not reproducible in a consistent amount of
> time, which makes it hard to tell if a particular change has definitely
> fixed it.
>
> This is reminiscent of a problem I had been experiencing while running
> kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
> upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
> something bad with PCIe management in kernel 3.18 and thought nothing
> more of it until now.
>
> The initial test environment where the issue occurred was kernel 4.9.58
> and Xen 4.6.6-7 (with security patches from CentOS).  I then tried
> upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
> improvements.
>
> I tried pcie_aspm=off on the kernel line, which has helped in the past
> with similar issues, but that didn't help here.
>
> I tried booting without Xen (just kernel 4.9.63) and it seems like that
> made the issue go away, which lead me to believe the issue only happens
> with hardware accessed from dom0.  I dug through Xen command line
> options and tried booting with msi=off and that now seems to have
> resulted in the problem going away, or at least, the system hasn't
> exhibited the issue since last week.  Previously, the issue would tend
> to manifest after less than 24 hours.
>
> My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.
>
> Disk issues begin with a kernel message like this followed by continuous
> ATA command failures:
>
> ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen
>
> NIC issues begin with a message like:
>
> igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly
>
> NICs do recover almost immediately but continue to flap periodically
> until reboot.
>
> I don't know if this is a bug in Xen or something else at play, but I
> could really use some help figuring out what's going on, why msi=off
> seems to fix it, and if there are any better ways to resolve this.

Jan / Andy,

Any idea why Kevin might be seeing stability issues under 4.6 / 4.8
that is solved by adding 'msi=off'?

 -George

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

Håkon Alstadheim
In reply to this post by Kevin Stange


Den 08. des. 2017 22:17, skrev Kevin Stange:
> Hi,
>
> I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
> on CentOS 6 and have been trying to finally move my environment up to
> Xen 4.6 or 4.8 using CentOS 7.

I started having a hell of a time with a debian stretch domu a couple of
weeks back. Hang in various processes during drive activity (not
terribly heavy). Process unkillable, even with kill -9. Looking at the
stack under /proc/<PID> made me no wiser. domu storage is backed by lvm
over dm-raid. Xen-4.10, linux-image-4.9.0-4-amd64.

I updated to linux 4.13 from backports, and the problems are gone.

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

How do You Set-up a Wireless connection on Xen?

rayj
In reply to this post by J. Roeleveld
I would like to understand how others are configuring Xen for wireless connections so I may better understand how to set it up.

Ray

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: How do You Set-up a Wireless connection on Xen?

Michel D'HOOGE
Hi Ray,

2017-12-21 2:56 GMT+01:00 rayj <[hidden email]>:
> I would like to understand how others are configuring Xen for wireless
> connections so I may better understand how to set it up.

After digging and reading a couple of pages, I gave up trying to do that! ;-)
But mostly because I can live with a wired solution that fits my needs...

The problem is not directly related to virtualization but really to
sharing wifi.
https://wiki.linuxfoundation.org/networking/bridge#it-doesn-t-work-with-my-wireless-card

First solution would be to use NAT.

The debian wiki shows another solution that use ebtable.
https://wiki.debian.org/BridgeNetworkConnections#Bridging_with_a_wireless_NIC

In all cases, the remote computer or the domU sees/uses a wired connection.
AFAIK qemu doesn't emulate a wireless card, so you can't make your
domU authenticate to the AP with its own MAC address.
So having the easy wired solution of having several different MAC
addresses going through your single wire is not an option.

But I'd be also interested to have some feedback from others :-D

HTH
Michel

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Network and SATA Instability on Xen 4.6/4.8

David Vincze
In reply to this post by George Dunlap
Hi,

we've been experiencing the same errors on very similar hardware.
Just as Kevin described: all SATA goes down and NICs start to flap in Dom0, the only way to fix is to reboot.

Unlike Kevin, i was unable to observe any patterns in system activity which may trigger these, it seems completely random.
Sometimes it happens under high load, sometimes it happens when load is really low (i/o and also cpu), sometimes twice a week, sometimes no errors for months...

We have three identical machines (Supermicro X8DTT-HIBQF+ boards with X5670 CPUs), and all three behaves likes this.
I think they have the same chipset as Kevin's board.

xen_version            : 4.6.1
xen_commandline        : dom0_mem=1024M loglvl=all guest_loglvl=all
cc_compiler            : gcc (Debian 4.7.2-5) 4.7.2

Dom0 kernel version is 3.14.61.

Also tried with Xen 4.8, and newer kernels for Dom0 (4.4.2), did not help.

I've tried modifying power management related settings in the BIOS setup, buy these had no effect on this issue.
ASPM was implicitly disabled by the kernel from the beginning:
[    8.601606] acpi PNP0A08:00: _OSC failed (AE_NOT_FOUND); disabling ASPM

Now i've disabled MSI in Dom0 kernel with pci=nomsi, and also explicitly disabled aspm with pcie_aspm=off.
Based on /proc/interrupts, lspci and dmesg MSI/MSI-X is not being used anymore.
We will see whether it gives a cure or not.
But as the the errors emerge randomly, it doesn't really proove anyhing if i don't see these errors again with MSI disabled...?

Any suggestions?

Thank you!

-David

On Wed, Dec 20, 2017 at 05:40:16PM +0000, George Dunlap wrote:

> On Fri, Dec 8, 2017 at 9:17 PM, Kevin Stange <[hidden email]> wrote:
> > Hi,
> >
> > I've been running Xen 4.4 stably for some time under kernel 4.9 in dom0
> > on CentOS 6 and have been trying to finally move my environment up to
> > Xen 4.6 or 4.8 using CentOS 7.  Since I've built out my test server with
> > Xen 4.6, I've been having issues where the Intel NICs begin flapping
> > repeatedly and the SATA disk interfaces go down and will not come back
> > up until I reboot the server.  Even sending the bus rescan command
> > doesn't bring the drives back.  The issue seems to trigger based on
> > activity, so during something like an mdraid resync is more likely to
> > cause the issue, but it's not reproducible in a consistent amount of
> > time, which makes it hard to tell if a particular change has definitely
> > fixed it.
> >
> > This is reminiscent of a problem I had been experiencing while running
> > kernel 3.18 and Xen 4.4 on CentOS 6, but the problem resolved itself
> > upon upgrading to kernel 4.4 and later 4.9, so I chalked that up to
> > something bad with PCIe management in kernel 3.18 and thought nothing
> > more of it until now.
> >
> > The initial test environment where the issue occurred was kernel 4.9.58
> > and Xen 4.6.6-7 (with security patches from CentOS).  I then tried
> > upgrading to kernel 4.9.63 and Xen 4.8.2-5, which didn't result in any
> > improvements.
> >
> > I tried pcie_aspm=off on the kernel line, which has helped in the past
> > with similar issues, but that didn't help here.
> >
> > I tried booting without Xen (just kernel 4.9.63) and it seems like that
> > made the issue go away, which lead me to believe the issue only happens
> > with hardware accessed from dom0.  I dug through Xen command line
> > options and tried booting with msi=off and that now seems to have
> > resulted in the problem going away, or at least, the system hasn't
> > exhibited the issue since last week.  Previously, the issue would tend
> > to manifest after less than 24 hours.
> >
> > My hardware is Supermicro X8DT3-F with Dual Intel Xeon E5620 CPUs.
> >
> > Disk issues begin with a kernel message like this followed by continuous
> > ATA command failures:
> >
> > ata2.00: exception emask 0x0 sact 0x7c01ffff serr 0x50000 action 0x6 frozen
> >
> > NIC issues begin with a message like:
> >
> > igb 0000:04:00.1: enp4s0f1: Reset adapter unexpectedly
> >
> > NICs do recover almost immediately but continue to flap periodically
> > until reboot.
> >
> > I don't know if this is a bug in Xen or something else at play, but I
> > could really use some help figuring out what's going on, why msi=off
> > seems to fix it, and if there are any better ways to resolve this.
>
> Jan / Andy,
>
> Any idea why Kevin might be seeing stability issues under 4.6 / 4.8
> that is solved by adding 'msi=off'?
>
>  -George
>
> _______________________________________________
> Xen-users mailing list
> [hidden email]
> https://lists.xenproject.org/mailman/listinfo/xen-users

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users