Lockup/High ksoftirqd when rate-limiting is enabled

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Lockup/High ksoftirqd when rate-limiting is enabled

Jean-Louis Dupond
Hi,

We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64.
Now recently we're having issues with rate-limiting enabled.

When we enable rate limiting in Xen, and then do alot of outbound
traffic on the domU, we notice a high ksoftirqd load.
But in some cases the system locks up completely.

This gives the following stacktrace:
Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#0
stuck for 22s! [swapper/0:0]
Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw
sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4
nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables
xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4
dm_cache_mq dm_cache dm_bio_prison
Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#1
stuck for 22s! [swapper/1:0]
Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw
sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4
nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables
xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4
dm_cache_mq dm_cache dm_bio_prison dm_persistent_data libcrc32c ext2
mbcache arptable_filter arp_tables xt_CT nf_conntrack iptable_raw
iptable_filter ip_tables nbd(O) xen_gntalloc rdma_ucm(O) ib_ucm(O)
rdma_cm(O) iw_cm(O) configfs ib_ipoib(O) ib_cm(O) ib_uverbs(O)
ib_umad(O) mlx5_ib(O) mlx5_core(O) mlx4_en(O) vxlan udp_tunnel
ip6_udp_tunnel mlx4_ib(O) ib_sa(O) ib_mad(O) ib_core(O) ib_addr(O)
ib_netlink(O) mlx4_core(O) mlx_compat(O) xen_acpi_processor blktap
xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd
dm_snapshot dm_bufio dm_mirror_sync(O) dm_mirror dm_region_hash dm_log
nfsv3 nfs_acl nfs fscache lockd sunrpc grace bridge ipv6 stp llc sg
iTCO_wdt iTCO_vendor_support sd_mod mxm_wmi dcdbas pcspkr dm_mod ixgbe
mdio sb_edac edac_core mgag200
Jun 4 11:07:56 xensrv1 kernel: ttm drm_kms_helper shpchp lpc_ich
8250_fintek ipmi_devintf ipmi_si ipmi_msghandler mei_me mei ahci libahci
igb dca ptp pps_core megaraid_sas wmi acpi_power_meter hwmon xen_pciback
cramfs
Jun 4 11:07:56 xensrv1 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G
O 3.18.44-20.el6.x86_64 #1
Jun 4 11:07:56 xensrv1 kernel: Hardware name: Dell Inc. PowerEdge
R730xd/xxxx, BIOS 2.1.6 05/19/2016
Jun 4 11:07:56 xensrv1 kernel: task: ffff880275f6e010 ti:
ffff880275fd0000 task.ti: ffff880275fd0000
Jun 4 11:07:56 xensrv1 kernel: RIP: e030:[<ffffffff8100bf38>]
[<ffffffff8100bf38>] xen_restore_fl_direct+0x18/0x1b
Jun 4 11:07:56 xensrv1 kernel: RSP: e02b:ffff88027aa23e30 EFLAGS:
00000297
Jun 4 11:07:56 xensrv1 kernel: RAX: 0000000000000008 RBX:
0000000000000200 RCX: 0000000000000003
Jun 4 11:07:56 xensrv1 kernel: RDX: ffff88027aa33f50 RSI:
ffffc90013f88000 RDI: 0000000000000200
Jun 4 11:07:56 xensrv1 kernel: RBP: ffff88027aa23e48 R08:
ffff88027aa33340 R09: ffff8802758d8a00
Jun 4 11:07:56 xensrv1 kernel: R10: ffff880283400c48 R11:
0000000000000000 R12: 0000000000000040
Jun 4 11:07:56 xensrv1 kernel: R13: ffffc90013f50000 R14:
0000000000000040 R15: 000000000000012b
Jun 4 11:07:56 xensrv1 kernel: FS: 0000000000000000(0000)
GS:ffff88027aa20000(0000) knlGS:ffff88027aa20000
Jun 4 11:07:56 xensrv1 kernel: CS: e033 DS: 002b ES: 002b CR0:
0000000080050033
Jun 4 11:07:56 xensrv1 kernel: CR2: 00007fad4acc6b08 CR3:
000000024e0a1000 CR4: 0000000000042660
Jun 4 11:07:56 xensrv1 kernel: Stack:
Jun 4 11:07:56 xensrv1 kernel: ffffffff815a1139 ffff88027aa23e58
ffffc90013f50028 ffff88027aa23e58
Jun 4 11:07:56 xensrv1 kernel: ffffffffa036fc81 ffff88027aa23e98
ffffffffa03733cd ffff88027aa23e98
Jun 4 11:07:56 xensrv1 kernel: ffffffff00000000 ffff880251e25050
ffffc90013f50028 0000000000000000
Jun 4 11:07:56 xensrv1 kernel: Call Trace:
Jun 4 11:07:56 xensrv1 kernel: <IRQ> [<ffffffff815a1139>] ?
__napi_schedule+0x59/0x60
Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa036fc81>]
xenvif_napi_schedule_or_enable_events+0x81/0x90 [xen_netback]
Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa03733cd>]
xenvif_poll+0x4d/0x68 [xen_netback]
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff815a8b32>]
net_rx_action+0x112/0x2c0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff81077d4c>]
__do_softirq+0xfc/0x2f0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8107804d>] irq_exit+0xbd/0xd0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff813b668c>]
xen_evtchn_do_upcall+0x3c/0x50
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8167c49e>]
xen_do_hypervisor_callback+0x1e/0x40
Jun 4 11:07:56 xensrv1 kernel: <EOI> [<ffffffff810013aa>] ?
xen_hypercall_sched_op+0xa/0x20
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810013aa>] ?
xen_hypercall_sched_op+0xa/0x20
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8100b700>] ?
xen_safe_halt+0x10/0x20
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101fd44>] ?
default_idle+0x24/0xf0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101f34f>] ?
arch_cpu_idle+0xf/0x20
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b37f6>] ?
cpuidle_idle_call+0xd6/0x1d0
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810926c2>] ?
__atomic_notifier_call_chain+0x12/0x20
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3a25>] ?
cpu_idle_loop+0x135/0x200
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b0b>] ?
cpu_startup_entry+0x1b/0x70
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b50>] ?
cpu_startup_entry+0x60/0x70
Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101261a>] ?
cpu_bringup_and_idle+0x2a/0x40
Jun 4 11:07:56 xensrv1 kernel: Code: 44 00 00 65 f6 04 25 c1 a0 00 00 ff
0f 94 c4 00 e4 c3 90 66 f7 c7 00 02 65 0f 94 04 25 c1 a0 00 00 65 66 83
3c 25 c0 a0 00 00 01 <75> 05 e8 01 00 00 00 c3 50 51 52 56 57 41 50 41
51 41 52 41 53

Sometimes we get this lockups for minutes, and then the system recovers.

But its clear we need to find a solution for this :)
And it seems like we're not the only ones:
https://lists.centos.org/pipermail/centos-virt/2016-March/005014.html

There was also some other thread were there was a proposed patch
(https://www.spinics.net/lists/netdev/msg282765.html). But I don't see
any followup on this.

Any advice?

Thanks!
Jean-Louis Dupond

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users