PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hello Guys,

I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash).
Here is configuration :
- 3x Hard Drives running on RAID 5 Software raid created by mdadm
- On top of it, DRBD for replication over another node (Active/passive cluster)
- On top of it, a BTRFS FileSystem with a few subvolumes
- On top of it, XEN VMs running.

The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack.
I've to reset system to make it work again.

Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also reproducible on another system with the same hardware.

Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15, kernel-4.9.0

Here dmesg errors :
[  937.123220] ------------[ cut here ]------------
[  937.127549] kernel BUG at drivers/md/raid5.c:527!
[  937.131891] invalid opcode: 0000 [#1] SMP
[  937.136216] Modules linked in: x86_pkg_temp_thermal coretemp crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
[  937.145665] CPU: 2 PID: 9704 Comm: kworker/u16:8 Not tainted 4.9.0-gentoo #2
[  937.150293] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016
[  937.155531] Workqueue: drbd0_submit do_submit
[  937.160506] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
[  937.164115] RIP: e030:[<ffffffff819e1fc1>]  [<ffffffff819e1fc1>] raid5_get_active_stripe+0x5e1/0x670
[  937.169584] RSP: e02b:ffffc9000a66fa58  EFLAGS: 00010086
[  937.175070] RAX: 0000000000000000 RBX: ffff880249d50000 RCX: ffff8802648bb5d0
[  937.180640] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880249d50000
[  937.185505] RBP: ffffc9000a66faf0 R08: ffff8801f4813288 R09: 0000000000000000
[  937.190631] R10: 0000000000000288 R11: 0000000000000000 R12: 0000000000000000
[  937.196030] R13: 000000001e773e88 R14: ffff880249d50000 R15: ffff8802648bb400
[  937.202011] FS:  0000000000000000(0000) GS:ffff880270c80000(0000) knlGS:ffff880270c80000
[  937.206628] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  937.212372] CR2: 00007f68a101b520 CR3: 0000000257875000 CR4: 0000000000042660
[  937.217538] Stack:
[  937.223361]  ffff8802648bb400 ffff880269550b40 0000000000000000 0000000166cf3800
[  937.229103]  000000001e773e88 ffff8802648bb5d0 0000000000000001 0000000000000000
[  937.233707]  ffff8802648bb40c 0000000000000001 ffffc9000a66faf0 ffff880047cba958
[  937.239736] Call Trace:
[  937.244406]  [<ffffffff819e21cd>] raid5_make_request+0x17d/0xdf0
[  937.250345]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[  937.256173]  [<ffffffff81a09c03>] md_make_request+0xe3/0x220
[  937.261031]  [<ffffffff81483e9b>] generic_make_request+0xcb/0x1a0
[  937.265615]  [<ffffffff81732537>] drbd_send_and_submit+0x497/0x1310
[  937.271605]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[  937.276726]  [<ffffffff817339ba>] send_and_submit_pending+0x6a/0x90
[  937.282292]  [<ffffffff81733e43>] do_submit+0x463/0x550
[  937.288333]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[  937.293205]  [<ffffffff81095400>] process_one_work+0x170/0x420
[  937.298982]  [<ffffffff810957d3>] worker_thread+0x123/0x500
[  937.304154]  [<ffffffff810956b0>] ? process_one_work+0x420/0x420
[  937.310314]  [<ffffffff810956b0>] ? process_one_work+0x420/0x420
[  937.316013]  [<ffffffff8109b135>] kthread+0xc5/0xe0
[  937.320918]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
[  937.327029]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
[  937.331994]  [<ffffffff81ccbbc5>] ret_from_fork+0x25/0x30
[  937.338068] Code: 85 d0 fb ff ff f0 41 80 8f 98 02 00 00 04 e9 c2 fb ff ff f3 90 41 8b 47 70 a8 01 75 f6 89 45 a4 e9 e2 fd ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 49 89 d6 e9 e1 fa ff ff 49 8b 82 e8 01 00 00 4d 8b 8a e0
[  937.349579] RIP  [<ffffffff819e1fc1>] raid5_get_active_stripe+0x5e1/0x670
[  937.355290]  RSP <ffffc9000a66fa58>
[  937.386587] ---[ end trace b870be01f61065a5 ]---
[  941.931453] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  941.937139] IP: [<ffffffff810bcaa6>] __wake_up_common+0x26/0x80
[  941.943106] PGD 252dde067
[  941.943219] PUD 252ee7067
[  941.950107] PMD 0

[  941.956080] Oops: 0000 [#2] SMP
[  941.961919] Modules linked in: x86_pkg_temp_thermal coretemp crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
[  941.974933] CPU: 2 PID: 9704 Comm: kworker/u16:8 Tainted: G      D         4.9.0-gentoo #2
[  941.982080] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016
[  941.989296] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
[  941.996831] RIP: e030:[<ffffffff810bcaa6>]  [<ffffffff810bcaa6>] __wake_up_common+0x26/0x80
[  942.004391] RSP: e02b:ffffc9000a66fe50  EFLAGS: 00010086
[  942.011818] RAX: 0000000000000200 RBX: ffffc9000a66ff18 RCX: 0000000000000000
[  942.019290] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffc9000a66ff18
[  942.026779] RBP: ffffc9000a66fe88 R08: 0000000000000000 R09: 0000000000000000
[  942.034246] R10: 0000000000000008 R11: 0000000000000001 R12: ffffc9000a66ff20
[  942.041693] R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000003
[  942.049166] FS:  0000000000000000(0000) GS:ffff880270c80000(0000) knlGS:ffff880270c80000
[  942.056599] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  942.063953] CR2: 0000000000000028 CR3: 0000000257875000 CR4: 0000000000042660
[  942.070841] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[  942.074862] BUG: unable to handle kernel paging request at ffffc9000234f8f8
[  942.078910] IP: [<ffffc9000234f8f8>] 0xffffc9000234f8f8
[  942.082961] PGD 1e9840067
[  942.083010] PUD 1e983f067
[  942.086963] PMD 26b42c067
[  942.086978] PTE 801000026b66c067

[  942.094822] Oops: 0011 [#3] SMP
[  942.098734] Modules linked in: x86_pkg_temp_thermal coretemp crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
[  942.107222] CPU: 2 PID: 9704 Comm: kworker/u16:8 Tainted: G      D         4.9.0-gentoo #2
[  942.111581] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016
[  942.116050] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
[  942.120530] RIP: e030:[<ffffc9000234f8f8>]  [<ffffc9000234f8f8>] 0xffffc9000234f8f8
[  942.125019] RSP: e02b:ffffc9000a66fb80  EFLAGS: 00010082
[  942.129534] RAX: 0000000000000041 RBX: 0000000000042660 RCX: 0000000000000006
[  942.134355] RDX: 0000000000000041 RSI: ffffffff824e00a0 RDI: ffff880270c8dd80
[  942.138921] RBP: ffffc9000a66fbe0 R08: 0000000000000000 R09: 0000000000000000
[  942.143564] R10: 0000000000000008 R11: 0000000000000001 R12: 0000000080050033
[  942.148172] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  942.152837] FS:  0000000000000000(0000) GS:ffff880270c80000(0000) knlGS:ffff880270c80000
[  942.157525] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  942.162213] CR2: 0000000000000028 CR3: 0000000257875000 CR4: 0000000000042660
[  942.166954] Stack:
[  942.171576]  0000000257875000 0000000000000028 ffff880270c80000 ffff880270c80000
[  942.176267]  0000000000000000 0000e0330a66c000 0000000000000000 ffffc9000a66fda8
[  942.180918]  0000000000000000 ffffc9000a66fda8 0000000000000000 0000000000000000
[  942.185521] Call Trace:
[  942.190043]  [<ffffffff810302ad>] show_regs+0x2d/0x180
[  942.194605]  [<ffffffff81030725>] __die+0xa5/0xf0
[  942.199050]  [<ffffffff8106041e>] no_context+0x14e/0x3d0
[  942.203562]  [<ffffffff81060798>] __bad_area_nosemaphore+0xf8/0x1c0
[  942.208002]  [<ffffffff8106086f>] bad_area_nosemaphore+0xf/0x20
[  942.212478]  [<ffffffff81061034>] __do_page_fault+0x84/0x4b0
[  942.216797]  [<ffffffff8106148c>] do_page_fault+0x2c/0x40
[  942.221021]  [<ffffffff81ccd808>] page_fault+0x28/0x30
[  942.225184]  [<ffffffff810bcaa6>] ? __wake_up_common+0x26/0x80
[  942.229287]  [<ffffffff810bcb0e>] __wake_up_locked+0xe/0x10
[  942.233366]  [<ffffffff810bd4d2>] complete+0x32/0x50
[  942.237330]  [<ffffffff8107a500>] mm_release+0xc0/0x160
[  942.241216]  [<ffffffff81080206>] do_exit+0x136/0xb50
[  942.245056]  [<ffffffff81ccdc07>] rewind_stack_do_exit+0x17/0x20
[  942.248933] Code: c9 ff ff c0 cf 74 b7 01 88 ff ff 00 30 cf 66 02 88 ff ff 00 00 00 00 00 00 00 00 40 29 57 6b 02 88 ff ff b0 cf 0b 81 ff ff ff ff <70> fb 66 0a 00 c9 ff ff 88 b6 8b 64 02 88 ff ff 00 00 00 00 01
[  942.257683] RIP  [<ffffc9000234f8f8>] 0xffffc9000234f8f8
[  942.261814]  RSP <ffffc9000a66fb80>
[  942.265860] CR2: ffffc9000234f8f8
[  942.269830] ---[ end trace b870be01f61065a6 ]---
[  942.293603] Fixing recursive fault but reboot is needed!
[  962.926746] INFO: rcu_sched detected stalls on CPUs/tasks:
[  962.930582]  4-...: (1 GPs behind) idle=deb/140000000000000/0 softirq=51234/51234 fqs=5195
[  962.934400]  (detected by 1, t=21002 jiffies, g=26732, c=26731, q=173)
[  962.938231] Task dump for CPU 4:
[  962.942054] md10_raid5      R  running task    13232  2654      2 0x00000008
[  962.945939]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88 0000000000000000
[  962.949899]  0000000000000220 ffff8802648bb40c 0000000000000002 ffff8802648bb708
[  962.953781]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884 ffff8802648bb400
[  962.957570] Call Trace:
[  962.961272]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
[  962.964943]  [<ffffffff819d87f4>] ? release_inactive_stripe_list+0x44/0x180
[  962.968604]  [<ffffffff819e5469>] ? handle_active_stripes.isra.56+0x169/0x440
[  962.972253]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
[  962.975825]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
[  962.979360]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[  962.982900]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
[  962.986392]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
[  962.989881]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
[  962.993382]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
[  962.996858]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30
[ 1025.932534] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1025.936027]  4-...: (1 GPs behind) idle=deb/140000000000000/0 softirq=51234/51234 fqs=20780
[ 1025.939486]  (detected by 0, t=84014 jiffies, g=26732, c=26731, q=742)
[ 1025.942969] Task dump for CPU 4:
[ 1025.946373] md10_raid5      R  running task    13232  2654      2 0x00000008
[ 1025.949909]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88 0000000000000000
[ 1025.953451]  0000000000000220 ffff8802648bb40c 0000000000000002 ffff8802648bb708
[ 1025.957015]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884 ffff8802648bb400
[ 1025.960601] Call Trace:
[ 1025.964139]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
[ 1025.967724]  [<ffffffff819d87f4>] ? release_inactive_stripe_list+0x44/0x180
[ 1025.971351]  [<ffffffff819e5469>] ? handle_active_stripes.isra.56+0x169/0x440
[ 1025.975001]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
[ 1025.978598]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
[ 1025.982255]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[ 1025.985875]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
[ 1025.989496]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
[ 1025.993117]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
[ 1025.996707]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
[ 1026.000354]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30
[ 1088.937436] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1088.941109]  4-...: (1 GPs behind) idle=deb/140000000000000/0 softirq=51234/51234 fqs=36280
[ 1088.944649]  (detected by 0, t=147019 jiffies, g=26732, c=26731, q=1328)
[ 1088.948180] Task dump for CPU 4:
[ 1088.951671] md10_raid5      R  running task    13232  2654      2 0x00000008
[ 1088.955296]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88 0000000000000000
[ 1088.958963]  0000000000000220 ffff8802648bb40c 0000000000000002 ffff8802648bb708
[ 1088.962665]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884 ffff8802648bb400
[ 1088.966301] Call Trace:
[ 1088.969868]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
[ 1088.973451]  [<ffffffff819d87f4>] ? release_inactive_stripe_list+0x44/0x180
[ 1088.977020]  [<ffffffff819e5469>] ? handle_active_stripes.isra.56+0x169/0x440
[ 1088.980572]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
[ 1088.984066]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
[ 1088.987549]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
[ 1088.991011]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
[ 1088.994444]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
[ 1088.997815]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
[ 1089.001181]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
[ 1089.004534]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30

(Another log here : http://pastebin.com/maxGFc1z)

Xen versions affected (at least): xen-4.6, xen-4.7, xen-4.8
xen-tools same version

Userland is a gentoo linux box.

Kernel .config : http://pastebin.com/p0EcHjbu

All buit with : gcc (Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3

-> scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux Node_1 4.9.0-gentoo #2 SMP Fri Dec 23 16:37:48 CET 2016 x86_64 Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz GenuineIntel GNU/Linux

GNU C                   4.9.3
GNU Make                4.1
Binutils                2.25.1
Util-linux              2.26.2
Mount                   2.26.2
Module-init-tools       22
E2fsprogs               1.43.3
Linux C Library         2.22
Dynamic linker (ldd)    2.22
Linux C++ Library       6.0.20
Procps                  3.3.12
Net-tools               1.60
Kbd                     2.0.3
Console-tools           2.0.3
Sh-utils                8.25
Udev                    220
Modules Loaded          ablk_helper aesni_intel aes_x86_64 coretemp crc32c_intel mei mei_me mpt3sas x86_pkg_temp_thermal

-> System is a SuperMicro Motherboard X10SDV-4C-7TP4F with 8GB of DDR 4 ECC Registered memory


Any help would be greatly appreciated !

Thanks,


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hi guys,

I've tested the same set-up except with a RAID 1 Soft Array, in this
case I get no issue at all.
It's definitely a raid 5 problem.
As requested, I've tested re-creating the RAID 5 array (just to be
sure), issue remains the same, even with metadata 0.90 or metadata 1.2.

Thanks,

Le 23/12/2016 19:25, MasterPrenium a écrit :

> Hello Guys,
>
> I've having some trouble on a new system I'm setting up. I'm getting a
> kernel BUG message, seems to be related with the use of Xen (when I
> boot the system _without_ Xen, I don't get any crash).
> Here is configuration :
> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
> - On top of it, DRBD for replication over another node (Active/passive
> cluster)
> - On top of it, a BTRFS FileSystem with a few subvolumes
> - On top of it, XEN VMs running.
>
> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync
> for example) on the RAID5 stack.
> I've to reset system to make it work again.
>
> Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also
> reproducible on another system with the same hardware.
>
> Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15,
> kernel-4.9.0
>
> Here dmesg errors :
> [  937.123220] ------------[ cut here ]------------
> [  937.127549] kernel BUG at drivers/md/raid5.c:527!
> [  937.131891] invalid opcode: 0000 [#1] SMP
> [  937.136216] Modules linked in: x86_pkg_temp_thermal coretemp
> crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
> [  937.145665] CPU: 2 PID: 9704 Comm: kworker/u16:8 Not tainted
> 4.9.0-gentoo #2
> [  937.150293] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F,
> BIOS 1.0b 11/21/2016
> [  937.155531] Workqueue: drbd0_submit do_submit
> [  937.160506] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
> [  937.164115] RIP: e030:[<ffffffff819e1fc1>] [<ffffffff819e1fc1>]
> raid5_get_active_stripe+0x5e1/0x670
> [  937.169584] RSP: e02b:ffffc9000a66fa58  EFLAGS: 00010086
> [  937.175070] RAX: 0000000000000000 RBX: ffff880249d50000 RCX:
> ffff8802648bb5d0
> [  937.180640] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffff880249d50000
> [  937.185505] RBP: ffffc9000a66faf0 R08: ffff8801f4813288 R09:
> 0000000000000000
> [  937.190631] R10: 0000000000000288 R11: 0000000000000000 R12:
> 0000000000000000
> [  937.196030] R13: 000000001e773e88 R14: ffff880249d50000 R15:
> ffff8802648bb400
> [  937.202011] FS:  0000000000000000(0000) GS:ffff880270c80000(0000)
> knlGS:ffff880270c80000
> [  937.206628] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  937.212372] CR2: 00007f68a101b520 CR3: 0000000257875000 CR4:
> 0000000000042660
> [  937.217538] Stack:
> [  937.223361]  ffff8802648bb400 ffff880269550b40 0000000000000000
> 0000000166cf3800
> [  937.229103]  000000001e773e88 ffff8802648bb5d0 0000000000000001
> 0000000000000000
> [  937.233707]  ffff8802648bb40c 0000000000000001 ffffc9000a66faf0
> ffff880047cba958
> [  937.239736] Call Trace:
> [  937.244406]  [<ffffffff819e21cd>] raid5_make_request+0x17d/0xdf0
> [  937.250345]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [  937.256173]  [<ffffffff81a09c03>] md_make_request+0xe3/0x220
> [  937.261031]  [<ffffffff81483e9b>] generic_make_request+0xcb/0x1a0
> [  937.265615]  [<ffffffff81732537>] drbd_send_and_submit+0x497/0x1310
> [  937.271605]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [  937.276726]  [<ffffffff817339ba>] send_and_submit_pending+0x6a/0x90
> [  937.282292]  [<ffffffff81733e43>] do_submit+0x463/0x550
> [  937.288333]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [  937.293205]  [<ffffffff81095400>] process_one_work+0x170/0x420
> [  937.298982]  [<ffffffff810957d3>] worker_thread+0x123/0x500
> [  937.304154]  [<ffffffff810956b0>] ? process_one_work+0x420/0x420
> [  937.310314]  [<ffffffff810956b0>] ? process_one_work+0x420/0x420
> [  937.316013]  [<ffffffff8109b135>] kthread+0xc5/0xe0
> [  937.320918]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
> [  937.327029]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
> [  937.331994]  [<ffffffff81ccbbc5>] ret_from_fork+0x25/0x30
> [  937.338068] Code: 85 d0 fb ff ff f0 41 80 8f 98 02 00 00 04 e9 c2
> fb ff ff f3 90 41 8b 47 70 a8 01 75 f6 89 45 a4 e9 e2 fd ff ff 0f 0b
> 0f 0b 0f 0b <0f> 0b 49 89 d6 e9 e1 fa ff ff 49 8b 82 e8 01 00 00 4d 8b
> 8a e0
> [  937.349579] RIP  [<ffffffff819e1fc1>]
> raid5_get_active_stripe+0x5e1/0x670
> [  937.355290]  RSP <ffffc9000a66fa58>
> [  937.386587] ---[ end trace b870be01f61065a5 ]---
> [  941.931453] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [  941.937139] IP: [<ffffffff810bcaa6>] __wake_up_common+0x26/0x80
> [  941.943106] PGD 252dde067
> [  941.943219] PUD 252ee7067
> [  941.950107] PMD 0
>
> [  941.956080] Oops: 0000 [#2] SMP
> [  941.961919] Modules linked in: x86_pkg_temp_thermal coretemp
> crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
> [  941.974933] CPU: 2 PID: 9704 Comm: kworker/u16:8 Tainted: G      
> D         4.9.0-gentoo #2
> [  941.982080] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F,
> BIOS 1.0b 11/21/2016
> [  941.989296] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
> [  941.996831] RIP: e030:[<ffffffff810bcaa6>] [<ffffffff810bcaa6>]
> __wake_up_common+0x26/0x80
> [  942.004391] RSP: e02b:ffffc9000a66fe50  EFLAGS: 00010086
> [  942.011818] RAX: 0000000000000200 RBX: ffffc9000a66ff18 RCX:
> 0000000000000000
> [  942.019290] RDX: 0000000000000000 RSI: 0000000000000003 RDI:
> ffffc9000a66ff18
> [  942.026779] RBP: ffffc9000a66fe88 R08: 0000000000000000 R09:
> 0000000000000000
> [  942.034246] R10: 0000000000000008 R11: 0000000000000001 R12:
> ffffc9000a66ff20
> [  942.041693] R13: 0000000000000200 R14: 0000000000000000 R15:
> 0000000000000003
> [  942.049166] FS:  0000000000000000(0000) GS:ffff880270c80000(0000)
> knlGS:ffff880270c80000
> [  942.056599] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  942.063953] CR2: 0000000000000028 CR3: 0000000257875000 CR4:
> 0000000000042660
> [  942.070841] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [  942.074862] BUG: unable to handle kernel paging request at
> ffffc9000234f8f8
> [  942.078910] IP: [<ffffc9000234f8f8>] 0xffffc9000234f8f8
> [  942.082961] PGD 1e9840067
> [  942.083010] PUD 1e983f067
> [  942.086963] PMD 26b42c067
> [  942.086978] PTE 801000026b66c067
>
> [  942.094822] Oops: 0011 [#3] SMP
> [  942.098734] Modules linked in: x86_pkg_temp_thermal coretemp
> crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas
> [  942.107222] CPU: 2 PID: 9704 Comm: kworker/u16:8 Tainted: G      
> D         4.9.0-gentoo #2
> [  942.111581] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F,
> BIOS 1.0b 11/21/2016
> [  942.116050] task: ffff88026b0b2940 task.stack: ffffc9000a66c000
> [  942.120530] RIP: e030:[<ffffc9000234f8f8>] [<ffffc9000234f8f8>]
> 0xffffc9000234f8f8
> [  942.125019] RSP: e02b:ffffc9000a66fb80  EFLAGS: 00010082
> [  942.129534] RAX: 0000000000000041 RBX: 0000000000042660 RCX:
> 0000000000000006
> [  942.134355] RDX: 0000000000000041 RSI: ffffffff824e00a0 RDI:
> ffff880270c8dd80
> [  942.138921] RBP: ffffc9000a66fbe0 R08: 0000000000000000 R09:
> 0000000000000000
> [  942.143564] R10: 0000000000000008 R11: 0000000000000001 R12:
> 0000000080050033
> [  942.148172] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [  942.152837] FS:  0000000000000000(0000) GS:ffff880270c80000(0000)
> knlGS:ffff880270c80000
> [  942.157525] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  942.162213] CR2: 0000000000000028 CR3: 0000000257875000 CR4:
> 0000000000042660
> [  942.166954] Stack:
> [  942.171576]  0000000257875000 0000000000000028 ffff880270c80000
> ffff880270c80000
> [  942.176267]  0000000000000000 0000e0330a66c000 0000000000000000
> ffffc9000a66fda8
> [  942.180918]  0000000000000000 ffffc9000a66fda8 0000000000000000
> 0000000000000000
> [  942.185521] Call Trace:
> [  942.190043]  [<ffffffff810302ad>] show_regs+0x2d/0x180
> [  942.194605]  [<ffffffff81030725>] __die+0xa5/0xf0
> [  942.199050]  [<ffffffff8106041e>] no_context+0x14e/0x3d0
> [  942.203562]  [<ffffffff81060798>] __bad_area_nosemaphore+0xf8/0x1c0
> [  942.208002]  [<ffffffff8106086f>] bad_area_nosemaphore+0xf/0x20
> [  942.212478]  [<ffffffff81061034>] __do_page_fault+0x84/0x4b0
> [  942.216797]  [<ffffffff8106148c>] do_page_fault+0x2c/0x40
> [  942.221021]  [<ffffffff81ccd808>] page_fault+0x28/0x30
> [  942.225184]  [<ffffffff810bcaa6>] ? __wake_up_common+0x26/0x80
> [  942.229287]  [<ffffffff810bcb0e>] __wake_up_locked+0xe/0x10
> [  942.233366]  [<ffffffff810bd4d2>] complete+0x32/0x50
> [  942.237330]  [<ffffffff8107a500>] mm_release+0xc0/0x160
> [  942.241216]  [<ffffffff81080206>] do_exit+0x136/0xb50
> [  942.245056]  [<ffffffff81ccdc07>] rewind_stack_do_exit+0x17/0x20
> [  942.248933] Code: c9 ff ff c0 cf 74 b7 01 88 ff ff 00 30 cf 66 02
> 88 ff ff 00 00 00 00 00 00 00 00 40 29 57 6b 02 88 ff ff b0 cf 0b 81
> ff ff ff ff <70> fb 66 0a 00 c9 ff ff 88 b6 8b 64 02 88 ff ff 00 00 00
> 00 01
> [  942.257683] RIP  [<ffffc9000234f8f8>] 0xffffc9000234f8f8
> [  942.261814]  RSP <ffffc9000a66fb80>
> [  942.265860] CR2: ffffc9000234f8f8
> [  942.269830] ---[ end trace b870be01f61065a6 ]---
> [  942.293603] Fixing recursive fault but reboot is needed!
> [  962.926746] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  962.930582]  4-...: (1 GPs behind) idle=deb/140000000000000/0
> softirq=51234/51234 fqs=5195
> [  962.934400]  (detected by 1, t=21002 jiffies, g=26732, c=26731, q=173)
> [  962.938231] Task dump for CPU 4:
> [  962.942054] md10_raid5      R  running task    13232  2654 2
> 0x00000008
> [  962.945939]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88
> 0000000000000000
> [  962.949899]  0000000000000220 ffff8802648bb40c 0000000000000002
> ffff8802648bb708
> [  962.953781]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884
> ffff8802648bb400
> [  962.957570] Call Trace:
> [  962.961272]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
> [  962.964943]  [<ffffffff819d87f4>] ?
> release_inactive_stripe_list+0x44/0x180
> [  962.968604]  [<ffffffff819e5469>] ?
> handle_active_stripes.isra.56+0x169/0x440
> [  962.972253]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
> [  962.975825]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
> [  962.979360]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [  962.982900]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
> [  962.986392]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
> [  962.989881]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
> [  962.993382]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
> [  962.996858]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30
> [ 1025.932534] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1025.936027]  4-...: (1 GPs behind) idle=deb/140000000000000/0
> softirq=51234/51234 fqs=20780
> [ 1025.939486]  (detected by 0, t=84014 jiffies, g=26732, c=26731, q=742)
> [ 1025.942969] Task dump for CPU 4:
> [ 1025.946373] md10_raid5      R  running task    13232  2654 2
> 0x00000008
> [ 1025.949909]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88
> 0000000000000000
> [ 1025.953451]  0000000000000220 ffff8802648bb40c 0000000000000002
> ffff8802648bb708
> [ 1025.957015]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884
> ffff8802648bb400
> [ 1025.960601] Call Trace:
> [ 1025.964139]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
> [ 1025.967724]  [<ffffffff819d87f4>] ?
> release_inactive_stripe_list+0x44/0x180
> [ 1025.971351]  [<ffffffff819e5469>] ?
> handle_active_stripes.isra.56+0x169/0x440
> [ 1025.975001]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
> [ 1025.978598]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
> [ 1025.982255]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [ 1025.985875]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
> [ 1025.989496]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
> [ 1025.993117]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
> [ 1025.996707]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
> [ 1026.000354]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30
> [ 1088.937436] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1088.941109]  4-...: (1 GPs behind) idle=deb/140000000000000/0
> softirq=51234/51234 fqs=36280
> [ 1088.944649]  (detected by 0, t=147019 jiffies, g=26732, c=26731,
> q=1328)
> [ 1088.948180] Task dump for CPU 4:
> [ 1088.951671] md10_raid5      R  running task    13232  2654 2
> 0x00000008
> [ 1088.955296]  ffff880270d0dcc0 ffff880270ed8ec0 000000000306bc88
> 0000000000000000
> [ 1088.958963]  0000000000000220 ffff8802648bb40c 0000000000000002
> ffff8802648bb708
> [ 1088.962665]  0000000000000001 ffffc9000306bcc8 ffffffff81ccb884
> ffff8802648bb400
> [ 1088.966301] Call Trace:
> [ 1088.969868]  [<ffffffff81ccb884>] ? _raw_spin_lock_irqsave+0x54/0x60
> [ 1088.973451]  [<ffffffff819d87f4>] ?
> release_inactive_stripe_list+0x44/0x180
> [ 1088.977020]  [<ffffffff819e5469>] ?
> handle_active_stripes.isra.56+0x169/0x440
> [ 1088.980572]  [<ffffffff819e5ae1>] ? raid5d+0x3a1/0x730
> [ 1088.984066]  [<ffffffff81a094d3>] ? md_thread+0xf3/0x100
> [ 1088.987549]  [<ffffffff810bcfb0>] ? wake_up_atomic_t+0x30/0x30
> [ 1088.991011]  [<ffffffff81a093e0>] ? find_pers+0x70/0x70
> [ 1088.994444]  [<ffffffff8109b135>] ? kthread+0xc5/0xe0
> [ 1088.997815]  [<ffffffff8102c815>] ? __switch_to+0x355/0x7a0
> [ 1089.001181]  [<ffffffff8109b070>] ? kthread_park+0x60/0x60
> [ 1089.004534]  [<ffffffff81ccbbc5>] ? ret_from_fork+0x25/0x30
>
> (Another log here : http://pastebin.com/maxGFc1z)
>
> Xen versions affected (at least): xen-4.6, xen-4.7, xen-4.8
> xen-tools same version
>
> Userland is a gentoo linux box.
>
> Kernel .config : http://pastebin.com/p0EcHjbu
>
> All buit with : gcc (Gentoo 4.9.3 p1.5, pie-0.6.4) 4.9.3
>
> -> scripts/ver_linux
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
>
> Linux Node_1 4.9.0-gentoo #2 SMP Fri Dec 23 16:37:48 CET 2016 x86_64
> Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz GenuineIntel GNU/Linux
>
> GNU C                   4.9.3
> GNU Make                4.1
> Binutils                2.25.1
> Util-linux              2.26.2
> Mount                   2.26.2
> Module-init-tools       22
> E2fsprogs               1.43.3
> Linux C Library         2.22
> Dynamic linker (ldd)    2.22
> Linux C++ Library       6.0.20
> Procps                  3.3.12
> Net-tools               1.60
> Kbd                     2.0.3
> Console-tools           2.0.3
> Sh-utils                8.25
> Udev                    220
> Modules Loaded          ablk_helper aesni_intel aes_x86_64 coretemp
> crc32c_intel mei mei_me mpt3sas x86_pkg_temp_thermal
>
> -> System is a SuperMicro Motherboard X10SDV-4C-7TP4F with 8GB of DDR
> 4 ECC Registered memory
>
>
> Any help would be greatly appreciated !
>
> Thanks,
>


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

Jes.Sorensen
In reply to this post by MasterPrenium
MasterPrenium <[hidden email]> writes:

> Hello Guys,
>
> I've having some trouble on a new system I'm setting up. I'm getting a
> kernel BUG message, seems to be related with the use of Xen (when I
> boot the system _without_ Xen, I don't get any crash).
> Here is configuration :
> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
> - On top of it, DRBD for replication over another node (Active/passive cluster)
> - On top of it, a BTRFS FileSystem with a few subvolumes
> - On top of it, XEN VMs running.
>
> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync
> for example) on the RAID5 stack.
> I've to reset system to make it work again.
>
> Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also
> reproducible on another system with the same hardware.
>
> Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15, kernel-4.9.0

Well you have one foreign object in there that is not part of the
kernel and which shows up in the OOPS: DRDB

What happens when you remove that from the equation?

Jes

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hello,

Thanks for your reply. DRBD isn't part of the kernel ? I was thinking it
has been included since 2.6.3x ?

I've just tested without DRBD, the issue seems to remain. Can't see the
"BUG", but the kernel crashed also. (A little bit later)
I don't have full dump since I lost my network connection and my serial
connection.
Here is a picture of what I got :
http://img15.hostingpics.net/pics/113882KernelError6.png
Another one : http://img11.hostingpics.net/pics/164702KernelError7.png

It also seems to me that having the "glances" monitoring software
running in dom0, makes the kernel crashes quicker, don't think this can
help but... just in case...

Any idea / test I can make ? This is really a blocking issue with
potential data loss...

Best regards,
MasterPrenium


Le 30/12/2016 21:54, Jes Sorensen a écrit :

> MasterPrenium<[hidden email]>  writes:
>> Hello Guys,
>>
>> I've having some trouble on a new system I'm setting up. I'm getting a
>> kernel BUG message, seems to be related with the use of Xen (when I
>> boot the system _without_ Xen, I don't get any crash).
>> Here is configuration :
>> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
>> - On top of it, DRBD for replication over another node (Active/passive cluster)
>> - On top of it, a BTRFS FileSystem with a few subvolumes
>> - On top of it, XEN VMs running.
>>
>> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync
>> for example) on the RAID5 stack.
>> I've to reset system to make it work again.
>>
>> Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also
>> reproducible on another system with the same hardware.
>>
>> Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15, kernel-4.9.0
> Well you have one foreign object in there that is not part of the
> kernel and which shows up in the OOPS: DRDB
>
> What happens when you remove that from the equation?
>
> Jes


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

Shaohua Li
In reply to this post by MasterPrenium
On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:

> Hello Guys,
>
> I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash).
> Here is configuration :
> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
> - On top of it, DRBD for replication over another node (Active/passive cluster)
> - On top of it, a BTRFS FileSystem with a few subvolumes
> - On top of it, XEN VMs running.
>
> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack.
> I've to reset system to make it work again.

what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
issue with a raw raid5 raid? It would be even better if you can give me a fio
job file with the issue, so I can easily debug it.

also please check if upstream patch (e8d7c33 md/raid5: limit request size
according to implementation limits) helps.

Thanks,
Shaohua

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hi Shaohua,

Thanks for your reply.

Let me explain my "huge". For example, if I'm making a low rate i/o
stream, I don't get a crash (<1MB written / sec) with random i/o, but if
I'm making a random I/O of about 20MB/sec, the kernel crashes in a few
minutes (for example, making an rsync, or even synchronising my DRBD
stack is causing the crash).
I don't know if this can help, but in most of case, when the kernel
crashes, after a reboot, my raid 5 stack is re-synchronizing.

I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio
...).

It seems I need to stack filesystems to help reproduce it:

Here is a configuration test, command lines to explain (the way I'm able
to reproduce the crash). Everything is done in dom0.
- mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1
/dev/sdd1 /dev/sde1
- mkfs.btrfs /dev/md10
- mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4
- mount /dev/md10 /tmp/btrfs
- btrfs subvolume create /tmp/btrfs/XenVM
- umount /tmp/btrfs
- mount /dev/md10 /mnt/XenVM -osubvol=XenVM
- truncate /mnt/XenVM/VMTestFile.dat -s 800G
- mkfs.ext4 /mnt/XenVM/VMTestFile.dat
- mount /mnt/XenVM/VMTestFile.dat /tmp/ext4

-> Doing this, doesn't seem to crash the kernel :
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
--rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
--group_reporting --filename=/mnt/XenVM/Fio.dat

-> Doing this, is crashing the kernel in a few minutes :
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
--rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
--group_reporting --filename=/tmp/ext4/ext4.dat

Note : --direct=1 or --direct=0 doesn't seem to change the behaviour.
Also having the raid 5 stack re-synchronizing or already synchronized,
doesn't change the behaviour.

Here another "crash" : http://pastebin.com/uqLzL4fn

Regarding your patch, I can't find it. Is it the one sent by Konstantin
Khlebnikov ?

Do you want the "ext4.dat" fio file ? It will be really difficult for me
to provide it to you as I've only a poor ADSL network connection.

Thanks for your help,

MasterPrenium

Le 04/01/2017 à 23:30, Shaohua Li a écrit :

> On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:
>> Hello Guys,
>>
>> I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash).
>> Here is configuration :
>> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
>> - On top of it, DRBD for replication over another node (Active/passive cluster)
>> - On top of it, a BTRFS FileSystem with a few subvolumes
>> - On top of it, XEN VMs running.
>>
>> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack.
>> I've to reset system to make it work again.
> what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
> issue with a raw raid5 raid? It would be even better if you can give me a fio
> job file with the issue, so I can easily debug it.
>
> also please check if upstream patch (e8d7c33 md/raid5: limit request size
> according to implementation limits) helps.
>
> Thanks,
> Shaohua


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

Shaohua Li
On Thu, Jan 05, 2017 at 03:16:53PM +0100, MasterPrenium wrote:

> Hi Shaohua,
>
> Thanks for your reply.
>
> Let me explain my "huge". For example, if I'm making a low rate i/o stream,
> I don't get a crash (<1MB written / sec) with random i/o, but if I'm making
> a random I/O of about 20MB/sec, the kernel crashes in a few minutes (for
> example, making an rsync, or even synchronising my DRBD stack is causing the
> crash).
> I don't know if this can help, but in most of case, when the kernel crashes,
> after a reboot, my raid 5 stack is re-synchronizing.
>
> I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio
> ...).
>
> It seems I need to stack filesystems to help reproduce it:
>
> Here is a configuration test, command lines to explain (the way I'm able to
> reproduce the crash). Everything is done in dom0.
> - mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1
> /dev/sde1
> - mkfs.btrfs /dev/md10
> - mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4
> - mount /dev/md10 /tmp/btrfs
> - btrfs subvolume create /tmp/btrfs/XenVM
> - umount /tmp/btrfs
> - mount /dev/md10 /mnt/XenVM -osubvol=XenVM
> - truncate /mnt/XenVM/VMTestFile.dat -s 800G
> - mkfs.ext4 /mnt/XenVM/VMTestFile.dat
> - mount /mnt/XenVM/VMTestFile.dat /tmp/ext4
>
> -> Doing this, doesn't seem to crash the kernel :
> fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
> --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
> --group_reporting --filename=/mnt/XenVM/Fio.dat
>
> -> Doing this, is crashing the kernel in a few minutes :
> fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
> --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
> --group_reporting --filename=/tmp/ext4/ext4.dat
>
> Note : --direct=1 or --direct=0 doesn't seem to change the behaviour. Also
> having the raid 5 stack re-synchronizing or already synchronized, doesn't
> change the behaviour.
>
> Here another "crash" : http://pastebin.com/uqLzL4fn

I'm trying to reproduce, but no success. So
ext4->btrfs->raid5, crash
btrfs->raid5, no crash
right? does subvolume matter? When you create the raid5 array, does adding
'--assume-clean' option change the behavior? I'd like to narrow down the issue.
If you can capture the blktrace to the raid5 array, it would be great to hint
us what kind of IO it is.
 
> Regarding your patch, I can't find it. Is it the one sent by Konstantin
> Khlebnikov ?

Right.

> Do you want the "ext4.dat" fio file ? It will be really difficult for me to
> provide it to you as I've only a poor ADSL network connection.

Not necessary.

Thanks,
Shaohua

> Thanks for your help,
>
> MasterPrenium
>
> Le 04/01/2017 à 23:30, Shaohua Li a écrit :
> > On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:
> > > Hello Guys,
> > >
> > > I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash).
> > > Here is configuration :
> > > - 3x Hard Drives running on RAID 5 Software raid created by mdadm
> > > - On top of it, DRBD for replication over another node (Active/passive cluster)
> > > - On top of it, a BTRFS FileSystem with a few subvolumes
> > > - On top of it, XEN VMs running.
> > >
> > > The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack.
> > > I've to reset system to make it work again.
> > what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
> > issue with a raw raid5 raid? It would be even better if you can give me a fio
> > job file with the issue, so I can easily debug it.
> >
> > also please check if upstream patch (e8d7c33 md/raid5: limit request size
> > according to implementation limits) helps.
> >
> > Thanks,
> > Shaohua
>

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hello,

Replies below + :
- I don't know if this can help but after the crash, when the system
reboots, the Raid 5 stack is re-synchronizing
[   37.028239] md10: Warning: Device sdc1 is misaligned
[   37.028541] created bitmap (15 pages) for device md10
[   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59
of 29807 bits

- Sometimes the kernel completely crash (lost serial + network
connection), sometimes only got the "BUG" dump, but still have network
access (but a reboot is impossible, need to reset the system).

- You can find blktrace here (while running fio), I hope it's complete
since the end of the file is when the kernel crashed :
https://goo.gl/X9jZ50

Thanks,
MasterPrenium

Le 05/01/2017 à 20:37, Shaohua Li a écrit :

> On Thu, Jan 05, 2017 at 03:16:53PM +0100, MasterPrenium wrote:
>> Hi Shaohua,
>>
>> Thanks for your reply.
>>
>> Let me explain my "huge". For example, if I'm making a low rate i/o stream,
>> I don't get a crash (<1MB written / sec) with random i/o, but if I'm making
>> a random I/O of about 20MB/sec, the kernel crashes in a few minutes (for
>> example, making an rsync, or even synchronising my DRBD stack is causing the
>> crash).
>> I don't know if this can help, but in most of case, when the kernel crashes,
>> after a reboot, my raid 5 stack is re-synchronizing.
>>
>> I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio
>> ...).
>>
>> It seems I need to stack filesystems to help reproduce it:
>>
>> Here is a configuration test, command lines to explain (the way I'm able to
>> reproduce the crash). Everything is done in dom0.
>> - mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1
>> /dev/sde1
>> - mkfs.btrfs /dev/md10
>> - mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4
>> - mount /dev/md10 /tmp/btrfs
>> - btrfs subvolume create /tmp/btrfs/XenVM
>> - umount /tmp/btrfs
>> - mount /dev/md10 /mnt/XenVM -osubvol=XenVM
>> - truncate /mnt/XenVM/VMTestFile.dat -s 800G
>> - mkfs.ext4 /mnt/XenVM/VMTestFile.dat
>> - mount /mnt/XenVM/VMTestFile.dat /tmp/ext4
>>
>> -> Doing this, doesn't seem to crash the kernel :
>> fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
>> --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
>> --group_reporting --filename=/mnt/XenVM/Fio.dat
>>
>> -> Doing this, is crashing the kernel in a few minutes :
>> fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite
>> --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600
>> --group_reporting --filename=/tmp/ext4/ext4.dat
>>
>> Note : --direct=1 or --direct=0 doesn't seem to change the behaviour. Also
>> having the raid 5 stack re-synchronizing or already synchronized, doesn't
>> change the behaviour.
>>
>> Here another "crash" : http://pastebin.com/uqLzL4fn
> I'm trying to reproduce, but no success. So
> ext4->btrfs->raid5, crash
> btrfs->raid5, no crash
> right? does subvolume matter? When you create the raid5 array, does adding
> '--assume-clean' option change the behavior? I'd like to narrow down the issue.
> If you can capture the blktrace to the raid5 array, it would be great to hint
> us what kind of IO it is.
>  
Yes Correct.
The subvolume doesn't matter.
-- assume-clean doesn't change the behaviour.
Don't forget that the system needs to be running on xen to crash,
without (on native kernel) it doesn't crash (or at least, I was not able
to make it crash).
>> Regarding your patch, I can't find it. Is it the one sent by Konstantin
>> Khlebnikov ?
> Right.
It doesn't help :(. Maybe the crash is happening a little bit later.

>
>> Do you want the "ext4.dat" fio file ? It will be really difficult for me to
>> provide it to you as I've only a poor ADSL network connection.
> Not necessary.
>
> Thanks,
> Shaohua
>
>> Thanks for your help,
>>
>> MasterPrenium
>>
>> Le 04/01/2017 à 23:30, Shaohua Li a écrit :
>>> On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:
>>>> Hello Guys,
>>>>
>>>> I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash).
>>>> Here is configuration :
>>>> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
>>>> - On top of it, DRBD for replication over another node (Active/passive cluster)
>>>> - On top of it, a BTRFS FileSystem with a few subvolumes
>>>> - On top of it, XEN VMs running.
>>>>
>>>> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack.
>>>> I've to reset system to make it work again.
>>> what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
>>> issue with a raw raid5 raid? It would be even better if you can give me a fio
>>> job file with the issue, so I can easily debug it.
>>>
>>> also please check if upstream patch (e8d7c33 md/raid5: limit request size
>>> according to implementation limits) helps.
>>>
>>> Thanks,
>>> Shaohua


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

Shaohua Li
On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:

> Hello,
>
> Replies below + :
> - I don't know if this can help but after the crash, when the system
> reboots, the Raid 5 stack is re-synchronizing
> [   37.028239] md10: Warning: Device sdc1 is misaligned
> [   37.028541] created bitmap (15 pages) for device md10
> [   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
> 29807 bits
>
> - Sometimes the kernel completely crash (lost serial + network connection),
> sometimes only got the "BUG" dump, but still have network access (but a
> reboot is impossible, need to reset the system).
>
> - You can find blktrace here (while running fio), I hope it's complete since
> the end of the file is when the kernel crashed : https://goo.gl/X9jZ50

Looks most are normal full stripe writes.
 

> > I'm trying to reproduce, but no success. So
> > ext4->btrfs->raid5, crash
> > btrfs->raid5, no crash
> > right? does subvolume matter? When you create the raid5 array, does adding
> > '--assume-clean' option change the behavior? I'd like to narrow down the issue.
> > If you can capture the blktrace to the raid5 array, it would be great to hint
> > us what kind of IO it is.
> Yes Correct.
> The subvolume doesn't matter.
> -- assume-clean doesn't change the behaviour.

so it's not a resync issue.

> Don't forget that the system needs to be running on xen to crash, without
> (on native kernel) it doesn't crash (or at least, I was not able to make it
> crash).
> > > Regarding your patch, I can't find it. Is it the one sent by Konstantin
> > > Khlebnikov ?
> > Right.
> It doesn't help :(. Maybe the crash is happening a little bit later.

ok, the patch is unlikely helpful, since the IO size isn't very big.

Don't have good idea yet. My best guess so far is virtual machine introduces
extra delay, which might trigger some race conditions which aren't seen in
native.  I'll check if I could find something locally.

Thanks,
Shaohua

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
Hi Shaohua,

I've made some new little tests, maybe it can help.

- I tried creating the RAID 5 stack with only 2 drives (mdadm --create
/dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 missing).
The same issue is happening.
- but one time (still with 2/3 drives), I was not able to crash the
kernel, with exactly the same procedure as previous. Even with
re-creating filesystems ect.
In order to re-produce the BUG I had to re-create the array.

Can this be linked to this message ? :
[  155.667456] md10: Warning: Device sdc1 is misaligned

I don't know how to "align" a drive in a RAID stack... The partition is
correctly align (as "parted" says).

- In another test (still 2/3 drives in the stack), I didn't got the
kernel crash, but I had 100% io wait on cpu. Trying to reboot, finally
give me this printk messages : http://pastebin.com/uzVHUUrC

If you have any patch to give me (maybe something to be more verbose
about the issue), please tell me, I'll test it as it's a really blocking
issue...

Best regards,

MasterPrenium


Le 09/01/2017 à 23:44, Shaohua Li a écrit :

> On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
>> Hello,
>>
>> Replies below + :
>> - I don't know if this can help but after the crash, when the system
>> reboots, the Raid 5 stack is re-synchronizing
>> [   37.028239] md10: Warning: Device sdc1 is misaligned
>> [   37.028541] created bitmap (15 pages) for device md10
>> [   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
>> 29807 bits
>>
>> - Sometimes the kernel completely crash (lost serial + network connection),
>> sometimes only got the "BUG" dump, but still have network access (but a
>> reboot is impossible, need to reset the system).
>>
>> - You can find blktrace here (while running fio), I hope it's complete since
>> the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
> Looks most are normal full stripe writes.
>  
>>> I'm trying to reproduce, but no success. So
>>> ext4->btrfs->raid5, crash
>>> btrfs->raid5, no crash
>>> right? does subvolume matter? When you create the raid5 array, does adding
>>> '--assume-clean' option change the behavior? I'd like to narrow down the issue.
>>> If you can capture the blktrace to the raid5 array, it would be great to hint
>>> us what kind of IO it is.
>> Yes Correct.
>> The subvolume doesn't matter.
>> -- assume-clean doesn't change the behaviour.
> so it's not a resync issue.
>
>> Don't forget that the system needs to be running on xen to crash, without
>> (on native kernel) it doesn't crash (or at least, I was not able to make it
>> crash).
>>>> Regarding your patch, I can't find it. Is it the one sent by Konstantin
>>>> Khlebnikov ?
>>> Right.
>> It doesn't help :(. Maybe the crash is happening a little bit later.
> ok, the patch is unlikely helpful, since the IO size isn't very big.
>
> Don't have good idea yet. My best guess so far is virtual machine introduces
> extra delay, which might trigger some race conditions which aren't seen in
> native.  I'll check if I could find something locally.
>
> Thanks,
> Shaohua


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
In reply to this post by Shaohua Li
Hi guys,

My issue is still remaining with new kernels, at least last revision of
4.10.x branch.

But I found something that can be interesting for investigations, here I
attached another .config file for kernel building, with this
configuration I'm not able to reproduce the kernel panic, I got no crash
at all with exactly the same procedure.

Tested on 4.9.16 kernel and 4.10.13 :
- config_Crash.txt : result in a crash running fio within less than 2
minutes
- config_NoCrash.txt : even after hours of fio, rebuilding arrays, etc
... no crash at all, neither no warning or anything in dmesg.

Note : config_NoCrash is coming from another server on which I had setup
similar system and which was not crashing. Tested this kernel on my
crashing system, and no crash anymore...

I can't believe how a different config can solve a kernel BUG...

If someone has any idea...

Bests,


Le 09/01/2017 à 23:44, Shaohua Li a écrit :

> On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
>> Hello,
>>
>> Replies below + :
>> - I don't know if this can help but after the crash, when the system
>> reboots, the Raid 5 stack is re-synchronizing
>> [   37.028239] md10: Warning: Device sdc1 is misaligned
>> [   37.028541] created bitmap (15 pages) for device md10
>> [   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
>> 29807 bits
>>
>> - Sometimes the kernel completely crash (lost serial + network connection),
>> sometimes only got the "BUG" dump, but still have network access (but a
>> reboot is impossible, need to reset the system).
>>
>> - You can find blktrace here (while running fio), I hope it's complete since
>> the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
> Looks most are normal full stripe writes.
>  
>>> I'm trying to reproduce, but no success. So
>>> ext4->btrfs->raid5, crash
>>> btrfs->raid5, no crash
>>> right? does subvolume matter? When you create the raid5 array, does adding
>>> '--assume-clean' option change the behavior? I'd like to narrow down the issue.
>>> If you can capture the blktrace to the raid5 array, it would be great to hint
>>> us what kind of IO it is.
>> Yes Correct.
>> The subvolume doesn't matter.
>> -- assume-clean doesn't change the behaviour.
> so it's not a resync issue.
>
>> Don't forget that the system needs to be running on xen to crash, without
>> (on native kernel) it doesn't crash (or at least, I was not able to make it
>> crash).
>>>> Regarding your patch, I can't find it. Is it the one sent by Konstantin
>>>> Khlebnikov ?
>>> Right.
>> It doesn't help :(. Maybe the crash is happening a little bit later.
> ok, the patch is unlikely helpful, since the IO size isn't very big.
>
> Don't have good idea yet. My best guess so far is virtual machine introduces
> extra delay, which might trigger some race conditions which aren't seen in
> native.  I'll check if I could find something locally.
>
> Thanks,
> Shaohua

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users

Config_Crash.txt (112K) Download Attachment
Config_NoCrash.txt (123K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium

Hi guys,

Finally the problem is still present, but harder to reproduce, I couldn't reproduce it with fio... But syncing DRBD stack finally made the kernel crash again :

May 13 05:33:49 Node_2 kernel: [ 7040.167706] ------------[ cut here ]------------
May 13 05:33:49 Node_2 kernel: [ 7040.170426] kernel BUG at drivers/md/raid5.c:527!
May 13 05:33:49 Node_2 kernel: [ 7040.173136] invalid opcode: 0000 [#1] SMP
May 13 05:33:49 Node_2 kernel: [ 7040.175820] Modules linked in: drbd lru_cache xen_acpi_processor xen_pciback xen_gntalloc xen_gntdev joydev iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac edac_core x86_pkg_temp_thermal coretemp ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw igb ixgbe gf128mul ablk_helper cryptd pcspkr mpt3sas mdio i2c_i801 ptp i2c_smbus lpc_ich xhci_pci scsi_transport_sas pps_core ioatdma dca mfd_core xhci_hcd shpchp wmi tpm_tis tpm_tis_core tpm
May 13 05:33:49 Node_2 kernel: [ 7040.188405] CPU: 0 PID: 2944 Comm: drbd_r_drbd0 Not tainted 4.9.16-gentoo #8
May 13 05:33:49 Node_2 kernel: [ 7040.191672] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016
May 13 05:33:49 Node_2 kernel: [ 7040.195033] task: ffff880268e40440 task.stack: ffffc90005f64000
May 13 05:33:49 Node_2 kernel: [ 7040.198493] RIP: e030:[<ffffffff8176c4a6>]  [<ffffffff8176c4a6>] raid5_get_active_stripe+0x566/0x670
May 13 05:33:49 Node_2 kernel: [ 7040.202157] RSP: e02b:ffffc90005f67b70  EFLAGS: 00010086
May 13 05:33:49 Node_2 kernel: [ 7040.205861] RAX: 0000000000000000 RBX: ffff880269ad9c00 RCX: dead000000000200
May 13 05:33:49 Node_2 kernel: [ 7040.209646] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8802581fca90
May 13 05:33:49 Node_2 kernel: [ 7040.213409] RBP: ffffc90005f67c10 R08: ffff8802581fcaa0 R09: 0000000034bfc400
May 13 05:33:49 Node_2 kernel: [ 7040.217207] R10: ffff8802581fca90 R11: 0000000000000001 R12: ffff880269ad9c10
May 13 05:33:49 Node_2 kernel: [ 7040.221111] R13: ffff8802581fca90 R14: ffff880268ee6f00 R15: 0000000034bfc510
May 13 05:33:49 Node_2 kernel: [ 7040.225004] FS:  0000000000000000(0000) GS:ffff880270c00000(0000) knlGS:ffff880270c00000
May 13 05:33:49 Node_2 kernel: [ 7040.229000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 05:33:49 Node_2 kernel: [ 7040.233005] CR2: 0000000000c7d2e0 CR3: 0000000264d39000 CR4: 0000000000042660
May 13 05:33:49 Node_2 kernel: [ 7040.237056] Stack:
May 13 05:33:49 Node_2 kernel: [ 7040.241073]  0000000000003af8 ffff880269ad9c00 0000000000000000 ffff880269ad9c08
May 13 05:33:49 Node_2 kernel: [ 7040.245172]  ffff880269ad9de0 ffff880200000002 0000000000000000 0000000034bfc510
May 13 05:33:49 Node_2 kernel: [ 7040.249344]  ffff8802581fca90 ffffffff81760000 ffffffff819a93b0 ffffc90005f67c10
May 13 05:33:49 Node_2 kernel: [ 7040.253395] Call Trace:
May 13 05:33:49 Node_2 kernel: [ 7040.257327]  [<ffffffff81760000>] ? raid10d+0xa00/0x12e0
May 13 05:33:49 Node_2 kernel: [ 7040.261327]  [<ffffffff819a93b0>] ? _raw_spin_lock_irq+0x10/0x30
May 13 05:33:49 Node_2 kernel: [ 7040.265336]  [<ffffffff8176c75b>] raid5_make_request+0x1ab/0xda0
May 13 05:33:49 Node_2 kernel: [ 7040.269297]  [<ffffffff811c0100>] ? kmem_cache_alloc+0x70/0x1a0
May 13 05:33:49 Node_2 kernel: [ 7040.273264]  [<ffffffff81166df5>] ? mempool_alloc_slab+0x15/0x20
May 13 05:33:49 Node_2 kernel: [ 7040.277145]  [<ffffffff810b5050>] ? wake_up_atomic_t+0x30/0x30
May 13 05:33:49 Node_2 kernel: [ 7040.281080]  [<ffffffff81776b68>] md_make_request+0xe8/0x220
May 13 05:33:49 Node_2 kernel: [ 7040.285000]  [<ffffffff813b82e0>] generic_make_request+0xd0/0x1b0
May 13 05:33:49 Node_2 kernel: [ 7040.289002]  [<ffffffffa004e75b>] drbd_submit_peer_request+0x1fb/0x4b0 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.293018]  [<ffffffffa004ef0e>] receive_RSDataReply+0x1ce/0x3b0 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.297102]  [<ffffffffa004ed40>] ? receive_rs_deallocated+0x330/0x330 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.301235]  [<ffffffffa004ed40>] ? receive_rs_deallocated+0x330/0x330 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.305331]  [<ffffffffa0050cca>] drbd_receiver+0x18a/0x2f0 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.309425]  [<ffffffffa0058de0>] ? drbd_destroy_connection+0xe0/0xe0 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.313600]  [<ffffffffa0058e2b>] drbd_thread_setup+0x4b/0x120 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.317820]  [<ffffffffa0058de0>] ? drbd_destroy_connection+0xe0/0xe0 [drbd]
May 13 05:33:49 Node_2 kernel: [ 7040.322006]  [<ffffffff81092a4a>] kthread+0xca/0xe0
May 13 05:33:49 Node_2 kernel: [ 7040.326100]  [<ffffffff81092980>] ? kthread_park+0x60/0x60
May 13 05:33:49 Node_2 kernel: [ 7040.330157]  [<ffffffff819a9945>] ret_from_fork+0x25/0x30
May 13 05:33:49 Node_2 kernel: [ 7040.334176] Code: 0f 85 b8 fc ff ff 0f 0b 0f 0b f3 90 8b 43 70 a8 01 75 f7 89 45 a0 e9 80 fd ff ff f0 ff 83 40 02 00 00 e9 d0 fc ff ff 0f 0b 0f 0b <0f> 0b 48 89 f2 48 c7 c7 88 a5 16 82 31 c0 48 c7 c6 7b de d1 81
May 13 05:33:49 Node_2 kernel: [ 7040.342995] RIP  [<ffffffff8176c4a6>] raid5_get_active_stripe+0x566/0x670
May 13 05:33:49 Node_2 kernel: [ 7040.347054]  RSP <ffffc90005f67b70>
May 13 05:33:49 Node_2 kernel: [ 7040.367142] ---[ end trace 47ae5e57e18c95c6 ]---
May 13 05:33:49 Node_2 kernel: [ 7040.391125] BUG: unable to handle kernel NULL pointer dereference at           (null)
May 13 05:33:49 Node_2 kernel: [ 7040.395306] IP: [<ffffffff810b4b0b>] __wake_up_common+0x2b/0x90
May 13 05:33:49 Node_2 kernel: [ 7040.399513] PGD 25b915067
May 13 05:33:49 Node_2 kernel: [ 7040.399562] PUD 26474b067
May 13 05:33:49 Node_2 kernel: [ 7040.403751] PMD 0
May 13 05:33:49 Node_2 kernel: [ 7040.403785]
May 13 05:33:49 Node_2 kernel: [ 7040.408059] Oops: 0000 [#2] SMP

Really need some help to fix it...

Bests,


Le 13/05/2017 à 02:06, MasterPrenium a écrit :
Hi guys,

My issue is still remaining with new kernels, at least last revision of 4.10.x branch.

But I found something that can be interesting for investigations, here I attached another .config file for kernel building, with this configuration I'm not able to reproduce the kernel panic, I got no crash at all with exactly the same procedure.

Tested on 4.9.16 kernel and 4.10.13 :
- config_Crash.txt : result in a crash running fio within less than 2 minutes
- config_NoCrash.txt : even after hours of fio, rebuilding arrays, etc ... no crash at all, neither no warning or anything in dmesg.

Note : config_NoCrash is coming from another server on which I had setup similar system and which was not crashing. Tested this kernel on my crashing system, and no crash anymore...

I can't believe how a different config can solve a kernel BUG...

If someone has any idea...

Bests,


Le 09/01/2017 à 23:44, Shaohua Li a écrit :
On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
Hello,

Replies below + :
- I don't know if this can help but after the crash, when the system
reboots, the Raid 5 stack is re-synchronizing
[   37.028239] md10: Warning: Device sdc1 is misaligned
[   37.028541] created bitmap (15 pages) for device md10
[   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
29807 bits

- Sometimes the kernel completely crash (lost serial + network connection),
sometimes only got the "BUG" dump, but still have network access (but a
reboot is impossible, need to reset the system).

- You can find blktrace here (while running fio), I hope it's complete since
the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
Looks most are normal full stripe writes.
 
I'm trying to reproduce, but no success. So
ext4->btrfs->raid5, crash
btrfs->raid5, no crash
right? does subvolume matter? When you create the raid5 array, does adding
'--assume-clean' option change the behavior? I'd like to narrow down the issue.
If you can capture the blktrace to the raid5 array, it would be great to hint
us what kind of IO it is.
Yes Correct.
The subvolume doesn't matter.
-- assume-clean doesn't change the behaviour.
so it's not a resync issue.

Don't forget that the system needs to be running on xen to crash, without
(on native kernel) it doesn't crash (or at least, I was not able to make it
crash).
Regarding your patch, I can't find it. Is it the one sent by Konstantin
Khlebnikov ?
Right.
It doesn't help :(. Maybe the crash is happening a little bit later.
ok, the patch is unlikely helpful, since the IO size isn't very big.

Don't have good idea yet. My best guess so far is virtual machine introduces
extra delay, which might trigger some race conditions which aren't seen in
native.  I'll check if I could find something locally.

Thanks,
Shaohua



_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

MasterPrenium
In reply to this post by Shaohua Li

Hi Shaohua,

It seems this patch fixed my issue ! Had the issue remaining in 4.13.3, after patched with the following patch, issue seems to be gone. I can't reproduce it anymore.

Thanks anyway ;)

From: Shaohua Li [hidden email]

commit 3664847d95e60a9a943858b7800f8484669740fc upstream.

We have a race condition in below scenario, say have 3 continuous stripes, sh1,
sh2 and sh3, sh1 is the stripe_head of sh2 and sh3:

CPU1				CPU2				CPU3
handle_stripe(sh3)
				stripe_add_to_batch_list(sh3)
				-> lock(sh2, sh3)
				-> lock batch_lock(sh1)
				-> add sh3 to batch_list of sh1
				-> unlock batch_lock(sh1)
								clear_batch_ready(sh1)
								-> lock(sh1) and batch_lock(sh1)
								-> clear STRIPE_BATCH_READY for all stripes in batch_list
								-> unlock(sh1) and batch_lock(sh1)
->clear_batch_ready(sh3)
-->test_and_clear_bit(STRIPE_BATCH_READY, sh3)
--->return 0 as sh->batch == NULL
				-> sh3->batch_head = sh1
				-> unlock (sh2, sh3)

In CPU1, handle_stripe will continue handle sh3 even it's in batch stripe list
of sh1. By moving sh3->batch_head assignment in to batch_lock, we make it
impossible to clear STRIPE_BATCH_READY before batch_head is set.

Thanks Stephane for helping debug this tricky issue.

Reported-and-tested-by: Stephane Thiell [hidden email]
Signed-off-by: Shaohua Li [hidden email]
Signed-off-by: Greg Kroah-Hartman [hidden email]

---
 drivers/md/raid5.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -812,6 +812,14 @@ static void stripe_add_to_batch_list(str
 			spin_unlock(&head->batch_head->batch_lock);
 			goto unlock_out;
 		}
+		/*
+		 * We must assign batch_head of this stripe within the
+		 * batch_lock, otherwise clear_batch_ready of batch head
+		 * stripe could clear BATCH_READY bit of this stripe and
+		 * this stripe->batch_head doesn't get assigned, which
+		 * could confuse clear_batch_ready for this stripe
+		 */
+		sh->batch_head = head->batch_head;
 
 		/*
 		 * at this point, head's BATCH_READY could be cleared, but we
@@ -819,8 +827,6 @@ static void stripe_add_to_batch_list(str
 		 */
 		list_add(&sh->batch_list, &head->batch_list);
 		spin_unlock(&head->batch_head->batch_lock);
-
-		sh->batch_head = head->batch_head;
 	} else {
 		head->batch_head = head;
 		sh->batch_head = head->batch_head;

Le 09/01/2017 à 23:44, Shaohua Li a écrit :
On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
Hello,

Replies below + :
- I don't know if this can help but after the crash, when the system
reboots, the Raid 5 stack is re-synchronizing
[   37.028239] md10: Warning: Device sdc1 is misaligned
[   37.028541] created bitmap (15 pages) for device md10
[   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
29807 bits

- Sometimes the kernel completely crash (lost serial + network connection),
sometimes only got the "BUG" dump, but still have network access (but a
reboot is impossible, need to reset the system).

- You can find blktrace here (while running fio), I hope it's complete since
the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
Looks most are normal full stripe writes.
 
I'm trying to reproduce, but no success. So
ext4->btrfs->raid5, crash
btrfs->raid5, no crash
right? does subvolume matter? When you create the raid5 array, does adding
'--assume-clean' option change the behavior? I'd like to narrow down the issue.
If you can capture the blktrace to the raid5 array, it would be great to hint
us what kind of IO it is.
Yes Correct.
The subvolume doesn't matter.
-- assume-clean doesn't change the behaviour.
so it's not a resync issue.

Don't forget that the system needs to be running on xen to crash, without
(on native kernel) it doesn't crash (or at least, I was not able to make it
crash).
Regarding your patch, I can't find it. Is it the one sent by Konstantin
Khlebnikov ?
Right.
It doesn't help :(. Maybe the crash is happening a little bit later.
ok, the patch is unlikely helpful, since the IO size isn't very big.

Don't have good idea yet. My best guess so far is virtual machine introduces
extra delay, which might trigger some race conditions which aren't seen in
native.  I'll check if I could find something locally.

Thanks,
Shaohua


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users