domU freeze on Debian stretch

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

domU freeze on Debian stretch

Valentin Vidic
Hi,

Having a strange problem with a database server after upgrading
to Debian stretch.  Shortly after boot all disk IO on the system
hangs and the messages bellow appear although the load on the
machine is almost 0.  The only way to restart the system is
using xl destroy.  When this happens the system is booted with
all 56 VCPUs and reducing this to a lower number like 24 seems
to help (no hangs after that).  Could this be a problem with the
Xen scheduler since all threads listed hang in __schedule?

Here is the basic system info:

release                : 4.9.0-3-amd64
version                : #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
machine                : x86_64
nr_cpus                : 56
max_cpu_id             : 55
nr_nodes               : 2
cores_per_socket       : 14
threads_per_core       : 2
cpu_mhz                : 2400
hw_caps                : b7ebfbff:77fef3ff:2c100800:00000121:00000001:001cbfbb:00000000:00000100
virt_caps              : hvm hvm_directio
total_memory           : 262030
free_memory            : 154236
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 8
xen_extra              : .1
xen_version            : 4.8.1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          :
xen_commandline        : placeholder dom0_mem=2048M com2=115200,8n1 console=com2,vga
cc_compiler            : gcc (Debian 6.3.0-16) 6.3.0 20170425
cc_compile_by          : ian.jackson
cc_compile_domain      : eu.citrix.com
cc_compile_date        : Tue May  2 14:06:04 UTC 2017
build_id               : 0b619fa14fca0e6ca76f2a8b52eba64d60aa37de
xend_config_format     : 4

[ 1330.140221] INFO: task postgres:980 blocked for more than 120 seconds.
[ 1330.140237]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1330.140241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.140246] postgres        D    0   980    838 0x00000000
[ 1330.140254]  ffff8830e444d000 0000000000000000 ffff8830e504c100 ffff883104a98240
[ 1330.140261]  ffff8830ebff5040 ffffc9005aa47860 ffffffff816015d3 0000000000000001
[ 1330.140267]  00ff8830e02471c0 ffff883104a98240 ffffffff81307849 ffff8830e504c100
[ 1330.140274] Call Trace:
[ 1330.140288]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1330.140297]  [<ffffffff81307849>] ? blk_mq_flush_plug_list+0x139/0x160
[ 1330.140302]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1330.140306]  [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
[ 1330.140313]  [<ffffffff812fc9dd>] ? blk_flush_plug_list+0xbd/0x230
[ 1330.140324]  [<ffffffff8117f184>] ? mempool_alloc+0x64/0x190
[ 1330.140332]  [<ffffffff8101b5f1>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1330.140338]  [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
[ 1330.140349]  [<ffffffff81357534>] ? __sbitmap_queue_get+0x24/0x90
[ 1330.140356]  [<ffffffff81307dd0>] ? bt_get.isra.6+0x160/0x220
[ 1330.140364]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1330.140370]  [<ffffffff81308143>] ? blk_mq_get_tag+0x23/0x90
[ 1330.140375]  [<ffffffff81303a3a>] ? __blk_mq_alloc_request+0x1a/0x220
[ 1330.140382]  [<ffffffff813048e9>] ? blk_mq_map_request+0xd9/0x170
[ 1330.140387]  [<ffffffff81307118>] ? blk_mq_make_request+0xc8/0x570
[ 1330.140393]  [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
[ 1330.140399]  [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[ 1330.140404]  [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
[ 1330.140434]  [<ffffffffc010a0a8>] ? ext4_io_submit+0x48/0x60 [ext4]
[ 1330.140447]  [<ffffffffc010a316>] ? ext4_bio_write_page+0x236/0x4d0 [ext4]
[ 1330.140458]  [<ffffffffc00ff985>] ? mpage_submit_page+0x55/0x70 [ext4]
[ 1330.140469]  [<ffffffffc00ffbd3>] ? mpage_map_and_submit_buffers+0x133/0x230 [ext4]
[ 1330.140482]  [<ffffffffc0105637>] ? ext4_writepages+0x7b7/0xd50 [ext4]
[ 1330.140489]  [<ffffffff8117d5b8>] ? __filemap_fdatawrite_range+0xc8/0x100
[ 1330.140495]  [<ffffffff8117d6f3>] ? filemap_write_and_wait_range+0x33/0x70
[ 1330.140506]  [<ffffffffc00fbbe1>] ? ext4_sync_file+0xe1/0x380 [ext4]
[ 1330.140515]  [<ffffffff81235bf8>] ? do_fsync+0x38/0x60
[ 1330.140521]  [<ffffffff81235e1c>] ? SyS_fsync+0xc/0x10
[ 1330.140529]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1450.972155] INFO: task kworker/u112:0:6 blocked for more than 120 seconds.
[ 1450.972180]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1450.972189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.972201] kworker/u112:0  D    0     6      2 0x00000000
[ 1450.972227] Workqueue: writeback wb_workfn (flush-202:0)
[ 1450.972239]  ffff8830e3d72000 0000000000000000 ffff8830ebf1e000 ffff883104a58240
[ 1450.972255]  ffff8830ebfef000 ffffc900588ff8a0 ffffffff816015d3 0000000000000000
[ 1450.972270]  00ff8830e3f00000 ffff883104a58240 ffffc900588ff8c0 ffff8830ebf1e000
[ 1450.972287] Call Trace:
[ 1450.972304]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1450.972316]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1450.972341]  [<ffffffffc0057086>] ? wait_transaction_locked+0x86/0xc0 [jbd2]
[ 1450.972356]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1450.972369]  [<ffffffffc00572a8>] ? add_transaction_credits+0x1b8/0x290 [jbd2]
[ 1450.972385]  [<ffffffffc00574d5>] ? start_this_handle+0x105/0x400 [jbd2]
[ 1450.972398]  [<ffffffff811e00cc>] ? kmem_cache_alloc+0xbc/0x520
[ 1450.972425]  [<ffffffffc00579f9>] ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
[ 1450.972455]  [<ffffffffc01052d5>] ? ext4_writepages+0x455/0xd50 [ext4]
[ 1450.972463]  [<ffffffffc01a1680>] ? blkif_queue_rq+0x800/0x800 [xen_blkfront]
[ 1450.972468]  [<ffffffffc01a1488>] ? blkif_queue_rq+0x608/0x800 [xen_blkfront]
[ 1450.972476]  [<ffffffff81328236>] ? cpumask_next_and+0x26/0x40
[ 1450.972482]  [<ffffffff812308ed>] ? __writeback_single_inode+0x3d/0x310
[ 1450.972489]  [<ffffffff810b23ce>] ? find_busiest_group+0x3e/0x4d0
[ 1450.972494]  [<ffffffff81231081>] ? writeback_sb_inodes+0x221/0x4f0
[ 1450.972499]  [<ffffffff812313d7>] ? __writeback_inodes_wb+0x87/0xb0
[ 1450.972505]  [<ffffffff81231748>] ? wb_writeback+0x278/0x310
[ 1450.972512]  [<ffffffff8123209e>] ? wb_workfn+0x2ae/0x3d0
[ 1450.972518]  [<ffffffff81090384>] ? process_one_work+0x184/0x410
[ 1450.972523]  [<ffffffff8109065d>] ? worker_thread+0x4d/0x480
[ 1450.972527]  [<ffffffff81090610>] ? process_one_work+0x410/0x410
[ 1450.972533]  [<ffffffff810965d7>] ? kthread+0xd7/0xf0
[ 1450.972538]  [<ffffffff81096500>] ? kthread_park+0x60/0x60
[ 1450.972544]  [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
[ 1450.972630] INFO: task jbd2/xvda-8:604 blocked for more than 120 seconds.
[ 1450.972637]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1450.972641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.972647] jbd2/xvda-8     D    0   604      2 0x00000000
[ 1450.972653]  ffff8830e8048c00 0000000000000000 ffff8830ddf91040 ffff883104a18240
[ 1450.972662]  ffffffff81c0e500 ffffc9005a09bca0 ffffffff816015d3 ffff8830ddee3088
[ 1450.972670]  0000000000000201 ffff883104a18240 ffffc9005a09bd80 ffff8830ddf91040
[ 1450.972678] Call Trace:
[ 1450.972683]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1450.972688]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1450.972693]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1450.972700]  [<ffffffffc005a26f>] ? jbd2_journal_commit_transaction+0x25f/0x17a0 [jbd2]
[ 1450.972707]  [<ffffffff810abdc1>] ? update_curr+0xe1/0x160
[ 1450.972712]  [<ffffffff810aabd4>] ? account_entity_dequeue+0xa4/0xc0
[ 1450.972719]  [<ffffffff810151b4>] ? xen_load_sp0+0x84/0x160
[ 1450.972725]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1450.972731]  [<ffffffff8109d95d>] ? finish_task_switch+0x7d/0x1f0
[ 1450.972737]  [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[ 1450.972745]  [<ffffffffc005fbc2>] ? kjournald2+0xc2/0x260 [jbd2]
[ 1450.972750]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1450.972757]  [<ffffffffc005fb00>] ? commit_timeout+0x10/0x10 [jbd2]
[ 1450.972765]  [<ffffffff810965d7>] ? kthread+0xd7/0xf0
[ 1450.972770]  [<ffffffff81096500>] ? kthread_park+0x60/0x60
[ 1450.972775]  [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30
[ 1450.972786] INFO: task rs:main Q:Reg:777 blocked for more than 120 seconds.
[ 1450.972791]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1450.972794] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.972801] rs:main Q:Reg   D    0   777      1 0x00000000
[ 1450.972806]  ffff8830ddec6000 0000000000000000 ffff8830dde13080 ffff883104a18240
[ 1450.972814]  ffffffff81c0e500 ffffc9005a52bb48 ffffffff816015d3 ffff8830e49d7090
[ 1450.972822]  00ffc9005a52bb4c ffff883104a18240 ffffc9005a52bb68 ffff8830dde13080
[ 1450.972830] Call Trace:
[ 1450.972834]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1450.972839]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1450.972845]  [<ffffffffc0057086>] ? wait_transaction_locked+0x86/0xc0 [jbd2]
[ 1450.972850]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1450.972856]  [<ffffffffc00572a8>] ? add_transaction_credits+0x1b8/0x290 [jbd2]
[ 1450.972866]  [<ffffffff8107c66d>] ? __local_bh_enable_ip+0x1d/0x80
[ 1450.972872]  [<ffffffffc00574d5>] ? start_this_handle+0x105/0x400 [jbd2]
[ 1450.972882]  [<ffffffff8154667a>] ? ip_output+0x6a/0xf0
[ 1450.972888]  [<ffffffffc00579f9>] ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
[ 1450.972904]  [<ffffffffc010907d>] ? ext4_dirty_inode+0x2d/0x60 [ext4]
[ 1450.972909]  [<ffffffff812306c5>] ? __mark_inode_dirty+0x165/0x350
[ 1450.972915]  [<ffffffff8121e029>] ? generic_update_time+0x79/0xd0
[ 1450.972920]  [<ffffffff8121e186>] ? current_time+0x36/0x70
[ 1450.972925]  [<ffffffff8121e27c>] ? file_update_time+0xbc/0x110
[ 1450.972932]  [<ffffffff8117d939>] ? __generic_file_write_iter+0x99/0x1b0
[ 1450.972943]  [<ffffffffc00fb180>] ? ext4_file_write_iter+0x90/0x370 [ext4]
[ 1450.972951]  [<ffffffff814e2880>] ? sock_sendmsg+0x30/0x40
[ 1450.972957]  [<ffffffff8120168a>] ? new_sync_write+0xda/0x130
[ 1450.972963]  [<ffffffff81201df0>] ? vfs_write+0xb0/0x190
[ 1450.972968]  [<ffffffff812031d2>] ? SyS_write+0x52/0xc0
[ 1450.972974]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1450.972983] INFO: task postgres:980 blocked for more than 120 seconds.
[ 1450.972987]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1450.972992] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.972999] postgres        D    0   980    838 0x00000000
[ 1450.973005]  ffff8830e444d000 0000000000000000 ffff8830e504c100 ffff883104a98240
[ 1450.973014]  ffff8830ebff5040 ffffc9005aa47860 ffffffff816015d3 0000000000000001
[ 1450.973022]  00ff8830e02471c0 ffff883104a98240 ffffffff81307849 ffff8830e504c100
[ 1451.173119] Call Trace:
[ 1451.173125]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1451.173131]  [<ffffffff81307849>] ? blk_mq_flush_plug_list+0x139/0x160
[ 1451.173136]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1451.173141]  [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
[ 1451.173148]  [<ffffffff812fc9dd>] ? blk_flush_plug_list+0xbd/0x230
[ 1451.173154]  [<ffffffff8117f184>] ? mempool_alloc+0x64/0x190
[ 1451.173165]  [<ffffffff8101b5f1>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1451.173174]  [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
[ 1451.173186]  [<ffffffff81357534>] ? __sbitmap_queue_get+0x24/0x90
[ 1451.173197]  [<ffffffff81307dd0>] ? bt_get.isra.6+0x160/0x220
[ 1451.173208]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1451.173218]  [<ffffffff81308143>] ? blk_mq_get_tag+0x23/0x90
[ 1451.173226]  [<ffffffff81303a3a>] ? __blk_mq_alloc_request+0x1a/0x220
[ 1451.173233]  [<ffffffff813048e9>] ? blk_mq_map_request+0xd9/0x170
[ 1451.173242]  [<ffffffff81307118>] ? blk_mq_make_request+0xc8/0x570
[ 1451.173251]  [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
[ 1451.173259]  [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[ 1451.173264]  [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
[ 1451.173276]  [<ffffffffc010a0a8>] ? ext4_io_submit+0x48/0x60 [ext4]
[ 1451.173287]  [<ffffffffc010a316>] ? ext4_bio_write_page+0x236/0x4d0 [ext4]
[ 1451.173298]  [<ffffffffc00ff985>] ? mpage_submit_page+0x55/0x70 [ext4]
[ 1451.173309]  [<ffffffffc00ffbd3>] ? mpage_map_and_submit_buffers+0x133/0x230 [ext4]
[ 1451.173321]  [<ffffffffc0105637>] ? ext4_writepages+0x7b7/0xd50 [ext4]
[ 1451.173328]  [<ffffffff8117d5b8>] ? __filemap_fdatawrite_range+0xc8/0x100
[ 1451.173333]  [<ffffffff8117d6f3>] ? filemap_write_and_wait_range+0x33/0x70
[ 1451.173343]  [<ffffffffc00fbbe1>] ? ext4_sync_file+0xe1/0x380 [ext4]
[ 1451.173349]  [<ffffffff81235bf8>] ? do_fsync+0x38/0x60
[ 1451.173355]  [<ffffffff81235e1c>] ? SyS_fsync+0xc/0x10
[ 1451.173361]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1451.173366] INFO: task postgres:982 blocked for more than 120 seconds.
[ 1451.173370]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1451.173374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.173381] postgres        D    0   982    838 0x00000004
[ 1451.173386]  ffff8830dddb2c00 0000000000000000 ffff8830eaa76080 ffff883104a58240
[ 1451.173394]  ffff8830ebfef000 ffffc9005aa4fb48 ffffffff816015d3 ffff8830ea7dd880
[ 1451.173403]  00ff8830ea7dde80 ffff883104a58240 ffffc9005aa4fb68 ffff8830eaa76080
[ 1451.173410] Call Trace:
[ 1451.173414]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1451.173419]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1451.173425]  [<ffffffffc0057086>] ? wait_transaction_locked+0x86/0xc0 [jbd2]
[ 1451.173434]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1451.173444]  [<ffffffffc00572a8>] ? add_transaction_credits+0x1b8/0x290 [jbd2]
[ 1451.173452]  [<ffffffff810151b4>] ? xen_load_sp0+0x84/0x160
[ 1451.173458]  [<ffffffffc00574d5>] ? start_this_handle+0x105/0x400 [jbd2]
[ 1451.173463]  [<ffffffff8109d95d>] ? finish_task_switch+0x7d/0x1f0
[ 1451.173469]  [<ffffffffc00579f9>] ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
[ 1451.173480]  [<ffffffffc010907d>] ? ext4_dirty_inode+0x2d/0x60 [ext4]
[ 1451.173485]  [<ffffffff812306c5>] ? __mark_inode_dirty+0x165/0x350
[ 1451.173492]  [<ffffffff8121e029>] ? generic_update_time+0x79/0xd0
[ 1451.173497]  [<ffffffff8121e186>] ? current_time+0x36/0x70
[ 1451.173501]  [<ffffffff8121e27c>] ? file_update_time+0xbc/0x110
[ 1451.173507]  [<ffffffff8117d939>] ? __generic_file_write_iter+0x99/0x1b0
[ 1451.173517]  [<ffffffffc00fb180>] ? ext4_file_write_iter+0x90/0x370 [ext4]
[ 1451.173524]  [<ffffffff810de182>] ? __call_rcu.constprop.70+0xd2/0x290
[ 1451.173529]  [<ffffffff8120168a>] ? new_sync_write+0xda/0x130
[ 1451.173534]  [<ffffffff81201df0>] ? vfs_write+0xb0/0x190
[ 1451.173539]  [<ffffffff812031d2>] ? SyS_write+0x52/0xc0
[ 1451.173544]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1451.173559] INFO: task salt-minion:1442 blocked for more than 120 seconds.
[ 1451.173563]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1451.173567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.173573] salt-minion     D    0  1442   1300 0x00000000
[ 1451.173578]  ffff8830df748800 0000000000000000 ffff8830e60b6100 ffff883104a58240
[ 1451.173586]  ffff8830ebfef000 ffffc9005a2f7b48 ffffffff816015d3 0000000000000000
[ 1451.173593]  0000011abebc79a8 ffff883104a58240 ffffc9005a2f7b68 ffff8830e60b6100
[ 1451.173600] Call Trace:
[ 1451.173604]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1451.173609]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1451.373714]  [<ffffffffc0057086>] ? wait_transaction_locked+0x86/0xc0 [jbd2]
[ 1451.373723]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1451.373730]  [<ffffffffc00572a8>] ? add_transaction_credits+0x1b8/0x290 [jbd2]
[ 1451.373737]  [<ffffffff810a0874>] ? ttwu_do_wakeup+0x14/0xd0
[ 1451.373743]  [<ffffffff81605e46>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[ 1451.373755]  [<ffffffff810a1424>] ? try_to_wake_up+0x54/0x3a0
[ 1451.373766]  [<ffffffffc00574d5>] ? start_this_handle+0x105/0x400 [jbd2]
[ 1451.373775]  [<ffffffff8120d2f2>] ? lookup_fast+0x52/0x2e0
[ 1451.373781]  [<ffffffffc00579f9>] ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
[ 1451.373793]  [<ffffffffc010907d>] ? ext4_dirty_inode+0x2d/0x60 [ext4]
[ 1451.373798]  [<ffffffff812306c5>] ? __mark_inode_dirty+0x165/0x350
[ 1451.373805]  [<ffffffff8121e029>] ? generic_update_time+0x79/0xd0
[ 1451.373810]  [<ffffffff8121e186>] ? current_time+0x36/0x70
[ 1451.373817]  [<ffffffff8121e27c>] ? file_update_time+0xbc/0x110
[ 1451.373823]  [<ffffffff8117d939>] ? __generic_file_write_iter+0x99/0x1b0
[ 1451.373835]  [<ffffffffc00fb180>] ? ext4_file_write_iter+0x90/0x370 [ext4]
[ 1451.373841]  [<ffffffff81356768>] ? strncpy_from_user+0x48/0x160
[ 1451.373846]  [<ffffffff8120726d>] ? cp_new_stat+0x14d/0x180
[ 1451.373851]  [<ffffffff8120168a>] ? new_sync_write+0xda/0x130
[ 1451.373856]  [<ffffffff81201df0>] ? vfs_write+0xb0/0x190
[ 1451.373860]  [<ffffffff812031d2>] ? SyS_write+0x52/0xc0
[ 1451.373866]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1451.373876] INFO: task postgres:2053 blocked for more than 120 seconds.
[ 1451.373881]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1451.373885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.373891] postgres        D    0  2053    838 0x00000000
[ 1451.373898]  ffff8830e8822c00 0000000000000000 ffff8830df502080 ffff883104a18240
[ 1451.373906]  ffffffff81c0e500 ffffc9005a1b3870 ffffffff816015d3 ffff8830e89ce4e8
[ 1451.373914]  00ff8830e578c85c ffff883104a18240 00000fade3faec00 ffff8830df502080
[ 1451.373923] Call Trace:
[ 1451.373928]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1451.373933]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1451.373938]  [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
[ 1451.373944]  [<ffffffff8101b5f1>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1451.373949]  [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
[ 1451.373956]  [<ffffffff81357534>] ? __sbitmap_queue_get+0x24/0x90
[ 1451.373962]  [<ffffffff81307dd0>] ? bt_get.isra.6+0x160/0x220
[ 1451.373967]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1451.373972]  [<ffffffff81308143>] ? blk_mq_get_tag+0x23/0x90
[ 1451.373976]  [<ffffffff81303a3a>] ? __blk_mq_alloc_request+0x1a/0x220
[ 1451.373981]  [<ffffffff813048e9>] ? blk_mq_map_request+0xd9/0x170
[ 1451.373986]  [<ffffffff81307118>] ? blk_mq_make_request+0xc8/0x570
[ 1451.373993]  [<ffffffff811f866f>] ? mem_cgroup_commit_charge+0x7f/0x4b0
[ 1451.374000]  [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
[ 1451.374005]  [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
[ 1451.374010]  [<ffffffff8117c193>] ? add_to_page_cache_lru+0x73/0xe0
[ 1451.374024]  [<ffffffffc014d076>] ? ext4_mpage_readpages+0x3e6/0x8d0 [ext4]
[ 1451.374030]  [<ffffffff811d5ec1>] ? alloc_pages_current+0x91/0x140
[ 1451.374037]  [<ffffffff8118b667>] ? __do_page_cache_readahead+0x197/0x240
[ 1451.374043]  [<ffffffff8132e18e>] ? radix_tree_lookup_slot+0x1e/0x50
[ 1451.374048]  [<ffffffff8118b840>] ? ondemand_readahead+0x130/0x220
[ 1451.374053]  [<ffffffff8117e2fe>] ? generic_file_read_iter+0x63e/0x8a0
[ 1451.374058]  [<ffffffff81201537>] ? new_sync_read+0xd7/0x120
[ 1451.374063]  [<ffffffff81201ca1>] ? vfs_read+0x91/0x130
[ 1451.374068]  [<ffffffff81203112>] ? SyS_read+0x52/0xc0
[ 1451.374073]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1451.374088] INFO: task postgres:3879 blocked for more than 120 seconds.
[ 1451.374093]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1451.374096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.374102] postgres        D    0  3879    838 0x00000000
[ 1451.374106]  ffff8830e444dc00 0000000000000000 ffff8830e73ad100 ffff883104a18240
[ 1451.374114]  ffffffff81c0e500 ffffc9005a5ef870 ffffffff816015d3 28c57f35af96329a
[ 1451.374122]  00000000024200ca ffff883104a18240 ffffc9005a5ef8d8 ffff8830e73ad100
[ 1451.374130] Call Trace:
[ 1451.374134]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1451.374138]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1451.374143]  [<ffffffff81604e73>] ? schedule_timeout+0x243/0x310
[ 1451.374148]  [<ffffffff8101b5f1>] ? xen_clocksource_get_cycles+0x11/0x20
[ 1451.374153]  [<ffffffff8160133d>] ? io_schedule_timeout+0x9d/0x100
[ 1451.374158]  [<ffffffff81357534>] ? __sbitmap_queue_get+0x24/0x90
[ 1451.374164]  [<ffffffff81307dd0>] ? bt_get.isra.6+0x160/0x220
[ 1451.374168]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1451.374173]  [<ffffffff81308143>] ? blk_mq_get_tag+0x23/0x90
[ 1451.374178]  [<ffffffff81303a3a>] ? __blk_mq_alloc_request+0x1a/0x220
[ 1451.374182]  [<ffffffff813048e9>] ? blk_mq_map_request+0xd9/0x170
[ 1451.574307]  [<ffffffff81307118>] ? blk_mq_make_request+0xc8/0x570
[ 1451.574318]  [<ffffffff811f866f>] ? mem_cgroup_commit_charge+0x7f/0x4b0
[ 1451.574324]  [<ffffffff812fb080>] ? generic_make_request+0x110/0x2d0
[ 1451.574329]  [<ffffffff812fb2b6>] ? submit_bio+0x76/0x140
[ 1451.574334]  [<ffffffff8117c193>] ? add_to_page_cache_lru+0x73/0xe0
[ 1451.574353]  [<ffffffffc014d076>] ? ext4_mpage_readpages+0x3e6/0x8d0 [ext4]
[ 1451.574365]  [<ffffffff811d5ec1>] ? alloc_pages_current+0x91/0x140
[ 1451.574374]  [<ffffffff8118b667>] ? __do_page_cache_readahead+0x197/0x240
[ 1451.574386]  [<ffffffff8132e18e>] ? radix_tree_lookup_slot+0x1e/0x50
[ 1451.574396]  [<ffffffff8118b840>] ? ondemand_readahead+0x130/0x220
[ 1451.574406]  [<ffffffff8117e2fe>] ? generic_file_read_iter+0x63e/0x8a0
[ 1451.574416]  [<ffffffff81201537>] ? new_sync_read+0xd7/0x120
[ 1451.574424]  [<ffffffff81201ca1>] ? vfs_read+0x91/0x130
[ 1451.574431]  [<ffffffff81203112>] ? SyS_read+0x52/0xc0
[ 1451.574437]  [<ffffffff8160627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1571.804126] INFO: task kworker/u112:0:6 blocked for more than 120 seconds.
[ 1571.804151]       Not tainted 4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u2
[ 1571.804159] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1571.804171] kworker/u112:0  D    0     6      2 0x00000000
[ 1571.804194] Workqueue: writeback wb_workfn (flush-202:0)
[ 1571.804206]  ffff8830e3d72000 0000000000000000 ffff8830ebf1e000 ffff883104a58240
[ 1571.804222]  ffff8830ebfef000 ffffc900588ff8a0 ffffffff816015d3 0000000000000000
[ 1571.804238]  00ff8830e3f00000 ffff883104a58240 ffffc900588ff8c0 ffff8830ebf1e000
[ 1571.804254] Call Trace:
[ 1571.804270]  [<ffffffff816015d3>] ? __schedule+0x233/0x6d0
[ 1571.804281]  [<ffffffff81601aa2>] ? schedule+0x32/0x80
[ 1571.804305]  [<ffffffffc0057086>] ? wait_transaction_locked+0x86/0xc0 [jbd2]
[ 1571.804322]  [<ffffffff810b8d50>] ? prepare_to_wait_event+0xf0/0xf0
[ 1571.804336]  [<ffffffffc00572a8>] ? add_transaction_credits+0x1b8/0x290 [jbd2]
[ 1571.804355]  [<ffffffffc00574d5>] ? start_this_handle+0x105/0x400 [jbd2]
[ 1571.804372]  [<ffffffff811e00cc>] ? kmem_cache_alloc+0xbc/0x520
[ 1571.804386]  [<ffffffffc00579f9>] ? jbd2__journal_start+0xd9/0x1e0 [jbd2]
[ 1571.804436]  [<ffffffffc01052d5>] ? ext4_writepages+0x455/0xd50 [ext4]
[ 1571.804458]  [<ffffffffc01a1680>] ? blkif_queue_rq+0x800/0x800 [xen_blkfront]
[ 1571.804471]  [<ffffffffc01a1488>] ? blkif_queue_rq+0x608/0x800 [xen_blkfront]
[ 1571.804488]  [<ffffffff81328236>] ? cpumask_next_and+0x26/0x40
[ 1571.804517]  [<ffffffff812308ed>] ? __writeback_single_inode+0x3d/0x310
[ 1571.804531]  [<ffffffff810b23ce>] ? find_busiest_group+0x3e/0x4d0
[ 1571.804539]  [<ffffffff81231081>] ? writeback_sb_inodes+0x221/0x4f0
[ 1571.804547]  [<ffffffff812313d7>] ? __writeback_inodes_wb+0x87/0xb0
[ 1571.804554]  [<ffffffff81231748>] ? wb_writeback+0x278/0x310
[ 1571.804561]  [<ffffffff8123209e>] ? wb_workfn+0x2ae/0x3d0
[ 1571.804567]  [<ffffffff81090384>] ? process_one_work+0x184/0x410
[ 1571.804574]  [<ffffffff8109065d>] ? worker_thread+0x4d/0x480
[ 1571.804578]  [<ffffffff81090610>] ? process_one_work+0x410/0x410
[ 1571.804585]  [<ffffffff810965d7>] ? kthread+0xd7/0xf0
[ 1571.804591]  [<ffffffff81096500>] ? kthread_park+0x60/0x60
[ 1571.804598]  [<ffffffff816064f5>] ? ret_from_fork+0x25/0x30

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: domU freeze on Debian stretch

Valentin Vidic
On Wed, Jul 26, 2017 at 09:06:08AM +0200, Valentin Vidic wrote:
> Having a strange problem with a database server after upgrading
> to Debian stretch.  Shortly after boot all disk IO on the system
> hangs and the messages bellow appear although the load on the
> machine is almost 0.  The only way to restart the system is
> using xl destroy.  When this happens the system is booted with
> all 56 VCPUs and reducing this to a lower number like 24 seems
> to help (no hangs after that).  Could this be a problem with the
> Xen scheduler since all threads listed hang in __schedule?

Setting gnttab_max_frames=256 seems to help with this issue,
as suggested by the xen developers. Just found a SUSE page with
the same info:

  I/O to LUNs hang / stall under high load when using xen-blkfront
  https://www.novell.com/support/kb/doc.php?id=7018590

--
Valentin

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xen.org/xen-users
Loading...