XEN domU hangs with frozen I/O

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

XEN domU hangs with frozen I/O

Jamil Anwar Zaman
I installed xen in the Centos 7.4.1708 (dom0) following the Xen4CentOS guide.

And then I installed 3 dom-U guest (2 CentOS7, and 1 Windows server) 
with full virtualization. After the initial testing when I made 
system production available, only the linux systems periodically 
hangs but Windows server system is running alright.

The xen kernel is 4.9.63-29.el7.x86_64. The dom-U linux hosts are 
CentOS 7 (3.10.0-693.5.2.el7.x86_64), and the windows host is 
Windows Server 2012 R2. The linux kernel for dom-U hosts hangs 
with the following kernel hang message:

> [ 3746.780097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 3746.780223] INFO: task jbd2/xvdb6-8:8173 blocked for more than 120 seconds.

The tasks in above message are different depending what was running
at that moment.

The logs end at some point until the new reboot. Sometimes it's still
possible to log on to the system, but nothing really works. It is like
all IO to the virtual block devices is suspended indefinitely. Until 
this happens, the systems seems to work without issues.

Something like 'ls' on a directory listed before still gets a result, 
but everything 'new', i.e. 'vim somefile' will cause the shell to stall. 
sar -u reveals hi I/O wait.

Similar problem is reported for xen for other kernel (debian/suse)
https://www.novell.com/support/kb/doc.php?id=7018590] and following
their suggestion I have raised gnttab_max_frames=xxx to 256. It was 
stable 1 weak and then one of the dom-U hangs.

Following is the output from xl info:

release                : 4.9.63-29.el7.x86_64
version                : #1 SMP Mon Nov 20 14:39:22 UTC 2017
machine                : x86_64
nr_cpus                : 32
max_cpu_id             : 191
nr_nodes               : 2
cores_per_socket       : 8
threads_per_core       : 2
cpu_mhz                : 2100
hw_caps                : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000121:021cbfbb
virt_caps              : hvm hvm_directio
total_memory           : 130978
free_memory            : 68109
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 6
xen_extra              : .6-6.el7
xen_version            : 4.6.6-6.el7
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : Fri Nov 17 18:32:23 2017 +0000 git:a559dc3-dirty
xen_commandline        : placeholder dom0_mem=2048M,max:2048M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all gnttab_max_frames=256
cc_compiler            : gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
cc_compile_by          : mockbuild
cc_compile_domain      : centos.org
cc_compile_date        : Mon Nov 20 12:28:41 UTC 2017
xend_config_format     : 4

I am a new Xen user did no tfind much help googling the issue. This is in my production system and beginning to impact. Any clue or 
debugging steps are very much appreciated.


Xen-users mailing list
[hidden email]