HVM guest crashes when running Drakvuf

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HVM guest crashes when running Drakvuf

Pierre-Philipp Braun
Hello.

I am trying to run Drakvuf, which allows to introspect the guests in real time.  It requires the guest to be HVM, since it leverages Extended Page Table (EPT) that is available on relatively modern VT-x capable Intel CPUs.  To run, Drakvuf also requires specific boot arguments to be passed to the xen micro-kernel,

hap_1gb=false hap_2mb=false altp2m=1 flask_enforcing=1

dom0_mem and cpu pinning does not seem to be mandatory.  It also requires and specific guest parameters as this example config shows,

arch = 'x86_64'
name = "xenial2"
maxmem = 512
vcups = 1
maxcups = 1
builder = "hvm"
boot = "cd"
hap = 1
acpi = 1
sdl = 1
usb = 0
altp2m = 1
shadow_memory = 16
audio=0
disk = ['file:/data/guests/xenial2/xenial2.disk,hda,w',
       'file:/data/ISO-IMAGES/devuan.iso,hdc:cdrom,r']
vif = [ 'vifname=xenial2.0' ]

I set up Drakvuf git/latest alright on top of XEN 4.9.1, Linux 4.9.82-1+deb9u3, LibVMI 0.13(git/latest) and rekall 1.7.2.rc1(pip/latest).  I got 4 cores -- Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz.  I got Monitor Trap Flag (MTF) available which are required as part of EPT.  `rdmsr --bitfield 59:59 $((0x00000482))` does return `1` on this platform.

I run the previously stated HVM guest and when spawning Drakvuf against it, it then crashes.  Sometimes I can see a few kernel traces, as expected, during one second or two, and then the guest appears in the xl list output as:

(null) 11 0 1 --pscd 15.4

until I interrupt Drakvuf, so the (null)-named domain finally gets cleaned up.

Here's a dmesg output corresponding to that HVM crash event.

(XEN) d15v0 vmentry failure (reason 0x80000021): Invalid guest state (0)
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x000000008005003b, shadow=0x0000000080050033, gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x0000000000362670, shadow=0x0000000000360670, gh_mask=ffffffffffffffff
(XEN) CR3 = 0x8000000017464000
(XEN) PDPTE0 = 0x0000000000000000  PDPTE1 = 0x0000000000000000
(XEN) PDPTE2 = 0x0000000000000000  PDPTE3 = 0x0000000000000000
(XEN) RSP = 0x00007f3d6b8ccc38 (0x00007f3d6b8ccc38)  RIP = 0xffffffff8184ef2d (0xffffffff8184ef2d)
(XEN) RFLAGS=0x00000006 (0x00000006)  DR7 = 0x0000000000000400
(XEN) Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff81851f60
(XEN)        sel  attr  limit   base
(XEN)   CS: 0010 0a09b ffffffff 0000000000000000
(XEN)   DS: 0000 1c000 ffffffff 0000000000000000
(XEN)   SS: 0018 0c093 ffffffff 0000000000000000
(XEN)   ES: 0000 1c000 ffffffff 0000000000000000
(XEN)   FS: 0000 1c000 ffffffff 00007f3d6b8cd700
(XEN)   GS: 0000 1c000 ffffffff ffff88001f400000
(XEN) GDTR:            0000007f ffff88001f40c000
(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
(XEN) IDTR:            00000fff ffffffffff574000
(XEN)   TR: 0040 0008b 00002087 ffff88001f4048c0
(XEN) EFER = 0x0000000000000000  PAT = 0x0407010600070106
(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
(XEN) DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
(XEN) PerfGlobCtl = 0x0000000000000000  BndCfgS = 0x0000000000000000
(XEN) Interruptibility = 00000000  ActivityState = 00000000
(XEN) *** Host State ***
(XEN) RIP = 0xffff82d08030a140 (vmx_asm_vmexit_handler)  RSP = 0xffff83050fd47f90
(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff83050fd4ec80
(XEN) GDTBase=ffff83050fd3e000 IDTBase=ffff83050fd4a000
(XEN) CR0=000000008005003b CR3=0000000457286000 CR4=00000000003526e0
(XEN) Sysenter RSP=ffff83050fd47fc0 CS:RIP=e008:ffff82d080348ba0
(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=0000003f CPUBased=b6a0e5fa SecondaryExec=001254eb
(XEN) EntryControls=000153ff ExitControls=008fefff
(XEN) ExceptionBitmap=0006008a PFECmask=00000000 PFECmatch=00000000
(XEN) VMEntry: intr_info=000000f3 errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003
(XEN)         reason=80000021 qualification=0000000000000000
(XEN) IDTVectoring: info=00000000 errcode=00000000
(XEN) TSC Offset = 0xffffdf608612721d  TSC Multiplier = 0x0000000000000000
(XEN) TPR Threshold = 0x00  PostedIntrVec = 0x00
(XEN) EPT pointer = 0x000000041774f01e  EPTP index = 0x0000
(XEN) PLE Gap=00000080 Window=00001000
(XEN) Virtual processor ID = 0x08d6 VMfunc controls = 0000000000000000
(XEN) **************************************
(XEN) domain_crash called from vmx.c:3337
(XEN) Domain 15 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.9.1  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    0010:[<ffffffff8184ef2d>]
(XEN) RFLAGS: 0000000000000006   CONTEXT: hvm guest (d15v0)
(XEN) rax: 8000000017464000   rbx: 00000000000012da   rcx: 00007ffdd43e9b39
(XEN) rdx: 0000000000000000   rsi: 00007f3d6b8ccc90   rdi: 0000000000000001
(XEN) rbp: 00007f3d6b8ccc60   rsp: 00007f3d6b8ccc38   r8:  0000000000000007
(XEN) r9:  0000000000000001   r10: 00007f3d64001880   r11: 0000000000000246
(XEN) r12: 00007f3d6b8ccc44   r13: 00188de0c5800000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000360670
(XEN) cr3: 8000000017464000   cr2: 00007f1368670090
(XEN) fsb: 00007f3d6b8cd700   gsb: ffff88001f400000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010

Now my question is pretty obvious.  How to avoid that HVM guest crash?  I also tried with XEN 4.10.0.  Although `xl dmesg` shows,

        (XEN) parameter "flask_enforcing" unknown!,

the resulting behavior of Drakvuf is identical,

# ./src/drakvuf -v -r /root/xenial2.json -d xenial2 2> /var/tmp/drakvuf.debug.xial2.`date +%s`.stderr.txt
[SYSCALL] TIME:1524864936.625006 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864936.854254 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.058071 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.262060 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.466197 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.673136 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.874151 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.078214 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.282078 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.486182 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.690172 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit

and then the guest crashes.

The Drakvuf owner stated that it looks like a `vmx_failed_vmentry` bug.
I provided some Drakvuf debug traces in case it helps: https://github.com/tklengyel/drakvuf/issues/388

Thanks for your help
P-Ph

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: HVM guest crashes when running Drakvuf

Kun Cheng
I'm also using drakvuf with xen 4.10 on my pc but my HVM vms are running fine.

I checked my kernel options, the only difference is I disabled flask using:

flask=disabled

Don't know if it helps..

pierre-philipp braun <[hidden email]> 于2018年4月28日周六 上午5:41写道:
Hello.

I am trying to run Drakvuf, which allows to introspect the guests in real time.  It requires the guest to be HVM, since it leverages Extended Page Table (EPT) that is available on relatively modern VT-x capable Intel CPUs.  To run, Drakvuf also requires specific boot arguments to be passed to the xen micro-kernel,

hap_1gb=false hap_2mb=false altp2m=1 flask_enforcing=1

dom0_mem and cpu pinning does not seem to be mandatory.  It also requires and specific guest parameters as this example config shows,

arch = 'x86_64'
name = "xenial2"
maxmem = 512
vcups = 1
maxcups = 1
builder = "hvm"
boot = "cd"
hap = 1
acpi = 1
sdl = 1
usb = 0
altp2m = 1
shadow_memory = 16
audio=0
disk = ['file:/data/guests/xenial2/xenial2.disk,hda,w',
       'file:/data/ISO-IMAGES/devuan.iso,hdc:cdrom,r']
vif = [ 'vifname=xenial2.0' ]

I set up Drakvuf git/latest alright on top of XEN 4.9.1, Linux 4.9.82-1+deb9u3, LibVMI 0.13(git/latest) and rekall 1.7.2.rc1(pip/latest).  I got 4 cores -- Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz.  I got Monitor Trap Flag (MTF) available which are required as part of EPT.  `rdmsr --bitfield 59:59 $((0x00000482))` does return `1` on this platform.

I run the previously stated HVM guest and when spawning Drakvuf against it, it then crashes.  Sometimes I can see a few kernel traces, as expected, during one second or two, and then the guest appears in the xl list output as:

(null) 11 0 1 --pscd 15.4

until I interrupt Drakvuf, so the (null)-named domain finally gets cleaned up.

Here's a dmesg output corresponding to that HVM crash event.

(XEN) d15v0 vmentry failure (reason 0x80000021): Invalid guest state (0)
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x000000008005003b, shadow=0x0000000080050033, gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x0000000000362670, shadow=0x0000000000360670, gh_mask=ffffffffffffffff
(XEN) CR3 = 0x8000000017464000
(XEN) PDPTE0 = 0x0000000000000000  PDPTE1 = 0x0000000000000000
(XEN) PDPTE2 = 0x0000000000000000  PDPTE3 = 0x0000000000000000
(XEN) RSP = 0x00007f3d6b8ccc38 (0x00007f3d6b8ccc38)  RIP = 0xffffffff8184ef2d (0xffffffff8184ef2d)
(XEN) RFLAGS=0x00000006 (0x00000006)  DR7 = 0x0000000000000400
(XEN) Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff81851f60
(XEN)        sel  attr  limit   base
(XEN)   CS: 0010 0a09b ffffffff 0000000000000000
(XEN)   DS: 0000 1c000 ffffffff 0000000000000000
(XEN)   SS: 0018 0c093 ffffffff 0000000000000000
(XEN)   ES: 0000 1c000 ffffffff 0000000000000000
(XEN)   FS: 0000 1c000 ffffffff 00007f3d6b8cd700
(XEN)   GS: 0000 1c000 ffffffff ffff88001f400000
(XEN) GDTR:            0000007f ffff88001f40c000
(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
(XEN) IDTR:            00000fff ffffffffff574000
(XEN)   TR: 0040 0008b 00002087 ffff88001f4048c0
(XEN) EFER = 0x0000000000000000  PAT = 0x0407010600070106
(XEN) PreemptionTimer = 0x00000000  SM Base = 0x00000000
(XEN) DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
(XEN) PerfGlobCtl = 0x0000000000000000  BndCfgS = 0x0000000000000000
(XEN) Interruptibility = 00000000  ActivityState = 00000000
(XEN) *** Host State ***
(XEN) RIP = 0xffff82d08030a140 (vmx_asm_vmexit_handler)  RSP = 0xffff83050fd47f90
(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff83050fd4ec80
(XEN) GDTBase=ffff83050fd3e000 IDTBase=ffff83050fd4a000
(XEN) CR0=000000008005003b CR3=0000000457286000 CR4=00000000003526e0
(XEN) Sysenter RSP=ffff83050fd47fc0 CS:RIP=e008:ffff82d080348ba0
(XEN) EFER = 0x0000000000000000  PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=0000003f CPUBased=b6a0e5fa SecondaryExec=001254eb
(XEN) EntryControls=000153ff ExitControls=008fefff
(XEN) ExceptionBitmap=0006008a PFECmask=00000000 PFECmatch=00000000
(XEN) VMEntry: intr_info=000000f3 errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003
(XEN)         reason=80000021 qualification=0000000000000000
(XEN) IDTVectoring: info=00000000 errcode=00000000
(XEN) TSC Offset = 0xffffdf608612721d  TSC Multiplier = 0x0000000000000000
(XEN) TPR Threshold = 0x00  PostedIntrVec = 0x00
(XEN) EPT pointer = 0x000000041774f01e  EPTP index = 0x0000
(XEN) PLE Gap=00000080 Window=00001000
(XEN) Virtual processor ID = 0x08d6 VMfunc controls = 0000000000000000
(XEN) **************************************
(XEN) domain_crash called from vmx.c:3337
(XEN) Domain 15 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.9.1  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    0010:[<ffffffff8184ef2d>]
(XEN) RFLAGS: 0000000000000006   CONTEXT: hvm guest (d15v0)
(XEN) rax: 8000000017464000   rbx: 00000000000012da   rcx: 00007ffdd43e9b39
(XEN) rdx: 0000000000000000   rsi: 00007f3d6b8ccc90   rdi: 0000000000000001
(XEN) rbp: 00007f3d6b8ccc60   rsp: 00007f3d6b8ccc38   r8:  0000000000000007
(XEN) r9:  0000000000000001   r10: 00007f3d64001880   r11: 0000000000000246
(XEN) r12: 00007f3d6b8ccc44   r13: 00188de0c5800000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 0000000000360670
(XEN) cr3: 8000000017464000   cr2: 00007f1368670090
(XEN) fsb: 00007f3d6b8cd700   gsb: ffff88001f400000   gss: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010

Now my question is pretty obvious.  How to avoid that HVM guest crash?  I also tried with XEN 4.10.0.  Although `xl dmesg` shows,

        (XEN) parameter "flask_enforcing" unknown!,

the resulting behavior of Drakvuf is identical,

# ./src/drakvuf -v -r /root/xenial2.json -d xenial2 2> /var/tmp/drakvuf.debug.xial2.`date +%s`.stderr.txt
[SYSCALL] TIME:1524864936.625006 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864936.854254 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.058071 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.262060 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.466197 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.673136 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864937.874151 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.078214 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.282078 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.486182 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit
[SYSCALL] TIME:1524864938.690172 VCPU:0 CR3:0x1c13e000,"kworker/0:1" UID:0 linux!sys_imageblit

and then the guest crashes.

The Drakvuf owner stated that it looks like a `vmx_failed_vmentry` bug.
I provided some Drakvuf debug traces in case it helps: https://github.com/tklengyel/drakvuf/issues/388

Thanks for your help
P-Ph

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users