xen domU segfaults with xpti on intel based systems

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

xen domU segfaults with xpti on intel based systems

Tomas Mozes
Hello,
we are observing random PV domU segfaults on Intel based systems with XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be reproduced on 4.9.3/4.10.2/4.11.1. The problem can be mitigated by adding xpti=false to xen command line options.

Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18 seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults on Debian, but on NetBSD it's almost instant.

The 7.0 files can be taken from:
https://cdn.netbsd.org/pub/NetBSD/NetBSD-7.0/amd64/binary/kernel/netbsd-INSTALL_XEN3_DOMU.gz
https://cdn.netbsd.org/pub/NetBSD/NetBSD-7.0/amd64/binary/kernel/netbsd-XEN3_DOMU.gz

netbsd.conf:
kernel = "7.0/netbsd-INSTALL_XEN3_DOMU.gz"
memory = 512
vcpus = 1
name = "netbsd"
vif = [ '' ]
disk = ['phy:/dev/vg_data/netbsd,xvda,w']

The installation goes fine, but in the end:

     Status: Command failed                                                    
    Command: /bin/sh MAKEDEV all                                               
     Hit enter to continue                                                     
--------------------------------------------------------------------------------
[1]   Done                    eval "${before}"... |
      Segmentation fault (core dumped) eval "${after}";...
[1]   Done                    eval "${before}"... |
      Segmentation fault (core dumped) eval "${after}";...

Now we boot the system with changing kernel to:
kernel = "7.0/netbsd-XEN3_DOMU.gz"

...
Updating motd.
Starting powerd.
[1]   Segmentation fault (core dumped) sysctl -n hw.dis...
/usr/sbin/postconf: warning: valid_hostname: misplaced delimiter: .domain.tld
/usr/sbin/postconf: fatal: unable to use my own hostname
Jan 11 05:49:49 .sygic postfix[1550]: fatal: unable to use my own hostname
/etc/rc.d/postfix exited with code 1
Starting inetd.
...

Is it a known issue or can something be done with this?

Thank you,
Tomas

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: xen domU segfaults with xpti on intel based systems

Juergen Gross-3
On 11/01/2019 07:05, Tomas Mozes wrote:
> Hello,
> we are observing random PV domU segfaults on Intel based systems with
> XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
> reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.> The problem can be
> mitigated by adding xpti=false to xen command line options.
>
> Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> on Debian, but on NetBSD it's almost instant.

Hmm, as we haven't received any similar reports, I suspect there is
something special on your side.

Can you please be more specific regarding:

- hardware (machine type(s), processor model(s), ...)
- other config options (hypervisor command line, hypervisor .config)

A hypervisor log (output of "xl dmesg") would help, too. Please add
"loglvl=all guest_loglvl=all" to the hypervisor command line for that
purpose. If possible use a debug hypervisor for this test, as that
will produce more diagnostic output.


Juergen

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: xen domU segfaults with xpti on intel based systems

Tomas Mozes


On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <[hidden email]> wrote:
On 11/01/2019 07:05, Tomas Mozes wrote:
> Hello,
> we are observing random PV domU segfaults on Intel based systems with
> XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
> reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.> The problem can be
> mitigated by adding xpti=false to xen command line options.
>
> Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> on Debian, but on NetBSD it's almost instant.

Hmm, as we haven't received any similar reports, I suspect there is
something special on your side.

Can you please be more specific regarding:

- hardware (machine type(s), processor model(s), ...)
- other config options (hypervisor command line, hypervisor .config)

A hypervisor log (output of "xl dmesg") would help, too. Please add
"loglvl=all guest_loglvl=all" to the hypervisor command line for that
purpose. If possible use a debug hypervisor for this test, as that
will produce more diagnostic output.


Juergen

These segfaults were actually spotted by the gmp project maintainer and only later they were locally reproduced on other machine (intel too).

A machine on which it can be reproduced: Intel DH87MC with Intel Core i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
But for example i cannot reproduce on my desktop machine: Intel DH77EB with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Grub options for the kernel/xen:
GRUB_CMDLINE_LINUX="panic=30 net.ifnames=0 domdadm"
GRUB_CMDLINE_XEN="dom0_mem=4G gnttab_max_frames=256 ucode=scan loglvl=all guest_loglvl=all console_to_ring console_timestamps=date conring_size=1m smt=true"

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users

xl-dmesg.txt (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: xen domU segfaults with xpti on intel based systems

Juergen Gross-3
On 11/01/2019 14:05, Tomas Mozes wrote:

>
>
> On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 11/01/2019 07:05, Tomas Mozes wrote:
>     > Hello,
>     > we are observing random PV domU segfaults on Intel based systems with
>     > XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
>     > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.>
>     <http://4.11.1.> The problem can be
>     > mitigated by adding xpti=false to xen command line options.
>     >
>     > Some of the affected systems are Debian 8/9 (Debian 10 with kernel
>     4.18
>     > seems to work fine) and NetBSD 7. It's harder to reproduce the
>     segfaults
>     > on Debian, but on NetBSD it's almost instant.
>
>     Hmm, as we haven't received any similar reports, I suspect there is
>     something special on your side.
>
>     Can you please be more specific regarding:
>
>     - hardware (machine type(s), processor model(s), ...)
>     - other config options (hypervisor command line, hypervisor .config)
>
>     A hypervisor log (output of "xl dmesg") would help, too. Please add
>     "loglvl=all guest_loglvl=all" to the hypervisor command line for that
>     purpose. If possible use a debug hypervisor for this test, as that
>     will produce more diagnostic output.
>
>
>     Juergen
>
>
> These segfaults were actually spotted by the gmp project maintainer and
> only later they were locally reproduced on other machine (intel too).
>
> A machine on which it can be reproduced: Intel DH87MC with Intel Core
> i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
> But for example i cannot reproduce on my desktop machine: Intel DH77EB
> with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Okay, those two cpus differ in a critical feature: on Ovy Bridge XPTI
can't make use of the processor's PCID feature due to a lack of the
INVPCID instruction.

Can you test wheter adding "pcid=false" to the hypervisor command line
on the Haswell machine makes any difference?

And one other question: could it be the problem occurred at the same
time when

(XEN) [2019-01-11 12:41:06] d1 L1TF-vulnerable L4e 000000070cb93004 -
Shadowing

was issued?


Juergen

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: xen domU segfaults with xpti on intel based systems

andy smith-10
In reply to this post by Juergen Gross-3
Hi Juergen, Tomas,

On Fri, Jan 11, 2019 at 09:21:09AM +0100, Juergen Gross wrote:
> On 11/01/2019 07:05, Tomas Mozes wrote:
> > Some of the affected systems are Debian 8/9 (Debian 10 with kernel 4.18
> > seems to work fine) and NetBSD 7. It's harder to reproduce the segfaults
> > on Debian, but on NetBSD it's almost instant.
>
> Hmm, as we haven't received any similar reports, I suspect there is
> something special on your side.

I did report slightly similar problems to xen-devel:

    https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg02811.html

I currently work around it by ensuring the guests have updated their
kernels to have the L1TF mitigations (you can tell because
/sys/devices/system/cpu/vulnerabilities/l1tf appears).

The other way was to set the Xen command line options pv-l1tf=false
or pcid=0.

For me this only affected 64-bit PV domains, but I only run Linux. I
didn't try xpti=false because the logs about shadowing made me try
the L1TF-related options first.

For me the above behaviour is experienced on Xeon D-1540 and Xeon
E5-1680v4 systems. I don't have any other types of system so don't
know how widespread it is.

Also please note that within weeks I also started experiencing much
worse problems: host crash, for which the only suggestion so far is
to try pcid=0. As that is hard for me to reproduce, with a time to
re-occurrence currently somewhere between 8 and 14 days, I am not
yet sure if pcid=0 helps. We're 9 days in to a test on that.

    https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00938.html

Cheers,
Andy

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: xen domU segfaults with xpti on intel based systems

Tomas Mozes
In reply to this post by Juergen Gross-3


On Fri, Jan 11, 2019 at 2:36 PM Juergen Gross <[hidden email]> wrote:
On 11/01/2019 14:05, Tomas Mozes wrote:
>
>
> On Fri, Jan 11, 2019 at 9:21 AM Juergen Gross <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 11/01/2019 07:05, Tomas Mozes wrote:
>     > Hello,
>     > we are observing random PV domU segfaults on Intel based systems with
>     > XPTI enabled. These segfaults were not present in Xen 4.9.2 and can be
>     > reproduced on 4.9.3/4.10.2/4.11.1. <http://4.11.1.>
>     <http://4.11.1.> The problem can be
>     > mitigated by adding xpti=false to xen command line options.
>     >
>     > Some of the affected systems are Debian 8/9 (Debian 10 with kernel
>     4.18
>     > seems to work fine) and NetBSD 7. It's harder to reproduce the
>     segfaults
>     > on Debian, but on NetBSD it's almost instant.
>
>     Hmm, as we haven't received any similar reports, I suspect there is
>     something special on your side.
>
>     Can you please be more specific regarding:
>
>     - hardware (machine type(s), processor model(s), ...)
>     - other config options (hypervisor command line, hypervisor .config)
>
>     A hypervisor log (output of "xl dmesg") would help, too. Please add
>     "loglvl=all guest_loglvl=all" to the hypervisor command line for that
>     purpose. If possible use a debug hypervisor for this test, as that
>     will produce more diagnostic output.
>
>
>     Juergen
>
>
> These segfaults were actually spotted by the gmp project maintainer and
> only later they were locally reproduced on other machine (intel too).
>
> A machine on which it can be reproduced: Intel DH87MC with Intel Core
> i7-4770 CPU @ 3.40GHz on Linux Gentoo (Haswell)
> But for example i cannot reproduce on my desktop machine: Intel DH77EB
> with Intel Core i5-3570 CPU @ 3.40GHz (Ivy Bridge)

Okay, those two cpus differ in a critical feature: on Ovy Bridge XPTI
can't make use of the processor's PCID feature due to a lack of the
INVPCID instruction.

Can you test wheter adding "pcid=false" to the hypervisor command line
on the Haswell machine makes any difference?

Setting "pcid=false" makes the segfault go away too.
 

And one other question: could it be the problem occurred at the same
time when

(XEN) [2019-01-11 12:41:06] d1 L1TF-vulnerable L4e 000000070cb93004 -
Shadowing

was issued?


It's printed shortly after the domU is started, like 10 seconds before the segfault. It's printed in both cases (with/without pcid=false).
 

Juergen

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users