Migrate/Suspend performance of PV and HVM

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Migrate/Suspend performance of PV and HVM

Frederico Cerveira
Hello xen-users,

Lately I have been playing with Remus to migrate one guest VM between
two different hypervisors, but have found an obstacle that is
preventing me from using HVM guests (without too much performance
loss).

I have tracked down the problem to the time that it takes for the
guest VM to suspend. While PV guests usually suspend quite quickly,
HVM guests are noticeably slower.

For example, see this extract taken from the "dmesg" output of a PV
guest after running "xl migrate" (Remus uses the same mechanism):

[42482.409957] Freezing user space processes ... (elapsed 0.004 seconds) done.
[42482.414895] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[42482.416827] PM: freeze of devices complete after 0.223 msecs
[42482.416833] suspending xenstore...
[42482.416901] PM: late freeze of devices complete after 0.059 msecs
[42482.417002] PM: noirq freeze of devices complete after 0.091 msecs
[42482.417102] xen:grant_table: Grant tables using version 1 layout
[42482.417102] PM: noirq restore of devices complete after 0.068 msecs
[42482.417102] PM: early restore of devices complete after 0.053 msecs
[42482.449704] PM: restore of devices complete after 27.866 msecs
[42482.449729] Restarting tasks ... done.

Now compare it with the same output when using an HVM guest:

[  149.705054] Freezing user space processes ... (elapsed 0.102 seconds) done.
[  149.808076] Freezing remaining freezable tasks ... (elapsed 0.013
seconds) done.
[  149.878555] PM: freeze of devices complete after 28.919 msecs
[  149.878586] suspending xenstore...
[  149.885652] PM: late freeze of devices complete after 6.989 msecs
[  149.919128] PM: noirq freeze of devices complete after 33.403 msecs
[  149.920594] xen:events: Xen HVM callback vector for event delivery is enabled
[  149.920594] Xen Platform PCI: I/O protocol version 1
[  149.920594] xen:grant_table: Grant tables using version 1 layout
[  149.920594] xen: --> irq=9, pirq=16
[  149.920594] xen: --> irq=8, pirq=17
[  149.920594] xen: --> irq=12, pirq=18
[  149.920594] xen: --> irq=1, pirq=19
[  149.920594] xen: --> irq=6, pirq=20
[  149.920594] xen: --> irq=4, pirq=21
[  149.920594] xen: --> irq=24, pirq=22
[  149.954270] PM: noirq restore of devices complete after 29.099 msecs
[  149.957334] PM: early restore of devices complete after 2.598 msecs
[  150.006117] rtc_cmos 00:02: System wakeup disabled by ACPI
[  150.013313] PM: restore of devices complete after 50.213 msecs

Comparing the time taken during the various steps of the suspend
process, one can see that the HVM guest is many orders of magnitude
slower than the PV guest. This takes a too big toll in my system that
renders HVM a non-option for my use case (migrating while running a
network-intensive workload with strict latency requirements).

I am using Xen 4.10,  Linux 3.5.1 (CentOS, compiled kernel) in the
dom0 and Linux 4.4.0-87-generic (Ubuntu) in the guest VM.
I have confirmed the same observation using another hardware (in fact,
in that system HVM became even much slower than this).

My questions to you are:
   1) Is this normal/expected behaviour? Or may it be caused due to a
configuration problem of mine?
   2) How can I improve the suspend performance of the HVM guest?

Thanks for your time,
Frederico Cerveira

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users