Bad iSCSI I/O performance on Xen 4.6

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Bad iSCSI I/O performance on Xen 4.6

Jean-Louis Dupond
Hi,

We are hitting some I/O limitation on some of our Xen hypervisors.
The hypervisors are running CentOS 6 with Xen 4.6.6-12.el6 and 4.9.105+
kernels.

The hypervisors are attached with 10G network to the SAN network. And
there is no congestion at all.
Storage is exported via iSCSI and we use multipathd for failover.
Now we see a performance of +-200MB/sec write speed, but only a poor
20-30mb/sec read speed on a LUN on the SAN.
This is while testing this on dom0. Same speeds on domU.

If I do the same test on a Xen 4.4.4-34.el6 hypervisor to the same LUN
(but attached with 1G), I max out the link (100MB read/write).

On the same storage network we also have a bare metal machine, and there
we have 400-500MB/sec read-write performance on the same LUN!

So it really looks like the Xen 4.6 hypervisors are reaching some
bottleneck. But we couldn't locate it yet :)
The hypervisor's dom0 has 8 vCPU and 8GB RAM, which should be plenty!


Anything we could try to improve the speeds?

Thanks
Jean-Louis


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Bad iSCSI I/O performance on Xen 4.6

Jean-Louis Dupond
Maybe an important addition:

I did some iperf testing on the 10G machines (on dom0), and we almost
max-out the 10G link.
So it seems like the nic's are performing like they should.


On 08-10-18 13:10, Jean-Louis Dupond wrote:

> Hi,
>
> We are hitting some I/O limitation on some of our Xen hypervisors.
> The hypervisors are running CentOS 6 with Xen 4.6.6-12.el6 and
> 4.9.105+ kernels.
>
> The hypervisors are attached with 10G network to the SAN network. And
> there is no congestion at all.
> Storage is exported via iSCSI and we use multipathd for failover.
> Now we see a performance of +-200MB/sec write speed, but only a poor
> 20-30mb/sec read speed on a LUN on the SAN.
> This is while testing this on dom0. Same speeds on domU.
>
> If I do the same test on a Xen 4.4.4-34.el6 hypervisor to the same LUN
> (but attached with 1G), I max out the link (100MB read/write).
>
> On the same storage network we also have a bare metal machine, and
> there we have 400-500MB/sec read-write performance on the same LUN!
>
> So it really looks like the Xen 4.6 hypervisors are reaching some
> bottleneck. But we couldn't locate it yet :)
> The hypervisor's dom0 has 8 vCPU and 8GB RAM, which should be plenty!
>
>
> Anything we could try to improve the speeds?
>
> Thanks
> Jean-Louis
>


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Bad iSCSI I/O performance on Xen 4.6

Dario Faggioli-4
In reply to this post by Jean-Louis Dupond
[Adding Roger]

On Mon, 2018-10-08 at 13:10 +0200, Jean-Louis Dupond wrote:

> Hi,
>
> We are hitting some I/O limitation on some of our Xen hypervisors.
> The hypervisors are running CentOS 6 with Xen 4.6.6-12.el6 and
> 4.9.105+
> kernels.
>
> The hypervisors are attached with 10G network to the SAN network.
> And
> there is no congestion at all.
> Storage is exported via iSCSI and we use multipathd for failover.
> Now we see a performance of +-200MB/sec write speed, but only a poor
> 20-30mb/sec read speed on a LUN on the SAN.
> This is while testing this on dom0. Same speeds on domU.
>
> If I do the same test on a Xen 4.4.4-34.el6 hypervisor to the same
> LUN
> (but attached with 1G), I max out the link (100MB read/write).
>
Right. But, if I've understood correctly, you're changing two things (I
mean between the two tests), i.e., the hypervisor and the NIC.
(BTW, is dom0 kernel the same, or does that also change?).

This makes it harder to narrow things down to where the problem could
be.

What would be useful to see would be the results of running:
- Xen 4.4.4-34.el6, with 4.9.105+ dom0 kernel on the 10G NIC / host,
  and compare this with Xen 4.6.6-12.el6, with the same kernel on the
  same NIC / host;
- Xen 4.6.6-12.el6, with 4.9.105+ dom0 kernel on the 1G NIC / host,
  and compare this with Xen 4.4.4-34.el6, with the same kernel on the
  same NIC / host.

This will tell us, if there is a regression between Xen 4.4.x and Xen
4.6.x (as that is _the_only_ thing that varies).

And this is assuming the versions of the dom0 kernels, and of all the
other components involved are the same. If they're not, we need to go
checking, changing one component at a time.

> So it really looks like the Xen 4.6 hypervisors are reaching some
> bottleneck. But we couldn't locate it yet :)
>
There seems to be issues, but from the tests you've performed so far, I
don't think we can conclude the problem is in Xen. And we need to know
at least where the problem most likely is, in order to have any chance
to find it! :-)

> The hypervisor's dom0 has 8 vCPU and 8GB RAM, which should be plenty!
>
Probably. But, just in case, have you tried increasing, e.g., the
number of dom0's vcpus? Is things like vcpu-pinning or similar features
being used? Is the host a NUMA box? (Or, more generally, what are the
characteristics of the host[s]?)

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bad iSCSI I/O performance on Xen 4.6

Jean-Louis Dupond

FYI,

After some additional debugging, I found out that on the machine the speed is perfect when running stock CentOS 6 kernel (2.6.32).
When using a 4.9.x of 4.18.x kernel, the speed is degraded again.

Speed on 2.6.32: 320 MB/s
Speed on 4.9.x : 55.2 MB/s

But when I disable gro on the storage NIC, It boosts to 157 MB/s.
That is already better, but still way below what we have on 2.6.32 ...

I also did tests on plain machine without Xen, and with the same results.
So it doesn't looks like its Xen related, but more iSCSi/Kernel.

Thanks
Jean-Louis


On 11-10-18 11:18, Dario Faggioli wrote:
[Adding Roger]

On Mon, 2018-10-08 at 13:10 +0200, Jean-Louis Dupond wrote:
Hi,

We are hitting some I/O limitation on some of our Xen hypervisors.
The hypervisors are running CentOS 6 with Xen 4.6.6-12.el6 and
4.9.105+ 
kernels.

The hypervisors are attached with 10G network to the SAN network.
And 
there is no congestion at all.
Storage is exported via iSCSI and we use multipathd for failover.
Now we see a performance of +-200MB/sec write speed, but only a poor 
20-30mb/sec read speed on a LUN on the SAN.
This is while testing this on dom0. Same speeds on domU.

If I do the same test on a Xen 4.4.4-34.el6 hypervisor to the same
LUN 
(but attached with 1G), I max out the link (100MB read/write).

Right. But, if I've understood correctly, you're changing two things (I
mean between the two tests), i.e., the hypervisor and the NIC.
(BTW, is dom0 kernel the same, or does that also change?).

This makes it harder to narrow things down to where the problem could
be.

What would be useful to see would be the results of running:
- Xen 4.4.4-34.el6, with 4.9.105+ dom0 kernel on the 10G NIC / host,
  and compare this with Xen 4.6.6-12.el6, with the same kernel on the 
  same NIC / host;
- Xen 4.6.6-12.el6, with 4.9.105+ dom0 kernel on the 1G NIC / host,
  and compare this with Xen 4.4.4-34.el6, with the same kernel on the
  same NIC / host.

This will tell us, if there is a regression between Xen 4.4.x and Xen
4.6.x (as that is _the_only_ thing that varies).

And this is assuming the versions of the dom0 kernels, and of all the
other components involved are the same. If they're not, we need to go
checking, changing one component at a time.

So it really looks like the Xen 4.6 hypervisors are reaching some 
bottleneck. But we couldn't locate it yet :)

There seems to be issues, but from the tests you've performed so far, I
don't think we can conclude the problem is in Xen. And we need to know
at least where the problem most likely is, in order to have any chance
to find it! :-)

The hypervisor's dom0 has 8 vCPU and 8GB RAM, which should be plenty!

Probably. But, just in case, have you tried increasing, e.g., the
number of dom0's vcpus? Is things like vcpu-pinning or similar features
being used? Is the host a NUMA box? (Or, more generally, what are the
characteristics of the host[s]?)

Regards,
Dario


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users


_______________________________________________
Xen-users mailing list
[hidden email]
https://lists.xenproject.org/mailman/listinfo/xen-users