Give dom0 2 pinned vcpus, but share one with domU

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Give dom0 2 pinned vcpus, but share one with domU

jumperalex
Is it possible to give dom0 two vcpus, pinned, but also allow one of those to be shared with a domU?

So right now I'm setup like this:
dom0: dom0_max_vcpus=1 dom0_vcpus_pin
archVM: vcpu="1-7"

In most of my use cases this works fine, but in a few (rsync specifically) my dom0 is just getting hammered and it is slowing a local rsync between domU data.img and dom0 raid array.

Someone else suggested turning off the md5 check in rsync but I'm not thrilled about turning off hashing of a backup and I want to see if I can solve the problem with good provisioning first.

So what if I did this:
dom0: dom0_max_vcpus=2 dom0_vcpus_pin
archVM: vcpu = "1-7"

So I'm giving dom0 2 vcpus which will be pinned to cores 0,1 but I'm still allowing archVM access to core 1 but not core 0

Will this even work?

How will Xen deal with dividing up core1's cycles?

If there is contention for core1 will that cause a lot of context switching between cores 0 and 1 within dom0?  Or will core 1 just be shared ~50/50 while core 0 does dom0's heavy lifting?

I know I could just give archVM 6 cores instead of 7, but I really want it to have access to as many cores as possible for heavy transcoding loads but I also know I need to dedicate one to dom0 so this is my compromise solution.  
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

powerhouse64
Not sure this is the correct answer, but how about NOT using the dom0_max_vcpus and dom0_vcpus_pin options. I've done some tests with running heavy domU and dom0 loads and without using CPU pinning and max_vcpus the dom0 would take whatever CPU resources it could get. In my test case I used a Windows 7 domU running Lightroom raw photo conversion to jpegs while using handbrake to transcode videos under dom0. My CPU has 12 vcpus (6-core Intel), and I gave domU 10 vcpus. Whenever domU didn't use its allocated VCPUs the dom0 would be able to utilise them.
There is an option to adjust the credit scheduler  - see http://wiki.xen.org/wiki/Credit_Scheduler. More on Xen tuning can be found http://wiki.xenproject.org/wiki/Tuning_Xen_for_Performance.
On Monday, May 19, 2014 8:00 PM, jumperalex <[hidden email]> wrote:


Is it possible to give dom0 two vcpus, pinned, but also allow one of those to
be shared with a domU?

So right now I'm setup like this:
dom0: dom0_max_vcpus=1 dom0_vcpus_pin
archVM: vcpu="1-7"

In most of my use cases this works fine, but in a few (rsync specifically)
my dom0 is just getting hammered and it is slowing a local rsync between
domU data.img and dom0 raid array.

Someone else suggested turning off the md5 check in rsync but I'm not
thrilled about turning off hashing of a backup and I want to see if I can
solve the problem with good provisioning first.

So what if I did this:
dom0: dom0_max_vcpus=*2* dom0_vcpus_pin
archVM: vcpu = "1-7"

So I'm giving dom0 2 vcpus which will be pinned to cores 0,1 but I'm still
allowing archVM access to core 1 but not core 0

Will this even work?

How will Xen deal with dividing up core1's cycles?

If there is contention for core1 will that cause a lot of context switching
between cores 0 and 1 within dom0?  Or will core 1 just be shared ~50/50
while core 0 does dom0's heavy lifting?

I know I could just give archVM 6 cores instead of 7, but I really want it
to have access to as many cores as possible for heavy transcoding loads but
I also know I need to dedicate one to dom0 so this is my compromise
solution. 



--
View this message in context: http://xen.1045712.n5.nabble.com/Give-dom0-2-pinned-vcpus-but-share-one-with-domU-tp5722792.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

Ian Campbell-10
In reply to this post by jumperalex
On Mon, 2014-05-19 at 09:56 -0700, jumperalex wrote:

> Is it possible to give dom0 two vcpus, pinned, but also allow one of those to
> be shared with a domU?
>
> So right now I'm setup like this:
> dom0: dom0_max_vcpus=1 dom0_vcpus_pin
> archVM: vcpu="1-7"
>
> In most of my use cases this works fine, but in a few (rsync specifically)
> my dom0 is just getting hammered and it is slowing a local rsync between
> domU data.img and dom0 raid array.
>
> Someone else suggested turning off the md5 check in rsync but I'm not
> thrilled about turning off hashing of a backup and I want to see if I can
> solve the problem with good provisioning first.
>
> So what if I did this:
> dom0: dom0_max_vcpus=*2* dom0_vcpus_pin
> archVM: vcpu = "1-7"
>
> So I'm giving dom0 2 vcpus which will be pinned to cores 0,1 but I'm still
> allowing archVM access to core 1 but not core 0
>
> Will this even work?

Yes, it is perfectly fine to have overlapping set of pins.

Note though that depending on your workload pinning (especially dom0)
might be actively harmful. Is there some reason you want to pin rather
than letting dom0's vcpus float?

> How will Xen deal with dividing up core1's cycles?

It will schedule the vcpus according to their affinity (pin) and
workload etc, it's pretty much the normal scheduling but with an
additional constraint.

> If there is contention for core1 will that cause a lot of context switching
> between cores 0 and 1 within dom0?  Or will core 1 just be shared ~50/50
> while core 0 does dom0's heavy lifting?

If both vcpus are busy then they will get approximately a 50/50 fair
share. If one is busy and the other is mostly idle then the split will
reflect that.

>
> I know I could just give archVM 6 cores instead of 7, but I really want it
> to have access to as many cores as possible for heavy transcoding loads but
> I also know I need to dedicate one to dom0 so this is my compromise
> solution.

Ian.


_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

jumperalex
In reply to this post by powerhouse64
How about NOT pinning / Why am I pinning?
In short because of this http://wiki.xen.org/wiki/Xen_Project_Best_Practices#Dedicating_a_CPU_core.28s.29_only_for_dom0 I'm just doing what I'm told :O  But I'm obviously open to suggestion.

Now I can't claim my dom0 is doing HEAVY I/O but it is hosting my unraid array so any VM (one at this point running Plex Media Server) will be pulling 1080p video streams from it to transcode (fulfilling the heave domU workload bit) and then sending it back out to the clients on the network.  Soon I too plan on running some handbrake runs which will probably have my server screaming for several days straight and then about twice a week.  Those could be scheduled during times of the day I know there won't likely be user interaction but it will just prolong the overall job of converting my whole library. At the same time it is possible, but rare due to scheduling, that I could be hitting the array with two backup streams coming from PC's running Acronis.  

That is not quite the worst case scenario but the most likely. I could throw in a few other processes that do occur which are also pretty I/O heavy but those are really unlikely to overlap or it will happen when no one is around to see it.  And the two main culprits, my cpu heavy rsync and plex transcoding literally couldn't have happened at the same time because the VM gets paused to run the rsync copy of the VM image :)  As you'll see below though I've also solved the cpu hogging rsync

All that said, I'm fully willing to admit I'm probably spending 95% of my time chasing the last 5% of performance, but I like at least poking around to make sure I haven't left something huge ripe for the taking.
Thanks. I will definitely take a look.  If done right that seems like an even more elegant solution.

Note though that depending on your workload pinning (especially dom0)
might be actively harmful. Is there some reason you want to pin rather
than letting dom0's vcpus float?
Well I know my dom0 workload is generally pretty light from a CPU perspective.  Even a single core from an FX-8320 would generally be considered overkill for just handling the day-to-day of an unraid array.  What even brought it up was an rsync to backup my domU.img into the dom0 array was just crushing my dom0 cpu and choking off the rsync.  BUT ... I found the main issue which was the use of -z for compression in rsync between local folders.  Once I turned that off cpu usage dropped and speed took off.  So I've solved my current problem via efficiency vs. brute force (my preferred way), but it still has me thinking it might not be a bad idea to let dom0 have the option of a little bit more.

I did try it out last night while watching xl vcpu-list, xl top, and htop in both dom's.  I ran rsync with -z and noticed improvement which didn't surprise me. Then I ran a transcode.  It is hard to confirm performance improvements there if you're just going from 6 cpus to 7 so I was mostly just looking to see that seven distinct PCPU's were being used. At first I wasn't sure I was really seeing pcpu1 being shared like you said, but after I looked at the screen shots later in the evening I convinced myself maybe it was working as hoped.  Then I woke up to your post.  So I'll probably change it back again and observe some more while I also read up on Credit Scheduling.

Thank you for indulging me.  Cheers.
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

powerhouse64
rsync with compression will use up more CPU resources. If you do a backup between disks on the same PC, I would probably not use compression as the "network" speed on a bridged network between domU and dom0 should be in the 10Gig.

As an example, I do use compression for creating backup images of my domU. Inside dom0, after making a LVM snapshot, I use "pigz" like this:
dd if=/dev/$group/$guest-snap bs=1024k | pigz -c > "$mt"/$group-$guest.img.gz
to create a compressed image file of the VM.
pigz uses all available VCPUs for compression. Since I'm not limiting the number of vcpus on my dom0, and my domU isn't running during the backup, it works very well (~7min for a 70GB domU volume).

By not pinning down or limiting the dom0 Xen is able to allocate whatever CPU resources it needs between dom0 and domU. If you run into a situation that your domU CPU load is maxed out while at the same time you have I/O intensive and perhaps CPU intensive dom0 loads, you could try increasing the scheduler priority for dom0. But it would probably take some experimenting to see what difference it makes, if any.
On Tuesday, May 20, 2014 3:24 PM, jumperalex <[hidden email]> wrote:



> How about NOT pinning / Why am I pinning?

In short because of this
http://wiki.xen.org/wiki/Xen_Project_Best_Practices#Dedicating_a_CPU_core.28s.29_only_for_dom0
I'm just doing what I'm told :O  But I'm obviously open to suggestion.

Now I can't claim my dom0 is doing HEAVY I/O but it is hosting my unraid
array so any VM (one at this point running Plex Media Server) will be
pulling 1080p video streams from it to transcode (fulfilling the heave domU
workload bit) and then sending it back out to the clients on the network.
Soon I too plan on running some handbrake runs which will probably have my
server screaming for several days straight and then about twice a week.
Those could be scheduled during times of the day I know there won't likely
be user interaction but it will just prolong the overall job of converting
my whole library. At the same time it is possible, but rare due to
scheduling, that I could be hitting the array with two backup streams coming
from PC's running Acronis. 

That is not quite the worst case scenario but the most likely. I could throw
in a few other processes that do occur which are also pretty I/O heavy but
those are really unlikely to overlap or it will happen when no one is around
to see it.  And the two main culprits, my cpu heavy rsync and plex
transcoding literally couldn't have happened at the same time because the VM
gets paused to run the rsync copy of the VM image :)  As you'll see below
though I've also solved the cpu hogging rsync

All that said, I'm fully willing to admit I'm probably spending 95% of my
time chasing the last 5% of performance, but I like at least poking around
to make sure I haven't left something huge ripe for the taking.

> There is an option to adjust the credit scheduler  - see
> http://wiki.xen.org/wiki/Credit_Scheduler. More on Xen tuning can be found
> http://wiki.xenproject.org/wiki/Tuning_Xen_for_Performance. See also
> http://wiki.xen.org/wiki/Performance_of_Xen_VCPU_Scheduling.

Thanks. I will definitely take a look.  If done right that seems like an
even more elegant solution.


> Note though that depending on your workload pinning (especially dom0)
> might be actively harmful. Is there some reason you want to pin rather
> than letting dom0's vcpus float?

Well I know my dom0 workload is generally pretty light from a CPU
perspective.  Even a single core from an FX-8320 would generally be
considered overkill for just handling the day-to-day of an unraid array.
What even brought it up was an rsync to backup my domU.img into the dom0
array was just crushing my dom0 cpu and choking off the rsync.  BUT ... I
found the main issue which was the use of -z for compression in rsync
between local folders.  Once I turned that off cpu usage dropped and speed
took off.  So I've solved my current problem via efficiency vs. brute force
(my preferred way), but it still has me thinking it might not be a bad idea
to let dom0 have the option of a little bit more.

I did try it out last night while watching xl vcpu-list, xl top, and htop in
both dom's.  I ran rsync with -z and noticed improvement which didn't
surprise me. Then I ran a transcode.  It is hard to confirm performance
improvements there if you're just going from 6 cpus to 7 so I was mostly
just looking to see that seven distinct PCPU's were being used. At first I
wasn't sure I was really seeing pcpu1 being shared like you said, but after
I looked at the screen shots later in the evening I convinced myself maybe
it was working as hoped.  Then I woke up to your post.  So I'll probably
change it back again and observe some more while I also read up on Credit
Scheduling.

Thank you for indulging me.  Cheers.



--
View this message in context: http://xen.1045712.n5.nabble.com/Give-dom0-2-pinned-vcpus-but-share-one-with-domU-tp5722792p5722800.html

Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

Ian Campbell-10
In reply to this post by jumperalex
On Tue, 2014-05-20 at 05:22 -0700, jumperalex wrote:
> > How about NOT pinning / Why am I pinning?
>
> In short because of this
> http://wiki.xen.org/wiki/Xen_Project_Best_Practices#Dedicating_a_CPU_core.28s.29_only_for_dom0
> I'm just doing what I'm told :O  But I'm obviously open to suggestion.

This is mentioned in http://wiki.xen.org/wiki/Tuning#Dom0_VCPUs and
http://wiki.xen.org/wiki/Tuning#Vcpu_Pinning too. It does say "might"
and "can", perhaps even those are a bit strong. Pinning is one tool in
the performance tuning arsenal but it is very workload dependent on
whether it will help or hurt (and it can be a lot in either direction).

I've made a note of this on
http://wiki.xen.org/wiki/Xen_Document_Days/TODO . Hopefully someone who
knows this tuning stuff better than I will improve things at some point.

Ian.

>
> Now I can't claim my dom0 is doing HEAVY I/O but it is hosting my unraid
> array so any VM (one at this point running Plex Media Server) will be
> pulling 1080p video streams from it to transcode (fulfilling the heave domU
> workload bit) and then sending it back out to the clients on the network.
> Soon I too plan on running some handbrake runs which will probably have my
> server screaming for several days straight and then about twice a week.
> Those could be scheduled during times of the day I know there won't likely
> be user interaction but it will just prolong the overall job of converting
> my whole library. At the same time it is possible, but rare due to
> scheduling, that I could be hitting the array with two backup streams coming
> from PC's running Acronis.  
>
> That is not quite the worst case scenario but the most likely. I could throw
> in a few other processes that do occur which are also pretty I/O heavy but
> those are really unlikely to overlap or it will happen when no one is around
> to see it.  And the two main culprits, my cpu heavy rsync and plex
> transcoding literally couldn't have happened at the same time because the VM
> gets paused to run the rsync copy of the VM image :)  As you'll see below
> though I've also solved the cpu hogging rsync
>
> All that said, I'm fully willing to admit I'm probably spending 95% of my
> time chasing the last 5% of performance, but I like at least poking around
> to make sure I haven't left something huge ripe for the taking.
>
> > There is an option to adjust the credit scheduler  - see
> > http://wiki.xen.org/wiki/Credit_Scheduler. More on Xen tuning can be found
> > http://wiki.xenproject.org/wiki/Tuning_Xen_for_Performance. See also
> > http://wiki.xen.org/wiki/Performance_of_Xen_VCPU_Scheduling.
>
> Thanks. I will definitely take a look.  If done right that seems like an
> even more elegant solution.
>
>
> > Note though that depending on your workload pinning (especially dom0)
> > might be actively harmful. Is there some reason you want to pin rather
> > than letting dom0's vcpus float?
>
> Well I know my dom0 workload is generally pretty light from a CPU
> perspective.  Even a single core from an FX-8320 would generally be
> considered overkill for just handling the day-to-day of an unraid array.
> What even brought it up was an rsync to backup my domU.img into the dom0
> array was just crushing my dom0 cpu and choking off the rsync.  BUT ... I
> found the main issue which was the use of -z for compression in rsync
> between local folders.  Once I turned that off cpu usage dropped and speed
> took off.  So I've solved my current problem via efficiency vs. brute force
> (my preferred way), but it still has me thinking it might not be a bad idea
> to let dom0 have the option of a little bit more.
>
> I did try it out last night while watching xl vcpu-list, xl top, and htop in
> both dom's.  I ran rsync with -z and noticed improvement which didn't
> surprise me. Then I ran a transcode.  It is hard to confirm performance
> improvements there if you're just going from 6 cpus to 7 so I was mostly
> just looking to see that seven distinct PCPU's were being used. At first I
> wasn't sure I was really seeing pcpu1 being shared like you said, but after
> I looked at the screen shots later in the evening I convinced myself maybe
> it was working as hoped.  Then I woke up to your post.  So I'll probably
> change it back again and observe some more while I also read up on Credit
> Scheduling.
>
> Thank you for indulging me.  Cheers.
>
>
>
> --
> View this message in context: http://xen.1045712.n5.nabble.com/Give-dom0-2-pinned-vcpus-but-share-one-with-domU-tp5722792p5722800.html
> Sent from the Xen - User mailing list archive at Nabble.com.
>
> _______________________________________________
> Xen-users mailing list
> [hidden email]
> http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|

Re: Give dom0 2 pinned vcpus, but share one with domU

jumperalex
wow, I made the Todo list in my first thread :O  I'm not sure if that is an honor or a scarlet letter?

Anyway thanks for the tips.  So apparently we (aka unRaid users) have had some problems with cpu stalling when not pinned.  But we are also on kernel 3.10.24 (with an update to 3.14 coming soon) and Xen 4.3 (4.4 also coming soon) so I won't go asking for help with that specifically.  But it is another reason I had choosen to pin early on.

In any case thanks for the tips.  I'm probably going to just open all doms to all vcpus do some testing and see what happens.  If I start seeing cpu stalls I'll see if playing with credit scheduler helps.  

re: rsync ... it looks like "-asP --inplace" is giving us the best performance which probably isn't new.  Thanks also for addressing that.

Ian Campbell-10 wrote
On Tue, 2014-05-20 at 05:22 -0700, jumperalex wrote:
> > How about NOT pinning / Why am I pinning?
>
> In short because of this
> http://wiki.xen.org/wiki/Xen_Project_Best_Practices#Dedicating_a_CPU_core.28s.29_only_for_dom0
> I'm just doing what I'm told :O  But I'm obviously open to suggestion.

This is mentioned in http://wiki.xen.org/wiki/Tuning#Dom0_VCPUs and
http://wiki.xen.org/wiki/Tuning#Vcpu_Pinning too. It does say "might"
and "can", perhaps even those are a bit strong. Pinning is one tool in
the performance tuning arsenal but it is very workload dependent on
whether it will help or hurt (and it can be a lot in either direction).

I've made a note of this on
http://wiki.xen.org/wiki/Xen_Document_Days/TODO . Hopefully someone who
knows this tuning stuff better than I will improve things at some point.

Ian.