Virtual mem map

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Virtual mem map

Tristan Gingold
Hi,

I am currently thinking about virtual mem map.
In Linux, the virtual mem map is (surprise) virtually mapped.
In Xen, we can use the same approach or we can manually cut the mem map into
several pieces using a mechanism similar to paging.

I don't really like the first way: this uses TLB, this may causes more
troubles and currently Xen almost don't use translation cache for itself.

So, I think I will use the second approach.

Am I missing an important point ?
Am I doing the wrong choice ?
Please, comment.

Tristan.


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Dan Magenheimer
> I am currently thinking about virtual mem map.
> In Linux, the virtual mem map is (surprise) virtually mapped.
> In Xen, we can use the same approach or we can manually cut
> the mem map into
> several pieces using a mechanism similar to paging.
>
> I don't really like the first way: this uses TLB, this may
> causes more
> troubles and currently Xen almost don't use translation cache
> for itself.
>
> So, I think I will use the second approach.
>
> Am I missing an important point ?
> Am I doing the wrong choice ?
> Please, comment.
>
> Tristan.

I think you will need to explain a little bit more what you
mean by "a mechanism similar to paging" before it will
be possible to comment.  Paging, to me, means there is some
kind of swap drive or backing store to allow more "virtual"
pages than "physical" pages.

I spent a lot of time recently digging through the physical
memory management code of Linux/ia64.  It is very messy because
it has to support a wide variety of physical memory layouts.
And it makes surprising choices that throw away huge chunks of
physical memory (anything that doesn't fit conveniently in
a "granule").  Getting this all working on multiple machines
will probably be a big challenge.  It might be best to use
Linux code that is known to work on many machines.

I agree with your concern though that taking TLB misses when
looking up a page struct in Xen is likely to cause performance
problems and some difficult bugs.  It might be worthwhile to
put some counters in to see how frequently the memmap is
accessed and code some defensive bounds checks to ensure
wild accesses are immediately flagged with a BUG rather than
resulting in random memory accesses.

Dan

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Tian, Kevin
In reply to this post by Tristan Gingold
>From: Tristan Gingold
>Sent: 2006年1月6日 20:58
>Hi,
>
>I am currently thinking about virtual mem map.
>In Linux, the virtual mem map is (surprise) virtually mapped.

Surprised but efficient and necessary. ;-)

>In Xen, we can use the same approach or we can manually cut the mem map into
>several pieces using a mechanism similar to paging.

I'm not exactly catching your meaning here. In case of physical memmap, identity mapped va is used and thus no special track structures (meaning pgtable in Linux) required. When physical memmap is converted to virtual memmap, it means you have to provide a virtually contiguous area to be mapped to an array of physical incontinuous pages, and thus you need extra PTEs in pgtable.

So in both cases you mentioned, extra structure to track mapping is necessary. Maybe the difference is that you propose to use another simpler structure (like a simple array/list) instead of multi-level page table? And then modify all memmap related macros (like pfn_valid, page_to_pfn, etc) to let them know existence of holes within memmap?

>
>I don't really like the first way: this uses TLB, this may causes more
>troubles and currently Xen almost don't use translation cache for itself.
>
>So, I think I will use the second approach.

So you really need to elaborate your 2nd approach with the exact difference provided.

Actually more questions come with this issue:

Do we need to add generic non-identity mapping into Xen? If yes, then there's no special think for virtual memmap which can be covered. If no, we already saw some limitation upon lacking of such feature, which can't manage/utilize machine page frames efficiently.

If yes, we may need add multi-level page table into Xen and walk it in page fault handler. Then do we need VHPT table for Xen for performance? Currently all vhpt tables are only for guest running...

When we're working around one specific issue, I hope the solution can be more generic to cover future similar requirement.

Thanks,
Kevin


>
>Am I missing an important point ?
>Am I doing the wrong choice ?
>Please, comment.
>
>Tristan.
>
>
>_______________________________________________
>Xen-ia64-devel mailing list
>[hidden email]
>http://lists.xensource.com/xen-ia64-devel

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

Re: Virtual mem map

Tristan Gingold
In reply to this post by Dan Magenheimer
Le Vendredi 06 Janvier 2006 22:40, Magenheimer, Dan (HP Labs Fort Collins) a
écrit :
> > I am currently thinking about virtual mem map.
> > In Linux, the virtual mem map is (surprise) virtually mapped.
> > In Xen, we can use the same approach or we can manually cut
> > the mem map into
> > several pieces using a mechanism similar to paging.
[...]
> I think you will need to explain a little bit more what you
> mean by "a mechanism similar to paging" before it will
> be possible to comment.  Paging, to me, means there is some
> kind of swap drive or backing store to allow more "virtual"
> pages than "physical" pages.
[I have just describe it in my previous mail].

> I spent a lot of time recently digging through the physical
> memory management code of Linux/ia64.  It is very messy because
> it has to support a wide variety of physical memory layouts.
> And it makes surprising choices that throw away huge chunks of
> physical memory (anything that doesn't fit conveniently in
> a "granule").
Yes, this is very surprising.  At least 16MB of memory are lost!  Maybe Xen
can use this memory for itself ?

>  Getting this all working on multiple machines
> will probably be a big challenge.  It might be best to use
> Linux code that is known to work on many machines.
Sure.
On the other side, Xen (almost) don't use ptc/itc for itself.  

> I agree with your concern though that taking TLB misses when
> looking up a page struct in Xen is likely to cause performance
> problems and some difficult bugs.  It might be worthwhile to
> put some counters in to see how frequently the memmap is
> accessed and code some defensive bounds checks to ensure
> wild accesses are immediately flagged with a BUG rather than
> resulting in random memory accesses.
I agree.
Tristan.


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

Re: Virtual mem map

Tristan Gingold
In reply to this post by Tian, Kevin
Le Lundi 09 Janvier 2006 06:45, Tian, Kevin a écrit :

> From: Tristan Gingold
>
> >Sent: 2006年1月6日 20:58
> >Hi,
> >
> >I am currently thinking about virtual mem map.
> >In Linux, the virtual mem map is (surprise) virtually mapped.
>
> Surprised but efficient and necessary. ;-)
>
> >In Xen, we can use the same approach or we can manually cut the mem map
> > into several pieces using a mechanism similar to paging.
[...]
> So in both cases you mentioned, extra structure to track mapping is
> necessary. Maybe the difference is that you propose to use another simpler
> structure (like a simple array/list) instead of multi-level page table? And
> then modify all memmap related macros (like pfn_valid, page_to_pfn, etc) to
> let them know existence of holes within memmap?
Yes, here are more details on my original propositions:
The structure is a 2-levels access:
* The first access is an access to a table of offsets/length.
* The offset is an offset to the page frame table, length is used only to
check validity.

I think this structure is simple enough to be fast.

For memory usage:
* Each entry of the first array describes 1GB of memory.  An entry is 32 bits.
  16KB for the first array can describe 2**12 * 2**30 = 2**42 B of memory.
  (Dan's machine physical memory is bellow 2**40).
* I think 1GB of granule is good enough, unless you have a machine with very
  small DIMM.  In this case, we can use 512MB or 256MB instead of 1GB.
* 1GB is 2**16 to 2**18 pages.  Thus, the offset may be 18 bits and the length
  14 bits (to be multiplied by 4).
As a conclusion, the memory footprint is *very* small, maybe too small ?

memmap related macros must be rewritten.

Tristan.


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Tian, Kevin
In reply to this post by Tristan Gingold
>From: Tristan Gingold [mailto:[hidden email]]
>Sent: 2006年1月9日 18:43
>
>Le Lundi 09 Janvier 2006 06:45, Tian, Kevin a écrit :
>> From: Tristan Gingold
>>
>[...]
>> So in both cases you mentioned, extra structure to track mapping is
>> necessary. Maybe the difference is that you propose to use another simpler
>> structure (like a simple array/list) instead of multi-level page table? And
>> then modify all memmap related macros (like pfn_valid, page_to_pfn, etc) to
>> let them know existence of holes within memmap?
>Yes, here are more details on my original propositions:
>The structure is a 2-levels access:
>* The first access is an access to a table of offsets/length.
>* The offset is an offset to the page frame table, length is used only to
>check validity.
>
>I think this structure is simple enough to be fast.
>
>For memory usage:
>* Each entry of the first array describes 1GB of memory.  An entry is 32 bits.
>  16KB for the first array can describe 2**12 * 2**30 = 2**42 B of memory.
>  (Dan's machine physical memory is bellow 2**40).
>* I think 1GB of granule is good enough, unless you have a machine with very
>  small DIMM.  In this case, we can use 512MB or 256MB instead of 1GB.
>* 1GB is 2**16 to 2**18 pages.  Thus, the offset may be 18 bits and the length
>  14 bits (to be multiplied by 4).
>As a conclusion, the memory footprint is *very* small, maybe too small ?
>
>memmap related macros must be rewritten.
>
>Tristan.

Hi, Tristan,
        I think it's worthy of a try to see any explicit performance degradation there, since this is a quick approach to make virtual memmap working.

        Just a question, how do you define the granularity, static or dynamic? I'm not sure whether all types of ia64 boxes have a well aligned memory trunks. Take a quick example from layout posted by Dan:

(XEN) Init boot pages: 0x407df42a54 -> 0x407efe0008.
(XEN) Init boot pages: 0x407efe0068 -> 0x407efe3f82.
(XEN) Init boot pages: 0x407efe3fca -> 0x407effc008.
(XEN) Init boot pages: 0x407effc7e8 -> 0x407fd68000.
(XEN) Init boot pages: 0x407fda4000 -> 0x407fe10000.
(XEN) Init boot pages: 0x407fe80000 -> 0x407ffbc000.

        You could see all above 6 trunks within 1G and even 256Mb range. How do you manage to include them in your offset table then? I'm taking this example to just emphasize importance of the generality. Since we start to shoot this issue, better for a good solution which can live long time. Once you compile out a xen image, customer definitely want it to run on as many boxes as possible without re-compilation.

        Simple array always has better performance than more complex structures like multi-level page tables, however the former then faces with flexibility issue and thus born the latter.

        Other alternative like to use multi-level structure but without virtual mapping, which looks like current p2m implementation within Xen. However yes, that will bring more overhead when walking the structure...

Thanks,
Kevin
       

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

Re: Virtual mem map

Tristan Gingold
Le Lundi 09 Janvier 2006 12:15, Tian, Kevin a écrit :

> From: Tristan Gingold [mailto:[hidden email]]
>
> >Sent: 2006年1月9日 18:43
> >
> >Le Lundi 09 Janvier 2006 06:45, Tian, Kevin a écrit :
> >> From: Tristan Gingold
> >
> >[...]
> >
> >> So in both cases you mentioned, extra structure to track mapping is
> >> necessary. Maybe the difference is that you propose to use another
> >> simpler structure (like a simple array/list) instead of multi-level page
> >> table? And then modify all memmap related macros (like pfn_valid,
> >> page_to_pfn, etc) to let them know existence of holes within memmap?
> >
> >Yes, here are more details on my original propositions:
> >The structure is a 2-levels access:
> >* The first access is an access to a table of offsets/length.
> >* The offset is an offset to the page frame table, length is used only to
> >check validity.
> >
> >I think this structure is simple enough to be fast.
> >
> >For memory usage:
> >* Each entry of the first array describes 1GB of memory.  An entry is 32
> > bits. 16KB for the first array can describe 2**12 * 2**30 = 2**42 B of
> > memory. (Dan's machine physical memory is bellow 2**40).
> >* I think 1GB of granule is good enough, unless you have a machine with
> > very small DIMM.  In this case, we can use 512MB or 256MB instead of 1GB.
> > * 1GB is 2**16 to 2**18 pages.  Thus, the offset may be 18 bits and the
> > length 14 bits (to be multiplied by 4).
> >As a conclusion, the memory footprint is *very* small, maybe too small ?
> >
> >memmap related macros must be rewritten.
> >
> >Tristan.
>
> Hi, Tristan,
> I think it's worthy of a try to see any explicit performance degradation
> there, since this is a quick approach to make virtual memmap working.
>
> Just a question, how do you define the granularity, static or dynamic?
At first static.  The only advantage of having dynamic granularity is to save
memory.

> I'm
> not sure whether all types of ia64 boxes have a well aligned memory trunks.
> Take a quick example from layout posted by Dan:
>
> (XEN) Init boot pages: 0x407df42a54 -> 0x407efe0008.
> (XEN) Init boot pages: 0x407efe0068 -> 0x407efe3f82.
> (XEN) Init boot pages: 0x407efe3fca -> 0x407effc008.
> (XEN) Init boot pages: 0x407effc7e8 -> 0x407fd68000.
> (XEN) Init boot pages: 0x407fda4000 -> 0x407fe10000.
> (XEN) Init boot pages: 0x407fe80000 -> 0x407ffbc000.
>
> You could see all above 6 trunks within 1G and even 256Mb range. How do
> you manage to include them in your offset table then?
Such holes are due to EFI or reserved areas such as bootparams.  These holes
may be really random.
By constructions these holes are not used.  We may check this by adding a
bit into the page structure.

Remember we need to have:
* a page entry for every physical page
* a quick access to such entry.
The non-virtual mem map is very good for both points, but it can spare a lot
of memory.  I think my approach is better for memory usage but only slightly
slower.

> I'm taking this
> example to just emphasize importance of the generality. Since we start to
> shoot this issue, better for a good solution which can live long time. Once
> you compile out a xen image, customer definitely want it to run on as many
> boxes as possible without re-compilation.
Of course, I agree with you.

> Simple array always has better performance than more complex structures
> like multi-level page tables, however the former then faces with
> flexibility issue and thus born the latter.
>
> Other alternative like to use multi-level structure but without virtual
> mapping, which looks like current p2m implementation within Xen. However
> yes, that will bring more overhead when walking the structure...
>
> Thanks,
> Kevin


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

Re: Virtual mem map

Tristan Gingold
[...]

> > >I think this structure is simple enough to be fast.
> > >
> > >For memory usage:
> > >* Each entry of the first array describes 1GB of memory.  An entry is 32
> > > bits. 16KB for the first array can describe 2**12 * 2**30 = 2**42 B of
> > > memory. (Dan's machine physical memory is bellow 2**40).
> > >* I think 1GB of granule is good enough, unless you have a machine with
> > > very small DIMM.  In this case, we can use 512MB or 256MB instead of
> > > 1GB. * 1GB is 2**16 to 2**18 pages.  Thus, the offset may be 18 bits
> > > and the length 14 bits (to be multiplied by 4).
> > >As a conclusion, the memory footprint is *very* small, maybe too small ?
> > >
> > >memmap related macros must be rewritten.
> > >
> > >Tristan.
Hugh,  I forgot we need also reverse mapping.  I now understand the root of
virtual mem map!

Tristan.


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Tian, Kevin
In reply to this post by Tristan Gingold
>From: Tristan Gingold [mailto:[hidden email]]
>Sent: 2006年1月9日 22:18
>
>Remember we need to have:
>* a page entry for every physical page
>* a quick access to such entry.
>The non-virtual mem map is very good for both points, but it can spare a lot
>of memory.  I think my approach is better for memory usage but only slightly
>slower.
>

All by far are talking about pfn_to_page. Another thing you need to think is the reverse search. Giving a page struct, how do you derive its page frame number (page_to_pfn)? Bother virtually mapped memmap and physical memmap preserves a good feature that memory layout is linearly kept in either virtual or physical level, thus a simple minus calculation is enough. In your proposal, you may walk your offset table to find that index but low efficiency. Or at least you need to hack some bits within frame table entry to record index in offset table.

Thanks,
Kevin

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Tian, Kevin
In reply to this post by Tristan Gingold
>From: Tristan Gingold [mailto:[hidden email]]
>Sent: 2006年1月10日 0:35
>
>[...]
>> > >I think this structure is simple enough to be fast.
>> > >
>> > >For memory usage:
>> > >* Each entry of the first array describes 1GB of memory.  An entry is 32
>> > > bits. 16KB for the first array can describe 2**12 * 2**30 = 2**42 B of
>> > > memory. (Dan's machine physical memory is bellow 2**40).
>> > >* I think 1GB of granule is good enough, unless you have a machine with
>> > > very small DIMM.  In this case, we can use 512MB or 256MB instead of
>> > > 1GB. * 1GB is 2**16 to 2**18 pages.  Thus, the offset may be 18 bits
>> > > and the length 14 bits (to be multiplied by 4).
>> > >As a conclusion, the memory footprint is *very* small, maybe too small ?
>> > >
>> > >memmap related macros must be rewritten.
>> > >
>> > >Tristan.
>Hugh,  I forgot we need also reverse mapping.  I now understand the root of
>virtual mem map!
>
>Tristan.

Yes, exactly.

Thanks,
Kevin

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

Re: Virtual mem map

Alex Williamson
In reply to this post by Tristan Gingold
On Mon, 2006-01-09 at 11:49 +0100, Tristan Gingold wrote:

> Le Vendredi 06 Janvier 2006 22:40, Magenheimer, Dan (HP Labs Fort Collins) a
> écrit :
> > I spent a lot of time recently digging through the physical
> > memory management code of Linux/ia64.  It is very messy because
> > it has to support a wide variety of physical memory layouts.
> > And it makes surprising choices that throw away huge chunks of
> > physical memory (anything that doesn't fit conveniently in
> > a "granule").
> Yes, this is very surprising.  At least 16MB of memory are lost!  Maybe Xen
> can use this memory for itself ?

   Careful, there's a reason for the madness.  We throw away those big
chunks of memory to avoid address aliasing.  We can't supports granules
that contain both cacheable and uncacheable memory regions.  You'll get
into trouble if you maintain granule size mappings, but ignore their
access attributes.  The first granule is usually thrown away because VGA
has some uncacheable memory regions below 1MB.

        Alex

--
Alex Williamson                             HP Linux & Open Source Lab


_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel
Reply | Threaded
Open this post in threaded view
|

RE: Virtual mem map

Dan Magenheimer
In reply to this post by Tristan Gingold
> When we're working around one specific issue, I hope the
> solution can be more generic to cover future similar requirement.

Sorry for the delayed response... I'm still catching up
on my email backlog.

An important question to consider if we are looking for a more
generic solution is whether we should go to a 4K pagesize.
Currently, Xen/ia64 cannot support a guest that uses 4K pages.

I see common/page_alloc.c code may need to be reworked to handle
non-contiguous physical memory.  It appears there are a number
of pieces of code that assume pg+i always points to a valid
page as long as i < nr_pages.  We may be able to get around
that by extending the concept of zones but that will require
some rework in page_alloc.c too.

I can't resist pointing out that it might have been a good
idea to retain the flexibility that was in the original Xen/ia64
allocator, rather than move to the oversimplified Xen/x86 approach.
(There may be a lesson here that could be applied for the p2m issue?)
I know Kevin remembers this discussion, but for those of you
who might want to read some ancient history on this topic, see
the thread "[Xen-devel] A tale of three memory allocators" at:
http://lists.xensource.com/archives/html/xen-devel/2005-03/msg00815.html

Dan

_______________________________________________
Xen-ia64-devel mailing list
[hidden email]
http://lists.xensource.com/xen-ia64-devel