[PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

Konrad Rzeszutek Wilk
Changelog v5 [since v4]:
 - used populate_physmap, fixed bugs.
[v2-v4: not posted]
 - reworked the code in setup.c to work properly.
[v1: https://lkml.org/lkml/2012/3/30/492]
 - initial patchset


These patches that did not change between the first posting and this one are:

      xen/p2m: Move code around to allow for better re-usage.
      xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
      xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
      xen/p2m: An early bootup variant of set_phys_to_machine

so if you read them before - you can skip them.

The purpose of these patches is three-fold:
 1) Fix the case where dom0_mem=X is not used and the machine boots with less
    memory than it should. A subsequent 'xl mem-set 0 X' is required to balloon up
    to the right amount. Here is an example of what it looks like with the older
    kernels and with a pvops (and the patches):

        2.6.32 SLES:
        Memory: 7538688k/8079432k available (3971k kernel code, 8192k absent, 532300k reserved, 2491k data, 348k init)
        MemTotal:        8063140 kB
        MemFree:         7421504 kB
        DirectMap4k:     8071240 kB
        Domain-0                                     0  7873     4     r-----     20.3

        3.3:
        Memory: 6486452k/9208688k available (5825k kernel code, 1136060k absent, 1586176k reserved, 2890k data, 692k init)
        MemTotal:        6716156 kB
        MemFree:         6365696 kB
        DirectMap4k:     8078192 kB
        Domain-0                                     0  6774     4     r-----     26.0

        3.3+patches:
        Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 692k init)
        MemTotal:        7849924 kB
        MemFree:         7500748 kB
        DirectMap4k:     8078192 kB
        Domain-0                                     0  7883     4     r-----     11.9

    This is implemented in:
    [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries

 2). Fix where dom0_mem=XXXG is used (so the 'max' parameter is not involved). The
     bug is that we would ignore it:

        With dom0_mem=1G
        2.6.32 SLES:
        Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init)
        3.3:
        Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init)
        3.3+patches:
        Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init)

   This is implemented in:
   [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not


Note, that the combination of 'dom0_mem=XXXG,max:YYYG' is perfectly fine, so there is no
need to fix that:

        With dom0_mem=1G,max:3G

        2.6.32 SLES:
        Memory: 650240k/1056768k available (3971k kernel code, 8192k absent, 397852k reserved, 2491k data, 348k init)
        3.3:
        Memory: 690324k/4281724k available (5826k kernel code, 1136060k absent, 2455340k reserved, 2891k data, 692k init)
        3.3+patches:
        Memory: 690960k/4281724k available (5824k kernel code, 1136060k absent, 2454704k reserved, 2893k data, 692k init)


Konrad Rzeszutek Wilk (8):
      xen/p2m: Move code around to allow for better re-usage.
      xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
      xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
      xen/p2m: An early bootup variant of set_phys_to_machine
      xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
      xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.
      xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
      xen/setup: Combine the two hypercall functions - since they are quite similar.

 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/p2m.c              |  104 ++++++++++++++++-----------
 arch/x86/xen/setup.c            |  152 +++++++++++++++++++++++++++++++++------
 3 files changed, 193 insertions(+), 64 deletions(-)

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/8] xen/p2m: Move code around to allow for better re-usage.

Konrad Rzeszutek Wilk
We are going to be using the early_alloc_p2m (and
early_alloc_p2m_middle) code in follow up patches which
are not related to setting identity pages.

Hence lets move the code out in its own function and
rename them as appropiate.

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/p2m.c |   62 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 1b267e7..3cc3afe 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn)
  return true;
 }
 
-static bool __init __early_alloc_p2m(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn)
 {
  unsigned topidx, mididx, idx;
 
@@ -541,6 +541,36 @@ static bool __init __early_alloc_p2m(unsigned long pfn)
  }
  return idx != 0;
 }
+
+static bool __init early_alloc_p2m(unsigned long pfn)
+{
+ unsigned topidx = p2m_top_index(pfn);
+ unsigned long *mid_mfn_p;
+ unsigned long **mid;
+
+ mid = p2m_top[topidx];
+ mid_mfn_p = p2m_top_mfn_p[topidx];
+ if (mid == p2m_mid_missing) {
+ mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
+
+ p2m_mid_init(mid);
+
+ p2m_top[topidx] = mid;
+
+ BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
+ }
+ /* And the save/restore P2M tables.. */
+ if (mid_mfn_p == p2m_mid_missing_mfn) {
+ mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
+ p2m_mid_mfn_init(mid_mfn_p);
+
+ p2m_top_mfn_p[topidx] = mid_mfn_p;
+ p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
+ /* Note: we don't set mid_mfn_p[midix] here,
+ * look in early_alloc_p2m_middle */
+ }
+ return true;
+}
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
       unsigned long pfn_e)
 {
@@ -559,35 +589,11 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
  pfn < ALIGN(pfn_e, (P2M_MID_PER_PAGE * P2M_PER_PAGE));
  pfn += P2M_MID_PER_PAGE * P2M_PER_PAGE)
  {
- unsigned topidx = p2m_top_index(pfn);
- unsigned long *mid_mfn_p;
- unsigned long **mid;
-
- mid = p2m_top[topidx];
- mid_mfn_p = p2m_top_mfn_p[topidx];
- if (mid == p2m_mid_missing) {
- mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
-
- p2m_mid_init(mid);
-
- p2m_top[topidx] = mid;
-
- BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
- }
- /* And the save/restore P2M tables.. */
- if (mid_mfn_p == p2m_mid_missing_mfn) {
- mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
- p2m_mid_mfn_init(mid_mfn_p);
-
- p2m_top_mfn_p[topidx] = mid_mfn_p;
- p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
- /* Note: we don't set mid_mfn_p[midix] here,
- * look in __early_alloc_p2m */
- }
+ WARN_ON(!early_alloc_p2m(pfn));
  }
 
- __early_alloc_p2m(pfn_s);
- __early_alloc_p2m(pfn_e);
+ early_alloc_p2m_middle(pfn_s);
+ early_alloc_p2m_middle(pfn_e);
 
  for (pfn = pfn_s; pfn < pfn_e; pfn++)
  if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/8] xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
For identity cases we want to call reserve_brk only on the boundary
conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is
to work around identify regions (PCI spaces, gaps in E820) which are not
aligned on 2MB regions.

However for the case were we want to allocate P2M middle leafs at the
early bootup stage, irregardless of this alignment check we need some
means of doing that.  For that we provide the new argument.

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/p2m.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 3cc3afe..8b3a395 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -499,7 +499,7 @@ static bool alloc_p2m(unsigned long pfn)
  return true;
 }
 
-static bool __init early_alloc_p2m_middle(unsigned long pfn)
+static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
 {
  unsigned topidx, mididx, idx;
 
@@ -508,7 +508,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
  idx = p2m_index(pfn);
 
  /* Pfff.. No boundary cross-over, lets get out. */
- if (!idx)
+ if (!idx && check_boundary)
  return false;
 
  WARN(p2m_top[topidx][mididx] == p2m_identity,
@@ -531,7 +531,7 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn)
  p2m_top[topidx][mididx] = p2m;
 
  /* For save/restore we need to MFN of the P2M saved */
-
+
  mid_mfn_p = p2m_top_mfn_p[topidx];
  WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
  "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
@@ -592,8 +592,8 @@ unsigned long __init set_phys_range_identity(unsigned long pfn_s,
  WARN_ON(!early_alloc_p2m(pfn));
  }
 
- early_alloc_p2m_middle(pfn_s);
- early_alloc_p2m_middle(pfn_e);
+ early_alloc_p2m_middle(pfn_s, true);
+ early_alloc_p2m_middle(pfn_e, true);
 
  for (pfn = pfn_s; pfn < pfn_e; pfn++)
  if (!__set_phys_to_machine(pfn, IDENTITY_FRAME(pfn)))
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/8] xen/p2m: Collapse early_alloc_p2m_middle redundant checks.

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
At the start of the function we were checking for idx != 0
and bailing out. And later calling extend_brk if idx != 0.

That is unnecessary so remove that checks.

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/p2m.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 8b3a395..952edef 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -502,6 +502,8 @@ static bool alloc_p2m(unsigned long pfn)
 static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary)
 {
  unsigned topidx, mididx, idx;
+ unsigned long *p2m;
+ unsigned long *mid_mfn_p;
 
  topidx = p2m_top_index(pfn);
  mididx = p2m_mid_index(pfn);
@@ -522,24 +524,21 @@ static bool __init early_alloc_p2m_middle(unsigned long pfn, bool check_boundary
  return false;
 
  /* Boundary cross-over for the edges: */
- if (idx) {
- unsigned long *p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
- unsigned long *mid_mfn_p;
+ p2m = extend_brk(PAGE_SIZE, PAGE_SIZE);
 
- p2m_init(p2m);
+ p2m_init(p2m);
 
- p2m_top[topidx][mididx] = p2m;
+ p2m_top[topidx][mididx] = p2m;
 
- /* For save/restore we need to MFN of the P2M saved */
+ /* For save/restore we need to MFN of the P2M saved */
 
- mid_mfn_p = p2m_top_mfn_p[topidx];
- WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
- "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
- topidx, mididx);
- mid_mfn_p[mididx] = virt_to_mfn(p2m);
+ mid_mfn_p = p2m_top_mfn_p[topidx];
+ WARN(mid_mfn_p[mididx] != virt_to_mfn(p2m_missing),
+ "P2M_TOP_P[%d][%d] != MFN of p2m_missing!\n",
+ topidx, mididx);
+ mid_mfn_p[mididx] = virt_to_mfn(p2m);
 
- }
- return idx != 0;
+ return true;
 }
 
 static bool __init early_alloc_p2m(unsigned long pfn)
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/8] xen/p2m: An early bootup variant of set_phys_to_machine

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
During early bootup we can't use alloc_page, so to allocate
leaf pages in the P2M we need to use extend_brk. For that
we are utilizing the early_alloc_p2m and early_alloc_p2m_middle
functions to do the job for us. This function follows the
same logic as set_phys_to_machine.

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/include/asm/xen/page.h |    1 +
 arch/x86/xen/p2m.c              |   15 +++++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index c34f96c..93971e8 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -44,6 +44,7 @@ extern unsigned long  machine_to_phys_nr;
 
 extern unsigned long get_phys_to_machine(unsigned long pfn);
 extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
 extern unsigned long set_phys_range_identity(unsigned long pfn_s,
      unsigned long pfn_e);
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 952edef..ffd08c4 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -570,6 +570,21 @@ static bool __init early_alloc_p2m(unsigned long pfn)
  }
  return true;
 }
+bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+ if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
+ if (!early_alloc_p2m(pfn))
+ return false;
+
+ if (!early_alloc_p2m_middle(pfn, false /* boundary crossover OK!*/))
+ return false;
+
+ if (!__set_phys_to_machine(pfn, mfn))
+ return false;
+ }
+
+ return true;
+}
 unsigned long __init set_phys_range_identity(unsigned long pfn_s,
       unsigned long pfn_e)
 {
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 5/8] xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
Otherwise we can get these meaningless:
Freeing  bad80-badf4 pfn range: 0 pages freed

We also can do this for the summary ones - no point of printing
"Set 0 page(s) to 1-1 mapping"

[v1: Extended to the summary printks]
Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/setup.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 1ba8dff..7b0ab77 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -114,8 +114,9 @@ static unsigned long __init xen_release_chunk(unsigned long start,
  len++;
  }
  }
- printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
-       start, end, len);
+ if (len)
+ printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
+       start, end, len);
 
  return len;
 }
@@ -162,8 +163,10 @@ static unsigned long __init xen_set_identity_and_release(
  }
  }
 
- printk(KERN_INFO "Released %lu pages of unused memory\n", released);
- printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
+ if (released)
+ printk(KERN_INFO "Released %lu pages of unused memory\n", released);
+ if (identity)
+ printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
 
  return released;
 }
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
We ignored the X value and ended up populating up to
max(MAX_DOMAIN_PAGES, last E820_RAM entry).

This fixes it by figuring out how many RAM nr_pages the
hypervisor wanted to provide to us and cap the populate
hypercalls up to that.

The end result is (on a 8GB box):

dom0_mem=1G
-Memory: 610884k/9435136k available (5817k kernel code, 1136060k absent, 7688192k reserved, 2899k data, 696k init)
+Memory: 724184k/1053064k available (5817k kernel code, 4552k absent, 324328k reserved, 2899k data, 696k init)

no dom0_mem
-Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)
+Memory: 7621460k/9208688k available (5817k kernel code, 1136060k absent, 451168k reserved, 2899k data, 696k init)

[v1: Details added]
Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/setup.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7b0ab77..0e82bf1 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -185,11 +185,29 @@ static unsigned long __init xen_get_max_pages(void)
  * the current maximum rather than the static maximum. In this
  * case the e820 map provided to us will cover the static
  * maximum region.
+ *
+ * The dom0_mem=min:X,max:Y tweaks options differently depending
+ * on the version, but in general this is what we get:
+ *                | XENMEM_maximum_reser  | nr_pages
+ * --------------++-----------------------+-------------------
+ *  no dom0_mem   | INT_MAX               | max_phys_pfn
+ *  =3G           | INT_MAX               | 786432
+ *  =max:3G       | 786432                | 786432
+ *  =min:1G,max:3G| INT_MAX               | max_phys_fn
+ *  =1G,max:3G    | INT_MAX               | 262144
+ *  =min:1G,max:3G,2G | INT_MAX           | max_phys_fn
+ *
+ * The =3G is often used and it lead to us initially setting
+ * 786432 and allowing dom0 to balloon up to the max_physical_pfn.
+ * This is at odd with the classic XenOClassic so lets emulate
+ * the classic behavior.
  */
  if (xen_initial_domain()) {
  ret = HYPERVISOR_memory_op(XENMEM_maximum_reservation, &domid);
  if (ret > 0)
  max_pages = ret;
+ if (ret == -1UL)
+ max_pages = xen_start_info->nr_pages;
  }
 
  return min(max_pages, MAX_DOMAIN_PAGES);
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.

The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000080800000 (usable)

The fun comes when a PV guest that is run with a machine E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:

BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 000000000009e000 (usable)
 Xen: 000000000009ec00 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000020000000 (usable)
 Xen: 0000000020000000 - 0000000020200000 (reserved)
 Xen: 0000000020200000 - 0000000040000000 (usable)
 Xen: 0000000040000000 - 0000000040200000 (reserved)
 Xen: 0000000040200000 - 00000000bad80000 (usable)
 Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:

Freeing  20000-20200 pfn range: 512 pages freed
Freeing  40000-40200 pfn range: 512 pages freed
Freeing  bad80-badf4 pfn range: 116 pages freed
Freeing  badf6-bae7f pfn range: 137 pages freed
Freeing  bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory

Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtracted by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.

This patch, implements the populate hypercall (XENMEM_populate_physmap)
which increases the the domain with the same amount of pages that
were released.

The other solution (that did not work) was to transplant the MFN in
the P2M tree - the ones that were going to be freed were put in
the E820_RAM regions past the nr_pages. But the modifications to the
M2P array (the other side of creating PTEs) were not carried away.
As the hypervisor is the only one capable of modifying that and the
only two hypercalls that would do this are: the update_va_mapping
(which won't work, as during initial bootup only PFNs up to nr_pages
are mapped in the guest) or via the populate hypercall.

The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.

On a 8GB machine, with various dom0_mem= parameters this is what we get:

no dom0_mem
-Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init)
+Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init)

dom0_mem=3G
-Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init)
+Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init)

dom0_mem=max:3G
-Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init)
+Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init)

And the 'xm list' or 'xl list' now reflect what the dom0_mem=
argument is.

[v2: Use populate hypercall]
[v3: Remove debug printks]
[v4: Simplify code]
Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/setup.c |  116 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 0e82bf1..b44d409 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -26,7 +26,6 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/physdev.h>
 #include <xen/features.h>
-
 #include "xen-ops.h"
 #include "vdso.h"
 
@@ -120,7 +119,105 @@ static unsigned long __init xen_release_chunk(unsigned long start,
 
  return len;
 }
+static unsigned long __init xen_populate_physmap(unsigned long start,
+ unsigned long end)
+{
+ struct xen_memory_reservation reservation = {
+ .address_bits = 0,
+ .extent_order = 0,
+ .domid        = DOMID_SELF
+ };
+ unsigned long len = 0;
+ int ret;
+
+ for (pfn = start; pfn < end; pfn++) {
+ unsigned long frame;
+
+ /* Make sure pfn does not exists to start with */
+ if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY)
+ continue;
 
+ frame = pfn;
+ set_xen_guest_handle(reservation.extent_start, &frame);
+ reservation.nr_extents = 1;
+
+ ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+ WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret);
+ if (ret == 1) {
+ if (!early_set_phys_to_machine(pfn, frame)) {
+ set_xen_guest_handle(reservation.extent_start, &frame);
+ reservation.nr_extents = 1;
+ ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+ &reservation);
+ break;
+ }
+ len++;
+ } else
+ break;
+ }
+ if (len)
+ printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n",
+       start, end, len);
+ return len;
+}
+static unsigned long __init xen_populate_chunk(
+ const struct e820entry *list, size_t map_size,
+ unsigned long max_pfn, unsigned long *last_pfn,
+ unsigned long credits_left)
+{
+ const struct e820entry *entry;
+ unsigned int i;
+ unsigned long done = 0;
+ unsigned long dest_pfn;
+
+ for (i = 0, entry = list; i < map_size; i++, entry++) {
+ unsigned long credits = credits_left;
+ unsigned long s_pfn;
+ unsigned long e_pfn;
+ unsigned long pfns;
+ long capacity;
+
+ if (credits <= 0)
+ break;
+
+ if (entry->type != E820_RAM)
+ continue;
+
+ e_pfn = PFN_UP(entry->addr + entry->size);
+
+ /* We only care about E820 after the xen_start_info->nr_pages */
+ if (e_pfn <= max_pfn)
+ continue;
+
+ s_pfn = PFN_DOWN(entry->addr);
+ /* If the E820 falls within the nr_pages, we want to start
+ * at the nr_pages PFN.
+ * If that would mean going past the E820 entry, skip it
+ */
+ if (s_pfn <= max_pfn) {
+ capacity = e_pfn - max_pfn;
+ dest_pfn = max_pfn;
+ } else {
+ /* last_pfn MUST be within E820_RAM regions */
+ if (*last_pfn && e_pfn >= *last_pfn)
+ s_pfn = *last_pfn;
+ capacity = e_pfn - s_pfn;
+ dest_pfn = s_pfn;
+ }
+ /* If we had filled this E820_RAM entry, go to the next one. */
+ if (capacity <= 0)
+ continue;
+
+ if (credits > capacity)
+ credits = capacity;
+
+ pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits);
+ done += pfns;
+ credits_left -= pfns;
+ *last_pfn = (dest_pfn + pfns);
+ }
+ return done;
+}
 static unsigned long __init xen_set_identity_and_release(
  const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
@@ -143,7 +240,6 @@ static unsigned long __init xen_set_identity_and_release(
  */
  for (i = 0, entry = list; i < map_size; i++, entry++) {
  phys_addr_t end = entry->addr + entry->size;
-
  if (entry->type == E820_RAM || i == map_size - 1) {
  unsigned long start_pfn = PFN_DOWN(start);
  unsigned long end_pfn = PFN_UP(end);
@@ -238,7 +334,9 @@ char * __init xen_memory_setup(void)
  int rc;
  struct xen_memory_map memmap;
  unsigned long max_pages;
+ unsigned long last_pfn = 0;
  unsigned long extra_pages = 0;
+ unsigned long populated;
  int i;
  int op;
 
@@ -278,9 +376,20 @@ char * __init xen_memory_setup(void)
  */
  xen_released_pages = xen_set_identity_and_release(
  map, memmap.nr_entries, max_pfn);
- extra_pages += xen_released_pages;
 
  /*
+ * Populate back the non-RAM pages and E820 gaps that had been
+ * released. */
+ populated = xen_populate_chunk(map, memmap.nr_entries,
+ max_pfn, &last_pfn, xen_released_pages);
+
+ extra_pages += (xen_released_pages - populated);
+
+ if (last_pfn > max_pfn) {
+ max_pfn = min(MAX_DOMAIN_PAGES, last_pfn);
+ mem_end = PFN_PHYS(max_pfn);
+ }
+ /*
  * Clamp the amount of extra memory to a EXTRA_MEM_RATIO
  * factor the base size.  On non-highmem systems, the base
  * size is the full initial memory allocation; on highmem it
@@ -293,7 +402,6 @@ char * __init xen_memory_setup(void)
  */
  extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
   extra_pages);
-
  i = 0;
  while (i < memmap.nr_entries) {
  u64 addr = map[i].addr;
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

[PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
They use the same set of arguments, so it is just the matter
of using the proper hypercall.

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/setup.c |   81 ++++++++++++++++++-------------------------------
 1 files changed, 30 insertions(+), 51 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b44d409..506a3e6 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -83,8 +83,8 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
  __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 }
 
-static unsigned long __init xen_release_chunk(unsigned long start,
-      unsigned long end)
+static unsigned long __init xen_do_chunk(unsigned long start,
+ unsigned long end, bool release)
 {
  struct xen_memory_reservation reservation = {
  .address_bits = 0,
@@ -95,60 +95,36 @@ static unsigned long __init xen_release_chunk(unsigned long start,
  unsigned long pfn;
  int ret;
 
- for(pfn = start; pfn < end; pfn++) {
- unsigned long mfn = pfn_to_mfn(pfn);
-
- /* Make sure pfn exists to start with */
- if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
- continue;
-
- set_xen_guest_handle(reservation.extent_start, &mfn);
- reservation.nr_extents = 1;
-
- ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
-   &reservation);
- WARN(ret != 1, "Failed to release pfn %lx err=%d\n", pfn, ret);
- if (ret == 1) {
- __set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
- len++;
- }
- }
- if (len)
- printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
-       start, end, len);
-
- return len;
-}
-static unsigned long __init xen_populate_physmap(unsigned long start,
- unsigned long end)
-{
- struct xen_memory_reservation reservation = {
- .address_bits = 0,
- .extent_order = 0,
- .domid        = DOMID_SELF
- };
- unsigned long len = 0;
- int ret;
-
  for (pfn = start; pfn < end; pfn++) {
  unsigned long frame;
+ unsigned long mfn = pfn_to_mfn(pfn);
 
- /* Make sure pfn does not exists to start with */
- if (pfn_to_mfn(pfn) != INVALID_P2M_ENTRY)
- continue;
-
- frame = pfn;
+ if (release) {
+ /* Make sure pfn exists to start with */
+ if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
+ continue;
+ frame = mfn;
+ } else {
+ if (mfn != INVALID_P2M_ENTRY)
+ continue;
+ frame = pfn;
+ }
  set_xen_guest_handle(reservation.extent_start, &frame);
  reservation.nr_extents = 1;
 
- ret = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
- WARN(ret != 1, "Failed to populate pfn %lx err=%d\n", pfn, ret);
+ ret = HYPERVISOR_memory_op(release ? XENMEM_decrease_reservation : XENMEM_populate_physmap,
+   &reservation);
+ WARN(ret != 1, "Failed to %s pfn %lx err=%d\n",
+     release ? "release" : "populate", pfn, ret);
+
  if (ret == 1) {
- if (!early_set_phys_to_machine(pfn, frame)) {
+ if (!early_set_phys_to_machine(pfn, release ? INVALID_P2M_ENTRY : frame)) {
+ if (release)
+ break;
  set_xen_guest_handle(reservation.extent_start, &frame);
  reservation.nr_extents = 1;
  ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
- &reservation);
+   &reservation);
  break;
  }
  len++;
@@ -156,8 +132,11 @@ static unsigned long __init xen_populate_physmap(unsigned long start,
  break;
  }
  if (len)
- printk(KERN_INFO "Populated %lx-%lx pfn range: %lu pages added\n",
-       start, end, len);
+ printk(KERN_INFO "%s %lx-%lx pfn range: %lu pages %s\n",
+       release ? "Freeing" : "Populating",
+       start, end, len,
+       release ? "freed" : "added");
+
  return len;
 }
 static unsigned long __init xen_populate_chunk(
@@ -211,7 +190,7 @@ static unsigned long __init xen_populate_chunk(
  if (credits > capacity)
  credits = capacity;
 
- pfns = xen_populate_physmap(dest_pfn, dest_pfn + credits);
+ pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false);
  done += pfns;
  credits_left -= pfns;
  *last_pfn = (dest_pfn + pfns);
@@ -249,8 +228,8 @@ static unsigned long __init xen_set_identity_and_release(
 
  if (start_pfn < end_pfn) {
  if (start_pfn < nr_pages)
- released += xen_release_chunk(
- start_pfn, min(end_pfn, nr_pages));
+ released += xen_do_chunk(
+ start_pfn, min(end_pfn, nr_pages), true);
 
  identity += set_phys_range_identity(
  start_pfn, end_pfn);
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

Konrad Rzeszutek Wilk
In reply to this post by Konrad Rzeszutek Wilk
On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
> Changelog v5 [since v4]:
>  - used populate_physmap, fixed bugs.
> [v2-v4: not posted]
>  - reworked the code in setup.c to work properly.
> [v1: https://lkml.org/lkml/2012/3/30/492]
>  - initial patchset

One bug I found was that with 'dom0_mem=max:1G' (with and without these
patches) I would get a bunch of

(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)

where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
I ran on it. I figured it out that the difference was in the ACPI tables
that are allocated - and that those regions - even though are returned
back to the hypervisor, cannot be repopulated. I can't find the actual
exact piece of code in the hypervisor to pin-point and say "Aha".

What I did was use the same metrix that the hypervisor uses to figure
out whether to deny the guest ballooning up - checking the d->tot_pages
against t->max_pages. For that the XENMEM_current_reservation is used.


From e4568b678455f68d374277319fb5cc41f11b6c4f Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <[hidden email]>
Date: Thu, 26 Apr 2012 22:11:08 -0400
Subject: [PATCH] xen/setup: Cap amount to populate based on current tot_pages
 count.

Previous to this patch we would try to populate exactly up to
xen_released_pages number (so the ones that we released), but
that is incorrect as there are some pages that we thought were
released but in actuality were shared. Depending on the platform
it could be small amounts - 2 pages, but some had much higher - 17.

As such, lets use the XENMEM_current_reservation to get the exact
count of pages we are using, subtract it from nr_pages and use the
lesser of this and xen_released_pages to populate back.

This fixes errors such as:

(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)

Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
---
 arch/x86/xen/setup.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 506a3e6..8e7dcfd 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -287,7 +287,15 @@ static unsigned long __init xen_get_max_pages(void)
 
  return min(max_pages, MAX_DOMAIN_PAGES);
 }
-
+static unsigned long xen_get_current_pages(void)
+{
+ domid_t domid = DOMID_SELF;
+ int ret;
+ ret = HYPERVISOR_memory_op(XENMEM_current_reservation, &domid);
+ if (ret > 0)
+ return ret;
+ return 0;
+}
 static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
 {
  u64 end = start + size;
@@ -358,7 +366,11 @@ char * __init xen_memory_setup(void)
 
  /*
  * Populate back the non-RAM pages and E820 gaps that had been
- * released. */
+ * released. But cap it as certain regions cannot be repopulated.
+ */
+ if (xen_get_current_pages())
+ xen_released_pages = min(max_pfn - xen_get_current_pages(),
+ xen_released_pages);
  populated = xen_populate_chunk(map, memmap.nr_entries,
  max_pfn, &last_pfn, xen_released_pages);
 
--
1.7.7.5


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

Jan Beulich-2
>>> On 01.05.12 at 18:37, Konrad Rzeszutek Wilk <[hidden email]> wrote:
> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>> Changelog v5 [since v4]:
>>  - used populate_physmap, fixed bugs.
>> [v2-v4: not posted]
>>  - reworked the code in setup.c to work properly.
>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>  - initial patchset
>
> One bug I found was that with 'dom0_mem=max:1G' (with and without these
> patches) I would get a bunch of
>
> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0
> of 17)
>
> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
> I ran on it. I figured it out that the difference was in the ACPI tables
> that are allocated - and that those regions - even though are returned
> back to the hypervisor, cannot be repopulated. I can't find the actual
> exact piece of code in the hypervisor to pin-point and say "Aha".

Are you sure about this? For one, given that you wrote that you saw this
with as little as a single page off, I doubt there's any half way modern
system where ACPI tables all fit into a single page (2 would even seem to
be a theoretical minimum, as there ought to be one NVS region along with
the "normal" tables).

Second, the ability to repopulate shouldn't really depend on the nature of
the corresponding machine pages.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

David Vrabel
In reply to this post by Konrad Rzeszutek Wilk
On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote:

> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>> Changelog v5 [since v4]:
>>  - used populate_physmap, fixed bugs.
>> [v2-v4: not posted]
>>  - reworked the code in setup.c to work properly.
>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>  - initial patchset
>
> One bug I found was that with 'dom0_mem=max:1G' (with and without these
> patches) I would get a bunch of
>
> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
>
> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
> I ran on it. I figured it out that the difference was in the ACPI tables
> that are allocated - and that those regions - even though are returned
> back to the hypervisor, cannot be repopulated. I can't find the actual
> exact piece of code in the hypervisor to pin-point and say "Aha".

It was tricky to track down what is going here but I think I see what's
happening.

The problem pages (on the system I looked at) were located just before
the ISA memory region (so PFN < a0) and so they are mapped in the
bootstrap page tables and have an additional ref so are not immediately
freed when the page is released.  They do get freed later on, presumably
when the page tables are swapped over.

I think the mapping needs to be removed with
HYPERVISOR_update_va_mapping() before releasing the page.  This is
already done for the ISA region in xen_ident_map_ISA().

I may be easier to avoid doing anything with the PFNs < 0x100 and take
the minimal lose of memory.

David

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 6/8] xen/setup: Work properly with 'dom0_mem=X' or with not dom0_mem.

David Vrabel-2
In reply to this post by Konrad Rzeszutek Wilk
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> We ignored the X value and ended up populating up to
> max(MAX_DOMAIN_PAGES, last E820_RAM entry).
>
> This fixes it by figuring out how many RAM nr_pages the
> hypervisor wanted to provide to us and cap the populate
> hypercalls up to that.

I don't think we should change the behavior again, particularly as the
dom0_mem documentation doesn't describe this proposed behaviour.

I would drop this patch.

David

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 7/8] xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM

David Vrabel-2
In reply to this post by Konrad Rzeszutek Wilk
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
>
>
> This patch, implements the populate hypercall (XENMEM_populate_physmap)
> which increases the the domain with the same amount of pages that
> were released.

Provided the problem with the unreclaimable pages are resolved:

Acked-by: David Vrabel <[hidden email]>

David

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.

David Vrabel-2
In reply to this post by Konrad Rzeszutek Wilk
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> They use the same set of arguments, so it is just the matter
> of using the proper hypercall.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>

Acked-by: David Vrabel <[hidden email]>

But would it be better to fold this into the previous patch?

David

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/8] xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0

David Vrabel
In reply to this post by Konrad Rzeszutek Wilk
On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> Otherwise we can get these meaningless:
> Freeing  bad80-badf4 pfn range: 0 pages freed
>
> We also can do this for the summary ones - no point of printing
> "Set 0 page(s) to 1-1 mapping"
>
> [v1: Extended to the summary printks]
> Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>

Acked-by: David Vrabel <[hidden email]>

David

> ---
>  arch/x86/xen/setup.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 1ba8dff..7b0ab77 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -114,8 +114,9 @@ static unsigned long __init xen_release_chunk(unsigned long start,
>   len++;
>   }
>   }
> - printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
> -       start, end, len);
> + if (len)
> + printk(KERN_INFO "Freeing  %lx-%lx pfn range: %lu pages freed\n",
> +       start, end, len);
>  
>   return len;
>  }
> @@ -162,8 +163,10 @@ static unsigned long __init xen_set_identity_and_release(
>   }
>   }
>  
> - printk(KERN_INFO "Released %lu pages of unused memory\n", released);
> - printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
> + if (released)
> + printk(KERN_INFO "Released %lu pages of unused memory\n", released);
> + if (identity)
> + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
>  
>   return released;
>  }


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

David Vrabel-2
In reply to this post by David Vrabel
On 03/05/12 12:48, David Vrabel wrote:

> On 01/05/12 17:37, Konrad Rzeszutek Wilk wrote:
>> On Mon, Apr 16, 2012 at 01:15:31PM -0400, Konrad Rzeszutek Wilk wrote:
>>> Changelog v5 [since v4]:
>>>  - used populate_physmap, fixed bugs.
>>> [v2-v4: not posted]
>>>  - reworked the code in setup.c to work properly.
>>> [v1: https://lkml.org/lkml/2012/3/30/492]
>>>  - initial patchset
>>
>> One bug I found was that with 'dom0_mem=max:1G' (with and without these
>> patches) I would get a bunch of
>>
>> (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
>> (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
>>
>> where the (0 of X), sometimes was 1, 2,3,4 or 17 -depending on the machine
>> I ran on it. I figured it out that the difference was in the ACPI tables
>> that are allocated - and that those regions - even though are returned
>> back to the hypervisor, cannot be repopulated. I can't find the actual
>> exact piece of code in the hypervisor to pin-point and say "Aha".
>
> It was tricky to track down what is going here but I think I see what's
> happening.
>
> The problem pages (on the system I looked at) were located just before
> the ISA memory region (so PFN < a0) and so they are mapped in the
> bootstrap page tables and have an additional ref so are not immediately
> freed when the page is released.  They do get freed later on, presumably
> when the page tables are swapped over.

It's not the bootstrap page tables but those constructed in
xen_setup_kernel_pagetable() but this has the same effect.

> I think the mapping needs to be removed with
> HYPERVISOR_update_va_mapping() before releasing the page.  This is
> already done for the ISA region in xen_ident_map_ISA().

And here's a patch that does this.  I've not given it a lot of testing.

This is on top of your 8/8 patch and your "xen/setup: Cap amount to
populate based on current tot_pages count." patch is no longer needed.

David

8<---------------------
From 17900ce942ed34ccc85b8e6fbce392118d95d9d3 Mon Sep 17 00:00:00 2001
From: David Vrabel <[hidden email]>
Date: Thu, 3 May 2012 15:57:00 +0100
Subject: [PATCH] xen: update VA mapping when releasing memory during setup

In xen_memory_setup(), if a page that is being released has a VA
mapping this must also be updated.  Otherwise, the page will be not
released completely -- it will still be referenced in Xen and won't be
freed util the mapping is removed and this prevents it from being
reallocated at a different PFN.

This was already being done for the ISA memory region in
xen_ident_map_ISA() but on many systems this was omitting a few pages
as many systems marked a few pages below the ISA memory region as
reserved in the e820 map.

Signed-off-by: David Vrabel <[hidden email]>
---
 arch/x86/xen/enlighten.c |    1 -
 arch/x86/xen/mmu.c       |   23 -----------------------
 arch/x86/xen/setup.c     |   41 ++++++++++++++++++++++++++++++++++-------
 arch/x86/xen/xen-ops.h   |    1 -
 4 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a8f8844..ff9a20a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1306,7 +1306,6 @@ asmlinkage void __init xen_start_kernel(void)
 
  xen_raw_console_write("mapping kernel into physical memory\n");
  pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages);
- xen_ident_map_ISA();
 
  /* Allocate and initialize top and mid mfn levels for p2m structure */
  xen_build_mfn_list_list();
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b8e2794..b756d8c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 #endif
 }
 
-void __init xen_ident_map_ISA(void)
-{
- unsigned long pa;
-
- /*
- * If we're dom0, then linear map the ISA machine addresses into
- * the kernel's address space.
- */
- if (!xen_initial_domain())
- return;
-
- xen_raw_printk("Xen: setup ISA identity maps\n");
-
- for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
- pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
-
- if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
- BUG();
- }
-
- xen_flush_tlb();
-}
-
 static void __init xen_post_allocator_init(void)
 {
  pv_mmu_ops.set_pte = xen_set_pte;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 506a3e6..d5f8714 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 
  return len;
 }
+
+static unsigned long __init xen_release_chunk(unsigned long start,
+      unsigned long end)
+{
+ return xen_do_chunk(start, end, true);
+}
+
 static unsigned long __init xen_populate_chunk(
  const struct e820entry *list, size_t map_size,
  unsigned long max_pfn, unsigned long *last_pfn,
@@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
  }
  return done;
 }
+
+static void __init xen_set_identity_and_release_chunk(
+ unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
+ unsigned long *released, unsigned long *identity)
+{
+ unsigned long pfn;
+
+ /*
+ * If the PFNs are currently mapped, the VA mapping also needs
+ * to be updated to be 1:1.
+ */
+ for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
+ (void)HYPERVISOR_update_va_mapping(
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ mfn_pte(pfn, PAGE_KERNEL_IO), 0);
+
+ if (start_pfn < nr_pages)
+ *released += xen_release_chunk(
+ start_pfn, min(end_pfn, nr_pages));
+
+ *identity += set_phys_range_identity(start_pfn, end_pfn);
+}
+
 static unsigned long __init xen_set_identity_and_release(
  const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
@@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
  if (entry->type == E820_RAM)
  end_pfn = PFN_UP(entry->addr);
 
- if (start_pfn < end_pfn) {
- if (start_pfn < nr_pages)
- released += xen_do_chunk(
- start_pfn, min(end_pfn, nr_pages), true);
+ if (start_pfn < end_pfn)
+ xen_set_identity_and_release_chunk(
+ start_pfn, end_pfn, nr_pages,
+ &released, &identity);
 
- identity += set_phys_range_identity(
- start_pfn, end_pfn);
- }
  start = end;
  }
  }
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b095739..506fa08 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
 void xen_build_mfn_list_list(void);
 void xen_setup_machphys_mapping(void);
 pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
-void xen_ident_map_ISA(void);
 void xen_reserve_top(void);
 extern unsigned long xen_max_p2m_pfn;
 
--
1.7.2.5

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 8/8] xen/setup: Combine the two hypercall functions - since they are quite similar.

Konrad Rzeszutek Wilk-2
In reply to this post by David Vrabel-2
On Thu, May 03, 2012 at 12:58:05PM +0100, David Vrabel wrote:
> On 16/04/12 18:15, Konrad Rzeszutek Wilk wrote:
> > They use the same set of arguments, so it is just the matter
> > of using the proper hypercall.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[hidden email]>
>
> Acked-by: David Vrabel <[hidden email]>
>
> But would it be better to fold this into the previous patch?

Nah, one logical change per patch.
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> [hidden email]
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

David Vrabel
In reply to this post by David Vrabel-2
On 03/05/12 16:15, David Vrabel wrote:

>
> xen: update VA mapping when releasing memory during setup
>
> In xen_memory_setup(), if a page that is being released has a VA
> mapping this must also be updated.  Otherwise, the page will be not
> released completely -- it will still be referenced in Xen and won't be
> freed util the mapping is removed and this prevents it from being
> reallocated at a different PFN.
>
> This was already being done for the ISA memory region in
> xen_ident_map_ISA() but on many systems this was omitting a few pages
> as many systems marked a few pages below the ISA memory region as
> reserved in the e820 map.
>
> Signed-off-by: David Vrabel <[hidden email]>
> ---
[...]

> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
>  #endif
>  }
>  
> -void __init xen_ident_map_ISA(void)
> -{
> - unsigned long pa;
> -
> - /*
> - * If we're dom0, then linear map the ISA machine addresses into
> - * the kernel's address space.
> - */
> - if (!xen_initial_domain())
> - return;

It might look like this test has gone, however the new code which
updates the VA mapping uses the e820 map and for a domU its map will not
have a ISA region so there's no mapping to be updated.

David

> -
> - xen_raw_printk("Xen: setup ISA identity maps\n");
> -
> - for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
> - pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
> -
> - if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
> - BUG();
> - }
> -
> - xen_flush_tlb();
> -}
> -
>  static void __init xen_post_allocator_init(void)
>  {
>   pv_mmu_ops.set_pte = xen_set_pte;
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 506a3e6..d5f8714 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  
>   return len;
>  }
> +
> +static unsigned long __init xen_release_chunk(unsigned long start,
> +      unsigned long end)
> +{
> + return xen_do_chunk(start, end, true);
> +}
> +
>  static unsigned long __init xen_populate_chunk(
>   const struct e820entry *list, size_t map_size,
>   unsigned long max_pfn, unsigned long *last_pfn,
> @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
>   }
>   return done;
>  }
> +
> +static void __init xen_set_identity_and_release_chunk(
> + unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
> + unsigned long *released, unsigned long *identity)
> +{
> + unsigned long pfn;
> +
> + /*
> + * If the PFNs are currently mapped, the VA mapping also needs
> + * to be updated to be 1:1.
> + */
> + for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> + (void)HYPERVISOR_update_va_mapping(
> + (unsigned long)__va(pfn << PAGE_SHIFT),
> + mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> +
> + if (start_pfn < nr_pages)
> + *released += xen_release_chunk(
> + start_pfn, min(end_pfn, nr_pages));
> +
> + *identity += set_phys_range_identity(start_pfn, end_pfn);
> +}
> +
>  static unsigned long __init xen_set_identity_and_release(
>   const struct e820entry *list, size_t map_size, unsigned long nr_pages)
>  {
> @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
>   if (entry->type == E820_RAM)
>   end_pfn = PFN_UP(entry->addr);
>  
> - if (start_pfn < end_pfn) {
> - if (start_pfn < nr_pages)
> - released += xen_do_chunk(
> - start_pfn, min(end_pfn, nr_pages), true);
> + if (start_pfn < end_pfn)
> + xen_set_identity_and_release_chunk(
> + start_pfn, end_pfn, nr_pages,
> + &released, &identity);
>  
> - identity += set_phys_range_identity(
> - start_pfn, end_pfn);
> - }
>   start = end;
>   }
>   }
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index b095739..506fa08 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
>  void xen_build_mfn_list_list(void);
>  void xen_setup_machphys_mapping(void);
>  pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
> -void xen_ident_map_ISA(void);
>  void xen_reserve_top(void);
>  extern unsigned long xen_max_p2m_pfn;
>  


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] auto balloon initial domain and fix dom0_mem=X inconsistencies (v5).

Konrad Rzeszutek Wilk
On Thu, May 03, 2012 at 05:27:27PM +0100, David Vrabel wrote:

> On 03/05/12 16:15, David Vrabel wrote:
> >
> > xen: update VA mapping when releasing memory during setup
> >
> > In xen_memory_setup(), if a page that is being released has a VA
> > mapping this must also be updated.  Otherwise, the page will be not
> > released completely -- it will still be referenced in Xen and won't be
> > freed util the mapping is removed and this prevents it from being
> > reallocated at a different PFN.
> >
> > This was already being done for the ISA memory region in
> > xen_ident_map_ISA() but on many systems this was omitting a few pages
> > as many systems marked a few pages below the ISA memory region as
> > reserved in the e820 map.
> >
> > Signed-off-by: David Vrabel <[hidden email]>
> > ---
> [...]
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1929,29 +1929,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> >  #endif
> >  }
> >  
> > -void __init xen_ident_map_ISA(void)
> > -{
> > - unsigned long pa;
> > -
> > - /*
> > - * If we're dom0, then linear map the ISA machine addresses into
> > - * the kernel's address space.
> > - */
> > - if (!xen_initial_domain())
> > - return;
>
> It might look like this test has gone, however the new code which
> updates the VA mapping uses the e820 map and for a domU its map will not
> have a ISA region so there's no mapping to be updated.

What if you use e820_hole=1 and the pci=xx in the guest?

>
> David
>
> > -
> > - xen_raw_printk("Xen: setup ISA identity maps\n");
> > -
> > - for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
> > - pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
> > -
> > - if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))
> > - BUG();
> > - }
> > -
> > - xen_flush_tlb();
> > -}
> > -
> >  static void __init xen_post_allocator_init(void)
> >  {
> >   pv_mmu_ops.set_pte = xen_set_pte;
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 506a3e6..d5f8714 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -139,6 +139,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
> >  
> >   return len;
> >  }
> > +
> > +static unsigned long __init xen_release_chunk(unsigned long start,
> > +      unsigned long end)
> > +{
> > + return xen_do_chunk(start, end, true);
> > +}
> > +
> >  static unsigned long __init xen_populate_chunk(
> >   const struct e820entry *list, size_t map_size,
> >   unsigned long max_pfn, unsigned long *last_pfn,
> > @@ -197,6 +204,29 @@ static unsigned long __init xen_populate_chunk(
> >   }
> >   return done;
> >  }
> > +
> > +static void __init xen_set_identity_and_release_chunk(
> > + unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
> > + unsigned long *released, unsigned long *identity)
> > +{
> > + unsigned long pfn;
> > +
> > + /*
> > + * If the PFNs are currently mapped, the VA mapping also needs
> > + * to be updated to be 1:1.
> > + */
> > + for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> > + (void)HYPERVISOR_update_va_mapping(
> > + (unsigned long)__va(pfn << PAGE_SHIFT),
> > + mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> > +
> > + if (start_pfn < nr_pages)
> > + *released += xen_release_chunk(
> > + start_pfn, min(end_pfn, nr_pages));
> > +
> > + *identity += set_phys_range_identity(start_pfn, end_pfn);
> > +}
> > +
> >  static unsigned long __init xen_set_identity_and_release(
> >   const struct e820entry *list, size_t map_size, unsigned long nr_pages)
> >  {
> > @@ -226,14 +256,11 @@ static unsigned long __init xen_set_identity_and_release(
> >   if (entry->type == E820_RAM)
> >   end_pfn = PFN_UP(entry->addr);
> >  
> > - if (start_pfn < end_pfn) {
> > - if (start_pfn < nr_pages)
> > - released += xen_do_chunk(
> > - start_pfn, min(end_pfn, nr_pages), true);
> > + if (start_pfn < end_pfn)
> > + xen_set_identity_and_release_chunk(
> > + start_pfn, end_pfn, nr_pages,
> > + &released, &identity);
> >  
> > - identity += set_phys_range_identity(
> > - start_pfn, end_pfn);
> > - }
> >   start = end;
> >   }
> >   }
> > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> > index b095739..506fa08 100644
> > --- a/arch/x86/xen/xen-ops.h
> > +++ b/arch/x86/xen/xen-ops.h
> > @@ -28,7 +28,6 @@ void xen_setup_shared_info(void);
> >  void xen_build_mfn_list_list(void);
> >  void xen_setup_machphys_mapping(void);
> >  pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
> > -void xen_ident_map_ISA(void);
> >  void xen_reserve_top(void);
> >  extern unsigned long xen_max_p2m_pfn;
> >  
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xen.org/xen-devel
12