A fix for the xend restart problems (2.0.x)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

A fix for the xend restart problems (2.0.x)

Jed Davis

The basic problem, which from the list archives it seems that I'm not
the only one running into: the first time xend is restarted (while
there are any guests running), it immediately dies on an exception
along the lines of "Invalid backend domain" after destroying one of
the domU's.  Further attempts to restart it get a "Failed to map
domain control interface" -- unless the dom0 kernel is NetBSD with
DIAGNOSTICS, in which case it panics.

After far too much time assuming this was a NetBSD-specific problem, I
eventually tracked it down in xend, and have this patch, which
probably isn't the Right solution, but nonetheless works:

--- tools/python/xen/xend/XendDomain.py.orig    2005-08-13 01:54:56.000000000 -0400
+++ tools/python/xen/xend/XendDomain.py 2005-08-13 01:55:17.000000000 -0400
@@ -147,7 +147,10 @@
             domid = str(d['dom'])
             doms[domid] = d
         dlist = []
-        for config in self.domain_db.values():
+        domkeys = map(int, self.domain_db.keys())
+        domkeys.sort()
+        for domkey in domkeys:
+            config = self.domain_db.get(str(domkey))
             domid = str(sxp.child_value(config, 'id'))
             if domid in doms:
                 d_dom = self._new_domain(config, doms[domid])

This change in traversal order avoids the exception shown below, when
the domU's info is being reconstructed, and its devices' backend
domain (here, dom0) is looked up -- but doesn't appear to exist yet,
because it hasn't been restored from the state files (or by querying
the hypervisor, for that matter) yet.  I assume it's due to code reuse
with a domain's actual creation that the exception causes xend to try
to destroy the domain after this fails.  The idea of the above patch,
then, is to restore the domains' state in the same order as they were
created.

This is the trace of the exception in question -- normally it gets
caught partway up and the "invalid backend domain" exception is thrown
from there, but I commented out the try/except so I could see that
first exception:

Traceback (most recent call last):
  File "/usr/local/sbin/xend", line 121, in ?
    sys.exit(main())
  File "/usr/local/sbin/xend", line 107, in main
    return daemon.start()
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDaemon.py", line 525, in start
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDaemon.py", line 615, in run
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvServer.py", line 47, in create
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvRoot.py", line 29, in __init__
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDir.py", line 69, in get
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDir.py", line 39, in getobj
  File "/pkg/xentools-2.0.6/usr/lib/python/xen/xend/server/SrvDomainDir.py", line 25, in __init__
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 800, in instance
    inst = XendDomain()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 65, in __init__
    self.initial_refresh()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 154, in initial_refresh
    d_dom = self._new_domain(config, doms[domid])
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 189, in _new_domain
    deferred = XendDomainInfo.vm_recreate(savedinfo, info)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 218, in vm_recreate
    d = vm.construct(config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 456, in construct
    deferred = self.configure()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 975, in configure
    d = self.create_devices()
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 803, in create_devices
    v = dev_handler(self, dev, dev_index)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1110, in vm_dev_vif
    defer = ctrl.attachDevice(vif, val, recreate=recreate)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 423, in attachDevice
    dev = self.addDevice(vif, config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 400, in addDevice
    dev = NetDev(vif, self, config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 105, in __init__
    self.configure(config)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/server/netif.py", line 150, in configure
    self.backendDomain = int(xd.domain_lookup(sxp.child_value(config, 'backend', '0')).id)
  File "/usr/local/lib/python2.4/site-packages/xen/xend/XendDomain.py", line 430, in domain_lookup
    raise XendError('invalid domain:' + name)
xen.xend.XendError.XendError: invalid domain:0



--
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))))))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))))))    '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel