Quantcast

1000 Domains: Not able to access Domu via xm console from Dom0

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
Hi all, 

I am running Xen 4.1.2 with ubuntu Dom0.

I have essentially got 1000 Modified Mini-OS DomU's running at the same time. When i try and access the 1000th domain console:

xm console DOM1000
xenconsole: could not read tty from store: No such file or directory

The domain is alive and running according to xentop, and has been for some time.

I can successfully access the first 338 domains with xm console, but (a sampling) of the rest give the above error.


Any help, or is this a limitation of Xen?

Thanks

Paul

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Campbell-10
On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:

> Any help, or is this a limitation of Xen?

One limit you might be hitting is the number of event channels which
dom0 can handle. The maximum is currently 1024 for a 32-bit domains and
4096 for 64-bit (that's per domains, not total in the system). Depending
on the configuration of the mini-os domains (e.g. number of devices etc)
you might be hitting this -- "lsevtchn 0" might give a clue if this is
happening (that tool is in /usr/lib/xen somewhere).

Work has just started on expanding these limits to ~32k and ~512k for
32- and 64-bit domains respectively, the hope is that this will be done
in time for 4.3. Look for posts from Wei Liu on xen-devel this week.

If you aren't hitting the evtchn limits then maybe you are hitting some
dom0 OS level limitation, i.e. a ulimit on the number of open file
descriptors which xenconsoled can have or some limit on the number of
pty's.

Ian.


_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
On 7 December 2012 10:03, Ian Campbell <[hidden email]> wrote:
On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:

> Any help, or is this a limitation of Xen?

One limit you might be hitting is the number of event channels which
dom0 can handle. The maximum is currently 1024 for a 32-bit domains and
4096 for 64-bit (that's per domains, not total in the system). Depending
on the configuration of the mini-os domains (e.g. number of devices etc)
you might be hitting this -- "lsevtchn 0" might give a clue if this is
happening (that tool is in /usr/lib/xen somewhere).

Work has just started on expanding these limits to ~32k and ~512k for
32- and 64-bit domains respectively, the hope is that this will be done
in time for 4.3. Look for posts from Wei Liu on xen-devel this week.

If you aren't hitting the evtchn limits then maybe you are hitting some
dom0 OS level limitation, i.e. a ulimit on the number of open file
descriptors which xenconsoled can have or some limit on the number of
pty's.

Ian.

Hi Ian, 

Thanks for the quick reply!

Have looked into your suggestions and:

* It is NOT the number of evntchns, this is much less that the limits you mention

* It is NOT the number of allowable PTY's, the number used is much less than the limit

* The number of per process file descriptors was set to 1024, but i have increased this to thousands : 
ulimint -n
10240

To hammer this point home, i built a wee C file to allocate pty's. Before i changed the limit i got problems past 1024, now it work fine as root, or any user.

But, when i create ~350 domains: 

cat /proc/<xenconsoled>/fd | wc -l  
1024

only ever goes as high as 1024, and does not increase for subsequently added domains.

Any other ideas?

Also, as a side note, any idea why the domain creation time grows quadratically?

Thanks 

Paul



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
Further to that last email looking in the xen store is confirming that the tty (pty) is not being assigned to the domains above 338

root@desktop:~# xenstore-ls /local/domain/339/console
ring-ref = "750902"
port = "2"
limit = "1048576"
type = "xenconsoled"

Whereas for 338 we get:

root@desktop:~# xenstore-ls /local/domain/338/console
ring-ref = "737537"
port = "2"
limit = "1048576"
type = "xenconsoled"
tty = "/dev/pts/342"




On 11 December 2012 22:07, Paul Harvey <[hidden email]> wrote:
On 7 December 2012 10:03, Ian Campbell <[hidden email]> wrote:

On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:

> Any help, or is this a limitation of Xen?

One limit you might be hitting is the number of event channels which
dom0 can handle. The maximum is currently 1024 for a 32-bit domains and
4096 for 64-bit (that's per domains, not total in the system). Depending
on the configuration of the mini-os domains (e.g. number of devices etc)
you might be hitting this -- "lsevtchn 0" might give a clue if this is
happening (that tool is in /usr/lib/xen somewhere).

Work has just started on expanding these limits to ~32k and ~512k for
32- and 64-bit domains respectively, the hope is that this will be done
in time for 4.3. Look for posts from Wei Liu on xen-devel this week.

If you aren't hitting the evtchn limits then maybe you are hitting some
dom0 OS level limitation, i.e. a ulimit on the number of open file
descriptors which xenconsoled can have or some limit on the number of
pty's.

Ian.

Hi Ian, 

Thanks for the quick reply!

Have looked into your suggestions and:

* It is NOT the number of evntchns, this is much less that the limits you mention

* It is NOT the number of allowable PTY's, the number used is much less than the limit

* The number of per process file descriptors was set to 1024, but i have increased this to thousands : 
ulimint -n
10240

To hammer this point home, i built a wee C file to allocate pty's. Before i changed the limit i got problems past 1024, now it work fine as root, or any user.

But, when i create ~350 domains: 

cat /proc/<xenconsoled>/fd | wc -l  
1024

only ever goes as high as 1024, and does not increase for subsequently added domains.

Any other ideas?

Also, as a side note, any idea why the domain creation time grows quadratically?

Thanks 

Paul




_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
So, i attached strace to xenconsoled to see i could find what was going on and i got this

ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1023, TIOCGPTN, [345])            = 0
stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 345), ...}) = 0
open("/dev/pts/345", O_RDWR|O_NOCTTY)   = -1 EMFILE (Too many open files)
close(1023)                             = 0
write(2, "Failed to create tty for domain-"..., 70) = 70
open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023
fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0


So this is definitely a problem with file limits, but i don't understand as the current limit on files per process is 65000


On 12 December 2012 12:02, Paul Harvey <[hidden email]> wrote:
Further to that last email looking in the xen store is confirming that the tty (pty) is not being assigned to the domains above 338

root@desktop:~# xenstore-ls /local/domain/339/console
ring-ref = "750902"
port = "2"
limit = "1048576"
type = "xenconsoled"

Whereas for 338 we get:

root@desktop:~# xenstore-ls /local/domain/338/console
ring-ref = "737537"
port = "2"
limit = "1048576"
type = "xenconsoled"
tty = "/dev/pts/342"





On 11 December 2012 22:07, Paul Harvey <[hidden email]> wrote:
On 7 December 2012 10:03, Ian Campbell <[hidden email]> wrote:

On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:

> Any help, or is this a limitation of Xen?

One limit you might be hitting is the number of event channels which
dom0 can handle. The maximum is currently 1024 for a 32-bit domains and
4096 for 64-bit (that's per domains, not total in the system). Depending
on the configuration of the mini-os domains (e.g. number of devices etc)
you might be hitting this -- "lsevtchn 0" might give a clue if this is
happening (that tool is in /usr/lib/xen somewhere).

Work has just started on expanding these limits to ~32k and ~512k for
32- and 64-bit domains respectively, the hope is that this will be done
in time for 4.3. Look for posts from Wei Liu on xen-devel this week.

If you aren't hitting the evtchn limits then maybe you are hitting some
dom0 OS level limitation, i.e. a ulimit on the number of open file
descriptors which xenconsoled can have or some limit on the number of
pty's.

Ian.

Hi Ian, 

Thanks for the quick reply!

Have looked into your suggestions and:

* It is NOT the number of evntchns, this is much less that the limits you mention

* It is NOT the number of allowable PTY's, the number used is much less than the limit

* The number of per process file descriptors was set to 1024, but i have increased this to thousands : 
ulimint -n
10240

To hammer this point home, i built a wee C file to allocate pty's. Before i changed the limit i got problems past 1024, now it work fine as root, or any user.

But, when i create ~350 domains: 

cat /proc/<xenconsoled>/fd | wc -l  
1024

only ever goes as high as 1024, and does not increase for subsequently added domains.

Any other ideas?

Also, as a side note, any idea why the domain creation time grows quadratically?

Thanks 

Paul





_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Campbell-10
On Thu, 2012-12-13 at 12:24 +0000, Paul Harvey wrote:

> So, i attached strace to xenconsoled to see i could find what was
> going on and i got this
>
> ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon
> echo ...}) = 0
> ioctl(1023, TIOCGPTN, [345])            = 0
> stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136,
> 345), ...}) = 0
> open("/dev/pts/345", O_RDWR|O_NOCTTY)   = -1 EMFILE (Too many open
> files)
> close(1023)                             = 0
> write(2, "Failed to create tty for domain-"..., 70) = 70
> open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
>
>
> So this is definitely a problem with file limits, but i don't
> understand as the current limit on files per process is 65000

I wrote the following yesterday and although I see it in my sent box I
can't see it in the list archives and you don't seem to have received it
either. I've no idea where it got to...


On Tue, 2012-12-11 at 22:07 +0000, Paul Harvey wrote:

> On 7 December 2012 10:03, Ian
> Campbell <[hidden email]> wrote:
>         On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:
>        
>         > Any help, or is this a limitation of Xen?
>        
>        
>         One limit you might be hitting is the number of event channels
>         which
>         dom0 can handle. The maximum is currently 1024 for a 32-bit
>         domains and
>         4096 for 64-bit (that's per domains, not total in the system).
>         Depending
>         on the configuration of the mini-os domains (e.g. number of
>         devices etc)
>         you might be hitting this -- "lsevtchn 0" might give a clue if
>         this is
>         happening (that tool is in /usr/lib/xen somewhere).
>        
>         Work has just started on expanding these limits to ~32k and
>         ~512k for
>         32- and 64-bit domains respectively, the hope is that this
>         will be done
>         in time for 4.3. Look for posts from Wei Liu on xen-devel this
>         week.
>        
>         If you aren't hitting the evtchn limits then maybe you are
>         hitting some
>         dom0 OS level limitation, i.e. a ulimit on the number of open
>         file
>         descriptors which xenconsoled can have or some limit on the
>         number of
>         pty's.
>        
>         Ian.
>
>
> Hi Ian,
>
>
> Thanks for the quick reply!
>
>
> Have looked into your suggestions and:
>
>
> * It is NOT the number of evntchns, this is much less that the limits
> you mention

OOI how many event channels do your 1000 domains require?

> * It is NOT the number of allowable PTY's, the number used is much
> less than the limit

Again OOI how many?

> * The number of per process file descriptors was set to 1024, but i
> have increased this to thousands :
> ulimint -n
> 10240

Did you apply this to the xenconsoled and other daemon processes too?
setting ulimit only effects the current process and its children.

> To hammer this point home, i built a wee C file to allocate pty's.
> Before i changed the limit i got problems past 1024, now it work fine
> as root, or any user.
>
>
> But, when i create ~350 domains:
>
>
> cat /proc/<xenconsoled>/fd | wc -l  
> 1024
>
>
> only ever goes as high as 1024, and does not increase for subsequently
> added domains.

I suspect you haven't actually increased the ulimit for this process.
What does /proc/<xenconsoled>/limits contain?

There may also be sysctls which limit the number of fds a process can
have.

> Any other ideas?

> Also, as a side note, any idea why the domain creation time grows
> quadratically?

Grows with the number of running domains you mean?

There were some memory allocator optimisations discussed on xen-devel
recently, but I don't recall the details enough to know if it is
relevant here, it could be that though. Other than that I'm afraid I've
no ideas.

Ian.


_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey

On 13 December 2012 14:58, Paul Harvey <[hidden email]> wrote:



On 13 December 2012 12:36, Ian Campbell <[hidden email]> wrote:
On Thu, 2012-12-13 at 12:24 +0000, Paul Harvey wrote:
> So, i attached strace to xenconsoled to see i could find what was
> going on and i got this
>
> ioctl(1023, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon
> echo ...}) = 0
> ioctl(1023, TIOCGPTN, [345])            = 0
> stat("/dev/pts/345", {st_mode=S_IFCHR|0620, st_rdev=makedev(136,
> 345), ...}) = 0
> open("/dev/pts/345", O_RDWR|O_NOCTTY)   = -1 EMFILE (Too many open
> files)
> close(1023)                             = 0
> write(2, "Failed to create tty for domain-"..., 70) = 70
> open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 1023
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
> fstat(1023, {st_mode=S_IFREG|0644, st_size=3661, ...}) = 0
>
>
> So this is definitely a problem with file limits, but i don't
> understand as the current limit on files per process is 65000

I wrote the following yesterday and although I see it in my sent box I
can't see it in the list archives and you don't seem to have received it
either. I've no idea where it got to...


On Tue, 2012-12-11 at 22:07 +0000, Paul Harvey wrote:
> On 7 December 2012 10:03, Ian
> Campbell <[hidden email]> wrote:
>         On Thu, 2012-12-06 at 23:27 +0000, Paul Harvey wrote:
>
>         > Any help, or is this a limitation of Xen?
>
>
>         One limit you might be hitting is the number of event channels
>         which
>         dom0 can handle. The maximum is currently 1024 for a 32-bit
>         domains and
>         4096 for 64-bit (that's per domains, not total in the system).
>         Depending
>         on the configuration of the mini-os domains (e.g. number of
>         devices etc)
>         you might be hitting this -- "lsevtchn 0" might give a clue if
>         this is
>         happening (that tool is in /usr/lib/xen somewhere).
>
>         Work has just started on expanding these limits to ~32k and
>         ~512k for
>         32- and 64-bit domains respectively, the hope is that this
>         will be done
>         in time for 4.3. Look for posts from Wei Liu on xen-devel this
>         week.
>
>         If you aren't hitting the evtchn limits then maybe you are
>         hitting some
>         dom0 OS level limitation, i.e. a ulimit on the number of open
>         file
>         descriptors which xenconsoled can have or some limit on the
>         number of

>         pty's.
>
>         Ian.
>
>
> Hi Ian,
>
>
> Thanks for the quick reply!
>
>
> Have looked into your suggestions and:
>
>
> * It is NOT the number of evntchns, this is much less that the limits
> you mention

OOI how many event channels do your 1000 domains require?
 

> * It is NOT the number of allowable PTY's, the number used is much
> less than the limit

Again OOI how many?

> * The number of per process file descriptors was set to 1024, but i
> have increased this to thousands :
> ulimint -n
> 10240

Did you apply this to the xenconsoled and other daemon processes too?
setting ulimit only effects the current process and its children.

> To hammer this point home, i built a wee C file to allocate pty's.
> Before i changed the limit i got problems past 1024, now it work fine
> as root, or any user.
>
>
> But, when i create ~350 domains:
>
>
> cat /proc/<xenconsoled>/fd | wc -l
> 1024
>
>
> only ever goes as high as 1024, and does not increase for subsequently
> added domains.

I suspect you haven't actually increased the ulimit for this process.
What does /proc/<xenconsoled>/limits contain?

There may also be sysctls which limit the number of fds a process can
have.

> Any other ideas?

> Also, as a side note, any idea why the domain creation time grows
> quadratically?

Grows with the number of running domains you mean?

There were some memory allocator optimisations discussed on xen-devel
recently, but I don't recall the details enough to know if it is
relevant here, it could be that though. Other than that I'm afraid I've
no ideas.

Ian.




Hi Ian, 

Thanks for getting back to me :)

So:

./lsevntchn 1000
   1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72
   2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73

cat /proc/sys/kernel/pty/max
4096

#with 338 Domains. There were 9 systems ones before starting
cat /proc/sys/kernel/pty/nr 
347

I have changed the configuration file /etc/security/limits.config and rebooted the machines and assumed that this would have applied the new limits to the deamons, but you were right and 

cat /proc/5388/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             87439                87439                processes 
Max open files            1024                1024                  files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       87439                87439                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us       


I killed all the domains and restarted the xenconsoled. This applies the new limits: 

cat /proc/27677/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             87439                87439                processes 
Max open files            65000                65000                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       87439                87439                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us       

BUT:

There is now a buffer overflow happening somewhere which is crashing the deamon when creating the 340th domain, as shown by strace: 

write(4, "\v\0\0\0\0\0\0\0\0\0\0\0+\0\0\0", 16) = 16
write(4, "/local/domain/1020/console/tty\0", 31) = 31
write(4, "/dev/pts/345", 12)            = 12
futex(0xd95124, FUTEX_WAIT_PRIVATE, 14161, NULL) = 0
futex(0xd950f8, FUTEX_WAKE_PRIVATE, 1)  = 0
rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fb5d50284a0}, NULL, 8) = 0
fcntl(1026, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address)
writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***: ", 6}, {"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {" terminated\n", 12}], 5) = 78
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb5d5eb3000
open("/usr/lib/xen-4.1/bin/../lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 1028
fstat(1028, {st_mode=S_IFREG|0644, st_size=85812, ...}) = 0
mmap(NULL, 85812, PROT_READ, MAP_PRIVATE, 1028, 0) = 0x7fb5d5e9e000
close(1028)                             = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 1028
read(1028, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320(\0\0\0\0\0\0"..., 832) = 832
fstat(1028, {st_mode=S_IFREG|0644, st_size=88384, ...}) = 0
mmap(NULL, 2184216, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 1028, 0) = 0x7fb5cf9d1000
mprotect(0x7fb5cf9e6000, 2093056, PROT_NONE) = 0
mmap(0x7fb5cfbe5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 1028, 0x14000) = 0x7fb5cfbe5000
close(1028)                             = 0
mprotect(0x7fb5cfbe5000, 4096, PROT_READ) = 0
munmap(0x7fb5d5e9e000, 85812)           = 0
futex(0x7fb5d53aedf0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fb5cfbe61a4, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "======= Backtrace: =========\n", 29) = 29
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__fortify_fail", 14}, {"+0x", 3}, {"37", 2}, {")", 1}, {"[0x", 3}, {"7fb5d50fc807", 12}, {"]\n", 2}], 9) = 69
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"109700", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fb700", 12}, {"]\n", 2}], 8) = 59
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"+0x", 3}, {"10a7be", 6}, {")", 1}, {"[0x", 3}, {"7fb5d50fc7be", 12}, {"]\n", 2}], 8) = 59
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"403cb8", 6}, {"]\n", 2}], 4) = 43
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4021d5", 6}, {"]\n", 2}], 4) = 43
writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1}, {"__libc_start_main", 17}, {"+0x", 3}, {"ed", 2}, {")", 1}, {"[0x", 3}, {"7fb5d501376d", 12}, {"]\n", 2}], 9) = 72
writev(2, [{"/usr/lib/xen-4.1/bin/xenconsoled", 32}, {"[0x", 3}, {"4022ad", 6}, {"]\n", 2}], 4) = 43
write(2, "======= Memory map: ========\n", 29) = 29


On 13 December 2012 15:27, Paul Harvey <[hidden email]> wrote:
Sorry, thought that i pressed reply all


On 13 December 2012 15:19, Ian Campbell <[hidden email]> wrote:
Please can you keep this conversation on the mailing list.

On Thu, 2012-12-13 at 15:12 +0000, Paul Harvey wrote:
[...]





_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Campbell-10
On Thu, 2012-12-13 at 15:28 +0000, Paul Harvey wrote:

> ./lsevntchn 1000
>    1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 72
>    2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 73

When I mentioned evtchn limitations I meant in dom0, IOW the other end
of all these. At two evtchns per-minios domain you'd expect to hit
issues around 512 domains on a 32 bit domain 0

> I have changed the configuration file /etc/security/limits.config and
>  rebooted the machines and assumed that this would have applied the
> new limits to the deamons, but you were right and

I don't have this file on Debian, so I guess it is particular to
whichever distro you use -- perhaps there is an dependency issue between
the xencommons initscript and whatever initscript applies the settings
from /etc/security/limits.config?

> I killed all the domains and restarted the xenconsoled. This applies
> the new limits:

Great!

> BUT:
>
>
> There is now a buffer overflow happening somewhere which is crashing
> the deamon when creating the 340th domain,

Not Great! :-/

I've added xen-devel@.

> as shown by strace:

Unfortunately strace doesn't give the sort of information needed to
diagnose this. Can you run the daemon under gdb? When it crashes you can
type "bt" to get a backtrace. If there are debuginfo packages available
in your distro installing the ones for the Xen packages would improve
the output of this too.

If you could figure out where (if anywhere) the daemons stderr (AKA fd
2) was going then that would be useful too. It may be enough to run it
in the foreground.

Ian.



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
SO

#with 341 domains
./lsevntchn 0 | wc -l
724

Attaching gdb to xenconsoled,

Program received signal SIGABRT, Aborted.
0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000000000403ca8 in handle_io () at daemon/io.c:1059
#7  0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166

Unfortunately strace doesn't give the sort of information needed to
diagnose this. Can you run the daemon under gdb? When it crashes you can
type "bt" to get a backtrace. If there are debuginfo packages available
in your distro installing the ones for the Xen packages would improve
the output of this too.


i don't really know how to enable the debugging info for these libraries. I can't see anything on Google about debuginfo packages for Ubuntu 12.04. Incidentally i just grabbed the xen version in there repo following this :

https://help.ubuntu.com/community/Xen

i did grab a copy of the source of xen 4.1.2 and compiled it with debug in the tools, so that is why i can see proper output for the first two

Paul

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Wei Liu-2
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:

> SO
>
> #with 341 domains
> ./lsevntchn 0 | wc -l
> 724
>
> Attaching gdb to xenconsoled,
>
> Program received signal SIGABRT, Aborted.
> 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0  0x00007fe588ca8425 in raise ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fe588cabb8b in abort ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fe588d7c807 in __fortify_fail ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007fe588d7b700 in __chk_fail ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x00007fe588d7c7be in __fdelt_warn ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000403ca8 in handle_io () at daemon/io.c:1059
> #7  0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at
> daemon/main.c:166
>

libc raises exception when it detects memory violation.

You can probably try to use valgrind to identify memory leak in
xenconsoled.


Wei.


_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Paul Harvey
On 14 December 2012 14:57, Wei Liu <[hidden email]> wrote:
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:
> SO
>
> #with 341 domains
> ./lsevntchn 0 | wc -l
> 724
>
> Attaching gdb to xenconsoled,
>
> Program received signal SIGABRT, Aborted.
> 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0  0x00007fe588ca8425 in raise ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fe588cabb8b in abort ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fe588d7c807 in __fortify_fail ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007fe588d7b700 in __chk_fail ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x00007fe588d7c7be in __fdelt_warn ()
> from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000403ca8 in handle_io () at daemon/io.c:1059
> #7  0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at
> daemon/main.c:166
>

libc raises exception when it detects memory violation.

You can probably try to use valgrind to identify memory leak in
xenconsoled.


Wei.


Feeling in a little over my head now,

I have run valgrind and include the file with the output. As before xenconsoled crashes, but i am not really sure how to read what i am seeing from valgrind. I am not really sure if it is telling me that these errors happen as it goes along, or if it is as a result of the crash that there are lost blocks around.

Valgrind was run with:

valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --log-file="valgrind_output.txt" --track-fds=yes ./xenconsoled --pid-file=/var/run/xenconsoled.pid

If the attached file doesn't show, could you tell where it should go?

Paul

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users

valgrind_output.txt (642K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Fajar A. Nugraha-4
In reply to this post by Paul Harvey
On Wed, Dec 12, 2012 at 5:07 AM, Paul Harvey <[hidden email]> wrote:
> increased this to thousands :
> ulimint -n
> 10240

By running "ulimit" manually? If yes, it's only applied in your current session.

>
> To hammer this point home, i built a wee C file to allocate pty's. Before i
> changed the limit i got problems past 1024, now it work fine as root, or any
> user.
>
> But, when i create ~350 domains:
>
> cat /proc/<xenconsoled>/fd | wc -l
> 1024
>
> only ever goes as high as 1024, and does not increase for subsequently added
> domains.

By default ubuntu only allows 1024 open file descriptor for any user.
Changing it manually using "ulimit" command does not change the global
limit. This includes root. Setting it globally is kinda pain:
- edit /etc/security/limits.conf, read about "nofile", and make
appropriate change
- edit /etc/pam.d/common-session and common-session-noninteractive,
include pam_limits.so (see settings for "su" for example)
- reboot

--
Fajar

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Campbell-10
In reply to this post by Paul Harvey
On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:

> Program received signal SIGABRT, Aborted.
> 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> #0  0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #5  0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6
> #6  0x0000000000403ca8 in handle_io () at daemon/io.c:1059
> #7  0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166

daemon/io.c:1059 in 4.1.2 is:
                                    FD_ISSET(xc_evtchn_fd(d->xce_handle),
                                             &readfds))
                                        handle_ring_read(d);

I rather suspect this is overrunning the readfds array.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn't show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems).

It doesn't seem likely that there will be a simple solution to this. We
probably need to switch to something other than select(2). poll(2) seems
to handle arbitrary numbers of file descriptors. epoll(7) would be nice
(it supposedly scales better than poll) but is Linux specific. Another
option might be to fork multiple worker processes (might be a good idea
if xenconsole becomes a bottleneck).

It seems likely (based on a quick grep) that both xenstore (both the C
and ocaml variants) will suffer from the same issue.

I'm not sure why we have an evtchn handle per guest, other than this
comment which suggests it was simply expedient rather than a good
design:
        /* Opening evtchn independently for each console is a bit
         * wasteful, but that's how the code is structured... */
        dom->xce_handle = xc_evtchn_open(NULL, 0);
        if (dom->xce_handle == NULL) {
                err = errno;
                goto out;
        }
However this is just one open fd which scales with number of domains
(the others are the pty related ones) so just fixing this would just buy
a bit more time but not fix the underlying issue.

Ian.



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0

Wei Liu-2
On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:

> On Fri, 2012-12-14 at 13:06 +0000, Paul Harvey wrote:
> > Program received signal SIGABRT, Aborted.
> > 0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) bt
> > #0  0x00007fe588ca8425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x00007fe588cabb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x00007fe588ce639e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x00007fe588d7c807 in __fortify_fail () from /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x00007fe588d7b700 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
> > #5  0x00007fe588d7c7be in __fdelt_warn () from /lib/x86_64-linux-gnu/libc.so.6
> > #6  0x0000000000403ca8 in handle_io () at daemon/io.c:1059
> > #7  0x00000000004021c5 in main (argc=2, argv=0x7fff58691d48) at daemon/main.c:166
>
> daemon/io.c:1059 in 4.1.2 is:
>                                     FD_ISSET(xc_evtchn_fd(d->xce_handle),
>                                              &readfds))
>                                         handle_ring_read(d);
>
> I rather suspect this is overrunning the readfds array.
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn't show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems).
>
> It doesn't seem likely that there will be a simple solution to this. We
> probably need to switch to something other than select(2). poll(2) seems
> to handle arbitrary numbers of file descriptors. epoll(7) would be nice
> (it supposedly scales better than poll) but is Linux specific. Another
> option might be to fork multiple worker processes (might be a good idea
> if xenconsole becomes a bottleneck).

libevent wraps around different event APIs and provides consistent
interface across OSes. But I don't know whether adding libevent as Xen
tools dependency is a good idea.

> It seems likely (based on a quick grep) that both xenstore (both the C
> and ocaml variants) will suffer from the same issue.
>

Yes, I ran a test and hit this limit in both Xenstored and Xenconsoled.

> I'm not sure why we have an evtchn handle per guest, other than this
> comment which suggests it was simply expedient rather than a good
> design:
> /* Opening evtchn independently for each console is a bit
> * wasteful, but that's how the code is structured... */
> dom->xce_handle = xc_evtchn_open(NULL, 0);
> if (dom->xce_handle == NULL) {
> err = errno;
> goto out;
> }
> However this is just one open fd which scales with number of domains
> (the others are the pty related ones) so just fixing this would just buy
> a bit more time but not fix the underlying issue.
>

Even if you work around this problem, you will still hit Xenstore limit.
So the underlying issue has to be fixed.


Wei.



_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Campbell-10
On Sat, 2012-12-29 at 16:21 +0000, Wei Liu wrote:
> On Mon, 2012-12-17 at 11:56 +0000, Ian Campbell wrote:

> > I rather suspect this is overrunning the readfds array.
> > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_select.h.html suggests this is sized by FD_SETSIZE. On my system that appears to be statically 1024 (at least strace doesn't show a syscall to determine it in a simple test app although grep /usr/include suggests that might be an option on some systems).
> >
> > It doesn't seem likely that there will be a simple solution to this. We
> > probably need to switch to something other than select(2). poll(2) seems
> > to handle arbitrary numbers of file descriptors. epoll(7) would be nice
> > (it supposedly scales better than poll) but is Linux specific. Another
> > option might be to fork multiple worker processes (might be a good idea
> > if xenconsole becomes a bottleneck).
>
> libevent wraps around different event APIs and provides consistent
> interface across OSes. But I don't know whether adding libevent as Xen
> tools dependency is a good idea.

Using some reasonably widespread library to abstract away the
differences between Linux and *BSD here seems like a better idea than
rolling our own. I don't know enough about it to say if libevent fits
the bill or not.

Based on a cursory glance it seems like libevent implements a complete
event loop and requires the application to switch over to using it
entirely. This may not be a bad thing but it might be preferable (and/or
easier) to see if there is a library which only provides a basic simple
abstraction/wrapper over the various polling mechanisms.

Ian.


_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Xen-devel] 1000 Domains: Not able to access Domu via xm console from Dom0

Ian Jackson-2
Ian Campbell writes ("Re: [Xen-devel] [Xen-users] 1000 Domains: Not able to access Domu via xm console from Dom0"):
> Using some reasonably widespread library to abstract away the
> differences between Linux and *BSD here seems like a better idea than
> rolling our own. I don't know enough about it to say if libevent fits
> the bill or not.

I don't see why we wouldn't just change the code to use poll() right
away.  poll is available everywhere and works in (roughly) the same
way.  It will fix the specific bug here.

poll's downside compared to nonportable approaches is that if you have
many hundreds or thousands of fds it can be less efficient.  I think
we can deal with the efficiency problem later.  As Ian writes, it
might be better to fork instead.

Ian.

_______________________________________________
Xen-users mailing list
[hidden email]
http://lists.xen.org/xen-users
Loading...