Quantcast

Comments on Xen bug 1732

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Comments on Xen bug 1732

Haitao Shan
Hi, Jan,

As you may already notice the bug 1732, (http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is c/s 22182.

I see the following attached code in your patch. It is pointless to check msi->table_base against the value read from physical device if this function is a virtual function of SR-IOV device. VFs are required to have BARs zeroed by specifications. And for VFs, unless you can read these values from corresponding PF, you will have to trust the "table_base" passed from dom0 via hypercall. Actually, this parameter is specifically introduced for enabling SR-IOV.

I am not familiar with this patch and hence its story. But I think it would be very simple for you to fix this up?

BTW: I vaguely recall that MSI-X table base might not be the first page of the corresponding BAR register.

Shan Haitao

+    if ( !dev->msix_nr_entries )
+    {
+        u64 pba_paddr;
+        u32 pba_offset;
+
+        ASSERT(!dev->msix_used_entries);
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+
+        dev->msix_nr_entries = nr_entries;
+        dev->msix_table.first = PFN_DOWN(table_paddr);
+        dev->msix_table.last = PFN_DOWN(table_paddr +
+                                        nr_entries * PCI_MSIX_ENTRY_SIZE - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_table.first,
+                                        dev->msix_table.last));
+
+        pba_offset = pci_conf_read32(bus, slot, func,
+                                     msix_pba_offset_reg(pos));
+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
+        WARN_ON(!pba_paddr);
+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
+
+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
+                                      BITS_TO_LONGS(nr_entries) - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last));
+
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
+                                dev->msix_table.last) )
+            WARN();
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
+                                dev->msix_pba.last) )
+            WARN();
+
+        if ( dev->domain )
+            p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain),
+                                         p2m_mmio_direct, p2m_mmio_direct);
+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
+        {
+            struct domain *d = dev->domain;



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
> Hi, Jan,
>
> As you may already notice the bug 1732, (
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> c/s 22182.
>
> I see the following attached code in your patch. It is pointless to check
> msi->table_base against the value read from physical device if this function
> is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> by specifications. And for VFs, unless you can read these values from
> corresponding PF, you will have to trust the "table_base" passed from dom0
> via hypercall. Actually, this parameter is specifically introduced for
> enabling SR-IOV.

Quite possible, which would perhaps just require removing some
or all of the warnings. In doing so, it must however be avoided to
introduce new ways for things to go bad silently.

> I am not familiar with this patch and hence its story. But I think it would
> be very simple for you to fix this up?

Not really, no. I had posted this patch as a draft after there was
no reaction on the part of the original implementers of the MSI and
pass-through code to address the security problem we're dealing
with here (and afaict the issue still wasn't completely addressed,
as I don't recall having seen corresponding adjustments to qemu).
I never had hardware to test this with, and hence had to rely on
Yunhong's testing and ack-ing of the patch.

> BTW: I vaguely recall that MSI-X table base might not be the first page of
> the corresponding BAR register.

While I agree that the code is lacking the use of
msix_table_offset_reg(), I would question what else would be
in the range supplied by the BAR, as the specification allows
only MSI-X table and PBA to share a BAR.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jiang, Yunhong
>Not really, no. I had posted this patch as a draft after there was
>no reaction on the part of the original implementers of the MSI and
>pass-through code to address the security problem we're dealing
>with here (and afaict the issue still wasn't completely addressed,
>as I don't recall having seen corresponding adjustments to qemu).
>I never had hardware to test this with, and hence had to rely on
>Yunhong's testing and ack-ing of the patch.

Yes, my bad. I verified it, but didn't use the SR-IOV at that time :$

-jyh

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Haitao Shan
In reply to this post by Jan Beulich


2011/1/31 Jan Beulich <[hidden email]>
>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
> Hi, Jan,
>
> As you may already notice the bug 1732, (
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> c/s 22182.
>
> I see the following attached code in your patch. It is pointless to check
> msi->table_base against the value read from physical device if this function
> is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> by specifications. And for VFs, unless you can read these values from
> corresponding PF, you will have to trust the "table_base" passed from dom0
> via hypercall. Actually, this parameter is specifically introduced for
> enabling SR-IOV.

Quite possible, which would perhaps just require removing some
or all of the warnings. In doing so, it must however be avoided to
introduce new ways for things to go bad silently.

> I am not familiar with this patch and hence its story. But I think it would
> be very simple for you to fix this up?

Not really, no. I had posted this patch as a draft after there was
no reaction on the part of the original implementers of the MSI and
pass-through code to address the security problem we're dealing
with here (and afaict the issue still wasn't completely addressed,
as I don't recall having seen corresponding adjustments to qemu).
I never had hardware to test this with, and hence had to rely on
Yunhong's testing and ack-ing of the patch.

> BTW: I vaguely recall that MSI-X table base might not be the first page of
> the corresponding BAR register.

While I agree that the code is lacking the use of
msix_table_offset_reg(), I would question what else would be
in the range supplied by the BAR, as the specification allows
only MSI-X table and PBA to share a BAR.

This is what I copied from PCI spec 3.0. I don't see that the spec only allows the two to be shared.
-----------------------------PCI----------
To enable system software to map MSI-X structures onto different processor pages for
improved access control, it is recommended that a function dedicate separate Base Address
registers for the MSI-X Table and MSI-X PBA, or else provide more than the minimum
required isolation with address ranges.
If dedicated separate Base Address registers is not feasible, it is recommended that a
function dedicate a single Base Address register for the MSI-X Table and MSI-X PBA.
If a dedicated Base Address register is not feasible, it is recommended that a function isolate
the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges rather than
the mandatory aligned 4 KB ranges. 
--------------------------spec---------------

Jan



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Haitao Shan
We are heading Chinese New Year, which typically take 1.5 weeks. I am afraid we don't have time slot to make a fix for this.
Probably Keir can make a judgement on the impact of this bug? And decide whether a fix before 4.1 release is needed.

Shan Haitao 

2011/1/31 Haitao Shan <[hidden email]>


2011/1/31 Jan Beulich <[hidden email]>

>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
> Hi, Jan,
>
> As you may already notice the bug 1732, (
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> c/s 22182.
>
> I see the following attached code in your patch. It is pointless to check
> msi->table_base against the value read from physical device if this function
> is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> by specifications. And for VFs, unless you can read these values from
> corresponding PF, you will have to trust the "table_base" passed from dom0
> via hypercall. Actually, this parameter is specifically introduced for
> enabling SR-IOV.

Quite possible, which would perhaps just require removing some
or all of the warnings. In doing so, it must however be avoided to
introduce new ways for things to go bad silently.

> I am not familiar with this patch and hence its story. But I think it would
> be very simple for you to fix this up?

Not really, no. I had posted this patch as a draft after there was
no reaction on the part of the original implementers of the MSI and
pass-through code to address the security problem we're dealing
with here (and afaict the issue still wasn't completely addressed,
as I don't recall having seen corresponding adjustments to qemu).
I never had hardware to test this with, and hence had to rely on
Yunhong's testing and ack-ing of the patch.

> BTW: I vaguely recall that MSI-X table base might not be the first page of
> the corresponding BAR register.

While I agree that the code is lacking the use of
msix_table_offset_reg(), I would question what else would be
in the range supplied by the BAR, as the specification allows
only MSI-X table and PBA to share a BAR.

This is what I copied from PCI spec 3.0. I don't see that the spec only allows the two to be shared.
-----------------------------PCI----------
To enable system software to map MSI-X structures onto different processor pages for
improved access control, it is recommended that a function dedicate separate Base Address
registers for the MSI-X Table and MSI-X PBA, or else provide more than the minimum
required isolation with address ranges.
If dedicated separate Base Address registers is not feasible, it is recommended that a
function dedicate a single Base Address register for the MSI-X Table and MSI-X PBA.
If a dedicated Base Address register is not feasible, it is recommended that a function isolate
the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges rather than
the mandatory aligned 4 KB ranges. 
--------------------------spec---------------

Jan




_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
In reply to this post by Haitao Shan
>>> On 31.01.11 at 09:52, Haitao Shan <[hidden email]> wrote:
>> > BTW: I vaguely recall that MSI-X table base might not be the first page
>> of
>> > the corresponding BAR register.
>>
>> While I agree that the code is lacking the use of
>> msix_table_offset_reg(), I would question what else would be
>> in the range supplied by the BAR, as the specification allows
>> only MSI-X table and PBA to share a BAR.
>>
>
> This is what I copied from PCI spec 3.0. I don't see that the spec only
> allows the two to be shared.
> -----------------------------PCI----------
> To enable system software to map MSI-X structures onto different processor
> pages for
> improved access control, it is recommended that a function dedicate separate
> Base Address
> registers for the MSI-X Table and MSI-X PBA, or else provide more than the
> minimum
> required isolation with address ranges.
> If dedicated separate Base Address registers is not feasible, it is
> recommended that a
> function dedicate a single Base Address register for the MSI-X Table and
> MSI-X PBA.
> If a dedicated Base Address register is not feasible, it is recommended that
> a function isolate
> the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges
> rather than
> the mandatory aligned 4 KB ranges.
> --------------------------spec---------------

Sorry, it should have been *page* instead of *BAR*. I certainly
can propose a fix to the not-at-offset-zero part of the problem,
but the VF (SR-IOV) specific part (i.e. the determination which
of the warnings can be dropped safely) should be done by
someone more familiar with all aspects of it.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Keir Fraser-4
In reply to this post by Haitao Shan
If the primary symptom is the large number of call traces, then simply
removing the 'offending' WARN_ON() for 4.1 might be a suitable fix. Even if
the root cause is really elsewhere (but we do not have resources to fix it
in time)?

 -- Keir

On 31/01/2011 10:02, "Haitao Shan" <[hidden email]> wrote:

> We are heading Chinese New Year, which typically take 1.5 weeks. I am afraid
> we don't have time slot to make a fix for this.
> Probably Keir can make a judgement on the impact of this bug? And decide
> whether a fix before 4.1 release is needed.
>
> Shan Haitao 
>
> 2011/1/31 Haitao Shan <[hidden email]>
>>
>>
>> 2011/1/31 Jan Beulich <[hidden email]>
>>
>>>>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
>>>> Hi, Jan,
>>>>
>>>> As you may already notice the bug 1732, (
>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit
>>>> is
>>>> c/s 22182.
>>>>
>>>> I see the following attached code in your patch. It is pointless to check
>>>> msi->table_base against the value read from physical device if this
>>>> function
>>>> is a virtual function of SR-IOV device. VFs are required to have BARs
>>>> zeroed
>>>> by specifications. And for VFs, unless you can read these values from
>>>> corresponding PF, you will have to trust the "table_base" passed from dom0
>>>> via hypercall. Actually, this parameter is specifically introduced for
>>>> enabling SR-IOV.
>>>
>>> Quite possible, which would perhaps just require removing some
>>> or all of the warnings. In doing so, it must however be avoided to
>>> introduce new ways for things to go bad silently.
>>>
>>>> I am not familiar with this patch and hence its story. But I think it would
>>>> be very simple for you to fix this up?
>>>
>>> Not really, no. I had posted this patch as a draft after there was
>>> no reaction on the part of the original implementers of the MSI and
>>> pass-through code to address the security problem we're dealing
>>> with here (and afaict the issue still wasn't completely addressed,
>>> as I don't recall having seen corresponding adjustments to qemu).
>>> I never had hardware to test this with, and hence had to rely on
>>> Yunhong's testing and ack-ing of the patch.
>>>
>>>> BTW: I vaguely recall that MSI-X table base might not be the first page of
>>>> the corresponding BAR register.
>>>
>>> While I agree that the code is lacking the use of
>>> msix_table_offset_reg(), I would question what else would be
>>> in the range supplied by the BAR, as the specification allows
>>> only MSI-X table and PBA to share a BAR.
>>
>> This is what I copied from PCI spec 3.0. I don't see that the spec only
>> allows the two to be shared.
>> -----------------------------PCI----------
>> To enable system software to map MSI-X structures onto different processor
>> pages for
>> improved access control, it is recommended that a function dedicate separate
>> Base Address
>> registers for the MSI-X Table and MSI-X PBA, or else provide more than the
>> minimum
>> required isolation with address ranges.
>> If dedicated separate Base Address registers is not feasible, it is
>> recommended that a
>> function dedicate a single Base Address register for the MSI-X Table and
>> MSI-X PBA.
>> If a dedicated Base Address register is not feasible, it is recommended that
>> a function isolate
>> the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges
>> rather than
>> the mandatory aligned 4 KB ranges. 
>> --------------------------spec---------------
>>>
>>> Jan
>>>
>>
>
>



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Haitao Shan
I am wondering whether we can trust msi->table_base for the moment?

Simply removing the warnings can of course suppress the symptom. But assuming MSIX table at pfn 0 and changing its p2m type are a concern to me. But probably this won't trigger anything bad, since this pfn is seldom used?

Shan Haitao 

2011/1/31 Keir Fraser <[hidden email]>
If the primary symptom is the large number of call traces, then simply
removing the 'offending' WARN_ON() for 4.1 might be a suitable fix. Even if
the root cause is really elsewhere (but we do not have resources to fix it
in time)?

 -- Keir

On 31/01/2011 10:02, "Haitao Shan" <[hidden email]> wrote:

> We are heading Chinese New Year, which typically take 1.5 weeks. I am afraid
> we don't have time slot to make a fix for this.
> Probably Keir can make a judgement on the impact of this bug? And decide
> whether a fix before 4.1 release is needed.
>
> Shan Haitao 
>
> 2011/1/31 Haitao Shan <[hidden email]>
>>
>>
>> 2011/1/31 Jan Beulich <[hidden email]>
>>
>>>>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
>>>> Hi, Jan,
>>>>
>>>> As you may already notice the bug 1732, (
>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit
>>>> is
>>>> c/s 22182.
>>>>
>>>> I see the following attached code in your patch. It is pointless to check
>>>> msi->table_base against the value read from physical device if this
>>>> function
>>>> is a virtual function of SR-IOV device. VFs are required to have BARs
>>>> zeroed
>>>> by specifications. And for VFs, unless you can read these values from
>>>> corresponding PF, you will have to trust the "table_base" passed from dom0
>>>> via hypercall. Actually, this parameter is specifically introduced for
>>>> enabling SR-IOV.
>>>
>>> Quite possible, which would perhaps just require removing some
>>> or all of the warnings. In doing so, it must however be avoided to
>>> introduce new ways for things to go bad silently.
>>>
>>>> I am not familiar with this patch and hence its story. But I think it would
>>>> be very simple for you to fix this up?
>>>
>>> Not really, no. I had posted this patch as a draft after there was
>>> no reaction on the part of the original implementers of the MSI and
>>> pass-through code to address the security problem we're dealing
>>> with here (and afaict the issue still wasn't completely addressed,
>>> as I don't recall having seen corresponding adjustments to qemu).
>>> I never had hardware to test this with, and hence had to rely on
>>> Yunhong's testing and ack-ing of the patch.
>>>
>>>> BTW: I vaguely recall that MSI-X table base might not be the first page of
>>>> the corresponding BAR register.
>>>
>>> While I agree that the code is lacking the use of
>>> msix_table_offset_reg(), I would question what else would be
>>> in the range supplied by the BAR, as the specification allows
>>> only MSI-X table and PBA to share a BAR.
>>
>> This is what I copied from PCI spec 3.0. I don't see that the spec only
>> allows the two to be shared.
>> -----------------------------PCI----------
>> To enable system software to map MSI-X structures onto different processor
>> pages for
>> improved access control, it is recommended that a function dedicate separate
>> Base Address
>> registers for the MSI-X Table and MSI-X PBA, or else provide more than the
>> minimum
>> required isolation with address ranges.
>> If dedicated separate Base Address registers is not feasible, it is
>> recommended that a
>> function dedicate a single Base Address register for the MSI-X Table and
>> MSI-X PBA.
>> If a dedicated Base Address register is not feasible, it is recommended that
>> a function isolate
>> the MSI-X structures from the non-MSI-X structures with aligned 8 KB ranges
>> rather than
>> the mandatory aligned 4 KB ranges. 
>> --------------------------spec---------------
>>>
>>> Jan
>>>
>>
>
>




_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 31.01.11 at 13:02, Haitao Shan <[hidden email]> wrote:
> I am wondering whether we can trust msi->table_base for the moment?

But that's already the case. A warning is issued if it doesn't match
what we read from the BAR, but what comes from Dom0 is being
used. The PBA, which doesn't gets passed up from Dom0, gets
determined by reading the device's config space (and is going to
be zero or close to it when the BAR is blank). Xen itself doesn't
use it for anything but trying to write-protect the corresponding
MFN.

> Simply removing the warnings can of course suppress the symptom. But
> assuming MSIX table at pfn 0 and changing its p2m type are a concern to me.
> But probably this won't trigger anything bad, since this pfn is seldom used?

We should not use MFN zero here (and that really is what I
intended the warnings to catch).

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
In reply to this post by Haitao Shan
>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
After taking a closer look:

> As you may already notice the bug 1732, (
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> c/s 22182.

The warnings are a result of the c/s, but if there are functionality
problems, they shouldn't be caused by this: The MSI-X table's base
address was always determined from the value passed from Dom0
(the raw address found in the BAR) plus the table offset as found
in the MSI-X capability structure.

> I see the following attached code in your patch. It is pointless to check
> msi->table_base against the value read from physical device if this function
> is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> by specifications. And for VFs, unless you can read these values from
> corresponding PF, you will have to trust the "table_base" passed from dom0
> via hypercall. Actually, this parameter is specifically introduced for
> enabling SR-IOV.

One important question then is whether there's a way for Xen to
determine the PF for the VF and the correct BAR to use without
additional help from Dom0. If that's not possible, passing down the
BAR contents needed for the PBA base address calculation on a
VF would be necessary, which would require a new sub-hypercall.

The only exception to this would be if both use the same BAR (and
really if that's a common case, a simple initial fix could be to use
the passed down table_base value also for pba_paddr if the two
BIRs match).

In any case I am of the opinion that all of the warnings make
sense currently, with the sole exception of the VF case of the
msi->table_base != read_pci_mem_bar() one (avoiding this
would require Xen to at least have a way to recognize a given
<bus>:<dev>.<func> is a VF).

> BTW: I vaguely recall that MSI-X table base might not be the first page of
> the corresponding BAR register.

Indeed - that's what is being accounted for using table_offset (read
from MSI-X capability structure + msix_table_offset_reg()).

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Stefano Stabellini-3
In reply to this post by Haitao Shan
On Mon, 31 Jan 2011, Haitao Shan wrote:
> We are heading Chinese New Year, which typically take 1.5 weeks.

Oh we didn't think about it.
Considering that we are very keen on getting your fixes, we'll be
slipping the code freeze by at least two weeks.

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Haitao Shan
In reply to this post by Jan Beulich
Hi, Jan,

My carelessness, the patch did not cause any malfunction besides these additional warning messages. :)

What I am thinking about here is:
1> Given a BDF, how can Xen determine whether it is a VFs?
2> If it is really a VF, how can Xen find its PF? For example, if a VF looks like 07:03:0, its PF might be 07:00.0 or 06:00.0 ... 
Perhaps a little assist from Dom0 would be very good.

Another question: What would the purpose of your patch be? I mean, you are trying to remove MSIX table access right for DomUs, or you are also aiming at removing msi->table_base from the trust chain so that guests cannot pass arbitrary address down to Xen?

I guess you patch already addressed the former. The latter does need a reasonable solution but it won't be a blocker for Xen 4.1 release, right?

Shan Haitao

2011/1/31 Jan Beulich <[hidden email]>
>>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
After taking a closer look:

> As you may already notice the bug 1732, (
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> c/s 22182.

The warnings are a result of the c/s, but if there are functionality
problems, they shouldn't be caused by this: The MSI-X table's base
address was always determined from the value passed from Dom0
(the raw address found in the BAR) plus the table offset as found
in the MSI-X capability structure.

> I see the following attached code in your patch. It is pointless to check
> msi->table_base against the value read from physical device if this function
> is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> by specifications. And for VFs, unless you can read these values from
> corresponding PF, you will have to trust the "table_base" passed from dom0
> via hypercall. Actually, this parameter is specifically introduced for
> enabling SR-IOV.

One important question then is whether there's a way for Xen to
determine the PF for the VF and the correct BAR to use without
additional help from Dom0. If that's not possible, passing down the
BAR contents needed for the PBA base address calculation on a
VF would be necessary, which would require a new sub-hypercall.

The only exception to this would be if both use the same BAR (and
really if that's a common case, a simple initial fix could be to use
the passed down table_base value also for pba_paddr if the two
BIRs match).

In any case I am of the opinion that all of the warnings make
sense currently, with the sole exception of the VF case of the
msi->table_base != read_pci_mem_bar() one (avoiding this
would require Xen to at least have a way to recognize a given
<bus>:<dev>.<func> is a VF).

> BTW: I vaguely recall that MSI-X table base might not be the first page of
> the corresponding BAR register.

Indeed - that's what is being accounted for using table_offset (read
from MSI-X capability structure + msix_table_offset_reg()).

Jan



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 01.02.11 at 06:48, Haitao Shan <[hidden email]> wrote:
> What I am thinking about here is:
> 1> Given a BDF, how can Xen determine whether it is a VFs?
> 2> If it is really a VF, how can Xen find its PF? For example, if a VF looks
> like 07:03:0, its PF might be 07:00.0 or 06:00.0 ...
> Perhaps a little assist from Dom0 would be very good.

Getting away without assistance from Dom0 would be much
preferred, but input from your SR-IOV engineers would be
needed on how (if at all) this may be achieved.

> Another question: What would the purpose of your patch be? I mean, you are
> trying to remove MSIX table access right for DomUs, or you are also aiming
> at removing msi->table_base from the trust chain so that guests cannot pass
> arbitrary address down to Xen?

No, passing down the address from Dom0 is fine, but given that
we have to use an alternative approach (reading config space) to
determine the PBA address, it seemed rather reasonable (proven
by what we now see) to verify the value as passed up by Dom0
matches what we would calculate on our own (to provide a hint
at whether the PBA address determination needs any fixing,
which we now know it does).

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Haitao Shan
> Another question: What would the purpose of your patch be? I mean, you are
> trying to remove MSIX table access right for DomUs, or you are also aiming
> at removing msi->table_base from the trust chain so that guests cannot pass
> arbitrary address down to Xen?

No, passing down the address from Dom0 is fine, but given that
we have to use an alternative approach (reading config space) to
determine the PBA address, it seemed rather reasonable (proven
by what we now see) to verify the value as passed up by Dom0
matches what we would calculate on our own (to provide a hint
at whether the PBA address determination needs any fixing,
which we now know it does).
It would be nice that the value passed by dom0 could be verified, I agree. But the reality is that it is quite complex to calculate this value if the function is a VF.
Per my investigation, the process should be:
1. Determine the corresponding PF of an given VF.
    This is already in Xen. But such kind of information is passed to Xen by dom0. Again, do we trust dom0 for this?
2. Look through SRIOV capability in PF to find the actual MEM Bar for this VF.
    This would require Xen access extended PCI configuration space. The mechanism is not available for 32bit Xen. Is this justified to add this to 32bit Xen? I see this as something overkilling. At minimum, I don't see any value of adding so many code to Xen 4.1 instead of removing the unnecessary checking and warning (or necessary but not implementation friendly).

Shan Haitao 
 

Jan



_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 11.02.11 at 07:00, Haitao Shan <[hidden email]> wrote:
>>
>> > Another question: What would the purpose of your patch be? I mean, you
>> are
>> > trying to remove MSIX table access right for DomUs, or you are also
>> aiming
>> > at removing msi->table_base from the trust chain so that guests cannot
>> pass
>> > arbitrary address down to Xen?
>>
>> No, passing down the address from Dom0 is fine, but given that
>> we have to use an alternative approach (reading config space) to
>> determine the PBA address, it seemed rather reasonable (proven
>> by what we now see) to verify the value as passed up by Dom0
>> matches what we would calculate on our own (to provide a hint
>> at whether the PBA address determination needs any fixing,
>> which we now know it does).
>>
> It would be nice that the value passed by dom0 could be verified, I agree.
> But the reality is that it is quite complex to calculate this value if the
> function is a VF.
> Per my investigation, the process should be:
> 1. Determine the corresponding PF of an given VF.
>     This is already in Xen. But such kind of information is passed to Xen by
> dom0. Again, do we trust dom0 for this?

I think trusting Dom0 is the right thing to do - it's being trusted in
other places too.

> 2. Look through SRIOV capability in PF to find the actual MEM Bar for this
> VF.
>     This would require Xen access extended PCI configuration space. The
> mechanism is not available for 32bit Xen. Is this justified to add this to
> 32bit Xen? I see this as something overkilling. At minimum, I don't see any
> value of adding so many code to Xen 4.1 instead of removing the unnecessary
> checking and warning (or necessary but not implementation friendly).

I think we can leave aside 32-bit Xen, as it'll go away soon anyway.
There's other code requiring extended cfg space accesses, so
no reason not to add this instance.

As to "unnecessary checking and warning" I can only repeat that
we know by now that the checking and warning *is* appropriate,
as we currently don't calculate the PBA address correctly. Once
we're sure this is done right for all cases, the warning certainly
can go away (there's no question of checking here, as the value
doesn't get passed from Dom0).

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Li, Xin
> > Per my investigation, the process should be:
> > 1. Determine the corresponding PF of an given VF.
> >     This is already in Xen. But such kind of information is passed to Xen by
> > dom0. Again, do we trust dom0 for this?
>
> I think trusting Dom0 is the right thing to do - it's being trusted in
> other places too.

Yes, dom0 is trusted!
Thanks!
-Xin

_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Gianni Tedesco-2
In reply to this post by Jan Beulich
On Mon, 2011-01-31 at 13:18 +0000, Jan Beulich wrote:

> >>> On 31.01.11 at 05:54, Haitao Shan <[hidden email]> wrote:
> After taking a closer look:
>
> > As you may already notice the bug 1732, (
> > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1732), the culprit is
> > c/s 22182.
>
> The warnings are a result of the c/s, but if there are functionality
> problems, they shouldn't be caused by this: The MSI-X table's base
> address was always determined from the value passed from Dom0
> (the raw address found in the BAR) plus the table offset as found
> in the MSI-X capability structure.

Actually I have some functionality problems which coincide with these
WARN()'s

> > I see the following attached code in your patch. It is pointless to check
> > msi->table_base against the value read from physical device if this function
> > is a virtual function of SR-IOV device. VFs are required to have BARs zeroed
> > by specifications. And for VFs, unless you can read these values from
> > corresponding PF, you will have to trust the "table_base" passed from dom0
> > via hypercall. Actually, this parameter is specifically introduced for
> > enabling SR-IOV.
>
> One important question then is whether there's a way for Xen to
> determine the PF for the VF and the correct BAR to use without
> additional help from Dom0. If that's not possible, passing down the
> BAR contents needed for the PBA base address calculation on a
> VF would be necessary, which would require a new sub-hypercall.

In my case (HVM) it looks like qemu has figured out the correct base
address for the PBA.

> The only exception to this would be if both use the same BAR (and
> really if that's a common case, a simple initial fix could be to use
> the passed down table_base value also for pba_paddr if the two
> BIRs match).
>
> In any case I am of the opinion that all of the warnings make
> sense currently, with the sole exception of the VF case of the
> msi->table_base != read_pci_mem_bar() one (avoiding this
> would require Xen to at least have a way to recognize a given
> <bus>:<dev>.<func> is a VF).

I see

> > BTW: I vaguely recall that MSI-X table base might not be the first page of
> > the corresponding BAR register.
>
> Indeed - that's what is being accounted for using table_offset (read
> from MSI-X capability structure + msix_table_offset_reg()).

In my case the device is ixgbe and yes, it seems to follow the 8KB
aligning recommendation.

The actual symptom I am having is a lot of stuff like this in the guest
with VF passed-through:

ixgbevf: eth: ixgbevf_reset: PF still resetting
ixgbevf: eth: ixgbevf_open: Unable to start - perhaps the PFDriver isn't up yet
ixgbevf: eth: ixgbevf_check_tx_hang: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH, TDT             <0>, <1>
  next_to_use          <1>
  next_to_clean        <0>
tx_buffer_info[next_to_clean]
  time_stamp           <fffd2f6b>
  jiffies              <fffd3db4>
ixgbevf: eth: ixgbevf_clean_tx_irq: tx hang 3 detected, resetting adapter
ixgbevf: eth: ixgbevf_watchdog_task: NIC Link is Up 10 Gbps

And correspondingly no Tx or Rx traffic at all. It all seems very much
like a lack of interrupts, but /proc/interrupts shows good numbers:

201:        146       PCI-MSI-X  eth-rx-0
209:        140       PCI-MSI-X  eth-tx-0
217:          8       PCI-MSI-X  eth:mbx

Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to
be a regression. One other notable change is the assignments of the
MSI-X vectors that I see when hitting the Q debug key:

On 3.4:
(XEN) 04:10.0 - dom 1   - MSIs < 66 74 82 >

On 4.1:
(XEN) 04:10.1 - dom 0   - MSIs < 117 118 119 >

However qemu seems happy with it all in either case:

Mar 15 18:00:30 localhost qemu.1[10344]: pt_register_regions: IO region registered (size=0x00004000 base_addr=0xdd700004)
Mar 15 18:00:30 localhost qemu.1[10344]: pt_register_regions: IO region registered (size=0x00004000 base_addr=0xdd800004)
Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: get MSI-X table bar base dd800000
Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: table_off = 0, total_entries = 3
Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: errno = 2
Mar 15 18:00:30 localhost qemu.1[10344]: pt_msix_init: mapping physical MSI-X table to b5d91000
Mar 15 18:00:30 localhost qemu.1[10344]: register_real_device: Real physical device 04:10.1 registered successfuly! IRQ type = INTx
...
Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 0 with pirq 56 gvec b9
Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 1 with pirq 55 gvec c1
Mar 15 18:01:13 localhost qemu.1[10344]: pt_msix_update_one: Update msix entry 2 with pirq 54 gvec c9

Any ideas?

Gianni


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 15.03.11 at 19:30, Gianni Tedesco <[hidden email]> wrote:
> ixgbevf: eth: ixgbevf_reset: PF still resetting

And nothing interesting in Dom0's logs?

> And correspondingly no Tx or Rx traffic at all. It all seems very much
> like a lack of interrupts, but /proc/interrupts shows good numbers:
>
> 201:        146       PCI-MSI-X  eth-rx-0
> 209:        140       PCI-MSI-X  eth-tx-0
> 217:          8       PCI-MSI-X  eth:mbx

With the above, I'd guess more towards a PF <-> VF communication
problem (which I can say nothing about).

> Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to
> be a regression. One other notable change is the assignments of the
> MSI-X vectors that I see when hitting the Q debug key:
>
> On 3.4:
> (XEN) 04:10.0 - dom 1   - MSIs < 66 74 82 >
>
> On 4.1:
> (XEN) 04:10.1 - dom 0   - MSIs < 117 118 119 >

dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn't
look like a 1:1 comparison to me.

> Any ideas?

Not really. Despite me not thinking that the change in question
(that introduced the WARN_ON()s) has any functionality impact
(it's really only about trying to write protect certain MMIO
ranges, with the WARN_ON()s reporting that this didn't work as
expected) - did you try reverting it (and its follow-up fixes)?

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Gianni Tedesco-2
On Wed, 2011-03-16 at 08:22 +0000, Jan Beulich wrote:
> >>> On 15.03.11 at 19:30, Gianni Tedesco <[hidden email]> wrote:
> > ixgbevf: eth: ixgbevf_reset: PF still resetting
>
> And nothing interesting in Dom0's logs?

Nothing of interest in the kernel messages no.

> > And correspondingly no Tx or Rx traffic at all. It all seems very much
> > like a lack of interrupts, but /proc/interrupts shows good numbers:
> >
> > 201:        146       PCI-MSI-X  eth-rx-0
> > 209:        140       PCI-MSI-X  eth-tx-0
> > 217:          8       PCI-MSI-X  eth:mbx
>
> With the above, I'd guess more towards a PF <-> VF communication
> problem (which I can say nothing about).

Actually, I do get this from the dom0 kernel:

ixgbe: eth5: ixgbe_rcv_msg_from_vf: Set MAC msg received from vf 0

> > Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to
> > be a regression. One other notable change is the assignments of the
> > MSI-X vectors that I see when hitting the Q debug key:
> >
> > On 3.4:
> > (XEN) 04:10.0 - dom 1   - MSIs < 66 74 82 >
> >
> > On 4.1:
> > (XEN) 04:10.1 - dom 0   - MSIs < 117 118 119 >
>
> dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn't
> look like a 1:1 comparison to me.

Yeah they are different machines with the same SR-IOV NIC (similar
enough hardware wise). But the point is the different assigned domains,
bear in mind that in both cases the function in question is assigned to
a guest at the time the debug key was pressed.

> > Any ideas?
>
> Not really. Despite me not thinking that the change in question
> (that introduced the WARN_ON()s) has any functionality impact
> (it's really only about trying to write protect certain MMIO
> ranges, with the WARN_ON()s reporting that this didn't work as
> expected) - did you try reverting it (and its follow-up fixes)?

No change.

Gianni


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Comments on Xen bug 1732

Jan Beulich
>>> On 16.03.11 at 14:50, Gianni Tedesco <[hidden email]> wrote:
> On Wed, 2011-03-16 at 08:22 +0000, Jan Beulich wrote:
>> >>> On 15.03.11 at 19:30, Gianni Tedesco <[hidden email]> wrote:
>> > Furthermore this used to work on xen 3.4 but fails on 4.1 so it seems to
>> > be a regression. One other notable change is the assignments of the
>> > MSI-X vectors that I see when hitting the Q debug key:
>> >
>> > On 3.4:
>> > (XEN) 04:10.0 - dom 1   - MSIs < 66 74 82 >
>> >
>> > On 4.1:
>> > (XEN) 04:10.1 - dom 0   - MSIs < 117 118 119 >
>>
>> dom 1 on 3.4 vs dom 0 on 4.1? And different functions? Doesn't
>> look like a 1:1 comparison to me.
>
> Yeah they are different machines with the same SR-IOV NIC (similar
> enough hardware wise). But the point is the different assigned domains,
> bear in mind that in both cases the function in question is assigned to
> a guest at the time the debug key was pressed.

And even iommu=verbose doesn't produce anything more
informative? Something must be going wrong during the
assignment...

Are the kernels in host and guest exactly the same in both the
3.4 and the 4.1 cases? Using pciback or pci-stub?

>> > Any ideas?
>>
>> Not really. Despite me not thinking that the change in question
>> (that introduced the WARN_ON()s) has any functionality impact
>> (it's really only about trying to write protect certain MMIO
>> ranges, with the WARN_ON()s reporting that this didn't work as
>> expected) - did you try reverting it (and its follow-up fixes)?
>
> No change.

With that, the regression then clearly must be elsewhere, and
I'm afraid we're having to hope that Intel folks will take a look.

Jan


_______________________________________________
Xen-devel mailing list
[hidden email]
http://lists.xensource.com/xen-devel
12
Loading...