Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
torvalds
GitHub Repository: torvalds/linux
Path: blob/master/arch/powerpc/platforms/powernv/pci-sriov.c
26481 views
1
// SPDX-License-Identifier: GPL-2.0-or-later
2
3
#include <linux/kernel.h>
4
#include <linux/ioport.h>
5
#include <linux/bitmap.h>
6
#include <linux/pci.h>
7
8
#include <asm/opal.h>
9
10
#include "pci.h"
11
12
/*
13
* The majority of the complexity in supporting SR-IOV on PowerNV comes from
14
* the need to put the MMIO space for each VF into a separate PE. Internally
15
* the PHB maps MMIO addresses to a specific PE using the "Memory BAR Table".
16
* The MBT historically only applied to the 64bit MMIO window of the PHB
17
* so it's common to see it referred to as the "M64BT".
18
*
19
* An MBT entry stores the mapped range as an <base>,<mask> pair. This forces
20
* the address range that we want to map to be power-of-two sized and aligned.
21
* For conventional PCI devices this isn't really an issue since PCI device BARs
22
* have the same requirement.
23
*
24
* For a SR-IOV BAR things are a little more awkward since size and alignment
25
* are not coupled. The alignment is set based on the per-VF BAR size, but
26
* the total BAR area is: number-of-vfs * per-vf-size. The number of VFs
27
* isn't necessarily a power of two, so neither is the total size. To fix that
28
* we need to finesse (read: hack) the Linux BAR allocator so that it will
29
* allocate the SR-IOV BARs in a way that lets us map them using the MBT.
30
*
31
* The changes to size and alignment that we need to do depend on the "mode"
32
* of MBT entry that we use. We only support SR-IOV on PHB3 (IODA2) and above,
33
* so as a baseline we can assume that we have the following BAR modes
34
* available:
35
*
36
* NB: $PE_COUNT is the number of PEs that the PHB supports.
37
*
38
* a) A segmented BAR that splits the mapped range into $PE_COUNT equally sized
39
* segments. The n'th segment is mapped to the n'th PE.
40
* b) An un-segmented BAR that maps the whole address range to a specific PE.
41
*
42
*
43
* We prefer to use mode a) since it only requires one MBT entry per SR-IOV BAR
44
* For comparison b) requires one entry per-VF per-BAR, or:
45
* (num-vfs * num-sriov-bars) in total. To use a) we need the size of each segment
46
* to equal the size of the per-VF BAR area. So:
47
*
48
* new_size = per-vf-size * number-of-PEs
49
*
50
* The alignment for the SR-IOV BAR also needs to be changed from per-vf-size
51
* to "new_size", calculated above. Implementing this is a convoluted process
52
* which requires several hooks in the PCI core:
53
*
54
* 1. In pcibios_device_add() we call pnv_pci_ioda_fixup_iov().
55
*
56
* At this point the device has been probed and the device's BARs are sized,
57
* but no resource allocations have been done. The SR-IOV BARs are sized
58
* based on the maximum number of VFs supported by the device and we need
59
* to increase that to new_size.
60
*
61
* 2. Later, when Linux actually assigns resources it tries to make the resource
62
* allocations for each PCI bus as compact as possible. As a part of that it
63
* sorts the BARs on a bus by their required alignment, which is calculated
64
* using pci_resource_alignment().
65
*
66
* For IOV resources this goes:
67
* pci_resource_alignment()
68
* pci_sriov_resource_alignment()
69
* pcibios_sriov_resource_alignment()
70
* pnv_pci_iov_resource_alignment()
71
*
72
* Our hook overrides the default alignment, equal to the per-vf-size, with
73
* new_size computed above.
74
*
75
* 3. When userspace enables VFs for a device:
76
*
77
* sriov_enable()
78
* pcibios_sriov_enable()
79
* pnv_pcibios_sriov_enable()
80
*
81
* This is where we actually allocate PE numbers for each VF and setup the
82
* MBT mapping for each SR-IOV BAR. In steps 1) and 2) we setup an "arena"
83
* where each MBT segment is equal in size to the VF BAR so we can shift
84
* around the actual SR-IOV BAR location within this arena. We need this
85
* ability because the PE space is shared by all devices on the same PHB.
86
* When using mode a) described above segment 0 in maps to PE#0 which might
87
* be already being used by another device on the PHB.
88
*
89
* As a result we need allocate a contigious range of PE numbers, then shift
90
* the address programmed into the SR-IOV BAR of the PF so that the address
91
* of VF0 matches up with the segment corresponding to the first allocated
92
* PE number. This is handled in pnv_pci_vf_resource_shift().
93
*
94
* Once all that is done we return to the PCI core which then enables VFs,
95
* scans them and creates pci_devs for each. The init process for a VF is
96
* largely the same as a normal device, but the VF is inserted into the IODA
97
* PE that we allocated for it rather than the PE associated with the bus.
98
*
99
* 4. When userspace disables VFs we unwind the above in
100
* pnv_pcibios_sriov_disable(). Fortunately this is relatively simple since
101
* we don't need to validate anything, just tear down the mappings and
102
* move SR-IOV resource back to its "proper" location.
103
*
104
* That's how mode a) works. In theory mode b) (single PE mapping) is less work
105
* since we can map each individual VF with a separate BAR. However, there's a
106
* few limitations:
107
*
108
* 1) For IODA2 mode b) has a minimum alignment requirement of 32MB. This makes
109
* it only usable for devices with very large per-VF BARs. Such devices are
110
* similar to Big Foot. They definitely exist, but I've never seen one.
111
*
112
* 2) The number of MBT entries that we have is limited. PHB3 and PHB4 only
113
* 16 total and some are needed for. Most SR-IOV capable network cards can support
114
* more than 16 VFs on each port.
115
*
116
* We use b) when using a) would use more than 1/4 of the entire 64 bit MMIO
117
* window of the PHB.
118
*
119
*
120
*
121
* PHB4 (IODA3) added a few new features that would be useful for SR-IOV. It
122
* allowed the MBT to map 32bit MMIO space in addition to 64bit which allows
123
* us to support SR-IOV BARs in the 32bit MMIO window. This is useful since
124
* the Linux BAR allocation will place any BAR marked as non-prefetchable into
125
* the non-prefetchable bridge window, which is 32bit only. It also added two
126
* new modes:
127
*
128
* c) A segmented BAR similar to a), but each segment can be individually
129
* mapped to any PE. This is matches how the 32bit MMIO window worked on
130
* IODA1&2.
131
*
132
* d) A segmented BAR with 8, 64, or 128 segments. This works similarly to a),
133
* but with fewer segments and configurable base PE.
134
*
135
* i.e. The n'th segment maps to the (n + base)'th PE.
136
*
137
* The base PE is also required to be a multiple of the window size.
138
*
139
* Unfortunately, the OPAL API doesn't currently (as of skiboot v6.6) allow us
140
* to exploit any of the IODA3 features.
141
*/
142
143
static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
144
{
145
struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
146
struct resource *res;
147
int i;
148
resource_size_t vf_bar_sz;
149
struct pnv_iov_data *iov;
150
int mul;
151
152
iov = kzalloc(sizeof(*iov), GFP_KERNEL);
153
if (!iov)
154
goto disable_iov;
155
pdev->dev.archdata.iov_data = iov;
156
mul = phb->ioda.total_pe_num;
157
158
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
159
res = &pdev->resource[i + PCI_IOV_RESOURCES];
160
if (!res->flags || res->parent)
161
continue;
162
if (!pnv_pci_is_m64_flags(res->flags)) {
163
dev_warn(&pdev->dev, "Don't support SR-IOV with non M64 VF BAR%d: %pR. \n",
164
i, res);
165
goto disable_iov;
166
}
167
168
vf_bar_sz = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
169
170
/*
171
* Generally, one segmented M64 BAR maps one IOV BAR. However,
172
* if a VF BAR is too large we end up wasting a lot of space.
173
* If each VF needs more than 1/4 of the default m64 segment
174
* then each VF BAR should be mapped in single-PE mode to reduce
175
* the amount of space required. This does however limit the
176
* number of VFs we can support.
177
*
178
* The 1/4 limit is arbitrary and can be tweaked.
179
*/
180
if (vf_bar_sz > (phb->ioda.m64_segsize >> 2)) {
181
/*
182
* On PHB3, the minimum size alignment of M64 BAR in
183
* single mode is 32MB. If this VF BAR is smaller than
184
* 32MB, but still too large for a segmented window
185
* then we can't map it and need to disable SR-IOV for
186
* this device.
187
*/
188
if (vf_bar_sz < SZ_32M) {
189
pci_err(pdev, "VF BAR%d: %pR can't be mapped in single PE mode\n",
190
i, res);
191
goto disable_iov;
192
}
193
194
iov->m64_single_mode[i] = true;
195
continue;
196
}
197
198
/*
199
* This BAR can be mapped with one segmented window, so adjust
200
* te resource size to accommodate.
201
*/
202
pci_dbg(pdev, " Fixing VF BAR%d: %pR to\n", i, res);
203
res->end = res->start + vf_bar_sz * mul - 1;
204
pci_dbg(pdev, " %pR\n", res);
205
206
pci_info(pdev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
207
i, res, mul);
208
209
iov->need_shift = true;
210
}
211
212
return;
213
214
disable_iov:
215
/* Save ourselves some MMIO space by disabling the unusable BARs */
216
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
217
res = &pdev->resource[i + PCI_IOV_RESOURCES];
218
res->flags = 0;
219
res->end = res->start - 1;
220
}
221
222
pdev->dev.archdata.iov_data = NULL;
223
kfree(iov);
224
}
225
226
void pnv_pci_ioda_fixup_iov(struct pci_dev *pdev)
227
{
228
if (pdev->is_virtfn) {
229
struct pnv_ioda_pe *pe = pnv_ioda_get_pe(pdev);
230
231
/*
232
* VF PEs are single-device PEs so their pdev pointer needs to
233
* be set. The pdev doesn't exist when the PE is allocated (in
234
* (pcibios_sriov_enable()) so we fix it up here.
235
*/
236
pe->pdev = pdev;
237
WARN_ON(!(pe->flags & PNV_IODA_PE_VF));
238
} else if (pdev->is_physfn) {
239
/*
240
* For PFs adjust their allocated IOV resources to match what
241
* the PHB can support using its M64 BAR table.
242
*/
243
pnv_pci_ioda_fixup_iov_resources(pdev);
244
}
245
}
246
247
resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
248
int resno)
249
{
250
resource_size_t align = pci_iov_resource_size(pdev, resno);
251
struct pnv_phb *phb = pci_bus_to_pnvhb(pdev->bus);
252
struct pnv_iov_data *iov = pnv_iov_get(pdev);
253
254
/*
255
* iov can be null if we have an SR-IOV device with IOV BAR that can't
256
* be placed in the m64 space (i.e. The BAR is 32bit or non-prefetch).
257
* In that case we don't allow VFs to be enabled since one of their
258
* BARs would not be placed in the correct PE.
259
*/
260
if (!iov)
261
return align;
262
263
/*
264
* If we're using single mode then we can just use the native VF BAR
265
* alignment. We validated that it's possible to use a single PE
266
* window above when we did the fixup.
267
*/
268
if (iov->m64_single_mode[resno - PCI_IOV_RESOURCES])
269
return align;
270
271
/*
272
* On PowerNV platform, IOV BAR is mapped by M64 BAR to enable the
273
* SR-IOV. While from hardware perspective, the range mapped by M64
274
* BAR should be size aligned.
275
*
276
* This function returns the total IOV BAR size if M64 BAR is in
277
* Shared PE mode or just VF BAR size if not.
278
* If the M64 BAR is in Single PE mode, return the VF BAR size or
279
* M64 segment size if IOV BAR size is less.
280
*/
281
return phb->ioda.total_pe_num * align;
282
}
283
284
static int pnv_pci_vf_release_m64(struct pci_dev *pdev, u16 num_vfs)
285
{
286
struct pnv_iov_data *iov;
287
struct pnv_phb *phb;
288
int window_id;
289
290
phb = pci_bus_to_pnvhb(pdev->bus);
291
iov = pnv_iov_get(pdev);
292
293
for_each_set_bit(window_id, iov->used_m64_bar_mask, MAX_M64_BARS) {
294
opal_pci_phb_mmio_enable(phb->opal_id,
295
OPAL_M64_WINDOW_TYPE,
296
window_id,
297
0);
298
299
clear_bit(window_id, &phb->ioda.m64_bar_alloc);
300
}
301
302
return 0;
303
}
304
305
306
/*
307
* PHB3 and beyond support segmented windows. The window's address range
308
* is subdivided into phb->ioda.total_pe_num segments and there's a 1-1
309
* mapping between PEs and segments.
310
*/
311
static int64_t pnv_ioda_map_m64_segmented(struct pnv_phb *phb,
312
int window_id,
313
resource_size_t start,
314
resource_size_t size)
315
{
316
int64_t rc;
317
318
rc = opal_pci_set_phb_mem_window(phb->opal_id,
319
OPAL_M64_WINDOW_TYPE,
320
window_id,
321
start,
322
0, /* unused */
323
size);
324
if (rc)
325
goto out;
326
327
rc = opal_pci_phb_mmio_enable(phb->opal_id,
328
OPAL_M64_WINDOW_TYPE,
329
window_id,
330
OPAL_ENABLE_M64_SPLIT);
331
out:
332
if (rc)
333
pr_err("Failed to map M64 window #%d: %lld\n", window_id, rc);
334
335
return rc;
336
}
337
338
static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
339
int pe_num,
340
int window_id,
341
resource_size_t start,
342
resource_size_t size)
343
{
344
int64_t rc;
345
346
/*
347
* The API for setting up m64 mmio windows seems to have been designed
348
* with P7-IOC in mind. For that chip each M64 BAR (window) had a fixed
349
* split of 8 equally sized segments each of which could individually
350
* assigned to a PE.
351
*
352
* The problem with this is that the API doesn't have any way to
353
* communicate the number of segments we want on a BAR. This wasn't
354
* a problem for p7-ioc since you didn't have a choice, but the
355
* single PE windows added in PHB3 don't map cleanly to this API.
356
*
357
* As a result we've got this slightly awkward process where we
358
* call opal_pci_map_pe_mmio_window() to put the single in single
359
* PE mode, and set the PE for the window before setting the address
360
* bounds. We need to do it this way because the single PE windows
361
* for PHB3 have different alignment requirements on PHB3.
362
*/
363
rc = opal_pci_map_pe_mmio_window(phb->opal_id,
364
pe_num,
365
OPAL_M64_WINDOW_TYPE,
366
window_id,
367
0);
368
if (rc)
369
goto out;
370
371
/*
372
* NB: In single PE mode the window needs to be aligned to 32MB
373
*/
374
rc = opal_pci_set_phb_mem_window(phb->opal_id,
375
OPAL_M64_WINDOW_TYPE,
376
window_id,
377
start,
378
0, /* ignored by FW, m64 is 1-1 */
379
size);
380
if (rc)
381
goto out;
382
383
/*
384
* Now actually enable it. We specified the BAR should be in "non-split"
385
* mode so FW will validate that the BAR is in single PE mode.
386
*/
387
rc = opal_pci_phb_mmio_enable(phb->opal_id,
388
OPAL_M64_WINDOW_TYPE,
389
window_id,
390
OPAL_ENABLE_M64_NON_SPLIT);
391
out:
392
if (rc)
393
pr_err("Error mapping single PE BAR\n");
394
395
return rc;
396
}
397
398
static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
399
{
400
int win;
401
402
do {
403
win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
404
phb->ioda.m64_bar_idx + 1, 0);
405
406
if (win >= phb->ioda.m64_bar_idx + 1)
407
return -1;
408
} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
409
410
set_bit(win, iov->used_m64_bar_mask);
411
412
return win;
413
}
414
415
static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 num_vfs)
416
{
417
struct pnv_iov_data *iov;
418
struct pnv_phb *phb;
419
int win;
420
struct resource *res;
421
int i, j;
422
int64_t rc;
423
resource_size_t size, start;
424
int base_pe_num;
425
426
phb = pci_bus_to_pnvhb(pdev->bus);
427
iov = pnv_iov_get(pdev);
428
429
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
430
res = &pdev->resource[i + PCI_IOV_RESOURCES];
431
if (!res->flags || !res->parent)
432
continue;
433
434
/* don't need single mode? map everything in one go! */
435
if (!iov->m64_single_mode[i]) {
436
win = pnv_pci_alloc_m64_bar(phb, iov);
437
if (win < 0)
438
goto m64_failed;
439
440
size = resource_size(res);
441
start = res->start;
442
443
rc = pnv_ioda_map_m64_segmented(phb, win, start, size);
444
if (rc)
445
goto m64_failed;
446
447
continue;
448
}
449
450
/* otherwise map each VF with single PE BARs */
451
size = pci_iov_resource_size(pdev, PCI_IOV_RESOURCES + i);
452
base_pe_num = iov->vf_pe_arr[0].pe_number;
453
454
for (j = 0; j < num_vfs; j++) {
455
win = pnv_pci_alloc_m64_bar(phb, iov);
456
if (win < 0)
457
goto m64_failed;
458
459
start = res->start + size * j;
460
rc = pnv_ioda_map_m64_single(phb, win,
461
base_pe_num + j,
462
start,
463
size);
464
if (rc)
465
goto m64_failed;
466
}
467
}
468
return 0;
469
470
m64_failed:
471
pnv_pci_vf_release_m64(pdev, num_vfs);
472
return -EBUSY;
473
}
474
475
static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
476
{
477
struct pnv_phb *phb;
478
struct pnv_ioda_pe *pe, *pe_n;
479
480
phb = pci_bus_to_pnvhb(pdev->bus);
481
482
if (!pdev->is_physfn)
483
return;
484
485
/* FIXME: Use pnv_ioda_release_pe()? */
486
list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
487
if (pe->parent_dev != pdev)
488
continue;
489
490
pnv_pci_ioda2_release_pe_dma(pe);
491
492
/* Remove from list */
493
mutex_lock(&phb->ioda.pe_list_mutex);
494
list_del(&pe->list);
495
mutex_unlock(&phb->ioda.pe_list_mutex);
496
497
pnv_ioda_deconfigure_pe(phb, pe);
498
499
pnv_ioda_free_pe(pe);
500
}
501
}
502
503
static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
504
{
505
struct resource *res, res2;
506
struct pnv_iov_data *iov;
507
resource_size_t size;
508
u16 num_vfs;
509
int i;
510
511
if (!dev->is_physfn)
512
return -EINVAL;
513
iov = pnv_iov_get(dev);
514
515
/*
516
* "offset" is in VFs. The M64 windows are sized so that when they
517
* are segmented, each segment is the same size as the IOV BAR.
518
* Each segment is in a separate PE, and the high order bits of the
519
* address are the PE number. Therefore, each VF's BAR is in a
520
* separate PE, and changing the IOV BAR start address changes the
521
* range of PEs the VFs are in.
522
*/
523
num_vfs = iov->num_vfs;
524
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
525
res = &dev->resource[i + PCI_IOV_RESOURCES];
526
if (!res->flags || !res->parent)
527
continue;
528
if (iov->m64_single_mode[i])
529
continue;
530
531
/*
532
* The actual IOV BAR range is determined by the start address
533
* and the actual size for num_vfs VFs BAR. This check is to
534
* make sure that after shifting, the range will not overlap
535
* with another device.
536
*/
537
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
538
res2.flags = res->flags;
539
res2.start = res->start + (size * offset);
540
res2.end = res2.start + (size * num_vfs) - 1;
541
542
if (res2.end > res->end) {
543
dev_err(&dev->dev, "VF BAR%d: %pR would extend past %pR (trying to enable %d VFs shifted by %d)\n",
544
i, &res2, res, num_vfs, offset);
545
return -EBUSY;
546
}
547
}
548
549
/*
550
* Since M64 BAR shares segments among all possible 256 PEs,
551
* we have to shift the beginning of PF IOV BAR to make it start from
552
* the segment which belongs to the PE number assigned to the first VF.
553
* This creates a "hole" in the /proc/iomem which could be used for
554
* allocating other resources so we reserve this area below and
555
* release when IOV is released.
556
*/
557
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
558
res = &dev->resource[i + PCI_IOV_RESOURCES];
559
if (!res->flags || !res->parent)
560
continue;
561
if (iov->m64_single_mode[i])
562
continue;
563
564
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
565
res2 = *res;
566
res->start += size * offset;
567
568
dev_info(&dev->dev, "VF BAR%d: %pR shifted to %pR (%sabling %d VFs shifted by %d)\n",
569
i, &res2, res, (offset > 0) ? "En" : "Dis",
570
num_vfs, offset);
571
572
if (offset < 0) {
573
devm_release_resource(&dev->dev, &iov->holes[i]);
574
memset(&iov->holes[i], 0, sizeof(iov->holes[i]));
575
}
576
577
pci_update_resource(dev, i + PCI_IOV_RESOURCES);
578
579
if (offset > 0) {
580
iov->holes[i].start = res2.start;
581
iov->holes[i].end = res2.start + size * offset - 1;
582
iov->holes[i].flags = IORESOURCE_BUS;
583
iov->holes[i].name = "pnv_iov_reserved";
584
devm_request_resource(&dev->dev, res->parent,
585
&iov->holes[i]);
586
}
587
}
588
return 0;
589
}
590
591
static void pnv_pci_sriov_disable(struct pci_dev *pdev)
592
{
593
u16 num_vfs, base_pe;
594
struct pnv_iov_data *iov;
595
596
iov = pnv_iov_get(pdev);
597
if (WARN_ON(!iov))
598
return;
599
600
num_vfs = iov->num_vfs;
601
base_pe = iov->vf_pe_arr[0].pe_number;
602
603
/* Release VF PEs */
604
pnv_ioda_release_vf_PE(pdev);
605
606
/* Un-shift the IOV BARs if we need to */
607
if (iov->need_shift)
608
pnv_pci_vf_resource_shift(pdev, -base_pe);
609
610
/* Release M64 windows */
611
pnv_pci_vf_release_m64(pdev, num_vfs);
612
}
613
614
static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
615
{
616
struct pnv_phb *phb;
617
struct pnv_ioda_pe *pe;
618
int pe_num;
619
u16 vf_index;
620
struct pnv_iov_data *iov;
621
struct pci_dn *pdn;
622
623
if (!pdev->is_physfn)
624
return;
625
626
phb = pci_bus_to_pnvhb(pdev->bus);
627
pdn = pci_get_pdn(pdev);
628
iov = pnv_iov_get(pdev);
629
630
/* Reserve PE for each VF */
631
for (vf_index = 0; vf_index < num_vfs; vf_index++) {
632
int vf_devfn = pci_iov_virtfn_devfn(pdev, vf_index);
633
int vf_bus = pci_iov_virtfn_bus(pdev, vf_index);
634
struct pci_dn *vf_pdn;
635
636
pe = &iov->vf_pe_arr[vf_index];
637
pe->phb = phb;
638
pe->flags = PNV_IODA_PE_VF;
639
pe->pbus = NULL;
640
pe->parent_dev = pdev;
641
pe->mve_number = -1;
642
pe->rid = (vf_bus << 8) | vf_devfn;
643
644
pe_num = pe->pe_number;
645
pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n",
646
pci_domain_nr(pdev->bus), pdev->bus->number,
647
PCI_SLOT(vf_devfn), PCI_FUNC(vf_devfn), pe_num);
648
649
if (pnv_ioda_configure_pe(phb, pe)) {
650
/* XXX What do we do here ? */
651
pnv_ioda_free_pe(pe);
652
pe->pdev = NULL;
653
continue;
654
}
655
656
/* Put PE to the list */
657
mutex_lock(&phb->ioda.pe_list_mutex);
658
list_add_tail(&pe->list, &phb->ioda.pe_list);
659
mutex_unlock(&phb->ioda.pe_list_mutex);
660
661
/* associate this pe to its pdn */
662
list_for_each_entry(vf_pdn, &pdn->parent->child_list, list) {
663
if (vf_pdn->busno == vf_bus &&
664
vf_pdn->devfn == vf_devfn) {
665
vf_pdn->pe_number = pe_num;
666
break;
667
}
668
}
669
670
pnv_pci_ioda2_setup_dma_pe(phb, pe);
671
}
672
}
673
674
static int pnv_pci_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
675
{
676
struct pnv_ioda_pe *base_pe;
677
struct pnv_iov_data *iov;
678
struct pnv_phb *phb;
679
int ret;
680
u16 i;
681
682
phb = pci_bus_to_pnvhb(pdev->bus);
683
iov = pnv_iov_get(pdev);
684
685
/*
686
* There's a calls to IODA2 PE setup code littered throughout. We could
687
* probably fix that, but we'd still have problems due to the
688
* restriction inherent on IODA1 PHBs.
689
*
690
* NB: We class IODA3 as IODA2 since they're very similar.
691
*/
692
if (phb->type != PNV_PHB_IODA2) {
693
pci_err(pdev, "SR-IOV is not supported on this PHB\n");
694
return -ENXIO;
695
}
696
697
if (!iov) {
698
dev_info(&pdev->dev, "don't support this SRIOV device with non 64bit-prefetchable IOV BAR\n");
699
return -ENOSPC;
700
}
701
702
/* allocate a contiguous block of PEs for our VFs */
703
base_pe = pnv_ioda_alloc_pe(phb, num_vfs);
704
if (!base_pe) {
705
pci_err(pdev, "Unable to allocate PEs for %d VFs\n", num_vfs);
706
return -EBUSY;
707
}
708
709
iov->vf_pe_arr = base_pe;
710
iov->num_vfs = num_vfs;
711
712
/* Assign M64 window accordingly */
713
ret = pnv_pci_vf_assign_m64(pdev, num_vfs);
714
if (ret) {
715
dev_info(&pdev->dev, "Not enough M64 window resources\n");
716
goto m64_failed;
717
}
718
719
/*
720
* When using one M64 BAR to map one IOV BAR, we need to shift
721
* the IOV BAR according to the PE# allocated to the VFs.
722
* Otherwise, the PE# for the VF will conflict with others.
723
*/
724
if (iov->need_shift) {
725
ret = pnv_pci_vf_resource_shift(pdev, base_pe->pe_number);
726
if (ret)
727
goto shift_failed;
728
}
729
730
/* Setup VF PEs */
731
pnv_ioda_setup_vf_PE(pdev, num_vfs);
732
733
return 0;
734
735
shift_failed:
736
pnv_pci_vf_release_m64(pdev, num_vfs);
737
738
m64_failed:
739
for (i = 0; i < num_vfs; i++)
740
pnv_ioda_free_pe(&iov->vf_pe_arr[i]);
741
742
return ret;
743
}
744
745
int pnv_pcibios_sriov_disable(struct pci_dev *pdev)
746
{
747
pnv_pci_sriov_disable(pdev);
748
749
/* Release PCI data */
750
remove_sriov_vf_pdns(pdev);
751
return 0;
752
}
753
754
int pnv_pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
755
{
756
/* Allocate PCI data */
757
add_sriov_vf_pdns(pdev);
758
759
return pnv_pci_sriov_enable(pdev, num_vfs);
760
}
761
762