Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
freebsd
GitHub Repository: freebsd/freebsd-src
Path: blob/main/sys/netpfil/ipfw/dummynet.txt
39478 views
1
#
2
#
3
4
Notes on the internal structure of dummynet (2010 version)
5
by Riccardo Panicucci and Luigi Rizzo
6
Work supported by the EC project ONELAB2
7
8
9
*********
10
* INDEX *
11
*********
12
Implementation of new dummynet
13
Internal structure
14
Files
15
Packet arrival
16
The reconfiguration routine
17
dummynet_task()
18
Configuration
19
Add a pipe
20
Add a scheduler
21
Add a flowset
22
Listing object
23
Delete of object
24
Delete a pipe
25
Delete a flowset
26
Delete a scheduler
27
Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary
28
ip_dummynet_glue.c
29
ip_fw_glue.c
30
How to configure dummynet
31
How to implement a new scheduler
32
33
34
35
OPEN ISSUES
36
------------------------------
37
20100131 deleting RR causes infinite loop
38
presumably in the rr_free_queue() call -- seems to hang
39
forever when deleting a live flow
40
------------------------------
41
42
Dummynet is a traffic shaper and network emulator. Packets are
43
selected by an external filter such as ipfw, and passed to the emulator
44
with a tag such as "pipe 10" or "queue 5" which tells what to
45
do with the packet. As an example
46
47
ipfw add queue 5 icmp from 10.0.0.2 to all
48
49
All packets with the same tag belong to a "flowset", or a set
50
of flows which can be further partitioned according to a mask.
51
Flowsets are then passed to a scheduler for processing. The
52
association of flowsets and schedulers is configurable e.g.
53
54
ipfw queue 5 config sched 10 weight 3 flow_mask xxxx
55
ipfw queue 8 config sched 10 weight 1 ...
56
ipfw queue 3 config sched 20 weight 1 ...
57
58
"sched 10" represents one or more scheduler instances,
59
selected through a mask on the 5-tuple itself.
60
61
ipfw sched 20 config type FIFO sched_mask yyy ...
62
63
There are in fact two masks applied to each packet:
64
+ the "sched_mask" sends packets arriving to a scheduler_id to
65
one of many instances.
66
+ the "flow_mask" together with the flowset_id is used to
67
collect packets into independent flows on each scheduler.
68
69
As an example, we can have
70
ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff
71
ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00
72
73
means that sched 10 will have one instance per /24 source subnet,
74
and within that, each individual source will be a flow.
75
76
Internal structure
77
-----------------
78
Dummynet-related data is split into several data structures,
79
part of them constituting the userland-kernel API, and others
80
specific to the kernel.
81
NOTE: for up-to-date details please look at the relevant source
82
headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h)
83
84
USERLAND-KERNEL API (ip_dummynet.h)
85
86
struct dn_link:
87
contains data about the physical link such as
88
bandwidth, delay, burst size;
89
90
struct dn_fs:
91
describes a flowset, i.e. a template for queues.
92
Main parameters are the scheduler we attach to, a flow_mask,
93
buckets, queue size, plr, weight, and other scheduler-specific
94
parameters.
95
96
struct dn_flow
97
contains information on a flow, including masks and
98
statistics
99
100
struct dn_sch:
101
defines a scheduler (and a link attached to it).
102
Parameters include scheduler type, sched_mask, number of
103
buckets, and possibly other scheduler-specific parameters,
104
105
struct dn_profile:
106
fields to simulate a delay profile
107
108
109
KERNEL REPRESENTATION (ip_dn_private.h)
110
111
struct mq
112
a queue of mbufs with head and tail.
113
114
struct dn_queue
115
individual queue of packets, created by a flowset using
116
flow_mask and attached to a scheduler instance selected
117
through sched_mask.
118
A dn_queue has a pointer to the dn_fsk (which in turn counts
119
how many queues point to it), a pointer to the
120
dn_sch_inst it attaches to, and is in a hash table in the
121
flowset. scheduler instances also should store queues in
122
their own containers used for scheduling (lists, trees, etc.)
123
CREATE: done on packet arrivals when a flow matches a flowset.
124
DELETE: done only when deleting the parent dn_sch_inst
125
or draining memory.
126
127
struct dn_fsk
128
includes a dn_fs; a pointer to the dn_schk; a link field
129
for the list of dn_fsk attached to the same scheduler,
130
or for the unlinked list;
131
a refcount for the number of queues pointing to it;
132
The dn_fsk is in a hash table, fshash.
133
CREATE: done on configuration commands.
134
DELETE: on configuration commands.
135
136
struct dn_sch_inst
137
a scheduler instance, created from a dn_schk applying sched_mask.
138
Contains a delay line, a reference to the parent, and scheduler-
139
specific info. Both dn_sch_inst and its delay line can be in the
140
evheap if they have events to be processed.
141
CREATE: created from a dn_schk applying sched_mask
142
DELETE: configuration command delete a scheduler which in turn
143
sweeps the hash table of instances deleting them
144
145
struct dn_schk
146
includes dn_sch, dn_link, a pointer to dn_profile,
147
a hash table of dn_sch_inst, a list of dn_fsk
148
attached to it.
149
CREATE: configuration command. If there are flowsets that
150
refer to this number, they are attached and moved
151
to the hash table
152
DELETE: manual, see dn_sch_inst
153
154
155
fshash schedhash
156
+---------------+ sched +--------------+
157
| sched-------------------->| NEW_SCHK|
158
-<----*sch_chain |<-----------------*fsk_list |
159
|NEW_FSK |<----. | [dn_link] |
160
+---------------+ | +--------------+
161
|qht (hash) | | | siht(hash) |
162
| [dn_queue] | | | [dn_si] |
163
| [dn_queue] | | | [dn_si] |
164
| ... | | | ... |
165
| +--------+ | | | +---------+ |
166
| |dn_queue| | | | |dn_si | |
167
| | fs *----------' | | | |
168
| | si *---------------------->| | |
169
| +---------+ | | +---------+ |
170
+---------------+ +--------------+
171
172
The following global data structures contain all
173
schedulers and flowsets.
174
175
- schedhash[x]: contains all scheduler templates in the system.
176
Looked up only on manual configurations, where flowsets
177
are attached to matching schedulers.
178
We have one entry per 'sched X config' command
179
(plus one for each 'pipe X config').
180
181
- fshash[x]: contains all flowsets.
182
We do a lookup on this for each packet.
183
We have one entry for each 'queue X config'
184
(plus one for each 'pipe X config').
185
186
Additionally, a list that contains all unlinked flowset:
187
- fsu: contains flowset that are not linked with any scheduler.
188
Flowset are put in this list when they refer to a non
189
existing scheduler.
190
We don't need an efficient data structure as we never search
191
here on a packet arrivals.
192
193
Scheduler instances and the delay lines associated with each scheduler
194
instance need to be woken up at certain times. Because we have many
195
such objects, we keep them in a priority heap (system_heap).
196
197
Almost all objects in this implementation are preceded by a structure
198
(struct dn_id) which makes it easier to identify them.
199
200
201
Files
202
-----
203
The dummynet code is split in several files.
204
All kernel code is in sys/netpfil/ipfw except ip_dummynet.h
205
All userland code is in sbin/ipfw.
206
Files are
207
- sys/netpfil/ip_dummynet.h defines the kernel-userland API
208
- ip_dn_private.h contains the kernel-specific APIs
209
and data structures
210
- dn_sched.h defines the scheduler API
211
- ip_dummynet.c cointains module glue and sockopt handlers, with all
212
functions to configure and list objects.
213
- ip_dn_io.c contains the functions directly related to packet processing,
214
and run in the critical path. It also contains some functions
215
exported to the schedulers.
216
- dn_heap.[ch] implement a binary heap and a generic hash table
217
- dn_sched_* implement the various scheduler modules
218
219
- dummynet.c is the file used to implement the user side of dummynet.
220
It contains the function to parsing command line, and functions to
221
show the output of dummynet objects.
222
Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that
223
are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and
224
FreeBSD 8.
225
226
LOCKING
227
=======
228
At the moment the entire processing occurs under a single lock
229
which is expected to be acquired in exclusive mode
230
DN_BH_WLOCK() / DN_BH_WUNLOCK().
231
232
In perspective we aim at the following:
233
- the 'busy' flag, 'pending' list and all structures modified by packet
234
arrivals and departures are protected by the BH_WLOCK.
235
This is normally acquired in exclusive mode by the packet processing
236
functions for short sections of code (exception -- the timer).
237
If 'busy' is not set, we can do regular packet processing.
238
If 'busy' is set, no pieces can be accessed.
239
We must enqueue the packet on 'pending' and return immediately.
240
241
- the 'busy' flag is set/cleared by long sections of code as follows:
242
UH_WLOCK(); KASSERT(busy == 0);
243
BH_WLOCK(); busy=1; BH_WUNLOCK();
244
... do processing ...
245
BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK();
246
UH_WUNLOCK();
247
this normally happens when the upper half has something heavy
248
to do. The prologue and epilogue are not in the critical path.
249
250
- the main containers (fshash, schedhash, ...) are protected by
251
UH_WLOCK.
252
253
Packet processing
254
=================
255
A packet enters dummynet through dummynet_io(). We first lookup
256
the flowset number in fshash using dn_ht_find(), then find the scheduler
257
instance using ipdn_si_find(), then possibly identify the correct
258
queue with ipdn_q_find().
259
If successful, we call the scheduler's enqueue function(), and
260
if needed start I/O on the link calling serve_sched().
261
If the packet can be returned immediately, this is done by
262
leaving *m0 set. Otherwise, the packet is absorbed by dummynet
263
and we simply return, possibly with some appropriate error code.
264
265
Reconfiguration
266
---------------
267
Reconfiguration is the complex part of the system because we need to
268
keep track of the various objects and containers.
269
At the moment we do not use reference counts for objects so all
270
processing must be done under a lock.
271
272
The main entry points for configuration is the ip_dn_ctl() handler
273
for the IP_DUMMYNET3 sockopt (others are provided only for backward
274
compatibility). Modifications to the configuration call do_config().
275
The argument is a sequence of blocks each starting with a struct dn_id
276
which specifies its content.
277
The first dn_id must contain as obj.id the DN_API_VERSION
278
The obj.type is DN_CMD_CONFIG (followed by actual objects),
279
DN_CMD_DELETE (with the correct subtype and list of objects), or
280
DN_CMD_FLUSH.
281
282
DN_CMD_CONFIG is followed by objects to add/reconfigure. In general,
283
if an object already exists it is reconfigured, otherwise it is
284
created in a way that keeps the structure consistent.
285
We have the following objects in the system, normally numbered with
286
an identifier N between 1 and 65535. For certain objects we have
287
"shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to
288
implement certain backward compatibility features.
289
290
In general we have the following linking
291
292
TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..."
293
corresponds to a dn_fs object numbered N
294
295
TRADITIONAL DUMMYNET PIPES "pipe N config ..."
296
dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX
297
298
GENERIC SCHEDULER "sched N config ... "
299
[dn_fs N+NMAX] --> dn_sch N --> dn_link N
300
The flowset N+NMAX is created only if the scheduler is not
301
of type MULTIQUEUE.
302
303
DELAY PROFILE "pipe N config profile ..."
304
it is always attached to an existing dn_link N
305
306
Because traditional dummynet pipes actually configure both a
307
'standalone' instance and one that can be used by queues,
308
we do the following:
309
310
"pipe N config ..." configures:
311
dn_sched N type WF2Q+
312
dn_sched N+NMAX type FIFO
313
dn_fs N+2NMAX attached to dn_sched N+NMAX
314
dn_pipe N
315
dn_pipe N+NMAX
316
317
"queue N config" configures
318
dn_fs N
319
320
"sched N config" configures
321
dn_sched N type as desired
322
dn_fs N+NMAX attached to dn_sched N
323
324
325
dummynet_task()
326
===============
327
The dummynet_task() function is the main dummynet processing function and is
328
called every tick. This function first calculate the new current time, then
329
it checks if it is the time to wake up object from the system_heap comparing
330
the current time and the key of the heap. Two types of object (really the
331
heap contains pointer to objects) are in the
332
system_heap:
333
334
- scheduler instance: if a scheduler instance is waked up, the dequeue()
335
function is called until it has credit. If the dequeue() returns packets,
336
the scheduler instance is inserted in the heap with a new key depending of
337
the data that will be send out. If the scheduler instance remains with
338
some credit, it means that is hasn't other packet to send and so the
339
instance is no longer inserted in the heap.
340
341
If the scheduler instance extracted from the heap has the DELETE flag set,
342
the dequeue() is not called and the instance is destroyed now.
343
344
- delay line: when extracting a delay line, the function transmit_event() is
345
called to send out packet from delay line.
346
347
If the scheduler instance associated with this delay line doesn't exists,
348
the delay line will be delete now.
349
350
Configuration
351
=============
352
To create a pipe, queue or scheduler, the user should type commands like:
353
"ipfw pipe x config"
354
"ipfw queue y config pipe x"
355
"ipfw pipe x config sched <type>"
356
357
The userland side of dummynet will prepare a buffer contains data to pass to
358
kernel side.
359
The buffer contains all struct needed to configure an object. In more detail,
360
to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed,
361
plus the delay profile struct if the pipe has a delay profile.
362
363
If configuring a scheduler only the struct dn_sch is wrote in the buffer,
364
while if configuring a flowset only the dn_fs struct is wrote.
365
366
The first struct in the buffer contains the type of command request, that is
367
if it is configuring a pipe, a queue, or a scheduler. Then there are structs
368
need to configure the object, and finally there is the struct that mark
369
the end of the buffer.
370
371
To support the insertion of pipe and queue using the old syntax, when adding
372
a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which
373
have a number x + DN_PIPEOFFSET.
374
375
Add a pipe
376
----------
377
A pipe is only a template for a link.
378
If the pipe already exists, parameters are updated. If a delay profile exists
379
it is deleted and a new one is created.
380
If the pipe doesn't exist a new one is created. After the creation, the
381
flowset unlinked list is scanned to see if there are some flowset that would
382
be linked with this pipe. If so, these flowset will be of wf2q+ type (for
383
compatibility) and a new wf2q+ scheduler is created now.
384
385
Add a scheduler
386
---------------
387
If the scheduler already exists, and the type and the mask are the same, the
388
scheduler is simply reconfigured calling the config_scheduler() scheduler
389
function with the RECONFIGURE flag active.
390
If the type or the mask differ, it is necessary to delete the old scheduler
391
and create a new one.
392
If the scheduler doesn't exists, a new one is created. If the scheduler has
393
a mask, the hash table is created to store pointers to scheduler instances.
394
When a new scheduler is created, it is necessary to scan the unlinked
395
flowset list to search eventually flowset that would be linked with this
396
scheduler number. If some are found, flowsets became of the type of this
397
scheduler and they are configured properly.
398
399
Add a flowset
400
-------------
401
Flowset pointers are store in the system in two list. The unlinked flowset list
402
contains all flowset that aren't linked with a scheduler, the flowset list
403
contains flowset linked to a scheduler, and so they have a type.
404
When adding a new flowset, first it is checked if the flowset exists (that is,
405
it is in the flowset list) and if it doesn't exists a new flowset is created
406
and added to unlinked flowset list if the scheduler which the flowset would be
407
linked doesn't exists, or added in the flowset list and configured properly if
408
the scheduler exists. If the flowset (before to be created) was in the
409
unlinked flowset list, it is removed and deleted, and then recreated.
410
If the flowset exists, to allow reconfiguration of this flowset, the
411
scheduler number and types must match with the one in memory. If this isn't
412
so, the flowset is deleted and a new one will be created. Really, the flowset
413
it isn't deleted now, but it is removed from flowset list and it will be
414
deleted later because there could be some queues that are using it.
415
416
Listing of object
417
=================
418
The user can request a list of object present in dummynet through the command
419
"ipfw [-v] pipe|queue [x] list|show"
420
The kernel side of dummynet send a buffer to user side that contains all
421
pipe, all scheduler, all flowset, plus all scheduler instances and all queues.
422
The dummynet user land will format the output and show only the relevant
423
information.
424
The buffer sent start with all pipe from the system. The entire struct dn_link
425
is passed, except the delay_profile struct that is useless in user space.
426
After pipes, all flowset are wrote in the buffer. The struct contains
427
scheduler flowset specific data is linked with the flowset writing the
428
'obj' id of the extension into the 'alg_fs' pointer.
429
Then schedulers are wrote. If a scheduler has one or more scheduler instance,
430
these are linked to the parent scheduler writing the id of the parent in the
431
'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in
432
the buffer and linked thorugh the 'obj' and 'sched_inst' pointer.
433
Finally, flowsets in the unlinked flowset list are write in the buffer, and
434
then a struct gen in saved in the buffer to mark the last struct in the buffer.
435
436
437
Delete of object
438
================
439
An object is usually removed by user through a command like
440
"ipfw pipe|queue x delete". XXX sched?
441
ipfw pass to the kernel a struct gen that contains the type and the number
442
of the object to remove
443
444
Delete of pipe x
445
----------------
446
A pipe can be deleted by the user through the command 'ipfw pipe x delete'.
447
To delete a pipe, the pipe is removed from the pipe list, and then deleted.
448
Also the scheduler associated with this pipe should be deleted.
449
For compatibility with old dummynet syntax, the associated FIFO scheduler and
450
FIFO flowset must be deleted.
451
452
Delete of flowset x
453
-------------------
454
To remove a flowset, we must be sure that is no longer referenced by any object.
455
If the flowset to remove is in the unlinked flowset list, there is not any
456
issue, the flowset can be safely removed calling a free() (the flowset
457
extension is not yet created if the flowset is in this list).
458
If the flowset is in the flowset list, first we remove from it so new packet
459
are discarded when arrive. Next, the flowset is marked as delete.
460
Now we must check if some queue is using this flowset.
461
To do this, a counter (active_f) is provided. This counter indicate how many
462
queues exist using this flowset.
463
The active_f counter is automatically incremented when a queue is created
464
and decremented when a queue is deleted.
465
If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs()
466
scheduler function is called before deallocate memory.
467
If the counter is not 0, the flowset remain in memory until the counter become
468
zero. When a queue is delete (by dn_delete_queue() function) it is checked if
469
the linked flowset is deleting and if so the counter is decrementing. If the
470
counter reaches 0, the flowset is deleted.
471
The deletion of a queue can be done only by the scheduler, or when the scheduler
472
is destroyed.
473
474
Delete of scheduler x
475
---------------------
476
To delete a scheduler we must be sure that any scheduler instance of this type
477
are in the system_heap. To do so, a counter (inst_counter) is provided.
478
This counter is managed by the system: it is incremented every time it is
479
inserted in the system_heap, and decremented every time it is extracted from it.
480
To delete the scheduler, first we remove it from the scheduler list, so new
481
packet are discarded when they arrive, and mark the scheduler as deleting.
482
483
If the counter is 0, we can remove the scheduler safely calling the
484
really_deletescheduler() function. This function will scan all scheduler
485
instances and call the delete_scheduler_instance() function that will delete
486
the instance. When all instance are deleted, the scheduler template is
487
deleted calling the delete_scheduler_template(). If the delay line associate
488
with the scheduler is empty, it is deleted now, else it will be deleted when
489
it will became empy.
490
If the counter was not 0, we wait for it. Every time the dummynet_task()
491
function extract a scheduler from the system_heap, the counter is decremented.
492
If the scheduler has the delete flag enabled the dequeue() is not called and
493
delete_scheduler_instance() is called to delete the instance.
494
Obviously this scheduler instance is no longer inserted in the system_heap.
495
If the counter reaches 0, the delete_scheduler_template() function is called
496
all memory is released.
497
NOTE: Flowsets that belong to this scheduler are not deleted, so if a new
498
scheduler with the same number is inserted will use these flowsets.
499
To do so, the best approach would be insert these flowset in the
500
unlinked flowset list, but doing this now will be very expensive.
501
So flowsets will remain in memory and linked with a scheduler that no
502
longer exists until a packet belonging to this flowset arrives. When
503
this packet arrives, the reconfigure() function is called because the
504
generation number mismatch with one contains in the flowset and so
505
the flowset will be moved into the flowset unlinked list, or will be
506
linked with the new scheduler if a new one was created.
507
508
509
COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY
510
==========================================================
511
Dummynet is not compatible with old ipfw binary because internal structs are
512
changed. Moreover, the old ipfw binary is not compatible with new kernels
513
because the struct that represents a firewall rule has changed. So, if a user
514
install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other
515
commands) will not work.
516
New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get.
517
The old option can be used to allow compatibility with the 'ipfw' binary of
518
older version (tested with 7.2 and 8.0) of FreeBSD.
519
Two file are provided for this purpose:
520
- ip_dummynet_glue.c translates old dummynet requests to the new ones,
521
- ip_fw_glue.c converts the rule format between 7.2 and 8 versions.
522
Let see in detail these two files.
523
524
IP_DUMMYNET_GLUE.C
525
------------------
526
The internal structs of new dummynet are very different from the original.
527
Because of there are some difference from between dummynet in FreeBSD 7.2 and
528
dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay
529
profile and burst option), I have to include both header files. I copied
530
the revision 191715 (for version 7.2) and the revision 196045 (for version 8)
531
and I appended a number to each struct to mark them.
532
533
The main function of this file is ip_dummynet_compat() that is called by
534
ip_dn_ctl() when it receive a request of old socket option.
535
536
A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using.
537
This variable is set every time a request of configuration is done, because
538
with this request we receive a buffer of which size depending of ipfw version.
539
Because of in general the first action is a configuration, this variable is
540
usually set accordly. If the first action is a request of listing of pipes
541
or queues, the system cannot know the version of ipfw, and we suppose that
542
version 7.2 is used. If version is wrong, the output can be senseless, but
543
the application should not crash.
544
545
There are four request for old dummynet:
546
- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the
547
dummynet_flush() function is called;
548
- IP_DUMMYNET_DEL: the delete option need to be translate.
549
It is only necessary to extract the number and the type of the object
550
(pipe or queue) to delete from the buffer received and build a new struct
551
gen contains the right parameters, then call the delete_object() function;
552
- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of
553
the ipfw version. After the properly extraction of all data, that depends
554
by the ipfw version used, new structures are filled and then the dummynet
555
config_link() function is properly called. Note that the 7.2 version does
556
not support some parameter as burst or delay profile.
557
- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer
558
depending of its version. There are two function that build the
559
corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These
560
functions reproduce the buffer exactly as 'ipfw' expect. The only difference
561
is that the weight parameter for a queue is no longer sent by dummynet and so
562
it is set to 0.
563
Moreover, because of the internal structure has changed, the bucket size
564
of a queue could not be correct, because now all flowset share the hash
565
table.
566
If the version of ipfw is wrong, the output could be senseless or truncated,
567
but the application should not crash.
568
569
IP_FW_GLUE.C
570
------------
571
The ipfw binary also is used to add rules to FreeBSD firewall. Because of the
572
struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary
573
to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel
574
provided with FreeBSD 8.
575
This file contains two functions to convert a rule from FreeBSD 7.2 format to
576
FreeBSD 8 format, and viceversa.
577
The conversion should be done when a rule passes from userspace to kernel space
578
and viceversa.
579
I have to modify the ip_fw2.c file to manage these two case, and added a
580
variable (is7) to store the ipfw version used, using an approach like the
581
previous file:
582
- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the
583
size of the rule received correspond to FreeBSD 7.2 ipfw version. If so, the
584
rule is converted to version 8 calling the function convert_rule_to_8().
585
Moreover, after the insertion of the rule, the rule is now reconverted to
586
version 7 because the ipfw binary will print it.
587
- when the user request a list of rules (option IP_FW_GET) the is7 variable
588
should be set correctly because we suppose that a configure command was done,
589
else we suppose that the FreeBSD version is 8. The function ipfw_getrules()
590
in ip_fw2.c file return all rules, eventually converted to version 7 (if
591
the is7 is set) to the ipfw binary.
592
The conversion of a rule is quite simple. The only difference between the
593
two structures (struct ip_fw) is that in the new there is a new field
594
(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in
595
the right position in the new (or old) struct. The size of commands are not
596
changed, and the copy is done into a cicle.
597
598
How to configure dummynet
599
=========================
600
It is possible to configure dummynet through two main commands:
601
'ipfw pipe' and 'ipfw queue'.
602
To allow compatibility with old version, it is possible configure dummynet
603
using the old command syntax. Doing so, obviously, it is only possible to
604
configure a FIFO scheduler or a wf2q+ scheduler.
605
A new command, 'ipfw pipe x config sched <type>' is supported to add a new
606
scheduler to the system.
607
608
- ipfw pipe x config ...
609
create a new pipe with the link parameters
610
create a new scheduler fifo (x + offset)
611
create a new flowset fifo (x + offset)
612
the mask is eventually stored in the FIFO scheduler
613
614
- ipfw queue y config pipe x ...
615
create a new flowset y linked to sched x.
616
The type of flowset depends by the specified scheduler.
617
If the scheduler does not exist, this flowset is inserted in a special
618
list and will be not active.
619
If pipe x exists and sched does not exist, a new wf2q+ scheduler is
620
created and the flowset will be linked to this new scheduler (this is
621
done for compatibility with old syntax).
622
623
- ipfw pipe x config sched <type> ...
624
create a new scheduler x of type <type>.
625
Search into the flowset unlinked list if there are some flowset that
626
should be linked with this new scheduler.
627
628
- ipfw pipe x delete
629
delete the pipe x
630
delete the scheduler fifo (x + offset)
631
delete the scheduler x
632
delete the flowset fifo (x + offset)
633
634
- ipfw queue x delete
635
delete the flowset x
636
637
- ipfw sched x delete ///XXX
638
delete the scheduler x
639
640
Follow now some examples to how configure dummynet:
641
- Ex1:
642
ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay
643
A FIFO flowset and scheduler is
644
also created
645
ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset
646
will be of wf2q+ because a pipe 10
647
exists. Moreover, the wf2q+
648
scheduler is created now.
649
- Ex2:
650
ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10
651
does not exist, so this flowset
652
is inserted in the unlinked
653
flowset list.
654
ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
655
Because of a flowset with 'pipe 10' exists,
656
a wf2q+ scheduler is created now and that
657
flowset is linked with this sceduler.
658
659
- Ex3:
660
ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
661
ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
662
pipe 10
663
ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset
664
will belong to scheduler 10 and
665
it is of type RR
666
667
- Ex4:
668
ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
669
pipe 10 (not exist yet)
670
ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
671
ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset
672
will belong to scheduler 10 and
673
it is of type RR
674
ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It
675
becomes a wf2q+ scheduler.
676
When a new packet of flowset 5 arrives,
677
the flowset 5 becomes to wf2q+ type.
678
679
How to implement a new scheduler
680
================================
681
In dummynet, a scheduler algorithm is represented by two main structs, some
682
functions and other minor structs.
683
- A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm
684
implemented) contains data relative to scheduler, as global parameter that
685
are common to all instances of the scheduler
686
- A struct dn_sch_inst_xyz contains data relative to a single scheduler
687
instance, as local status variable depending for example by flows that
688
are linked with the scheduler, and so on.
689
To add a scheduler to dummynet, the user should type a command like:
690
'ipfw pipe x config sched <type> [mask ... ...]'
691
This command creates a new struct dn_sch_xyz of type <type>, and
692
store the optional parameter in that struct.
693
694
The parameter mask determines how many scheduler instance of this
695
scheduler may exist. For example, it is possible to divide traffic
696
depending on the source port (or destination, or ip address...),
697
so that every scheduler instance act as an independent scheduler.
698
If the mask is not set, all traffic goes to the same instance.
699
700
When a packet arrives to a scheduler, the system search the corrected
701
scheduler instance, and if it does not exist it is created now (the
702
struct dn_sch_inst_xyz is allocated by the system, and the scheduler
703
fills the field correctly). It is a task of the scheduler to create
704
the struct that contains all queues for a scheduler instance.
705
Dummynet provides some function to create an hash table to store
706
queues, but the schedule algorithm can choice the own struct.
707
708
To link a flow to a scheduler, the user should type a command like:
709
'ipfw queue z config pipe x [mask... ...]'
710
711
This command creates a new 'dn_fs' struct that will be inserted
712
in the system. If the scheduler x exists, this flowset will be
713
linked to that scheduler and the flowset type become the same as
714
the scheduler type. At this point, the function create_alg_fs_xyz()
715
is called to allow store eventually parameter for the flowset that
716
depend by scheduler (for example the 'weight' parameter for a wf2q+
717
scheduler, or some priority...). A parameter mask can be used for
718
a flowset. If the mask parameter is set, the scheduler instance can
719
separate packet according to its flow id (src and dst ip, ports...)
720
and assign it to a separate queue. This is done by the scheduler,
721
so it can ignore the mask if it wants.
722
723
See now the two main structs:
724
struct dn_sch_xyz {
725
struct gen g; /* important the name g */
726
/* global params */
727
};
728
struct dn_sch_inst_xyz {
729
struct gen g; /* important the name g */
730
/* params of the instance */
731
};
732
It is important to embed the struct gen as first parameter. The struct gen
733
contains some values that the scheduler instance must fill (the 'type' of
734
scheduler, the 'len' of the struct...)
735
The function create_scheduler_xyz() should be implemented to initialize global
736
parameters in the first struct, and if memory allocation is done it is
737
mandatory to implement the delete_scheduler_template() function to free that
738
memory.
739
The function create_scheduler_instance_xyz() must be implemented even if the
740
scheduler instance does not use extra parameters. In this function the struct
741
gen fields must be filled with corrected infos. The
742
delete_scheduler_instance_xyz() function must bu implemented if the instance
743
has allocated some memory in the previous function.
744
745
To store data belonging to a flowset the follow struct is used:
746
struct alg_fs_xyz {
747
struct gen g;
748
/* fill correctly the gen struct
749
g.subtype = DN_XYZ;
750
g.len = sizeof(struct alg_fs_xyz)
751
...
752
*/
753
/* params for the flow */
754
};
755
The create_alg_fs_xyz() function is mandatory, because it must fill the struct
756
gen, but the delete_alg_fs_xyz() is mandatory only if the previous function
757
has allocated some memory.
758
759
A struct dn_queue contains packets belonging to a queue and some statistical
760
data. The scheduler could have to store data in this struct, so it must define
761
a dn_queue_xyz struct:
762
struct dn_queue_xyz {
763
struct dn_queue q;
764
/* parameter for a queue */
765
}
766
767
All structures are allocated by the system. To do so, the scheduler must
768
set the size of its structs in the scheduler descriptor:
769
scheduler_size: sizeof(dn_sch_xyz)
770
scheduler_i_size: sizeof(dn_sch_inst_xyz)
771
flowset_size: sizeof(alg_fs_xyz)
772
queue_size: sizeof(dn_queue_xyz);
773
The scheduler_size could be 0, but other struct must have at least a struct gen.
774
775
776
After the definition of structs, it is necessary to implement the
777
scheduler functions.
778
779
- int (*config_scheduler)(char *command, void *sch, int reconfigure);
780
Configure a scheduler, or reconfigure if 'reconfigure' == 1.
781
This function performs additional allocation and initialization of global
782
parameter for this scheduler.
783
If memory is allocated here, the delete_scheduler_template() function
784
should be implemented to remove this memory.
785
- int (*delete_scheduler_template)(void* sch);
786
Delete a scheduler template. This function is mandatory if the scheduler
787
uses extra data respect the struct dn_sch.
788
- int (*create_scheduler_instance)(void *s);
789
Create a new scheduler instance. The system allocate the necessary memory
790
and the schedulet can access it using the 's' pointer.
791
The scheduler instance stores all queues, and to do this can use the
792
hash table provided by the system.
793
- int (*delete_scheduler_instance)(void *s);
794
Delete a scheduler instance. It is important to free memory allocated
795
by create_scheduler_instance() function. The memory allocated by system
796
is freed by the system itself. The struct contains all queue also has
797
to be deleted.
798
- int (*enqueue)(void *s, struct gen *f, struct mbuf *m,
799
struct ipfw_flow_id *id);
800
Called when a packet arrives. The packet 'm' belongs to the scheduler
801
instance 's', has a flowset 'f' and the flowid 'id' has already been
802
masked. The enqueue() must call dn_queue_packet(q, m) function to really
803
enqueue packet in the queue q. The queue 'q' is chosen by the scheduler
804
and if it does not exist should be created calling the dn_create_queue()
805
function. If the schedule want to drop the packet, it must call the
806
dn_drop_packet() function and then return 1.
807
- struct mbuf * (*dequeue)(void *s);
808
Called when the timer expires (or when a packet arrives and the scheduler
809
instance is idle).
810
This function is called when at least a packet can be send out. The
811
scheduler choices the packet and returns it; if no packet are in the
812
schedulerinstance, the function must return NULL.
813
Before return a packet, it is important to call the function
814
dn_return_packet() to update some statistic of the queue and update the
815
queue counters.
816
- int (*drain_queue)(void *s, int flag);
817
The system request to scheduler to delete all queues that is not using
818
to free memory. The flag parameter indicate if a queue must be deleted
819
even if it is active.
820
821
- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure);
822
It is called when a flowset is linked with a scheduler. This is done
823
when the scheduler is defined, so we can know the type of flowset.
824
The function initialize the flowset paramenter parsing the command
825
line. The parameter will be stored in the g struct that have the right
826
size allocated by the system. If the reconfigure flag is set, it means
827
that the flowset is reconfiguring
828
- int (*delete_alg_fs)(struct gen *f);
829
It is called when a flowset is deleting. Must remove the memory allocate
830
by the create_alg_fs() function.
831
832
- int (*create_queue_alg)(struct dn_queue *q, struct gen *f);
833
Called when a queue is created. The function should link the queue
834
to the struct used by the scheduler instance to store all queues.
835
- int (*delete_queue_alg)(struct dn_queue *q);
836
Called when a queue is deleting. The function should remove extra data
837
and update the struct contains all queues in the scheduler instance.
838
839
The struct scheduler represent the scheduler descriptor that is passed to
840
dummynet when a scheduler module is loaded.
841
This struct contains the type of scheduler, the length of all structs and
842
all function pointers.
843
If a function is not implemented should be initialize to NULL. Some functions
844
are mandatory, other are mandatory if some memory should be freed.
845
Mandatory functions:
846
- create_scheduler_instance()
847
- enqueue()
848
- dequeue()
849
- create_alg_fs()
850
- drain_queue()
851
Optional functions:
852
- config_scheduler()
853
- create_queue_alg()
854
Mandatory functions if the corresponding create...() has allocated memory:
855
- delete_scheduler_template()
856
- delete_scheduler_instance()
857
- delete_alg_fs()
858
- delete_queue_alg()
859
860
861