-
Notifications
You must be signed in to change notification settings - Fork 1
/
report.txt
414 lines (285 loc) · 16.5 KB
/
report.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
% Loadable Kernel Module based Fibers
% Beatrice Bevilacqua, Anxhelo Xhebraj
% September 2018
![Project overview](images/diagram.png){width=100%}
Introduction
------------
Coroutines are a programming paradigm that offers time-sharing
non-preemptive multitasking which can be used to implement many patterns
such as the Actor model, State machines and Communicating Sequential Processes.
This concept has influenced many programming languages such as Go and Clojure
especially for handling asynchronous I/O operations.
In this document we present a Loadable Kernel Module (LKM) implementation
taking as reference the Fibers implementation offered by Windows NT and
ReactOS. While user space implementations are generally preferred for their
low over-head and easier debugging, a kernel space implementation permits to
understand more deeply how kernel subsystems work.
The project is composed of two main parts:
- a LKM that implements the
facilities to perform the yielding of a fiber and managing their context with
an additional component that extends `procfs` to provide information about
fibers under `/proc/<tgid>`
- a user space library to invoke the previously
mentioned facilities
The code has been tested from kernel 4.16 up [^1].
Interface
---------
The entry points to kernel space code are exported by the library which
performs `ioctl()` calls to a special *[miscdevice]* registered by the module
at load time. Differently from "typical" character devices, miscdevices are
more suited for implementing just an entry point to kernel facilities than
reserving a full range of minor numbers representing actual devices.
**`module/fibers.c`**
~~~~ {caption="module/fibers.c" .lineAnchors include=module/fibers.c
startLine=10 endLine=22 .numberLines startFrom="10" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/fibers.c" .lineAnchors include=module/fibers.c
startLine=39 endLine=51 .numberLines startFrom="39" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After the load of the module a new file called `fibers` is exposed in `/dev`
and the `.fibers` folder appears under `/proc`.
The file operations implemented for such device are `open`, `release` and
`ioctl` which are called upon the respective system calls invokations on
the file through the functions offered by the library.
**`lib/fibers.c`**
~~~~ {caption="lib/fibers.c" .lineAnchors include=lib/fibers.c
startLine=4 endLine=6 .numberLines startFrom="4" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="lib/fibers.c" .lineAnchors include=lib/fibers.c
startLine=20 endLine=39 .numberLines startFrom="20" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The library provided defines a constructor function which
opens the file thus instantiating one *file descriptor*[^2] designated to
communicate with the module. Since a file descriptor represents a pool of
fibers all the threads belonging to the process are allowed to interact and
execute fibers of such process. Moreover, the code can easily be extended
to support multiple opens of the `fibers` file to create separate pools of
fibers and thus implement different scheduling strategies such as assigning
different pools to different threads.
In order to avoid unintended sharing of fibers pool from parent process to
child after a `fork()` caused by file descriptor inheritance, an *atfork*
function is registered to be run on the child after a fork, which closes the
parent descriptor and reopens the file.
Finally, in the library there are respectively implemented, based on `ioctl`
calls on the `__fibers_file` file descriptor, the functions to
convert a thread to fiber, create a new fiber, yielding from a fiber to
another one and allocate and manage fixed size memory (Fiber Local Storage).
**`include/lib/fibers.h`**
~~~~ {caption="include/lib/fibers.h" .lineAnchors include=include/lib/fibers.h
startLine=15 endLine=22 .numberLines startFrom="15" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All these functions prepare data structures to be passed to their relative
kernel space implementations and invoke the following wrapper of the `ioctl`
system call. The code saves the callee-save register in case of switching from
one fiber to another since they are not saved by the Linux system call
dispatcher. This is needed in order to correctly save and restore the execution
context of a fiber otherwise it would not be possible to obtain the state of
such registers when the system call is performed in kernel space.
**`lib/fibers.c`**
~~~~ {caption="lib/fibers.c" .lineAnchors include=lib/fibers.c
startLine=55 endLine=81 .numberLines startFrom="55" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Module
------
**`include/module/common.h`**
~~~~ {caption="include/module/common.h" .lineAnchors
include=include/module/common.h
startLine=34 endLine=39 .numberLines startFrom="34" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When the constructor defined by the library is executed, the `open` file
operation is performed which initializes a `struct fibers_data` that keeps the
per process information such as:
- `fibers_pool`: an *[idr]* data structure that maintains the fibers of the
process. Internally it uses a [radix tree], making efficient the assignment
of an id to fibers and later retrieving them by the same id
- `base`: the directory entry associated to `/proc/.fibers/<tgid>`, a directory
keeping one file for each fiber created by the process. As explained
[later](#procfs-extension) a symlink (`fibers`) to such directory is added
under `/proc/<tgid>`
- `fls`: an array keeping the allocation units provided to the fibers
- `bitmap`: bitmap to manage the allocation and deallocation of the
aforementioned units
The struct is then stored in the `private_data` field of the file descriptor
which is a free field that can be used by device drivers for such purposes to
later retrieve it on `ioctl`.
These data structures are finally freed on the `release` operation of the file
which is called when the `struct file` associated to the file is going to be
freed by the kernel, i.e. on close of the file or exit/fault of the process.
The fibers' pool maintains structures of the following type.
**`include/module/common.h`**
~~~~ {caption="include/module/common.h" .lineAnchors
include=include/module/common.h
startLine=20 endLine=32 .numberLines startFrom="20" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The `entry_point` is the address of the first instruction that a fiber will
execute on first schedule
- The `exec_context` and `fpuregs` fields are used to keep all the
registers content to properly maintain the state of a fiber for context
switching
- The `state` field is used to avoid a fiber being run by two threads
concurrently
- The additional fields are used for statistics about the fibers which the user
can retrieve from the file `/proc/<tgid>/fibers/<fiber_id>`
When the `ioctl` file operation is invoked, it calls the proper kernel
space function to handle the user space request.
### Fiber creation
A fiber data structure is created in case of a `create_fiber` or a `to_fiber`
call. The latter is only used to let the module acknowledge that some thread
can switch to other fibers and execution can resume at such thread.
In brief the steps taken are:
- Allocation of a `struct fiber_struct`, initialization of its fields and
insertion in `fibers_pool` protected through *[rcu]*.
In the case of `create_fiber` also the following actions are taken to
set the right instruction pointer, stack pointer and parameter which are
given by the userspace
**`module/fibers_api.c`**
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=49 endLine=58 .numberLines startFrom="49" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The stack is allocated by the userspace library as follows
**`lib/fibers.c`**
~~~~ {caption="lib/fibers.c" .lineAnchors
include=lib/fibers.c
startLine=92 endLine=110 .numberLines startFrom="92" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- When adding the fiber in `fibers_pool` also a file under
`/proc/.fibers/<tgid>` is created with name the id of the fiber.
In the field `data` of `proc_dir_entry` corresponding to
`/proc/.fibers/<tgid>/<fiber_id>` is stored a pointer to the associated
`struct fiber_struct` to easily retrieve its statistics and then display them
to the user.
- Since it is needed to know which fiber is running on a given thread, at the
bottom of the kernel stack, immediatly above the `struct thread_info`, is
stored the pointer to the `struct fiber_struct`.
.
**`include/module/fibers_api.h`**
~~~~ {caption="include/module/fibers_api.h" .lineAnchors
include=include/module/fibers_api.h
startLine=23 endLine=27 .numberLines startFrom="22" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The macro shown above acts as a per-thread kernel variable.
Therefore `to_fiber` sets `current_fiber` as follows
**`module/fibers_api.c:to_fiber`**
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=114 endLine=116 .numberLines startFrom="114" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
### Fiber switching
**`module/fibers_api.c`**
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=154 endLine=159 .numberLines startFrom="154" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=177 endLine=181 .numberLines startFrom="177" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=191 endLine=206 .numberLines startFrom="191" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First it is checked whether the fiber is already running through an atomic
test-and-set on the state of the fiber, in which case the
switch simply fails. The switch also fails in case the caller has not
performed `to_fiber` previously. This is ensured by checking whether
`current_fiber` is zero which is safe only if the kernel allocates a
zeroed stack to a thread and no stack overflow has happened.
Then the switch is performed by using the `struct pt_regs` found at the top of
the kernel stack which has been previously pushed by the Linux system call
dispatcher. By changing its fields and setting them to the values of the fiber
we want to switch to, when the dispatcher will restore the userspace context,
will let the execution proceed to the just set context.
Finally the *fpu* registers are changed by using the functions provided by the
kernel, the previously running fiber is released and the `current_fiber` is
set properly.
### Fiber Local Storage
The allocator implemented for fibers is very simple. There is a fixed number
of entries kept in `fls` that can be allocated which are managed by a bitmap
keeping one bit for each entry telling whether it's free (`0`) or occupied
(`1`).
**`module/fibers_api.c`**
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=208 endLine=218 .numberLines startFrom="208" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/fibers_api.c" .lineAnchors
include=module/fibers_api.c
startLine=243 endLine=247 .numberLines startFrom="243" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`procfs` extension
------------------
As we have mentioned in the previous parts a `.fibers` directory is created
under `/proc` at module load and a directory for each thread group id that
opens the `/dev/fibers` file is created under `/proc/.fibers`. In each
directory is added one file for each fiber instantiation of the group.
In order to make more *proc-oriented* such information a symbolic link is
added to `/proc/<tgid>` with name `fibers` pointing to the corresponding
`/proc/.fibers/<tgid>` directory.
Since the entries under `/proc/<tgid>` are [statically defined] at compile time
it's not possible to add such link through a module except by hooking the
functions that are called upon listing/moving a/to a directory namely
[`proc_tgid_base_readdir`], [`proc_tgid_base_lookup`] respectively stored
inside the `iter_shared` field of [`proc_tgid_base_operations`] and `lookup`
field of [`proc_tgid_base_inode_operations`].
**`module/proc.c`**
~~~~ {caption="module/proc.c" .lineAnchors
include=module/proc.c
startLine=52 endLine=60 .numberLines startFrom="52" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/proc.c" .lineAnchors
include=module/proc.c
startLine=185 endLine=195 .numberLines startFrom="185" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~ {caption="module/proc.c" .lineAnchors
include=module/proc.c
startLine=207 endLine=215 .numberLines startFrom="207" .C}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The hooked `readdir` first calls the original function to fill the result with
the correct directory entries and then similarly to the original function
instantiates a new directory entry of type link. In order to do so, the
first three symbols in `hooked_syms_names` and additional re-definitions
of not exported structs used and function pointers types are needed.
Similar operations are taken for `lookup` but in this case the original
function is not invoked if the lookup is for the `fibers` directory.
Performance
-----------
![[FlameGraph Interactive view](images/perf.svg)](images/perf.svg){width=100%}
The FlameGraph produced by `perf` shows that a good amount of time in the
simulation is spent inside the `fls_get` function. Note that the graph is
slightly broken since the initial frame-pointers are not properly set for the
stacks of the created fibers producing the `[unknown]` tag.
The thread leader instead creates the fibers and then calls the `main_loop`
as shown in the graph with the block on top of `main`.
![](images/user.png){width=70% .center}
![](images/kernel.png){width=70% .center}
As expected, the module based implementation has no performance advantages
over the *sigaltstack* based one. This is due to the fact that calls to the
apis provided require a mode switch producing a double cost with respect to
the sigaltstack implementation since on a switch first the registers are
saved on stack by the Linux system call dispatcher and then copied on the
`fiber_struct` by the module. Same goes for the `fls_get` calls which as
well require a mode switch.
Conclusions
-----------
Overall the module has good performances and integrates smoothly with
the kernel. A good amount of time for the project has been spent in the
design choices trying to find the right data structures fitting the needs
of the project. The library and the module can be extended with
asynchronous I/O, multiple fibers pools per process and channels to
communicate between fibers.
[^1]: The implementation uses the recent *[xarray]* interface and assumes a
zeroed kernel stack.
[^2]: We use the term *file descriptor* to indicate both the `struct file`
kept by the kernel to describe file opened by a process and the number
returned to the process to identify the instance.
[miscdevice]: https://www.linuxjournal.com/article/2920
[xarray]: https://lwn.net/Articles/745073/
[idr]: https://www.kernel.org/doc/html/v4.17/core-api/idr.html
[radix tree]: https://lwn.net/Articles/175432/
[rcu]: https://lwn.net/Articles/262464/
[statically defined]: https://elixir.bootlin.com/linux/latest/source/fs/proc/base.c#L2902
[`proc_tgid_base_inode_operations`]: https://elixir.bootlin.com/linux/latest/source/fs/proc/base.c#L3017
[`proc_tgid_base_operations`]: https://elixir.bootlin.com/linux/latest/source/fs/proc/base.c#L3005
[`proc_tgid_base_readdir`]: https://elixir.bootlin.com/linux/latest/source/fs/proc/base.c#L2999
[`proc_tgid_base_lookup`]: https://elixir.bootlin.com/linux/latest/source/fs/proc/base.c#L3011