Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
allendowney
GitHub Repository: allendowney/cpython
Path: blob/main/Python/hamt.c
12 views
1
#include "Python.h"
2
3
#include "pycore_bitutils.h" // _Py_popcount32
4
#include "pycore_hamt.h"
5
#include "pycore_initconfig.h" // _PyStatus_OK()
6
#include "pycore_object.h" // _PyObject_GC_TRACK()
7
#include <stddef.h> // offsetof()
8
9
/*
10
This file provides an implementation of an immutable mapping using the
11
Hash Array Mapped Trie (or HAMT) datastructure.
12
13
This design allows to have:
14
15
1. Efficient copy: immutable mappings can be copied by reference,
16
making it an O(1) operation.
17
18
2. Efficient mutations: due to structural sharing, only a portion of
19
the trie needs to be copied when the collection is mutated. The
20
cost of set/delete operations is O(log N).
21
22
3. Efficient lookups: O(log N).
23
24
(where N is number of key/value items in the immutable mapping.)
25
26
27
HAMT
28
====
29
30
The core idea of HAMT is that the shape of the trie is encoded into the
31
hashes of keys.
32
33
Say we want to store a K/V pair in our mapping. First, we calculate the
34
hash of K, let's say it's 19830128, or in binary:
35
36
0b1001011101001010101110000 = 19830128
37
38
Now let's partition this bit representation of the hash into blocks of
39
5 bits each:
40
41
0b00_00000_10010_11101_00101_01011_10000 = 19830128
42
(6) (5) (4) (3) (2) (1)
43
44
Each block of 5 bits represents a number between 0 and 31. So if we have
45
a tree that consists of nodes, each of which is an array of 32 pointers,
46
those 5-bit blocks will encode a position on a single tree level.
47
48
For example, storing the key K with hash 19830128, results in the following
49
tree structure:
50
51
(array of 32 pointers)
52
+---+ -- +----+----+----+ -- +----+
53
root node | 0 | .. | 15 | 16 | 17 | .. | 31 | 0b10000 = 16 (1)
54
(level 1) +---+ -- +----+----+----+ -- +----+
55
|
56
+---+ -- +----+----+----+ -- +----+
57
a 2nd level node | 0 | .. | 10 | 11 | 12 | .. | 31 | 0b01011 = 11 (2)
58
+---+ -- +----+----+----+ -- +----+
59
|
60
+---+ -- +----+----+----+ -- +----+
61
a 3rd level node | 0 | .. | 04 | 05 | 06 | .. | 31 | 0b00101 = 5 (3)
62
+---+ -- +----+----+----+ -- +----+
63
|
64
+---+ -- +----+----+----+----+
65
a 4th level node | 0 | .. | 04 | 29 | 30 | 31 | 0b11101 = 29 (4)
66
+---+ -- +----+----+----+----+
67
|
68
+---+ -- +----+----+----+ -- +----+
69
a 5th level node | 0 | .. | 17 | 18 | 19 | .. | 31 | 0b10010 = 18 (5)
70
+---+ -- +----+----+----+ -- +----+
71
|
72
+--------------+
73
|
74
+---+ -- +----+----+----+ -- +----+
75
a 6th level node | 0 | .. | 15 | 16 | 17 | .. | 31 | 0b00000 = 0 (6)
76
+---+ -- +----+----+----+ -- +----+
77
|
78
V -- our value (or collision)
79
80
To rehash: for a K/V pair, the hash of K encodes where in the tree V will
81
be stored.
82
83
To optimize memory footprint and handle hash collisions, our implementation
84
uses three different types of nodes:
85
86
* A Bitmap node;
87
* An Array node;
88
* A Collision node.
89
90
Because we implement an immutable dictionary, our nodes are also
91
immutable. Therefore, when we need to modify a node, we copy it, and
92
do that modification to the copy.
93
94
95
Array Nodes
96
-----------
97
98
These nodes are very simple. Essentially they are arrays of 32 pointers
99
we used to illustrate the high-level idea in the previous section.
100
101
We use Array nodes only when we need to store more than 16 pointers
102
in a single node.
103
104
Array nodes do not store key objects or value objects. They are used
105
only as an indirection level - their pointers point to other nodes in
106
the tree.
107
108
109
Bitmap Node
110
-----------
111
112
Allocating a new 32-pointers array for every node of our tree would be
113
very expensive. Unless we store millions of keys, most of tree nodes would
114
be very sparse.
115
116
When we have less than 16 elements in a node, we don't want to use the
117
Array node, that would mean that we waste a lot of memory. Instead,
118
we can use bitmap compression and can have just as many pointers
119
as we need!
120
121
Bitmap nodes consist of two fields:
122
123
1. An array of pointers. If a Bitmap node holds N elements, the
124
array will be of N pointers.
125
126
2. A 32bit integer -- a bitmap field. If an N-th bit is set in the
127
bitmap, it means that the node has an N-th element.
128
129
For example, say we need to store a 3 elements sparse array:
130
131
+---+ -- +---+ -- +----+ -- +----+
132
| 0 | .. | 4 | .. | 11 | .. | 17 |
133
+---+ -- +---+ -- +----+ -- +----+
134
| | |
135
o1 o2 o3
136
137
We allocate a three-pointer Bitmap node. Its bitmap field will be
138
then set to:
139
140
0b_00100_00010_00000_10000 == (1 << 17) | (1 << 11) | (1 << 4)
141
142
To check if our Bitmap node has an I-th element we can do:
143
144
bitmap & (1 << I)
145
146
147
And here's a formula to calculate a position in our pointer array
148
which would correspond to an I-th element:
149
150
popcount(bitmap & ((1 << I) - 1))
151
152
153
Let's break it down:
154
155
* `popcount` is a function that returns a number of bits set to 1;
156
157
* `((1 << I) - 1)` is a mask to filter the bitmask to contain bits
158
set to the *right* of our bit.
159
160
161
So for our 17, 11, and 4 indexes:
162
163
* bitmap & ((1 << 17) - 1) == 0b100000010000 => 2 bits are set => index is 2.
164
165
* bitmap & ((1 << 11) - 1) == 0b10000 => 1 bit is set => index is 1.
166
167
* bitmap & ((1 << 4) - 1) == 0b0 => 0 bits are set => index is 0.
168
169
170
To conclude: Bitmap nodes are just like Array nodes -- they can store
171
a number of pointers, but use bitmap compression to eliminate unused
172
pointers.
173
174
175
Bitmap nodes have two pointers for each item:
176
177
+----+----+----+----+ -- +----+----+
178
| k1 | v1 | k2 | v2 | .. | kN | vN |
179
+----+----+----+----+ -- +----+----+
180
181
When kI == NULL, vI points to another tree level.
182
183
When kI != NULL, the actual key object is stored in kI, and its
184
value is stored in vI.
185
186
187
Collision Nodes
188
---------------
189
190
Collision nodes are simple arrays of pointers -- two pointers per
191
key/value. When there's a hash collision, say for k1/v1 and k2/v2
192
we have `hash(k1)==hash(k2)`. Then our collision node will be:
193
194
+----+----+----+----+
195
| k1 | v1 | k2 | v2 |
196
+----+----+----+----+
197
198
199
Tree Structure
200
--------------
201
202
All nodes are PyObjects.
203
204
The `PyHamtObject` object has a pointer to the root node (h_root),
205
and has a length field (h_count).
206
207
High-level functions accept a PyHamtObject object and dispatch to
208
lower-level functions depending on what kind of node h_root points to.
209
210
211
Operations
212
==========
213
214
There are three fundamental operations on an immutable dictionary:
215
216
1. "o.assoc(k, v)" will return a new immutable dictionary, that will be
217
a copy of "o", but with the "k/v" item set.
218
219
Functions in this file:
220
221
hamt_node_assoc, hamt_node_bitmap_assoc,
222
hamt_node_array_assoc, hamt_node_collision_assoc
223
224
`hamt_node_assoc` function accepts a node object, and calls
225
other functions depending on its actual type.
226
227
2. "o.find(k)" will lookup key "k" in "o".
228
229
Functions:
230
231
hamt_node_find, hamt_node_bitmap_find,
232
hamt_node_array_find, hamt_node_collision_find
233
234
3. "o.without(k)" will return a new immutable dictionary, that will be
235
a copy of "o", buth without the "k" key.
236
237
Functions:
238
239
hamt_node_without, hamt_node_bitmap_without,
240
hamt_node_array_without, hamt_node_collision_without
241
242
243
Further Reading
244
===============
245
246
1. http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice.html
247
248
2. http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html
249
250
3. Clojure's PersistentHashMap implementation:
251
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashMap.java
252
253
254
Debug
255
=====
256
257
The HAMT datatype is accessible for testing purposes under the
258
`_testcapi` module:
259
260
>>> from _testcapi import hamt
261
>>> h = hamt()
262
>>> h2 = h.set('a', 2)
263
>>> h3 = h2.set('b', 3)
264
>>> list(h3)
265
['a', 'b']
266
267
When CPython is built in debug mode, a '__dump__()' method is available
268
to introspect the tree:
269
270
>>> print(h3.__dump__())
271
HAMT(len=2):
272
BitmapNode(size=4 count=2 bitmap=0b110 id=0x10eb9d9e8):
273
'a': 2
274
'b': 3
275
*/
276
277
278
#define IS_ARRAY_NODE(node) Py_IS_TYPE(node, &_PyHamt_ArrayNode_Type)
279
#define IS_BITMAP_NODE(node) Py_IS_TYPE(node, &_PyHamt_BitmapNode_Type)
280
#define IS_COLLISION_NODE(node) Py_IS_TYPE(node, &_PyHamt_CollisionNode_Type)
281
282
283
/* Return type for 'find' (lookup a key) functions.
284
285
* F_ERROR - an error occurred;
286
* F_NOT_FOUND - the key was not found;
287
* F_FOUND - the key was found.
288
*/
289
typedef enum {F_ERROR, F_NOT_FOUND, F_FOUND} hamt_find_t;
290
291
292
/* Return type for 'without' (delete a key) functions.
293
294
* W_ERROR - an error occurred;
295
* W_NOT_FOUND - the key was not found: there's nothing to delete;
296
* W_EMPTY - the key was found: the node/tree would be empty
297
if the key is deleted;
298
* W_NEWNODE - the key was found: a new node/tree is returned
299
without that key.
300
*/
301
typedef enum {W_ERROR, W_NOT_FOUND, W_EMPTY, W_NEWNODE} hamt_without_t;
302
303
304
/* Low-level iterator protocol type.
305
306
* I_ITEM - a new item has been yielded;
307
* I_END - the whole tree was visited (similar to StopIteration).
308
*/
309
typedef enum {I_ITEM, I_END} hamt_iter_t;
310
311
312
#define HAMT_ARRAY_NODE_SIZE 32
313
314
315
typedef struct {
316
PyObject_HEAD
317
PyHamtNode *a_array[HAMT_ARRAY_NODE_SIZE];
318
Py_ssize_t a_count;
319
} PyHamtNode_Array;
320
321
322
typedef struct {
323
PyObject_VAR_HEAD
324
int32_t c_hash;
325
PyObject *c_array[1];
326
} PyHamtNode_Collision;
327
328
329
static PyHamtObject *
330
hamt_alloc(void);
331
332
static PyHamtNode *
333
hamt_node_assoc(PyHamtNode *node,
334
uint32_t shift, int32_t hash,
335
PyObject *key, PyObject *val, int* added_leaf);
336
337
static hamt_without_t
338
hamt_node_without(PyHamtNode *node,
339
uint32_t shift, int32_t hash,
340
PyObject *key,
341
PyHamtNode **new_node);
342
343
static hamt_find_t
344
hamt_node_find(PyHamtNode *node,
345
uint32_t shift, int32_t hash,
346
PyObject *key, PyObject **val);
347
348
#ifdef Py_DEBUG
349
static int
350
hamt_node_dump(PyHamtNode *node,
351
_PyUnicodeWriter *writer, int level);
352
#endif
353
354
static PyHamtNode *
355
hamt_node_array_new(Py_ssize_t);
356
357
static PyHamtNode *
358
hamt_node_collision_new(int32_t hash, Py_ssize_t size);
359
360
static inline Py_ssize_t
361
hamt_node_collision_count(PyHamtNode_Collision *node);
362
363
364
#ifdef Py_DEBUG
365
static void
366
_hamt_node_array_validate(void *obj_raw)
367
{
368
PyObject *obj = _PyObject_CAST(obj_raw);
369
assert(IS_ARRAY_NODE(obj));
370
PyHamtNode_Array *node = (PyHamtNode_Array*)obj;
371
Py_ssize_t i = 0, count = 0;
372
for (; i < HAMT_ARRAY_NODE_SIZE; i++) {
373
if (node->a_array[i] != NULL) {
374
count++;
375
}
376
}
377
assert(count == node->a_count);
378
}
379
380
#define VALIDATE_ARRAY_NODE(NODE) \
381
do { _hamt_node_array_validate(NODE); } while (0);
382
#else
383
#define VALIDATE_ARRAY_NODE(NODE)
384
#endif
385
386
387
/* Returns -1 on error */
388
static inline int32_t
389
hamt_hash(PyObject *o)
390
{
391
Py_hash_t hash = PyObject_Hash(o);
392
393
#if SIZEOF_PY_HASH_T <= 4
394
return hash;
395
#else
396
if (hash == -1) {
397
/* exception */
398
return -1;
399
}
400
401
/* While it's somewhat suboptimal to reduce Python's 64 bit hash to
402
32 bits via XOR, it seems that the resulting hash function
403
is good enough (this is also how Long type is hashed in Java.)
404
Storing 10, 100, 1000 Python strings results in a relatively
405
shallow and uniform tree structure.
406
407
Also it's worth noting that it would be possible to adapt the tree
408
structure to 64 bit hashes, but that would increase memory pressure
409
and provide little to no performance benefits for collections with
410
fewer than billions of key/value pairs.
411
412
Important: do not change this hash reducing function. There are many
413
tests that need an exact tree shape to cover all code paths and
414
we do that by specifying concrete values for test data's `__hash__`.
415
If this function is changed most of the regression tests would
416
become useless.
417
*/
418
int32_t xored = (int32_t)(hash & 0xffffffffl) ^ (int32_t)(hash >> 32);
419
return xored == -1 ? -2 : xored;
420
#endif
421
}
422
423
static inline uint32_t
424
hamt_mask(int32_t hash, uint32_t shift)
425
{
426
return (((uint32_t)hash >> shift) & 0x01f);
427
}
428
429
static inline uint32_t
430
hamt_bitpos(int32_t hash, uint32_t shift)
431
{
432
return (uint32_t)1 << hamt_mask(hash, shift);
433
}
434
435
static inline uint32_t
436
hamt_bitindex(uint32_t bitmap, uint32_t bit)
437
{
438
return (uint32_t)_Py_popcount32(bitmap & (bit - 1));
439
}
440
441
442
/////////////////////////////////// Dump Helpers
443
#ifdef Py_DEBUG
444
445
static int
446
_hamt_dump_ident(_PyUnicodeWriter *writer, int level)
447
{
448
/* Write `' ' * level` to the `writer` */
449
PyObject *str = NULL;
450
PyObject *num = NULL;
451
PyObject *res = NULL;
452
int ret = -1;
453
454
str = PyUnicode_FromString(" ");
455
if (str == NULL) {
456
goto error;
457
}
458
459
num = PyLong_FromLong((long)level);
460
if (num == NULL) {
461
goto error;
462
}
463
464
res = PyNumber_Multiply(str, num);
465
if (res == NULL) {
466
goto error;
467
}
468
469
ret = _PyUnicodeWriter_WriteStr(writer, res);
470
471
error:
472
Py_XDECREF(res);
473
Py_XDECREF(str);
474
Py_XDECREF(num);
475
return ret;
476
}
477
478
static int
479
_hamt_dump_format(_PyUnicodeWriter *writer, const char *format, ...)
480
{
481
/* A convenient helper combining _PyUnicodeWriter_WriteStr and
482
PyUnicode_FromFormatV.
483
*/
484
PyObject* msg;
485
int ret;
486
487
va_list vargs;
488
va_start(vargs, format);
489
msg = PyUnicode_FromFormatV(format, vargs);
490
va_end(vargs);
491
492
if (msg == NULL) {
493
return -1;
494
}
495
496
ret = _PyUnicodeWriter_WriteStr(writer, msg);
497
Py_DECREF(msg);
498
return ret;
499
}
500
501
#endif /* Py_DEBUG */
502
/////////////////////////////////// Bitmap Node
503
504
505
static PyHamtNode *
506
hamt_node_bitmap_new(Py_ssize_t size)
507
{
508
/* Create a new bitmap node of size 'size' */
509
510
PyHamtNode_Bitmap *node;
511
Py_ssize_t i;
512
513
if (size == 0) {
514
/* Since bitmap nodes are immutable, we can cache the instance
515
for size=0 and reuse it whenever we need an empty bitmap node.
516
*/
517
return (PyHamtNode *)Py_NewRef(&_Py_SINGLETON(hamt_bitmap_node_empty));
518
}
519
520
assert(size >= 0);
521
assert(size % 2 == 0);
522
523
/* No freelist; allocate a new bitmap node */
524
node = PyObject_GC_NewVar(
525
PyHamtNode_Bitmap, &_PyHamt_BitmapNode_Type, size);
526
if (node == NULL) {
527
return NULL;
528
}
529
530
Py_SET_SIZE(node, size);
531
532
for (i = 0; i < size; i++) {
533
node->b_array[i] = NULL;
534
}
535
536
node->b_bitmap = 0;
537
538
_PyObject_GC_TRACK(node);
539
540
return (PyHamtNode *)node;
541
}
542
543
static inline Py_ssize_t
544
hamt_node_bitmap_count(PyHamtNode_Bitmap *node)
545
{
546
return Py_SIZE(node) / 2;
547
}
548
549
static PyHamtNode_Bitmap *
550
hamt_node_bitmap_clone(PyHamtNode_Bitmap *node)
551
{
552
/* Clone a bitmap node; return a new one with the same child notes. */
553
554
PyHamtNode_Bitmap *clone;
555
Py_ssize_t i;
556
557
clone = (PyHamtNode_Bitmap *)hamt_node_bitmap_new(Py_SIZE(node));
558
if (clone == NULL) {
559
return NULL;
560
}
561
562
for (i = 0; i < Py_SIZE(node); i++) {
563
clone->b_array[i] = Py_XNewRef(node->b_array[i]);
564
}
565
566
clone->b_bitmap = node->b_bitmap;
567
return clone;
568
}
569
570
static PyHamtNode_Bitmap *
571
hamt_node_bitmap_clone_without(PyHamtNode_Bitmap *o, uint32_t bit)
572
{
573
assert(bit & o->b_bitmap);
574
assert(hamt_node_bitmap_count(o) > 1);
575
576
PyHamtNode_Bitmap *new = (PyHamtNode_Bitmap *)hamt_node_bitmap_new(
577
Py_SIZE(o) - 2);
578
if (new == NULL) {
579
return NULL;
580
}
581
582
uint32_t idx = hamt_bitindex(o->b_bitmap, bit);
583
uint32_t key_idx = 2 * idx;
584
uint32_t val_idx = key_idx + 1;
585
uint32_t i;
586
587
for (i = 0; i < key_idx; i++) {
588
new->b_array[i] = Py_XNewRef(o->b_array[i]);
589
}
590
591
assert(Py_SIZE(o) >= 0 && Py_SIZE(o) <= 32);
592
for (i = val_idx + 1; i < (uint32_t)Py_SIZE(o); i++) {
593
new->b_array[i - 2] = Py_XNewRef(o->b_array[i]);
594
}
595
596
new->b_bitmap = o->b_bitmap & ~bit;
597
return new;
598
}
599
600
static PyHamtNode *
601
hamt_node_new_bitmap_or_collision(uint32_t shift,
602
PyObject *key1, PyObject *val1,
603
int32_t key2_hash,
604
PyObject *key2, PyObject *val2)
605
{
606
/* Helper method. Creates a new node for key1/val and key2/val2
607
pairs.
608
609
If key1 hash is equal to the hash of key2, a Collision node
610
will be created. If they are not equal, a Bitmap node is
611
created.
612
*/
613
614
int32_t key1_hash = hamt_hash(key1);
615
if (key1_hash == -1) {
616
return NULL;
617
}
618
619
if (key1_hash == key2_hash) {
620
PyHamtNode_Collision *n;
621
n = (PyHamtNode_Collision *)hamt_node_collision_new(key1_hash, 4);
622
if (n == NULL) {
623
return NULL;
624
}
625
626
n->c_array[0] = Py_NewRef(key1);
627
n->c_array[1] = Py_NewRef(val1);
628
629
n->c_array[2] = Py_NewRef(key2);
630
n->c_array[3] = Py_NewRef(val2);
631
632
return (PyHamtNode *)n;
633
}
634
else {
635
int added_leaf = 0;
636
PyHamtNode *n = hamt_node_bitmap_new(0);
637
if (n == NULL) {
638
return NULL;
639
}
640
641
PyHamtNode *n2 = hamt_node_assoc(
642
n, shift, key1_hash, key1, val1, &added_leaf);
643
Py_DECREF(n);
644
if (n2 == NULL) {
645
return NULL;
646
}
647
648
n = hamt_node_assoc(n2, shift, key2_hash, key2, val2, &added_leaf);
649
Py_DECREF(n2);
650
if (n == NULL) {
651
return NULL;
652
}
653
654
return n;
655
}
656
}
657
658
static PyHamtNode *
659
hamt_node_bitmap_assoc(PyHamtNode_Bitmap *self,
660
uint32_t shift, int32_t hash,
661
PyObject *key, PyObject *val, int* added_leaf)
662
{
663
/* assoc operation for bitmap nodes.
664
665
Return: a new node, or self if key/val already is in the
666
collection.
667
668
'added_leaf' is later used in '_PyHamt_Assoc' to determine if
669
`hamt.set(key, val)` increased the size of the collection.
670
*/
671
672
uint32_t bit = hamt_bitpos(hash, shift);
673
uint32_t idx = hamt_bitindex(self->b_bitmap, bit);
674
675
/* Bitmap node layout:
676
677
+------+------+------+------+ --- +------+------+
678
| key1 | val1 | key2 | val2 | ... | keyN | valN |
679
+------+------+------+------+ --- +------+------+
680
where `N < Py_SIZE(node)`.
681
682
The `node->b_bitmap` field is a bitmap. For a given
683
`(shift, hash)` pair we can determine:
684
685
- If this node has the corresponding key/val slots.
686
- The index of key/val slots.
687
*/
688
689
if (self->b_bitmap & bit) {
690
/* The key is set in this node */
691
692
uint32_t key_idx = 2 * idx;
693
uint32_t val_idx = key_idx + 1;
694
695
assert(val_idx < (size_t)Py_SIZE(self));
696
697
PyObject *key_or_null = self->b_array[key_idx];
698
PyObject *val_or_node = self->b_array[val_idx];
699
700
if (key_or_null == NULL) {
701
/* key is NULL. This means that we have a few keys
702
that have the same (hash, shift) pair. */
703
704
assert(val_or_node != NULL);
705
706
PyHamtNode *sub_node = hamt_node_assoc(
707
(PyHamtNode *)val_or_node,
708
shift + 5, hash, key, val, added_leaf);
709
if (sub_node == NULL) {
710
return NULL;
711
}
712
713
if (val_or_node == (PyObject *)sub_node) {
714
Py_DECREF(sub_node);
715
return (PyHamtNode *)Py_NewRef(self);
716
}
717
718
PyHamtNode_Bitmap *ret = hamt_node_bitmap_clone(self);
719
if (ret == NULL) {
720
return NULL;
721
}
722
Py_SETREF(ret->b_array[val_idx], (PyObject*)sub_node);
723
return (PyHamtNode *)ret;
724
}
725
726
assert(key != NULL);
727
/* key is not NULL. This means that we have only one other
728
key in this collection that matches our hash for this shift. */
729
730
int comp_err = PyObject_RichCompareBool(key, key_or_null, Py_EQ);
731
if (comp_err < 0) { /* exception in __eq__ */
732
return NULL;
733
}
734
if (comp_err == 1) { /* key == key_or_null */
735
if (val == val_or_node) {
736
/* we already have the same key/val pair; return self. */
737
return (PyHamtNode *)Py_NewRef(self);
738
}
739
740
/* We're setting a new value for the key we had before.
741
Make a new bitmap node with a replaced value, and return it. */
742
PyHamtNode_Bitmap *ret = hamt_node_bitmap_clone(self);
743
if (ret == NULL) {
744
return NULL;
745
}
746
Py_SETREF(ret->b_array[val_idx], Py_NewRef(val));
747
return (PyHamtNode *)ret;
748
}
749
750
/* It's a new key, and it has the same index as *one* another key.
751
We have a collision. We need to create a new node which will
752
combine the existing key and the key we're adding.
753
754
`hamt_node_new_bitmap_or_collision` will either create a new
755
Collision node if the keys have identical hashes, or
756
a new Bitmap node.
757
*/
758
PyHamtNode *sub_node = hamt_node_new_bitmap_or_collision(
759
shift + 5,
760
key_or_null, val_or_node, /* existing key/val */
761
hash,
762
key, val /* new key/val */
763
);
764
if (sub_node == NULL) {
765
return NULL;
766
}
767
768
PyHamtNode_Bitmap *ret = hamt_node_bitmap_clone(self);
769
if (ret == NULL) {
770
Py_DECREF(sub_node);
771
return NULL;
772
}
773
Py_SETREF(ret->b_array[key_idx], NULL);
774
Py_SETREF(ret->b_array[val_idx], (PyObject *)sub_node);
775
776
*added_leaf = 1;
777
return (PyHamtNode *)ret;
778
}
779
else {
780
/* There was no key before with the same (shift,hash). */
781
782
uint32_t n = (uint32_t)_Py_popcount32(self->b_bitmap);
783
784
if (n >= 16) {
785
/* When we have a situation where we want to store more
786
than 16 nodes at one level of the tree, we no longer
787
want to use the Bitmap node with bitmap encoding.
788
789
Instead we start using an Array node, which has
790
simpler (faster) implementation at the expense of
791
having preallocated 32 pointers for its keys/values
792
pairs.
793
794
Small hamt objects (<30 keys) usually don't have any
795
Array nodes at all. Between ~30 and ~400 keys hamt
796
objects usually have one Array node, and usually it's
797
a root node.
798
*/
799
800
uint32_t jdx = hamt_mask(hash, shift);
801
/* 'jdx' is the index of where the new key should be added
802
in the new Array node we're about to create. */
803
804
PyHamtNode *empty = NULL;
805
PyHamtNode_Array *new_node = NULL;
806
PyHamtNode *res = NULL;
807
808
/* Create a new Array node. */
809
new_node = (PyHamtNode_Array *)hamt_node_array_new(n + 1);
810
if (new_node == NULL) {
811
goto fin;
812
}
813
814
/* Create an empty bitmap node for the next
815
hamt_node_assoc call. */
816
empty = hamt_node_bitmap_new(0);
817
if (empty == NULL) {
818
goto fin;
819
}
820
821
/* Make a new bitmap node for the key/val we're adding.
822
Set that bitmap node to new-array-node[jdx]. */
823
new_node->a_array[jdx] = hamt_node_assoc(
824
empty, shift + 5, hash, key, val, added_leaf);
825
if (new_node->a_array[jdx] == NULL) {
826
goto fin;
827
}
828
829
/* Copy existing key/value pairs from the current Bitmap
830
node to the new Array node we've just created. */
831
Py_ssize_t i, j;
832
for (i = 0, j = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
833
if (((self->b_bitmap >> i) & 1) != 0) {
834
/* Ensure we don't accidentally override `jdx` element
835
we set few lines above.
836
*/
837
assert(new_node->a_array[i] == NULL);
838
839
if (self->b_array[j] == NULL) {
840
new_node->a_array[i] =
841
(PyHamtNode *)Py_NewRef(self->b_array[j + 1]);
842
}
843
else {
844
int32_t rehash = hamt_hash(self->b_array[j]);
845
if (rehash == -1) {
846
goto fin;
847
}
848
849
new_node->a_array[i] = hamt_node_assoc(
850
empty, shift + 5,
851
rehash,
852
self->b_array[j],
853
self->b_array[j + 1],
854
added_leaf);
855
856
if (new_node->a_array[i] == NULL) {
857
goto fin;
858
}
859
}
860
j += 2;
861
}
862
}
863
864
VALIDATE_ARRAY_NODE(new_node)
865
866
/* That's it! */
867
res = (PyHamtNode *)new_node;
868
869
fin:
870
Py_XDECREF(empty);
871
if (res == NULL) {
872
Py_XDECREF(new_node);
873
}
874
return res;
875
}
876
else {
877
/* We have less than 16 keys at this level; let's just
878
create a new bitmap node out of this node with the
879
new key/val pair added. */
880
881
uint32_t key_idx = 2 * idx;
882
uint32_t val_idx = key_idx + 1;
883
uint32_t i;
884
885
*added_leaf = 1;
886
887
/* Allocate new Bitmap node which can have one more key/val
888
pair in addition to what we have already. */
889
PyHamtNode_Bitmap *new_node =
890
(PyHamtNode_Bitmap *)hamt_node_bitmap_new(2 * (n + 1));
891
if (new_node == NULL) {
892
return NULL;
893
}
894
895
/* Copy all keys/values that will be before the new key/value
896
we are adding. */
897
for (i = 0; i < key_idx; i++) {
898
new_node->b_array[i] = Py_XNewRef(self->b_array[i]);
899
}
900
901
/* Set the new key/value to the new Bitmap node. */
902
new_node->b_array[key_idx] = Py_NewRef(key);
903
new_node->b_array[val_idx] = Py_NewRef(val);
904
905
/* Copy all keys/values that will be after the new key/value
906
we are adding. */
907
assert(Py_SIZE(self) >= 0 && Py_SIZE(self) <= 32);
908
for (i = key_idx; i < (uint32_t)Py_SIZE(self); i++) {
909
new_node->b_array[i + 2] = Py_XNewRef(self->b_array[i]);
910
}
911
912
new_node->b_bitmap = self->b_bitmap | bit;
913
return (PyHamtNode *)new_node;
914
}
915
}
916
}
917
918
static hamt_without_t
919
hamt_node_bitmap_without(PyHamtNode_Bitmap *self,
920
uint32_t shift, int32_t hash,
921
PyObject *key,
922
PyHamtNode **new_node)
923
{
924
uint32_t bit = hamt_bitpos(hash, shift);
925
if ((self->b_bitmap & bit) == 0) {
926
return W_NOT_FOUND;
927
}
928
929
uint32_t idx = hamt_bitindex(self->b_bitmap, bit);
930
931
uint32_t key_idx = 2 * idx;
932
uint32_t val_idx = key_idx + 1;
933
934
PyObject *key_or_null = self->b_array[key_idx];
935
PyObject *val_or_node = self->b_array[val_idx];
936
937
if (key_or_null == NULL) {
938
/* key == NULL means that 'value' is another tree node. */
939
940
PyHamtNode *sub_node = NULL;
941
942
hamt_without_t res = hamt_node_without(
943
(PyHamtNode *)val_or_node,
944
shift + 5, hash, key, &sub_node);
945
946
switch (res) {
947
case W_EMPTY:
948
/* It's impossible for us to receive a W_EMPTY here:
949
950
- Array nodes are converted to Bitmap nodes when
951
we delete 16th item from them;
952
953
- Collision nodes are converted to Bitmap when
954
there is one item in them;
955
956
- Bitmap node's without() inlines single-item
957
sub-nodes.
958
959
So in no situation we can have a single-item
960
Bitmap child of another Bitmap node.
961
*/
962
Py_UNREACHABLE();
963
964
case W_NEWNODE: {
965
assert(sub_node != NULL);
966
967
if (IS_BITMAP_NODE(sub_node)) {
968
PyHamtNode_Bitmap *sub_tree = (PyHamtNode_Bitmap *)sub_node;
969
if (hamt_node_bitmap_count(sub_tree) == 1 &&
970
sub_tree->b_array[0] != NULL)
971
{
972
/* A bitmap node with one key/value pair. Just
973
merge it into this node.
974
975
Note that we don't inline Bitmap nodes that
976
have a NULL key -- those nodes point to another
977
tree level, and we cannot simply move tree levels
978
up or down.
979
*/
980
981
PyHamtNode_Bitmap *clone = hamt_node_bitmap_clone(self);
982
if (clone == NULL) {
983
Py_DECREF(sub_node);
984
return W_ERROR;
985
}
986
987
PyObject *key = sub_tree->b_array[0];
988
PyObject *val = sub_tree->b_array[1];
989
990
Py_XSETREF(clone->b_array[key_idx], Py_NewRef(key));
991
Py_SETREF(clone->b_array[val_idx], Py_NewRef(val));
992
993
Py_DECREF(sub_tree);
994
995
*new_node = (PyHamtNode *)clone;
996
return W_NEWNODE;
997
}
998
}
999
1000
#ifdef Py_DEBUG
1001
/* Ensure that Collision.without implementation
1002
converts to Bitmap nodes itself.
1003
*/
1004
if (IS_COLLISION_NODE(sub_node)) {
1005
assert(hamt_node_collision_count(
1006
(PyHamtNode_Collision*)sub_node) > 1);
1007
}
1008
#endif
1009
1010
PyHamtNode_Bitmap *clone = hamt_node_bitmap_clone(self);
1011
if (clone == NULL) {
1012
return W_ERROR;
1013
}
1014
1015
Py_SETREF(clone->b_array[val_idx],
1016
(PyObject *)sub_node); /* borrow */
1017
1018
*new_node = (PyHamtNode *)clone;
1019
return W_NEWNODE;
1020
}
1021
1022
case W_ERROR:
1023
case W_NOT_FOUND:
1024
assert(sub_node == NULL);
1025
return res;
1026
1027
default:
1028
Py_UNREACHABLE();
1029
}
1030
}
1031
else {
1032
/* We have a regular key/value pair */
1033
1034
int cmp = PyObject_RichCompareBool(key_or_null, key, Py_EQ);
1035
if (cmp < 0) {
1036
return W_ERROR;
1037
}
1038
if (cmp == 0) {
1039
return W_NOT_FOUND;
1040
}
1041
1042
if (hamt_node_bitmap_count(self) == 1) {
1043
return W_EMPTY;
1044
}
1045
1046
*new_node = (PyHamtNode *)
1047
hamt_node_bitmap_clone_without(self, bit);
1048
if (*new_node == NULL) {
1049
return W_ERROR;
1050
}
1051
1052
return W_NEWNODE;
1053
}
1054
}
1055
1056
static hamt_find_t
1057
hamt_node_bitmap_find(PyHamtNode_Bitmap *self,
1058
uint32_t shift, int32_t hash,
1059
PyObject *key, PyObject **val)
1060
{
1061
/* Lookup a key in a Bitmap node. */
1062
1063
uint32_t bit = hamt_bitpos(hash, shift);
1064
uint32_t idx;
1065
uint32_t key_idx;
1066
uint32_t val_idx;
1067
PyObject *key_or_null;
1068
PyObject *val_or_node;
1069
int comp_err;
1070
1071
if ((self->b_bitmap & bit) == 0) {
1072
return F_NOT_FOUND;
1073
}
1074
1075
idx = hamt_bitindex(self->b_bitmap, bit);
1076
key_idx = idx * 2;
1077
val_idx = key_idx + 1;
1078
1079
assert(val_idx < (size_t)Py_SIZE(self));
1080
1081
key_or_null = self->b_array[key_idx];
1082
val_or_node = self->b_array[val_idx];
1083
1084
if (key_or_null == NULL) {
1085
/* There are a few keys that have the same hash at the current shift
1086
that match our key. Dispatch the lookup further down the tree. */
1087
assert(val_or_node != NULL);
1088
return hamt_node_find((PyHamtNode *)val_or_node,
1089
shift + 5, hash, key, val);
1090
}
1091
1092
/* We have only one key -- a potential match. Let's compare if the
1093
key we are looking at is equal to the key we are looking for. */
1094
assert(key != NULL);
1095
comp_err = PyObject_RichCompareBool(key, key_or_null, Py_EQ);
1096
if (comp_err < 0) { /* exception in __eq__ */
1097
return F_ERROR;
1098
}
1099
if (comp_err == 1) { /* key == key_or_null */
1100
*val = val_or_node;
1101
return F_FOUND;
1102
}
1103
1104
return F_NOT_FOUND;
1105
}
1106
1107
static int
1108
hamt_node_bitmap_traverse(PyHamtNode_Bitmap *self, visitproc visit, void *arg)
1109
{
1110
/* Bitmap's tp_traverse */
1111
1112
Py_ssize_t i;
1113
1114
for (i = Py_SIZE(self); --i >= 0; ) {
1115
Py_VISIT(self->b_array[i]);
1116
}
1117
1118
return 0;
1119
}
1120
1121
static void
1122
hamt_node_bitmap_dealloc(PyHamtNode_Bitmap *self)
1123
{
1124
/* Bitmap's tp_dealloc */
1125
1126
Py_ssize_t len = Py_SIZE(self);
1127
Py_ssize_t i;
1128
1129
if (Py_SIZE(self) == 0) {
1130
/* The empty node is statically allocated. */
1131
assert(self == &_Py_SINGLETON(hamt_bitmap_node_empty));
1132
#ifdef Py_DEBUG
1133
_Py_FatalRefcountError("deallocating the empty hamt node bitmap singleton");
1134
#else
1135
return;
1136
#endif
1137
}
1138
1139
PyObject_GC_UnTrack(self);
1140
Py_TRASHCAN_BEGIN(self, hamt_node_bitmap_dealloc)
1141
1142
if (len > 0) {
1143
i = len;
1144
while (--i >= 0) {
1145
Py_XDECREF(self->b_array[i]);
1146
}
1147
}
1148
1149
Py_TYPE(self)->tp_free((PyObject *)self);
1150
Py_TRASHCAN_END
1151
}
1152
1153
#ifdef Py_DEBUG
1154
static int
1155
hamt_node_bitmap_dump(PyHamtNode_Bitmap *node,
1156
_PyUnicodeWriter *writer, int level)
1157
{
1158
/* Debug build: __dump__() method implementation for Bitmap nodes. */
1159
1160
Py_ssize_t i;
1161
PyObject *tmp1;
1162
PyObject *tmp2;
1163
1164
if (_hamt_dump_ident(writer, level + 1)) {
1165
goto error;
1166
}
1167
1168
if (_hamt_dump_format(writer, "BitmapNode(size=%zd count=%zd ",
1169
Py_SIZE(node), Py_SIZE(node) / 2))
1170
{
1171
goto error;
1172
}
1173
1174
tmp1 = PyLong_FromUnsignedLong(node->b_bitmap);
1175
if (tmp1 == NULL) {
1176
goto error;
1177
}
1178
tmp2 = _PyLong_Format(tmp1, 2);
1179
Py_DECREF(tmp1);
1180
if (tmp2 == NULL) {
1181
goto error;
1182
}
1183
if (_hamt_dump_format(writer, "bitmap=%S id=%p):\n", tmp2, node)) {
1184
Py_DECREF(tmp2);
1185
goto error;
1186
}
1187
Py_DECREF(tmp2);
1188
1189
for (i = 0; i < Py_SIZE(node); i += 2) {
1190
PyObject *key_or_null = node->b_array[i];
1191
PyObject *val_or_node = node->b_array[i + 1];
1192
1193
if (_hamt_dump_ident(writer, level + 2)) {
1194
goto error;
1195
}
1196
1197
if (key_or_null == NULL) {
1198
if (_hamt_dump_format(writer, "NULL:\n")) {
1199
goto error;
1200
}
1201
1202
if (hamt_node_dump((PyHamtNode *)val_or_node,
1203
writer, level + 2))
1204
{
1205
goto error;
1206
}
1207
}
1208
else {
1209
if (_hamt_dump_format(writer, "%R: %R", key_or_null,
1210
val_or_node))
1211
{
1212
goto error;
1213
}
1214
}
1215
1216
if (_hamt_dump_format(writer, "\n")) {
1217
goto error;
1218
}
1219
}
1220
1221
return 0;
1222
error:
1223
return -1;
1224
}
1225
#endif /* Py_DEBUG */
1226
1227
1228
/////////////////////////////////// Collision Node
1229
1230
1231
static PyHamtNode *
1232
hamt_node_collision_new(int32_t hash, Py_ssize_t size)
1233
{
1234
/* Create a new Collision node. */
1235
1236
PyHamtNode_Collision *node;
1237
Py_ssize_t i;
1238
1239
assert(size >= 4);
1240
assert(size % 2 == 0);
1241
1242
node = PyObject_GC_NewVar(
1243
PyHamtNode_Collision, &_PyHamt_CollisionNode_Type, size);
1244
if (node == NULL) {
1245
return NULL;
1246
}
1247
1248
for (i = 0; i < size; i++) {
1249
node->c_array[i] = NULL;
1250
}
1251
1252
Py_SET_SIZE(node, size);
1253
node->c_hash = hash;
1254
1255
_PyObject_GC_TRACK(node);
1256
1257
return (PyHamtNode *)node;
1258
}
1259
1260
static hamt_find_t
1261
hamt_node_collision_find_index(PyHamtNode_Collision *self, PyObject *key,
1262
Py_ssize_t *idx)
1263
{
1264
/* Lookup `key` in the Collision node `self`. Set the index of the
1265
found key to 'idx'. */
1266
1267
Py_ssize_t i;
1268
PyObject *el;
1269
1270
for (i = 0; i < Py_SIZE(self); i += 2) {
1271
el = self->c_array[i];
1272
1273
assert(el != NULL);
1274
int cmp = PyObject_RichCompareBool(key, el, Py_EQ);
1275
if (cmp < 0) {
1276
return F_ERROR;
1277
}
1278
if (cmp == 1) {
1279
*idx = i;
1280
return F_FOUND;
1281
}
1282
}
1283
1284
return F_NOT_FOUND;
1285
}
1286
1287
static PyHamtNode *
1288
hamt_node_collision_assoc(PyHamtNode_Collision *self,
1289
uint32_t shift, int32_t hash,
1290
PyObject *key, PyObject *val, int* added_leaf)
1291
{
1292
/* Set a new key to this level (currently a Collision node)
1293
of the tree. */
1294
1295
if (hash == self->c_hash) {
1296
/* The hash of the 'key' we are adding matches the hash of
1297
other keys in this Collision node. */
1298
1299
Py_ssize_t key_idx = -1;
1300
hamt_find_t found;
1301
PyHamtNode_Collision *new_node;
1302
Py_ssize_t i;
1303
1304
/* Let's try to lookup the new 'key', maybe we already have it. */
1305
found = hamt_node_collision_find_index(self, key, &key_idx);
1306
switch (found) {
1307
case F_ERROR:
1308
/* Exception. */
1309
return NULL;
1310
1311
case F_NOT_FOUND:
1312
/* This is a totally new key. Clone the current node,
1313
add a new key/value to the cloned node. */
1314
1315
new_node = (PyHamtNode_Collision *)hamt_node_collision_new(
1316
self->c_hash, Py_SIZE(self) + 2);
1317
if (new_node == NULL) {
1318
return NULL;
1319
}
1320
1321
for (i = 0; i < Py_SIZE(self); i++) {
1322
new_node->c_array[i] = Py_NewRef(self->c_array[i]);
1323
}
1324
1325
new_node->c_array[i] = Py_NewRef(key);
1326
new_node->c_array[i + 1] = Py_NewRef(val);
1327
1328
*added_leaf = 1;
1329
return (PyHamtNode *)new_node;
1330
1331
case F_FOUND:
1332
/* There's a key which is equal to the key we are adding. */
1333
1334
assert(key_idx >= 0);
1335
assert(key_idx < Py_SIZE(self));
1336
Py_ssize_t val_idx = key_idx + 1;
1337
1338
if (self->c_array[val_idx] == val) {
1339
/* We're setting a key/value pair that's already set. */
1340
return (PyHamtNode *)Py_NewRef(self);
1341
}
1342
1343
/* We need to replace old value for the key
1344
with a new value. Create a new Collision node.*/
1345
new_node = (PyHamtNode_Collision *)hamt_node_collision_new(
1346
self->c_hash, Py_SIZE(self));
1347
if (new_node == NULL) {
1348
return NULL;
1349
}
1350
1351
/* Copy all elements of the old node to the new one. */
1352
for (i = 0; i < Py_SIZE(self); i++) {
1353
new_node->c_array[i] = Py_NewRef(self->c_array[i]);
1354
}
1355
1356
/* Replace the old value with the new value for the our key. */
1357
Py_SETREF(new_node->c_array[val_idx], Py_NewRef(val));
1358
1359
return (PyHamtNode *)new_node;
1360
1361
default:
1362
Py_UNREACHABLE();
1363
}
1364
}
1365
else {
1366
/* The hash of the new key is different from the hash that
1367
all keys of this Collision node have.
1368
1369
Create a Bitmap node inplace with two children:
1370
key/value pair that we're adding, and the Collision node
1371
we're replacing on this tree level.
1372
*/
1373
1374
PyHamtNode_Bitmap *new_node;
1375
PyHamtNode *assoc_res;
1376
1377
new_node = (PyHamtNode_Bitmap *)hamt_node_bitmap_new(2);
1378
if (new_node == NULL) {
1379
return NULL;
1380
}
1381
new_node->b_bitmap = hamt_bitpos(self->c_hash, shift);
1382
new_node->b_array[1] = Py_NewRef(self);
1383
1384
assoc_res = hamt_node_bitmap_assoc(
1385
new_node, shift, hash, key, val, added_leaf);
1386
Py_DECREF(new_node);
1387
return assoc_res;
1388
}
1389
}
1390
1391
static inline Py_ssize_t
1392
hamt_node_collision_count(PyHamtNode_Collision *node)
1393
{
1394
return Py_SIZE(node) / 2;
1395
}
1396
1397
static hamt_without_t
1398
hamt_node_collision_without(PyHamtNode_Collision *self,
1399
uint32_t shift, int32_t hash,
1400
PyObject *key,
1401
PyHamtNode **new_node)
1402
{
1403
if (hash != self->c_hash) {
1404
return W_NOT_FOUND;
1405
}
1406
1407
Py_ssize_t key_idx = -1;
1408
hamt_find_t found = hamt_node_collision_find_index(self, key, &key_idx);
1409
1410
switch (found) {
1411
case F_ERROR:
1412
return W_ERROR;
1413
1414
case F_NOT_FOUND:
1415
return W_NOT_FOUND;
1416
1417
case F_FOUND:
1418
assert(key_idx >= 0);
1419
assert(key_idx < Py_SIZE(self));
1420
1421
Py_ssize_t new_count = hamt_node_collision_count(self) - 1;
1422
1423
if (new_count == 0) {
1424
/* The node has only one key/value pair and it's for the
1425
key we're trying to delete. So a new node will be empty
1426
after the removal.
1427
*/
1428
return W_EMPTY;
1429
}
1430
1431
if (new_count == 1) {
1432
/* The node has two keys, and after deletion the
1433
new Collision node would have one. Collision nodes
1434
with one key shouldn't exist, so convert it to a
1435
Bitmap node.
1436
*/
1437
PyHamtNode_Bitmap *node = (PyHamtNode_Bitmap *)
1438
hamt_node_bitmap_new(2);
1439
if (node == NULL) {
1440
return W_ERROR;
1441
}
1442
1443
if (key_idx == 0) {
1444
node->b_array[0] = Py_NewRef(self->c_array[2]);
1445
node->b_array[1] = Py_NewRef(self->c_array[3]);
1446
}
1447
else {
1448
assert(key_idx == 2);
1449
node->b_array[0] = Py_NewRef(self->c_array[0]);
1450
node->b_array[1] = Py_NewRef(self->c_array[1]);
1451
}
1452
1453
node->b_bitmap = hamt_bitpos(hash, shift);
1454
1455
*new_node = (PyHamtNode *)node;
1456
return W_NEWNODE;
1457
}
1458
1459
/* Allocate a new Collision node with capacity for one
1460
less key/value pair */
1461
PyHamtNode_Collision *new = (PyHamtNode_Collision *)
1462
hamt_node_collision_new(
1463
self->c_hash, Py_SIZE(self) - 2);
1464
if (new == NULL) {
1465
return W_ERROR;
1466
}
1467
1468
/* Copy all other keys from `self` to `new` */
1469
Py_ssize_t i;
1470
for (i = 0; i < key_idx; i++) {
1471
new->c_array[i] = Py_NewRef(self->c_array[i]);
1472
}
1473
for (i = key_idx + 2; i < Py_SIZE(self); i++) {
1474
new->c_array[i - 2] = Py_NewRef(self->c_array[i]);
1475
}
1476
1477
*new_node = (PyHamtNode*)new;
1478
return W_NEWNODE;
1479
1480
default:
1481
Py_UNREACHABLE();
1482
}
1483
}
1484
1485
static hamt_find_t
1486
hamt_node_collision_find(PyHamtNode_Collision *self,
1487
uint32_t shift, int32_t hash,
1488
PyObject *key, PyObject **val)
1489
{
1490
/* Lookup `key` in the Collision node `self`. Set the value
1491
for the found key to 'val'. */
1492
1493
Py_ssize_t idx = -1;
1494
hamt_find_t res;
1495
1496
res = hamt_node_collision_find_index(self, key, &idx);
1497
if (res == F_ERROR || res == F_NOT_FOUND) {
1498
return res;
1499
}
1500
1501
assert(idx >= 0);
1502
assert(idx + 1 < Py_SIZE(self));
1503
1504
*val = self->c_array[idx + 1];
1505
assert(*val != NULL);
1506
1507
return F_FOUND;
1508
}
1509
1510
1511
static int
1512
hamt_node_collision_traverse(PyHamtNode_Collision *self,
1513
visitproc visit, void *arg)
1514
{
1515
/* Collision's tp_traverse */
1516
1517
Py_ssize_t i;
1518
1519
for (i = Py_SIZE(self); --i >= 0; ) {
1520
Py_VISIT(self->c_array[i]);
1521
}
1522
1523
return 0;
1524
}
1525
1526
static void
1527
hamt_node_collision_dealloc(PyHamtNode_Collision *self)
1528
{
1529
/* Collision's tp_dealloc */
1530
1531
Py_ssize_t len = Py_SIZE(self);
1532
1533
PyObject_GC_UnTrack(self);
1534
Py_TRASHCAN_BEGIN(self, hamt_node_collision_dealloc)
1535
1536
if (len > 0) {
1537
1538
while (--len >= 0) {
1539
Py_XDECREF(self->c_array[len]);
1540
}
1541
}
1542
1543
Py_TYPE(self)->tp_free((PyObject *)self);
1544
Py_TRASHCAN_END
1545
}
1546
1547
#ifdef Py_DEBUG
1548
static int
1549
hamt_node_collision_dump(PyHamtNode_Collision *node,
1550
_PyUnicodeWriter *writer, int level)
1551
{
1552
/* Debug build: __dump__() method implementation for Collision nodes. */
1553
1554
Py_ssize_t i;
1555
1556
if (_hamt_dump_ident(writer, level + 1)) {
1557
goto error;
1558
}
1559
1560
if (_hamt_dump_format(writer, "CollisionNode(size=%zd id=%p):\n",
1561
Py_SIZE(node), node))
1562
{
1563
goto error;
1564
}
1565
1566
for (i = 0; i < Py_SIZE(node); i += 2) {
1567
PyObject *key = node->c_array[i];
1568
PyObject *val = node->c_array[i + 1];
1569
1570
if (_hamt_dump_ident(writer, level + 2)) {
1571
goto error;
1572
}
1573
1574
if (_hamt_dump_format(writer, "%R: %R\n", key, val)) {
1575
goto error;
1576
}
1577
}
1578
1579
return 0;
1580
error:
1581
return -1;
1582
}
1583
#endif /* Py_DEBUG */
1584
1585
1586
/////////////////////////////////// Array Node
1587
1588
1589
static PyHamtNode *
1590
hamt_node_array_new(Py_ssize_t count)
1591
{
1592
Py_ssize_t i;
1593
1594
PyHamtNode_Array *node = PyObject_GC_New(
1595
PyHamtNode_Array, &_PyHamt_ArrayNode_Type);
1596
if (node == NULL) {
1597
return NULL;
1598
}
1599
1600
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1601
node->a_array[i] = NULL;
1602
}
1603
1604
node->a_count = count;
1605
1606
_PyObject_GC_TRACK(node);
1607
return (PyHamtNode *)node;
1608
}
1609
1610
static PyHamtNode_Array *
1611
hamt_node_array_clone(PyHamtNode_Array *node)
1612
{
1613
PyHamtNode_Array *clone;
1614
Py_ssize_t i;
1615
1616
VALIDATE_ARRAY_NODE(node)
1617
1618
/* Create a new Array node. */
1619
clone = (PyHamtNode_Array *)hamt_node_array_new(node->a_count);
1620
if (clone == NULL) {
1621
return NULL;
1622
}
1623
1624
/* Copy all elements from the current Array node to the new one. */
1625
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1626
clone->a_array[i] = (PyHamtNode*)Py_XNewRef(node->a_array[i]);
1627
}
1628
1629
VALIDATE_ARRAY_NODE(clone)
1630
return clone;
1631
}
1632
1633
static PyHamtNode *
1634
hamt_node_array_assoc(PyHamtNode_Array *self,
1635
uint32_t shift, int32_t hash,
1636
PyObject *key, PyObject *val, int* added_leaf)
1637
{
1638
/* Set a new key to this level (currently a Collision node)
1639
of the tree.
1640
1641
Array nodes don't store values, they can only point to
1642
other nodes. They are simple arrays of 32 BaseNode pointers/
1643
*/
1644
1645
uint32_t idx = hamt_mask(hash, shift);
1646
PyHamtNode *node = self->a_array[idx];
1647
PyHamtNode *child_node;
1648
PyHamtNode_Array *new_node;
1649
Py_ssize_t i;
1650
1651
if (node == NULL) {
1652
/* There's no child node for the given hash. Create a new
1653
Bitmap node for this key. */
1654
1655
PyHamtNode_Bitmap *empty = NULL;
1656
1657
/* Get an empty Bitmap node to work with. */
1658
empty = (PyHamtNode_Bitmap *)hamt_node_bitmap_new(0);
1659
if (empty == NULL) {
1660
return NULL;
1661
}
1662
1663
/* Set key/val to the newly created empty Bitmap, thus
1664
creating a new Bitmap node with our key/value pair. */
1665
child_node = hamt_node_bitmap_assoc(
1666
empty,
1667
shift + 5, hash, key, val, added_leaf);
1668
Py_DECREF(empty);
1669
if (child_node == NULL) {
1670
return NULL;
1671
}
1672
1673
/* Create a new Array node. */
1674
new_node = (PyHamtNode_Array *)hamt_node_array_new(self->a_count + 1);
1675
if (new_node == NULL) {
1676
Py_DECREF(child_node);
1677
return NULL;
1678
}
1679
1680
/* Copy all elements from the current Array node to the
1681
new one. */
1682
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1683
new_node->a_array[i] = (PyHamtNode*)Py_XNewRef(self->a_array[i]);
1684
}
1685
1686
assert(new_node->a_array[idx] == NULL);
1687
new_node->a_array[idx] = child_node; /* borrow */
1688
VALIDATE_ARRAY_NODE(new_node)
1689
}
1690
else {
1691
/* There's a child node for the given hash.
1692
Set the key to it./ */
1693
child_node = hamt_node_assoc(
1694
node, shift + 5, hash, key, val, added_leaf);
1695
if (child_node == NULL) {
1696
return NULL;
1697
}
1698
else if (child_node == (PyHamtNode *)self) {
1699
Py_DECREF(child_node);
1700
return (PyHamtNode *)self;
1701
}
1702
1703
new_node = hamt_node_array_clone(self);
1704
if (new_node == NULL) {
1705
Py_DECREF(child_node);
1706
return NULL;
1707
}
1708
1709
Py_SETREF(new_node->a_array[idx], child_node); /* borrow */
1710
VALIDATE_ARRAY_NODE(new_node)
1711
}
1712
1713
return (PyHamtNode *)new_node;
1714
}
1715
1716
static hamt_without_t
1717
hamt_node_array_without(PyHamtNode_Array *self,
1718
uint32_t shift, int32_t hash,
1719
PyObject *key,
1720
PyHamtNode **new_node)
1721
{
1722
uint32_t idx = hamt_mask(hash, shift);
1723
PyHamtNode *node = self->a_array[idx];
1724
1725
if (node == NULL) {
1726
return W_NOT_FOUND;
1727
}
1728
1729
PyHamtNode *sub_node = NULL;
1730
hamt_without_t res = hamt_node_without(
1731
(PyHamtNode *)node,
1732
shift + 5, hash, key, &sub_node);
1733
1734
switch (res) {
1735
case W_NOT_FOUND:
1736
case W_ERROR:
1737
assert(sub_node == NULL);
1738
return res;
1739
1740
case W_NEWNODE: {
1741
/* We need to replace a node at the `idx` index.
1742
Clone this node and replace.
1743
*/
1744
assert(sub_node != NULL);
1745
1746
PyHamtNode_Array *clone = hamt_node_array_clone(self);
1747
if (clone == NULL) {
1748
Py_DECREF(sub_node);
1749
return W_ERROR;
1750
}
1751
1752
Py_SETREF(clone->a_array[idx], sub_node); /* borrow */
1753
*new_node = (PyHamtNode*)clone; /* borrow */
1754
return W_NEWNODE;
1755
}
1756
1757
case W_EMPTY: {
1758
assert(sub_node == NULL);
1759
/* We need to remove a node at the `idx` index.
1760
Calculate the size of the replacement Array node.
1761
*/
1762
Py_ssize_t new_count = self->a_count - 1;
1763
1764
if (new_count == 0) {
1765
return W_EMPTY;
1766
}
1767
1768
if (new_count >= 16) {
1769
/* We convert Bitmap nodes to Array nodes, when a
1770
Bitmap node needs to store more than 15 key/value
1771
pairs. So we will create a new Array node if we
1772
the number of key/values after deletion is still
1773
greater than 15.
1774
*/
1775
1776
PyHamtNode_Array *new = hamt_node_array_clone(self);
1777
if (new == NULL) {
1778
return W_ERROR;
1779
}
1780
new->a_count = new_count;
1781
Py_CLEAR(new->a_array[idx]);
1782
1783
*new_node = (PyHamtNode*)new; /* borrow */
1784
return W_NEWNODE;
1785
}
1786
1787
/* New Array node would have less than 16 key/value
1788
pairs. We need to create a replacement Bitmap node. */
1789
1790
Py_ssize_t bitmap_size = new_count * 2;
1791
uint32_t bitmap = 0;
1792
1793
PyHamtNode_Bitmap *new = (PyHamtNode_Bitmap *)
1794
hamt_node_bitmap_new(bitmap_size);
1795
if (new == NULL) {
1796
return W_ERROR;
1797
}
1798
1799
Py_ssize_t new_i = 0;
1800
for (uint32_t i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1801
if (i == idx) {
1802
/* Skip the node we are deleting. */
1803
continue;
1804
}
1805
1806
PyHamtNode *node = self->a_array[i];
1807
if (node == NULL) {
1808
/* Skip any missing nodes. */
1809
continue;
1810
}
1811
1812
bitmap |= 1U << i;
1813
1814
if (IS_BITMAP_NODE(node)) {
1815
PyHamtNode_Bitmap *child = (PyHamtNode_Bitmap *)node;
1816
1817
if (hamt_node_bitmap_count(child) == 1 &&
1818
child->b_array[0] != NULL)
1819
{
1820
/* node is a Bitmap with one key/value pair, just
1821
merge it into the new Bitmap node we're building.
1822
1823
Note that we don't inline Bitmap nodes that
1824
have a NULL key -- those nodes point to another
1825
tree level, and we cannot simply move tree levels
1826
up or down.
1827
*/
1828
PyObject *key = child->b_array[0];
1829
PyObject *val = child->b_array[1];
1830
1831
new->b_array[new_i] = Py_NewRef(key);
1832
new->b_array[new_i + 1] = Py_NewRef(val);
1833
}
1834
else {
1835
new->b_array[new_i] = NULL;
1836
new->b_array[new_i + 1] = Py_NewRef(node);
1837
}
1838
}
1839
else {
1840
1841
#ifdef Py_DEBUG
1842
if (IS_COLLISION_NODE(node)) {
1843
Py_ssize_t child_count = hamt_node_collision_count(
1844
(PyHamtNode_Collision*)node);
1845
assert(child_count > 1);
1846
}
1847
else if (IS_ARRAY_NODE(node)) {
1848
assert(((PyHamtNode_Array*)node)->a_count >= 16);
1849
}
1850
#endif
1851
1852
/* Just copy the node into our new Bitmap */
1853
new->b_array[new_i] = NULL;
1854
new->b_array[new_i + 1] = Py_NewRef(node);
1855
}
1856
1857
new_i += 2;
1858
}
1859
1860
new->b_bitmap = bitmap;
1861
*new_node = (PyHamtNode*)new; /* borrow */
1862
return W_NEWNODE;
1863
}
1864
1865
default:
1866
Py_UNREACHABLE();
1867
}
1868
}
1869
1870
static hamt_find_t
1871
hamt_node_array_find(PyHamtNode_Array *self,
1872
uint32_t shift, int32_t hash,
1873
PyObject *key, PyObject **val)
1874
{
1875
/* Lookup `key` in the Array node `self`. Set the value
1876
for the found key to 'val'. */
1877
1878
uint32_t idx = hamt_mask(hash, shift);
1879
PyHamtNode *node;
1880
1881
node = self->a_array[idx];
1882
if (node == NULL) {
1883
return F_NOT_FOUND;
1884
}
1885
1886
/* Dispatch to the generic hamt_node_find */
1887
return hamt_node_find(node, shift + 5, hash, key, val);
1888
}
1889
1890
static int
1891
hamt_node_array_traverse(PyHamtNode_Array *self,
1892
visitproc visit, void *arg)
1893
{
1894
/* Array's tp_traverse */
1895
1896
Py_ssize_t i;
1897
1898
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1899
Py_VISIT(self->a_array[i]);
1900
}
1901
1902
return 0;
1903
}
1904
1905
static void
1906
hamt_node_array_dealloc(PyHamtNode_Array *self)
1907
{
1908
/* Array's tp_dealloc */
1909
1910
Py_ssize_t i;
1911
1912
PyObject_GC_UnTrack(self);
1913
Py_TRASHCAN_BEGIN(self, hamt_node_array_dealloc)
1914
1915
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1916
Py_XDECREF(self->a_array[i]);
1917
}
1918
1919
Py_TYPE(self)->tp_free((PyObject *)self);
1920
Py_TRASHCAN_END
1921
}
1922
1923
#ifdef Py_DEBUG
1924
static int
1925
hamt_node_array_dump(PyHamtNode_Array *node,
1926
_PyUnicodeWriter *writer, int level)
1927
{
1928
/* Debug build: __dump__() method implementation for Array nodes. */
1929
1930
Py_ssize_t i;
1931
1932
if (_hamt_dump_ident(writer, level + 1)) {
1933
goto error;
1934
}
1935
1936
if (_hamt_dump_format(writer, "ArrayNode(id=%p):\n", node)) {
1937
goto error;
1938
}
1939
1940
for (i = 0; i < HAMT_ARRAY_NODE_SIZE; i++) {
1941
if (node->a_array[i] == NULL) {
1942
continue;
1943
}
1944
1945
if (_hamt_dump_ident(writer, level + 2)) {
1946
goto error;
1947
}
1948
1949
if (_hamt_dump_format(writer, "%zd::\n", i)) {
1950
goto error;
1951
}
1952
1953
if (hamt_node_dump(node->a_array[i], writer, level + 1)) {
1954
goto error;
1955
}
1956
1957
if (_hamt_dump_format(writer, "\n")) {
1958
goto error;
1959
}
1960
}
1961
1962
return 0;
1963
error:
1964
return -1;
1965
}
1966
#endif /* Py_DEBUG */
1967
1968
1969
/////////////////////////////////// Node Dispatch
1970
1971
1972
static PyHamtNode *
1973
hamt_node_assoc(PyHamtNode *node,
1974
uint32_t shift, int32_t hash,
1975
PyObject *key, PyObject *val, int* added_leaf)
1976
{
1977
/* Set key/value to the 'node' starting with the given shift/hash.
1978
Return a new node, or the same node if key/value already
1979
set.
1980
1981
added_leaf will be set to 1 if key/value wasn't in the
1982
tree before.
1983
1984
This method automatically dispatches to the suitable
1985
hamt_node_{nodetype}_assoc method.
1986
*/
1987
1988
if (IS_BITMAP_NODE(node)) {
1989
return hamt_node_bitmap_assoc(
1990
(PyHamtNode_Bitmap *)node,
1991
shift, hash, key, val, added_leaf);
1992
}
1993
else if (IS_ARRAY_NODE(node)) {
1994
return hamt_node_array_assoc(
1995
(PyHamtNode_Array *)node,
1996
shift, hash, key, val, added_leaf);
1997
}
1998
else {
1999
assert(IS_COLLISION_NODE(node));
2000
return hamt_node_collision_assoc(
2001
(PyHamtNode_Collision *)node,
2002
shift, hash, key, val, added_leaf);
2003
}
2004
}
2005
2006
static hamt_without_t
2007
hamt_node_without(PyHamtNode *node,
2008
uint32_t shift, int32_t hash,
2009
PyObject *key,
2010
PyHamtNode **new_node)
2011
{
2012
if (IS_BITMAP_NODE(node)) {
2013
return hamt_node_bitmap_without(
2014
(PyHamtNode_Bitmap *)node,
2015
shift, hash, key,
2016
new_node);
2017
}
2018
else if (IS_ARRAY_NODE(node)) {
2019
return hamt_node_array_without(
2020
(PyHamtNode_Array *)node,
2021
shift, hash, key,
2022
new_node);
2023
}
2024
else {
2025
assert(IS_COLLISION_NODE(node));
2026
return hamt_node_collision_without(
2027
(PyHamtNode_Collision *)node,
2028
shift, hash, key,
2029
new_node);
2030
}
2031
}
2032
2033
static hamt_find_t
2034
hamt_node_find(PyHamtNode *node,
2035
uint32_t shift, int32_t hash,
2036
PyObject *key, PyObject **val)
2037
{
2038
/* Find the key in the node starting with the given shift/hash.
2039
2040
If a value is found, the result will be set to F_FOUND, and
2041
*val will point to the found value object.
2042
2043
If a value wasn't found, the result will be set to F_NOT_FOUND.
2044
2045
If an exception occurs during the call, the result will be F_ERROR.
2046
2047
This method automatically dispatches to the suitable
2048
hamt_node_{nodetype}_find method.
2049
*/
2050
2051
if (IS_BITMAP_NODE(node)) {
2052
return hamt_node_bitmap_find(
2053
(PyHamtNode_Bitmap *)node,
2054
shift, hash, key, val);
2055
2056
}
2057
else if (IS_ARRAY_NODE(node)) {
2058
return hamt_node_array_find(
2059
(PyHamtNode_Array *)node,
2060
shift, hash, key, val);
2061
}
2062
else {
2063
assert(IS_COLLISION_NODE(node));
2064
return hamt_node_collision_find(
2065
(PyHamtNode_Collision *)node,
2066
shift, hash, key, val);
2067
}
2068
}
2069
2070
#ifdef Py_DEBUG
2071
static int
2072
hamt_node_dump(PyHamtNode *node,
2073
_PyUnicodeWriter *writer, int level)
2074
{
2075
/* Debug build: __dump__() method implementation for a node.
2076
2077
This method automatically dispatches to the suitable
2078
hamt_node_{nodetype})_dump method.
2079
*/
2080
2081
if (IS_BITMAP_NODE(node)) {
2082
return hamt_node_bitmap_dump(
2083
(PyHamtNode_Bitmap *)node, writer, level);
2084
}
2085
else if (IS_ARRAY_NODE(node)) {
2086
return hamt_node_array_dump(
2087
(PyHamtNode_Array *)node, writer, level);
2088
}
2089
else {
2090
assert(IS_COLLISION_NODE(node));
2091
return hamt_node_collision_dump(
2092
(PyHamtNode_Collision *)node, writer, level);
2093
}
2094
}
2095
#endif /* Py_DEBUG */
2096
2097
2098
/////////////////////////////////// Iterators: Machinery
2099
2100
2101
static hamt_iter_t
2102
hamt_iterator_next(PyHamtIteratorState *iter, PyObject **key, PyObject **val);
2103
2104
2105
static void
2106
hamt_iterator_init(PyHamtIteratorState *iter, PyHamtNode *root)
2107
{
2108
for (uint32_t i = 0; i < _Py_HAMT_MAX_TREE_DEPTH; i++) {
2109
iter->i_nodes[i] = NULL;
2110
iter->i_pos[i] = 0;
2111
}
2112
2113
iter->i_level = 0;
2114
2115
/* Note: we don't incref/decref nodes in i_nodes. */
2116
iter->i_nodes[0] = root;
2117
}
2118
2119
static hamt_iter_t
2120
hamt_iterator_bitmap_next(PyHamtIteratorState *iter,
2121
PyObject **key, PyObject **val)
2122
{
2123
int8_t level = iter->i_level;
2124
2125
PyHamtNode_Bitmap *node = (PyHamtNode_Bitmap *)(iter->i_nodes[level]);
2126
Py_ssize_t pos = iter->i_pos[level];
2127
2128
if (pos + 1 >= Py_SIZE(node)) {
2129
#ifdef Py_DEBUG
2130
assert(iter->i_level >= 0);
2131
iter->i_nodes[iter->i_level] = NULL;
2132
#endif
2133
iter->i_level--;
2134
return hamt_iterator_next(iter, key, val);
2135
}
2136
2137
if (node->b_array[pos] == NULL) {
2138
iter->i_pos[level] = pos + 2;
2139
2140
int8_t next_level = level + 1;
2141
assert(next_level < _Py_HAMT_MAX_TREE_DEPTH);
2142
iter->i_level = next_level;
2143
iter->i_pos[next_level] = 0;
2144
iter->i_nodes[next_level] = (PyHamtNode *)
2145
node->b_array[pos + 1];
2146
2147
return hamt_iterator_next(iter, key, val);
2148
}
2149
2150
*key = node->b_array[pos];
2151
*val = node->b_array[pos + 1];
2152
iter->i_pos[level] = pos + 2;
2153
return I_ITEM;
2154
}
2155
2156
static hamt_iter_t
2157
hamt_iterator_collision_next(PyHamtIteratorState *iter,
2158
PyObject **key, PyObject **val)
2159
{
2160
int8_t level = iter->i_level;
2161
2162
PyHamtNode_Collision *node = (PyHamtNode_Collision *)(iter->i_nodes[level]);
2163
Py_ssize_t pos = iter->i_pos[level];
2164
2165
if (pos + 1 >= Py_SIZE(node)) {
2166
#ifdef Py_DEBUG
2167
assert(iter->i_level >= 0);
2168
iter->i_nodes[iter->i_level] = NULL;
2169
#endif
2170
iter->i_level--;
2171
return hamt_iterator_next(iter, key, val);
2172
}
2173
2174
*key = node->c_array[pos];
2175
*val = node->c_array[pos + 1];
2176
iter->i_pos[level] = pos + 2;
2177
return I_ITEM;
2178
}
2179
2180
static hamt_iter_t
2181
hamt_iterator_array_next(PyHamtIteratorState *iter,
2182
PyObject **key, PyObject **val)
2183
{
2184
int8_t level = iter->i_level;
2185
2186
PyHamtNode_Array *node = (PyHamtNode_Array *)(iter->i_nodes[level]);
2187
Py_ssize_t pos = iter->i_pos[level];
2188
2189
if (pos >= HAMT_ARRAY_NODE_SIZE) {
2190
#ifdef Py_DEBUG
2191
assert(iter->i_level >= 0);
2192
iter->i_nodes[iter->i_level] = NULL;
2193
#endif
2194
iter->i_level--;
2195
return hamt_iterator_next(iter, key, val);
2196
}
2197
2198
for (Py_ssize_t i = pos; i < HAMT_ARRAY_NODE_SIZE; i++) {
2199
if (node->a_array[i] != NULL) {
2200
iter->i_pos[level] = i + 1;
2201
2202
int8_t next_level = level + 1;
2203
assert(next_level < _Py_HAMT_MAX_TREE_DEPTH);
2204
iter->i_pos[next_level] = 0;
2205
iter->i_nodes[next_level] = node->a_array[i];
2206
iter->i_level = next_level;
2207
2208
return hamt_iterator_next(iter, key, val);
2209
}
2210
}
2211
2212
#ifdef Py_DEBUG
2213
assert(iter->i_level >= 0);
2214
iter->i_nodes[iter->i_level] = NULL;
2215
#endif
2216
2217
iter->i_level--;
2218
return hamt_iterator_next(iter, key, val);
2219
}
2220
2221
static hamt_iter_t
2222
hamt_iterator_next(PyHamtIteratorState *iter, PyObject **key, PyObject **val)
2223
{
2224
if (iter->i_level < 0) {
2225
return I_END;
2226
}
2227
2228
assert(iter->i_level < _Py_HAMT_MAX_TREE_DEPTH);
2229
2230
PyHamtNode *current = iter->i_nodes[iter->i_level];
2231
2232
if (IS_BITMAP_NODE(current)) {
2233
return hamt_iterator_bitmap_next(iter, key, val);
2234
}
2235
else if (IS_ARRAY_NODE(current)) {
2236
return hamt_iterator_array_next(iter, key, val);
2237
}
2238
else {
2239
assert(IS_COLLISION_NODE(current));
2240
return hamt_iterator_collision_next(iter, key, val);
2241
}
2242
}
2243
2244
2245
/////////////////////////////////// HAMT high-level functions
2246
2247
2248
PyHamtObject *
2249
_PyHamt_Assoc(PyHamtObject *o, PyObject *key, PyObject *val)
2250
{
2251
int32_t key_hash;
2252
int added_leaf = 0;
2253
PyHamtNode *new_root;
2254
PyHamtObject *new_o;
2255
2256
key_hash = hamt_hash(key);
2257
if (key_hash == -1) {
2258
return NULL;
2259
}
2260
2261
new_root = hamt_node_assoc(
2262
(PyHamtNode *)(o->h_root),
2263
0, key_hash, key, val, &added_leaf);
2264
if (new_root == NULL) {
2265
return NULL;
2266
}
2267
2268
if (new_root == o->h_root) {
2269
Py_DECREF(new_root);
2270
return (PyHamtObject*)Py_NewRef(o);
2271
}
2272
2273
new_o = hamt_alloc();
2274
if (new_o == NULL) {
2275
Py_DECREF(new_root);
2276
return NULL;
2277
}
2278
2279
new_o->h_root = new_root; /* borrow */
2280
new_o->h_count = added_leaf ? o->h_count + 1 : o->h_count;
2281
2282
return new_o;
2283
}
2284
2285
PyHamtObject *
2286
_PyHamt_Without(PyHamtObject *o, PyObject *key)
2287
{
2288
int32_t key_hash = hamt_hash(key);
2289
if (key_hash == -1) {
2290
return NULL;
2291
}
2292
2293
PyHamtNode *new_root = NULL;
2294
2295
hamt_without_t res = hamt_node_without(
2296
(PyHamtNode *)(o->h_root),
2297
0, key_hash, key,
2298
&new_root);
2299
2300
switch (res) {
2301
case W_ERROR:
2302
return NULL;
2303
case W_EMPTY:
2304
return _PyHamt_New();
2305
case W_NOT_FOUND:
2306
return (PyHamtObject*)Py_NewRef(o);
2307
case W_NEWNODE: {
2308
assert(new_root != NULL);
2309
2310
PyHamtObject *new_o = hamt_alloc();
2311
if (new_o == NULL) {
2312
Py_DECREF(new_root);
2313
return NULL;
2314
}
2315
2316
new_o->h_root = new_root; /* borrow */
2317
new_o->h_count = o->h_count - 1;
2318
assert(new_o->h_count >= 0);
2319
return new_o;
2320
}
2321
default:
2322
Py_UNREACHABLE();
2323
}
2324
}
2325
2326
static hamt_find_t
2327
hamt_find(PyHamtObject *o, PyObject *key, PyObject **val)
2328
{
2329
if (o->h_count == 0) {
2330
return F_NOT_FOUND;
2331
}
2332
2333
int32_t key_hash = hamt_hash(key);
2334
if (key_hash == -1) {
2335
return F_ERROR;
2336
}
2337
2338
return hamt_node_find(o->h_root, 0, key_hash, key, val);
2339
}
2340
2341
2342
int
2343
_PyHamt_Find(PyHamtObject *o, PyObject *key, PyObject **val)
2344
{
2345
hamt_find_t res = hamt_find(o, key, val);
2346
switch (res) {
2347
case F_ERROR:
2348
return -1;
2349
case F_NOT_FOUND:
2350
return 0;
2351
case F_FOUND:
2352
return 1;
2353
default:
2354
Py_UNREACHABLE();
2355
}
2356
}
2357
2358
2359
int
2360
_PyHamt_Eq(PyHamtObject *v, PyHamtObject *w)
2361
{
2362
if (v == w) {
2363
return 1;
2364
}
2365
2366
if (v->h_count != w->h_count) {
2367
return 0;
2368
}
2369
2370
PyHamtIteratorState iter;
2371
hamt_iter_t iter_res;
2372
hamt_find_t find_res;
2373
PyObject *v_key;
2374
PyObject *v_val;
2375
PyObject *w_val;
2376
2377
hamt_iterator_init(&iter, v->h_root);
2378
2379
do {
2380
iter_res = hamt_iterator_next(&iter, &v_key, &v_val);
2381
if (iter_res == I_ITEM) {
2382
find_res = hamt_find(w, v_key, &w_val);
2383
switch (find_res) {
2384
case F_ERROR:
2385
return -1;
2386
2387
case F_NOT_FOUND:
2388
return 0;
2389
2390
case F_FOUND: {
2391
int cmp = PyObject_RichCompareBool(v_val, w_val, Py_EQ);
2392
if (cmp < 0) {
2393
return -1;
2394
}
2395
if (cmp == 0) {
2396
return 0;
2397
}
2398
}
2399
}
2400
}
2401
} while (iter_res != I_END);
2402
2403
return 1;
2404
}
2405
2406
Py_ssize_t
2407
_PyHamt_Len(PyHamtObject *o)
2408
{
2409
return o->h_count;
2410
}
2411
2412
static PyHamtObject *
2413
hamt_alloc(void)
2414
{
2415
PyHamtObject *o;
2416
o = PyObject_GC_New(PyHamtObject, &_PyHamt_Type);
2417
if (o == NULL) {
2418
return NULL;
2419
}
2420
o->h_count = 0;
2421
o->h_root = NULL;
2422
o->h_weakreflist = NULL;
2423
PyObject_GC_Track(o);
2424
return o;
2425
}
2426
2427
#define _empty_hamt \
2428
(&_Py_INTERP_SINGLETON(_PyInterpreterState_GET(), hamt_empty))
2429
2430
PyHamtObject *
2431
_PyHamt_New(void)
2432
{
2433
/* HAMT is an immutable object so we can easily cache an
2434
empty instance. */
2435
return (PyHamtObject*)Py_NewRef(_empty_hamt);
2436
}
2437
2438
#ifdef Py_DEBUG
2439
static PyObject *
2440
hamt_dump(PyHamtObject *self)
2441
{
2442
_PyUnicodeWriter writer;
2443
2444
_PyUnicodeWriter_Init(&writer);
2445
2446
if (_hamt_dump_format(&writer, "HAMT(len=%zd):\n", self->h_count)) {
2447
goto error;
2448
}
2449
2450
if (hamt_node_dump(self->h_root, &writer, 0)) {
2451
goto error;
2452
}
2453
2454
return _PyUnicodeWriter_Finish(&writer);
2455
2456
error:
2457
_PyUnicodeWriter_Dealloc(&writer);
2458
return NULL;
2459
}
2460
#endif /* Py_DEBUG */
2461
2462
2463
/////////////////////////////////// Iterators: Shared Iterator Implementation
2464
2465
2466
static int
2467
hamt_baseiter_tp_clear(PyHamtIterator *it)
2468
{
2469
Py_CLEAR(it->hi_obj);
2470
return 0;
2471
}
2472
2473
static void
2474
hamt_baseiter_tp_dealloc(PyHamtIterator *it)
2475
{
2476
PyObject_GC_UnTrack(it);
2477
(void)hamt_baseiter_tp_clear(it);
2478
PyObject_GC_Del(it);
2479
}
2480
2481
static int
2482
hamt_baseiter_tp_traverse(PyHamtIterator *it, visitproc visit, void *arg)
2483
{
2484
Py_VISIT(it->hi_obj);
2485
return 0;
2486
}
2487
2488
static PyObject *
2489
hamt_baseiter_tp_iternext(PyHamtIterator *it)
2490
{
2491
PyObject *key;
2492
PyObject *val;
2493
hamt_iter_t res = hamt_iterator_next(&it->hi_iter, &key, &val);
2494
2495
switch (res) {
2496
case I_END:
2497
PyErr_SetNone(PyExc_StopIteration);
2498
return NULL;
2499
2500
case I_ITEM: {
2501
return (*(it->hi_yield))(key, val);
2502
}
2503
2504
default: {
2505
Py_UNREACHABLE();
2506
}
2507
}
2508
}
2509
2510
static Py_ssize_t
2511
hamt_baseiter_tp_len(PyHamtIterator *it)
2512
{
2513
return it->hi_obj->h_count;
2514
}
2515
2516
static PyMappingMethods PyHamtIterator_as_mapping = {
2517
(lenfunc)hamt_baseiter_tp_len,
2518
};
2519
2520
static PyObject *
2521
hamt_baseiter_new(PyTypeObject *type, binaryfunc yield, PyHamtObject *o)
2522
{
2523
PyHamtIterator *it = PyObject_GC_New(PyHamtIterator, type);
2524
if (it == NULL) {
2525
return NULL;
2526
}
2527
2528
it->hi_obj = (PyHamtObject*)Py_NewRef(o);
2529
it->hi_yield = yield;
2530
2531
hamt_iterator_init(&it->hi_iter, o->h_root);
2532
2533
return (PyObject*)it;
2534
}
2535
2536
#define ITERATOR_TYPE_SHARED_SLOTS \
2537
.tp_basicsize = sizeof(PyHamtIterator), \
2538
.tp_itemsize = 0, \
2539
.tp_as_mapping = &PyHamtIterator_as_mapping, \
2540
.tp_dealloc = (destructor)hamt_baseiter_tp_dealloc, \
2541
.tp_getattro = PyObject_GenericGetAttr, \
2542
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, \
2543
.tp_traverse = (traverseproc)hamt_baseiter_tp_traverse, \
2544
.tp_clear = (inquiry)hamt_baseiter_tp_clear, \
2545
.tp_iter = PyObject_SelfIter, \
2546
.tp_iternext = (iternextfunc)hamt_baseiter_tp_iternext,
2547
2548
2549
/////////////////////////////////// _PyHamtItems_Type
2550
2551
2552
PyTypeObject _PyHamtItems_Type = {
2553
PyVarObject_HEAD_INIT(NULL, 0)
2554
"items",
2555
ITERATOR_TYPE_SHARED_SLOTS
2556
};
2557
2558
static PyObject *
2559
hamt_iter_yield_items(PyObject *key, PyObject *val)
2560
{
2561
return PyTuple_Pack(2, key, val);
2562
}
2563
2564
PyObject *
2565
_PyHamt_NewIterItems(PyHamtObject *o)
2566
{
2567
return hamt_baseiter_new(
2568
&_PyHamtItems_Type, hamt_iter_yield_items, o);
2569
}
2570
2571
2572
/////////////////////////////////// _PyHamtKeys_Type
2573
2574
2575
PyTypeObject _PyHamtKeys_Type = {
2576
PyVarObject_HEAD_INIT(NULL, 0)
2577
"keys",
2578
ITERATOR_TYPE_SHARED_SLOTS
2579
};
2580
2581
static PyObject *
2582
hamt_iter_yield_keys(PyObject *key, PyObject *val)
2583
{
2584
return Py_NewRef(key);
2585
}
2586
2587
PyObject *
2588
_PyHamt_NewIterKeys(PyHamtObject *o)
2589
{
2590
return hamt_baseiter_new(
2591
&_PyHamtKeys_Type, hamt_iter_yield_keys, o);
2592
}
2593
2594
2595
/////////////////////////////////// _PyHamtValues_Type
2596
2597
2598
PyTypeObject _PyHamtValues_Type = {
2599
PyVarObject_HEAD_INIT(NULL, 0)
2600
"values",
2601
ITERATOR_TYPE_SHARED_SLOTS
2602
};
2603
2604
static PyObject *
2605
hamt_iter_yield_values(PyObject *key, PyObject *val)
2606
{
2607
return Py_NewRef(val);
2608
}
2609
2610
PyObject *
2611
_PyHamt_NewIterValues(PyHamtObject *o)
2612
{
2613
return hamt_baseiter_new(
2614
&_PyHamtValues_Type, hamt_iter_yield_values, o);
2615
}
2616
2617
2618
/////////////////////////////////// _PyHamt_Type
2619
2620
2621
#ifdef Py_DEBUG
2622
static PyObject *
2623
hamt_dump(PyHamtObject *self);
2624
#endif
2625
2626
2627
static PyObject *
2628
hamt_tp_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
2629
{
2630
return (PyObject*)_PyHamt_New();
2631
}
2632
2633
static int
2634
hamt_tp_clear(PyHamtObject *self)
2635
{
2636
Py_CLEAR(self->h_root);
2637
return 0;
2638
}
2639
2640
2641
static int
2642
hamt_tp_traverse(PyHamtObject *self, visitproc visit, void *arg)
2643
{
2644
Py_VISIT(self->h_root);
2645
return 0;
2646
}
2647
2648
static void
2649
hamt_tp_dealloc(PyHamtObject *self)
2650
{
2651
if (self == _empty_hamt) {
2652
/* The empty one is statically allocated. */
2653
#ifdef Py_DEBUG
2654
_Py_FatalRefcountError("deallocating the empty hamt singleton");
2655
#else
2656
return;
2657
#endif
2658
}
2659
2660
PyObject_GC_UnTrack(self);
2661
if (self->h_weakreflist != NULL) {
2662
PyObject_ClearWeakRefs((PyObject*)self);
2663
}
2664
(void)hamt_tp_clear(self);
2665
Py_TYPE(self)->tp_free(self);
2666
}
2667
2668
2669
static PyObject *
2670
hamt_tp_richcompare(PyObject *v, PyObject *w, int op)
2671
{
2672
if (!PyHamt_Check(v) || !PyHamt_Check(w) || (op != Py_EQ && op != Py_NE)) {
2673
Py_RETURN_NOTIMPLEMENTED;
2674
}
2675
2676
int res = _PyHamt_Eq((PyHamtObject *)v, (PyHamtObject *)w);
2677
if (res < 0) {
2678
return NULL;
2679
}
2680
2681
if (op == Py_NE) {
2682
res = !res;
2683
}
2684
2685
if (res) {
2686
Py_RETURN_TRUE;
2687
}
2688
else {
2689
Py_RETURN_FALSE;
2690
}
2691
}
2692
2693
static int
2694
hamt_tp_contains(PyHamtObject *self, PyObject *key)
2695
{
2696
PyObject *val;
2697
return _PyHamt_Find(self, key, &val);
2698
}
2699
2700
static PyObject *
2701
hamt_tp_subscript(PyHamtObject *self, PyObject *key)
2702
{
2703
PyObject *val;
2704
hamt_find_t res = hamt_find(self, key, &val);
2705
switch (res) {
2706
case F_ERROR:
2707
return NULL;
2708
case F_FOUND:
2709
return Py_NewRef(val);
2710
case F_NOT_FOUND:
2711
PyErr_SetObject(PyExc_KeyError, key);
2712
return NULL;
2713
default:
2714
Py_UNREACHABLE();
2715
}
2716
}
2717
2718
static Py_ssize_t
2719
hamt_tp_len(PyHamtObject *self)
2720
{
2721
return _PyHamt_Len(self);
2722
}
2723
2724
static PyObject *
2725
hamt_tp_iter(PyHamtObject *self)
2726
{
2727
return _PyHamt_NewIterKeys(self);
2728
}
2729
2730
static PyObject *
2731
hamt_py_set(PyHamtObject *self, PyObject *args)
2732
{
2733
PyObject *key;
2734
PyObject *val;
2735
2736
if (!PyArg_UnpackTuple(args, "set", 2, 2, &key, &val)) {
2737
return NULL;
2738
}
2739
2740
return (PyObject *)_PyHamt_Assoc(self, key, val);
2741
}
2742
2743
static PyObject *
2744
hamt_py_get(PyHamtObject *self, PyObject *args)
2745
{
2746
PyObject *key;
2747
PyObject *def = NULL;
2748
2749
if (!PyArg_UnpackTuple(args, "get", 1, 2, &key, &def)) {
2750
return NULL;
2751
}
2752
2753
PyObject *val = NULL;
2754
hamt_find_t res = hamt_find(self, key, &val);
2755
switch (res) {
2756
case F_ERROR:
2757
return NULL;
2758
case F_FOUND:
2759
return Py_NewRef(val);
2760
case F_NOT_FOUND:
2761
if (def == NULL) {
2762
Py_RETURN_NONE;
2763
}
2764
return Py_NewRef(def);
2765
default:
2766
Py_UNREACHABLE();
2767
}
2768
}
2769
2770
static PyObject *
2771
hamt_py_delete(PyHamtObject *self, PyObject *key)
2772
{
2773
return (PyObject *)_PyHamt_Without(self, key);
2774
}
2775
2776
static PyObject *
2777
hamt_py_items(PyHamtObject *self, PyObject *args)
2778
{
2779
return _PyHamt_NewIterItems(self);
2780
}
2781
2782
static PyObject *
2783
hamt_py_values(PyHamtObject *self, PyObject *args)
2784
{
2785
return _PyHamt_NewIterValues(self);
2786
}
2787
2788
static PyObject *
2789
hamt_py_keys(PyHamtObject *self, PyObject *Py_UNUSED(args))
2790
{
2791
return _PyHamt_NewIterKeys(self);
2792
}
2793
2794
#ifdef Py_DEBUG
2795
static PyObject *
2796
hamt_py_dump(PyHamtObject *self, PyObject *Py_UNUSED(args))
2797
{
2798
return hamt_dump(self);
2799
}
2800
#endif
2801
2802
2803
static PyMethodDef PyHamt_methods[] = {
2804
{"set", _PyCFunction_CAST(hamt_py_set), METH_VARARGS, NULL},
2805
{"get", _PyCFunction_CAST(hamt_py_get), METH_VARARGS, NULL},
2806
{"delete", _PyCFunction_CAST(hamt_py_delete), METH_O, NULL},
2807
{"items", _PyCFunction_CAST(hamt_py_items), METH_NOARGS, NULL},
2808
{"keys", _PyCFunction_CAST(hamt_py_keys), METH_NOARGS, NULL},
2809
{"values", _PyCFunction_CAST(hamt_py_values), METH_NOARGS, NULL},
2810
#ifdef Py_DEBUG
2811
{"__dump__", _PyCFunction_CAST(hamt_py_dump), METH_NOARGS, NULL},
2812
#endif
2813
{NULL, NULL}
2814
};
2815
2816
static PySequenceMethods PyHamt_as_sequence = {
2817
0, /* sq_length */
2818
0, /* sq_concat */
2819
0, /* sq_repeat */
2820
0, /* sq_item */
2821
0, /* sq_slice */
2822
0, /* sq_ass_item */
2823
0, /* sq_ass_slice */
2824
(objobjproc)hamt_tp_contains, /* sq_contains */
2825
0, /* sq_inplace_concat */
2826
0, /* sq_inplace_repeat */
2827
};
2828
2829
static PyMappingMethods PyHamt_as_mapping = {
2830
(lenfunc)hamt_tp_len, /* mp_length */
2831
(binaryfunc)hamt_tp_subscript, /* mp_subscript */
2832
};
2833
2834
PyTypeObject _PyHamt_Type = {
2835
PyVarObject_HEAD_INIT(&PyType_Type, 0)
2836
"hamt",
2837
sizeof(PyHamtObject),
2838
.tp_methods = PyHamt_methods,
2839
.tp_as_mapping = &PyHamt_as_mapping,
2840
.tp_as_sequence = &PyHamt_as_sequence,
2841
.tp_iter = (getiterfunc)hamt_tp_iter,
2842
.tp_dealloc = (destructor)hamt_tp_dealloc,
2843
.tp_getattro = PyObject_GenericGetAttr,
2844
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
2845
.tp_richcompare = hamt_tp_richcompare,
2846
.tp_traverse = (traverseproc)hamt_tp_traverse,
2847
.tp_clear = (inquiry)hamt_tp_clear,
2848
.tp_new = hamt_tp_new,
2849
.tp_weaklistoffset = offsetof(PyHamtObject, h_weakreflist),
2850
.tp_hash = PyObject_HashNotImplemented,
2851
};
2852
2853
2854
/////////////////////////////////// Tree Node Types
2855
2856
2857
PyTypeObject _PyHamt_ArrayNode_Type = {
2858
PyVarObject_HEAD_INIT(&PyType_Type, 0)
2859
"hamt_array_node",
2860
sizeof(PyHamtNode_Array),
2861
0,
2862
.tp_dealloc = (destructor)hamt_node_array_dealloc,
2863
.tp_getattro = PyObject_GenericGetAttr,
2864
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
2865
.tp_traverse = (traverseproc)hamt_node_array_traverse,
2866
.tp_free = PyObject_GC_Del,
2867
.tp_hash = PyObject_HashNotImplemented,
2868
};
2869
2870
PyTypeObject _PyHamt_BitmapNode_Type = {
2871
PyVarObject_HEAD_INIT(&PyType_Type, 0)
2872
"hamt_bitmap_node",
2873
sizeof(PyHamtNode_Bitmap) - sizeof(PyObject *),
2874
sizeof(PyObject *),
2875
.tp_dealloc = (destructor)hamt_node_bitmap_dealloc,
2876
.tp_getattro = PyObject_GenericGetAttr,
2877
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
2878
.tp_traverse = (traverseproc)hamt_node_bitmap_traverse,
2879
.tp_free = PyObject_GC_Del,
2880
.tp_hash = PyObject_HashNotImplemented,
2881
};
2882
2883
PyTypeObject _PyHamt_CollisionNode_Type = {
2884
PyVarObject_HEAD_INIT(&PyType_Type, 0)
2885
"hamt_collision_node",
2886
sizeof(PyHamtNode_Collision) - sizeof(PyObject *),
2887
sizeof(PyObject *),
2888
.tp_dealloc = (destructor)hamt_node_collision_dealloc,
2889
.tp_getattro = PyObject_GenericGetAttr,
2890
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
2891
.tp_traverse = (traverseproc)hamt_node_collision_traverse,
2892
.tp_free = PyObject_GC_Del,
2893
.tp_hash = PyObject_HashNotImplemented,
2894
};
2895
2896