Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
sagemath
GitHub Repository: sagemath/sagesmc
Path: blob/master/src/sage/graphs/base/static_sparse_graph.pyx
8815 views
1
r"""
2
Static Sparse Graphs
3
4
What is the point ?
5
-------------------
6
7
This class implements a Cython (di)graph structure made for efficiency. The
8
graphs are *static*, i.e. no add/remove vertex/edges methods are available, nor
9
can they easily or efficiently be implemented within this data structure.
10
11
The data structure, however, is made to save the maximum amount of computations
12
for graph algorithms whose main operation is to *list the out-neighbours of a
13
vertex* (which is precisely what BFS, DFS, distance computations and the
14
flow-related stuff waste their life on).
15
16
The code contained in this module is written C-style. While Sage needs a class
17
for static graphs (not available today, i.e. 2012-01-13) it is not what we try
18
to address here. The purpose is efficiency and simplicity.
19
20
Author:
21
22
- Nathann Cohen (2011)
23
24
Data structure
25
--------------
26
27
.. image:: ../../../media/structure.png
28
29
The data structure is actually pretty simple and compact. ``short_digraph`` has
30
five fields
31
32
* ``n`` (``int``) -- the number of vertices in the graph.
33
34
* ``m`` (``int``) -- the number of edges in the graph.
35
36
* ``edges`` (``uint32_t *``) -- array whose length is the number of edges of
37
the graph.
38
39
* ``neighbors`` (``uint32_t **``) -- this array has size `n+1`, and
40
describes how the data of ``edges`` should be read : the neighbors of
41
vertex `i` are the elements of ``edges`` addressed by
42
``neighbors[i]...neighbors[i+1]-1``. The element ``neighbors[n]``, which
43
corresponds to no vertex (they are numbered from `0` to `n-1`) is present
44
so that it remains easy to enumerate the neighbors of vertex `n-1` : the
45
last of them is the element addressed by ``neighbors[n]-1``.
46
47
* ``edge_labels`` -- this cython list associates a label to each edge of the
48
graph. If a given edge is represented by ``edges[i]``, this its associated
49
label can be found at ``edge_labels[i]``. This object is usually NULL,
50
unless the call to ``init_short_digraph`` explicitly requires the labels
51
to be stored in the data structure.
52
53
In the example given above, vertex 0 has 2,3,5,7,8 and 9 as out-neighbors, but
54
not 4, which is an out-neighbour of vertex 1. Vertex `n-1` has 2, 5, 8 and 9 as
55
out-neighbors. `\text{neighbors[n]}` points toward the cell immediately *after*
56
the end of `\text{edges}`, hence *outside of the allocated memory*. It is used
57
to indicate the end of the outneighbors of vertex `n-1`
58
59
**Iterating over the edges**
60
61
This is *the one thing* to have in mind when working with this data structure::
62
63
cdef list_edges(short_digraph g):
64
cdef int i, j
65
for i in range(g.n):
66
for j in range(g.neighbors[i+1]-g.neighbors[i]):
67
print "There is an edge from",str(i),"to",g.neighbors[i][j]
68
69
**Advantages**
70
71
Two great points :
72
73
* The neighbors of a vertex are C types, and are contiguous in memory.
74
* Storing such graphs is incredibly cheaper than storing Python structures.
75
76
Well, I think it would be hard to have anything more efficient than that to
77
enumerate out-neighbors in sparse graphs ! :-)
78
79
Technical details
80
-----------------
81
82
* When creating a ``fast_digraph`` from a ``Graph`` or ``DiGraph`` named
83
``G``, the `i^{\text{th}}` vertex corresponds to ``G.vertices()[i]``
84
85
* Some methods return ``bitset_t`` objets when lists could be
86
expected. There is a very useful ``bitset_list`` function for this kind of
87
problems :-)
88
89
* When the edges are labelled, most of the space taken by this graph is
90
taken by edge labels. If no edge is labelled then this space is not
91
allocated, but if *any* edge has a label then a (possibly empty) label is
92
stored for each edge, which can double the memory needs.
93
94
* The data structure stores the number of edges, even though it appears that
95
this number can be reconstructed with
96
``g.neighbors[n]-g.neighbors[0]``. The trick is that not all elements of
97
the ``g.edges`` array are necessarily used : when an undirected graph
98
contains loops, only one entry of the array of size `2m` is used to store
99
it, instead of the expected two. Storing the number of edges is the only
100
way to avoid an uselessly costly computation to obtain the number of edges
101
of an undirected, looped, AND labelled graph (think of several loops on
102
the same vertex with different labels).
103
104
* The codes of this module are well documented, and many answers can be
105
found directly in the code.
106
107
Cython functions
108
----------------
109
110
.. csv-table::
111
:class: contentstable
112
:widths: 30, 70
113
:delim: |
114
115
``init_short_digraph(short_digraph g, G)`` | Initializes ``short_digraph g`` from a Sage (Di)Graph.
116
``int n_edges(short_digraph g)`` | Returns the number of edges in ``g``
117
``int out_degree(short_digraph g, int i)`` | Returns the out-degree of vertex `i` in ``g``
118
``has_edge(short_digraph g, int u, int v)`` | Tests the existence of an edge.
119
``edge_label(short_digraph g, int * edge)`` | Returns the label associated with a given edge
120
``init_empty_copy(short_digraph dst, short_digraph src)`` | Allocates ``dst`` so that it can contain as many vertices and edges as ``src``.
121
``init_reverse(short_digraph dst, short_digraph src)`` | Initializes ``dst`` to a copy of ``src`` with all edges in the opposite direction.
122
``free_short_digraph(short_digraph g)`` | Free the ressources used by ``g``
123
124
**Connectivity**
125
126
``can_be_reached_from(short_digraph g, int src, bitset_t reached)``
127
128
Assuming ``bitset_t reached`` has size at least ``g.n``, this method updates
129
``reached`` so that it represents the set of vertices that can be reached
130
from ``src`` in ``g``.
131
132
``strongly_connected_component_containing_vertex(short_digraph g, short_digraph g_reversed, int v, bitset_t scc)``
133
134
Assuming ``bitset_t reached`` has size at least ``g.n``, this method updates
135
``scc`` so that it represents the vertices of the strongly connected
136
component containing ``v`` in ``g``. The variable ``g_reversed`` is assumed
137
to represent the reverse of ``g``.
138
139
What is this module used for ?
140
------------------------------
141
142
At the moment, it is only used in the :mod:`sage.graphs.distances_all_pairs` module.
143
144
Python functions
145
----------------
146
147
These functions are available so that Python modules from Sage can call the
148
Cython routines this module implements (as they can not directly call methods
149
with C arguments).
150
"""
151
include "sage/misc/bitset.pxi"
152
cimport cpython
153
154
##############################################################################
155
# Copyright (C) 2010 Nathann Cohen <[email protected]>
156
# Distributed under the terms of the GNU General Public License (GPL)
157
# The full text of the GPL is available at:
158
# http://www.gnu.org/licenses/
159
##############################################################################
160
161
from sage.graphs.base.c_graph cimport CGraph
162
from libc.stdint cimport INT32_MAX
163
164
cdef int init_short_digraph(short_digraph g, G, edge_labelled = False) except -1:
165
r"""
166
Initializes ``short_digraph g`` from a Sage (Di)Graph.
167
168
If ``G`` is a ``Graph`` objet (and not a ``DiGraph``), an edge between two
169
vertices `u` and `v` is replaced by two arcs in both directions.
170
"""
171
g.edge_labels = NULL
172
173
if G.order() >= INT32_MAX:
174
raise ValueError("This structure can handle at most "+str(INT32_MAX)+" vertices !")
175
else:
176
g.n = G.order()
177
178
cdef int isdigraph
179
180
from sage.graphs.all import Graph, DiGraph
181
182
if isinstance(G, DiGraph):
183
isdigraph = 1
184
elif isinstance(G, Graph):
185
isdigraph = 0
186
else:
187
raise ValueError("The source graph must be either a DiGraph or a Graph object !")
188
189
cdef list vertices = G.vertices()
190
cdef dict v_to_id = {}
191
cdef int i,j,v_id
192
cdef list neighbor_label
193
cdef list edge_labels
194
195
g.m = G.size()
196
cdef int n_edges = g.m if isdigraph else 2*g.m
197
198
for i, v in enumerate(vertices):
199
v_to_id[v] = i
200
201
g.edges = <uint32_t *> sage_malloc(n_edges*sizeof(uint32_t))
202
if g.edges == NULL:
203
raise ValueError("Problem while allocating memory (edges)")
204
205
g.neighbors = <uint32_t **> sage_malloc((1+<int>g.n)*sizeof(uint32_t *))
206
if g.neighbors == NULL:
207
raise ValueError("Problem while allocating memory (neighbors)")
208
209
# Initializing the value of neighbors
210
g.neighbors[0] = g.edges
211
cdef CGraph cg = <CGraph> G._backend
212
213
if not G.has_loops():
214
# Normal case
215
for i in range(1,(<int>g.n)+1):
216
g.neighbors[i] = g.neighbors[i-1] + <int> (cg.out_degree(vertices[i-1]) if isdigraph else G.degree(vertices[i-1]))
217
else:
218
# In the presence of loops. For a funny reason, if a vertex v has a loop
219
# attached to it and no other incident edge, Sage declares that it has
220
# degree 2. This way, the sum of the degrees of the vertices is twice
221
# the number of edges, but then the degree of a vertex is not the number
222
# of its neighbors anymore. One should never try to think. It never ends
223
# well.
224
for i in range(1,(<int>g.n)+1):
225
g.neighbors[i] = g.neighbors[i-1] + <int> len(G.edges_incident(vertices[i-1]))
226
227
if not edge_labelled:
228
for u,v in G.edge_iterator(labels = False):
229
i = v_to_id[u]
230
j = v_to_id[v]
231
232
g.neighbors[i][0] = j
233
g.neighbors[i] += 1
234
235
if not isdigraph and i!=j:
236
g.neighbors[j][0] = i
237
g.neighbors[j] += 1
238
239
# Reinitializing the value of neighbors
240
for g.n> i >0:
241
g.neighbors[i] = g.neighbors[i-1]
242
243
g.neighbors[0] = g.edges
244
245
# Sorting the neighbors
246
for i in range(g.n):
247
qsort(g.neighbors[i],g.neighbors[i+1]-g.neighbors[i],sizeof(int),compare_uint32_p)
248
249
else:
250
edge_labels = [None]*n_edges
251
for v in G:
252
neighbor_label = [(v_to_id[uu],l) if uu != v else (v_to_id[u],l)
253
for u,uu,l in G.edges_incident(v)]
254
neighbor_label.sort()
255
v_id = v_to_id[v]
256
257
for i,(j,label) in enumerate(neighbor_label):
258
g.neighbors[v_id][i] = j
259
edge_labels[(g.neighbors[v_id]+i)-g.edges] = label
260
261
g.edge_labels = <PyObject *> <void *> edge_labels
262
cpython.Py_XINCREF(g.edge_labels)
263
264
cdef inline int n_edges(short_digraph g):
265
# The number of edges is nothing but a difference of pointers
266
return <int> (g.neighbors[g.n]-g.edges)
267
268
cdef inline int out_degree(short_digraph g, int i):
269
# The out-degree is nothing but a difference of pointers
270
return <int> (g.neighbors[i+1]-g.neighbors[i])
271
272
cdef int init_empty_copy(short_digraph dst, short_digraph src) except -1:
273
dst.n = src.n
274
dst.m = src.m
275
dst.edge_labels = NULL
276
cdef list edge_labels
277
278
dst.edges = <uint32_t *> sage_malloc(n_edges(src)*sizeof(uint32_t))
279
if dst.edges == NULL:
280
raise ValueError("Problem while allocating memory (edges)")
281
282
dst.neighbors = <uint32_t **> sage_malloc((src.n+1)*sizeof(uint32_t *))
283
if dst.neighbors == NULL:
284
raise ValueError("Problem while allocating memory (neighbors)")
285
286
if src.edge_labels != NULL:
287
edge_labels = [None]*n_edges(src)
288
dst.edge_labels = <PyObject *> <void *> edge_labels
289
cpython.Py_XINCREF(dst.edge_labels)
290
291
cdef int init_reverse(short_digraph dst, short_digraph src) except -1:
292
cdef int i,j,v
293
# Allocates memory for dst
294
init_empty_copy(dst, src)
295
296
# Avoiding a later segfault
297
if dst.n == 0:
298
return 0
299
300
#### 1/3
301
#
302
# In a first pass, we count the in-degrees of each vertex and store it in a
303
# vector. With this information, we can initialize dst.neighbors to its
304
# correct value. The content of dst.edges is not touched at this level.
305
306
cdef int * in_degree = <int *> sage_malloc(src.n*sizeof(int))
307
if in_degree == NULL:
308
raise ValueError("Problem while allocating memory (in_degree)")
309
310
# Counting the degrees
311
memset(in_degree, 0, src.n*sizeof(int))
312
313
for i in range(n_edges(src)):
314
in_degree[src.edges[i]] += 1
315
316
# Updating dst.neighbors
317
dst.neighbors[0] = dst.edges
318
for i in range(1, src.n+1):
319
dst.neighbors[i] = dst.neighbors[i-1] + in_degree[i-1]
320
sage_free(in_degree)
321
322
#### 2/3
323
#
324
# Second pass : we list the edges again, and add them in dst.edges. Doing
325
# so, we will change the value of dst.neighbors, but that is not so bad as
326
# we can fix it afterwards.
327
for i in range(0, src.n):
328
for j in range(out_degree(src,i)):
329
v = src.neighbors[i][j]
330
dst.neighbors[v][0] = i
331
332
if dst.edge_labels != NULL:
333
(<list> dst.edge_labels)[dst.neighbors[v]-dst.edges] = edge_label(src,src.neighbors[i]+j)
334
335
dst.neighbors[v] += 1
336
337
#### 3/3
338
#
339
# Final step : set the correct values of dst.neighbors again. It is easy, as
340
# the correct value of dst.neighbors[i] is actually dst.neighbors[i-1]
341
for src.n> i >0:
342
dst.neighbors[i] = dst.neighbors[i-1]
343
dst.neighbors[0] = dst.edges
344
345
return 0
346
347
cdef int compare_uint32_p(const_void *a, const_void *b):
348
return (<uint32_t *> a)[0] - (<uint32_t *> b)[0]
349
350
cdef inline uint32_t * has_edge(short_digraph g, int u, int v):
351
r"""
352
Tests the existence of an edge.
353
354
Assumes that the neighbors of each vertex are sorted.
355
"""
356
return <uint32_t *> bsearch(&v, g.neighbors[u], g.neighbors[u+1]-g.neighbors[u], sizeof(uint32_t), compare_uint32_p)
357
358
cdef inline object edge_label(short_digraph g, uint32_t * edge):
359
r"""
360
Returns the label associated with a given edge
361
"""
362
if g.edge_labels == NULL:
363
return None
364
else:
365
return (<list> g.edge_labels)[edge-g.edges]
366
367
cdef int can_be_reached_from(short_digraph g, int src, bitset_t reached) except -1:
368
if g.n == 0:
369
return 0
370
371
# Initializing the set of vertices reached by setting only bit src
372
bitset_set_first_n(reached, 0)
373
bitset_add(reached, src)
374
375
# We will be doing a Depth-First Search. We allocate the stack we need for
376
# that, and put "src" on top of it.
377
cdef int * stack = <int *> sage_malloc(g.n*sizeof(int))
378
if stack == NULL:
379
raise ValueError("Problem while allocating memory (stack)")
380
381
stack[0] = src
382
cdef int stack_size = 1
383
384
# What we need to iterate over the edges
385
cdef int i
386
cdef uint32_t * v
387
cdef uint32_t * end
388
389
# Plain old DFS ...
390
#
391
#If there is something left on the stack, we remove it consider each of its
392
# neighbors. If we find any which has not been reached yet, we set its
393
# corresponding bit in the reached bitset, and add it on top of the stack.
394
395
while stack_size:
396
stack_size -= 1
397
i = stack[stack_size]
398
399
v = g.neighbors[i]
400
end = g.neighbors[i+1]
401
402
while v < end:
403
if not bitset_in(reached, v[0]):
404
bitset_add(reached, v[0])
405
stack[stack_size] = v[0]
406
stack_size += 1
407
408
v += 1
409
410
sage_free(stack)
411
412
cdef strongly_connected_component_containing_vertex(short_digraph g, short_digraph g_reversed, int v, bitset_t scc):
413
414
# Computing the set of vertices that can be reached from v in g
415
can_be_reached_from(g, v, scc)
416
# Computing the set of vertices that can be reached from v in g *reversed*
417
cdef bitset_t scc_reversed
418
bitset_init(scc_reversed, g.n)
419
can_be_reached_from(g_reversed, v, scc_reversed)
420
# The scc containing v is the intersection of both sets
421
bitset_intersection(scc, scc, scc_reversed)
422
423
cdef void free_short_digraph(short_digraph g):
424
if g.edges != NULL:
425
sage_free(g.edges)
426
427
if g.neighbors != NULL:
428
sage_free(g.neighbors)
429
430
if g.edge_labels != NULL:
431
cpython.Py_XDECREF(g.edge_labels)
432
433
def strongly_connected_components(G):
434
r"""
435
Returns the strongly connected components of the given DiGraph.
436
437
INPUT:
438
439
- ``G`` -- a DiGraph.
440
441
.. NOTE::
442
443
This method has been written as an attempt to solve the slowness
444
reported in :trac:`12235`. It is not the one used by
445
:meth:`sage.graphs.digraph.DiGraph.strongly_connected_components` as
446
saving some time on the computation of the strongly connected components
447
is not worth copying the whole graph, but it is a nice way to test this
448
module's functions. It is also tested in the doctest or
449
:meth:`sage.graphs.digraph.DiGraph.strongly_connected_components`.
450
451
EXAMPLE::
452
453
sage: from sage.graphs.base.static_sparse_graph import strongly_connected_components
454
sage: g = digraphs.ButterflyGraph(2)
455
sage: strongly_connected_components(g)
456
[[('00', 0)], [('00', 1)], [('00', 2)], [('01', 0)], [('01', 1)], [('01', 2)],
457
[('10', 0)], [('10', 1)], [('10', 2)], [('11', 0)], [('11', 1)], [('11', 2)]]
458
"""
459
460
if G.order() == 0:
461
return [[]]
462
463
# To compute the connected component containing a given vertex v, we take
464
# the intersection of the set of vertices that can be reached from v in G
465
# and the set of vertices that can be reached from v in G reversed.
466
#
467
# That's all that happens here.
468
469
cdef list answer = []
470
cdef list vertices = G.vertices()
471
cdef short_digraph g, gr
472
473
init_short_digraph(g, G)
474
init_reverse(gr, g)
475
476
cdef bitset_t seen
477
bitset_init(seen, g.n)
478
bitset_set_first_n(seen, 0)
479
480
cdef bitset_t scc
481
bitset_init(scc, g.n)
482
bitset_set_first_n(scc, 0)
483
484
cdef int v
485
while bitset_len(seen) < g.n:
486
v = bitset_first_in_complement(seen)
487
strongly_connected_component_containing_vertex(g, gr, v, scc)
488
answer.append([vertices[i] for i in bitset_list(scc)])
489
bitset_union(seen, seen, scc)
490
491
bitset_free(seen)
492
bitset_free(scc)
493
free_short_digraph(g)
494
free_short_digraph(gr)
495
return answer
496
497