Path: blob/master/src/sage/graphs/base/static_sparse_graph.pyx
8815 views
r"""1Static Sparse Graphs23What is the point ?4-------------------56This class implements a Cython (di)graph structure made for efficiency. The7graphs are *static*, i.e. no add/remove vertex/edges methods are available, nor8can they easily or efficiently be implemented within this data structure.910The data structure, however, is made to save the maximum amount of computations11for graph algorithms whose main operation is to *list the out-neighbours of a12vertex* (which is precisely what BFS, DFS, distance computations and the13flow-related stuff waste their life on).1415The code contained in this module is written C-style. While Sage needs a class16for static graphs (not available today, i.e. 2012-01-13) it is not what we try17to address here. The purpose is efficiency and simplicity.1819Author:2021- Nathann Cohen (2011)2223Data structure24--------------2526.. image:: ../../../media/structure.png2728The data structure is actually pretty simple and compact. ``short_digraph`` has29five fields3031* ``n`` (``int``) -- the number of vertices in the graph.3233* ``m`` (``int``) -- the number of edges in the graph.3435* ``edges`` (``uint32_t *``) -- array whose length is the number of edges of36the graph.3738* ``neighbors`` (``uint32_t **``) -- this array has size `n+1`, and39describes how the data of ``edges`` should be read : the neighbors of40vertex `i` are the elements of ``edges`` addressed by41``neighbors[i]...neighbors[i+1]-1``. The element ``neighbors[n]``, which42corresponds to no vertex (they are numbered from `0` to `n-1`) is present43so that it remains easy to enumerate the neighbors of vertex `n-1` : the44last of them is the element addressed by ``neighbors[n]-1``.4546* ``edge_labels`` -- this cython list associates a label to each edge of the47graph. If a given edge is represented by ``edges[i]``, this its associated48label can be found at ``edge_labels[i]``. This object is usually NULL,49unless the call to ``init_short_digraph`` explicitly requires the labels50to be stored in the data structure.5152In the example given above, vertex 0 has 2,3,5,7,8 and 9 as out-neighbors, but53not 4, which is an out-neighbour of vertex 1. Vertex `n-1` has 2, 5, 8 and 9 as54out-neighbors. `\text{neighbors[n]}` points toward the cell immediately *after*55the end of `\text{edges}`, hence *outside of the allocated memory*. It is used56to indicate the end of the outneighbors of vertex `n-1`5758**Iterating over the edges**5960This is *the one thing* to have in mind when working with this data structure::6162cdef list_edges(short_digraph g):63cdef int i, j64for i in range(g.n):65for j in range(g.neighbors[i+1]-g.neighbors[i]):66print "There is an edge from",str(i),"to",g.neighbors[i][j]6768**Advantages**6970Two great points :7172* The neighbors of a vertex are C types, and are contiguous in memory.73* Storing such graphs is incredibly cheaper than storing Python structures.7475Well, I think it would be hard to have anything more efficient than that to76enumerate out-neighbors in sparse graphs ! :-)7778Technical details79-----------------8081* When creating a ``fast_digraph`` from a ``Graph`` or ``DiGraph`` named82``G``, the `i^{\text{th}}` vertex corresponds to ``G.vertices()[i]``8384* Some methods return ``bitset_t`` objets when lists could be85expected. There is a very useful ``bitset_list`` function for this kind of86problems :-)8788* When the edges are labelled, most of the space taken by this graph is89taken by edge labels. If no edge is labelled then this space is not90allocated, but if *any* edge has a label then a (possibly empty) label is91stored for each edge, which can double the memory needs.9293* The data structure stores the number of edges, even though it appears that94this number can be reconstructed with95``g.neighbors[n]-g.neighbors[0]``. The trick is that not all elements of96the ``g.edges`` array are necessarily used : when an undirected graph97contains loops, only one entry of the array of size `2m` is used to store98it, instead of the expected two. Storing the number of edges is the only99way to avoid an uselessly costly computation to obtain the number of edges100of an undirected, looped, AND labelled graph (think of several loops on101the same vertex with different labels).102103* The codes of this module are well documented, and many answers can be104found directly in the code.105106Cython functions107----------------108109.. csv-table::110:class: contentstable111:widths: 30, 70112:delim: |113114``init_short_digraph(short_digraph g, G)`` | Initializes ``short_digraph g`` from a Sage (Di)Graph.115``int n_edges(short_digraph g)`` | Returns the number of edges in ``g``116``int out_degree(short_digraph g, int i)`` | Returns the out-degree of vertex `i` in ``g``117``has_edge(short_digraph g, int u, int v)`` | Tests the existence of an edge.118``edge_label(short_digraph g, int * edge)`` | Returns the label associated with a given edge119``init_empty_copy(short_digraph dst, short_digraph src)`` | Allocates ``dst`` so that it can contain as many vertices and edges as ``src``.120``init_reverse(short_digraph dst, short_digraph src)`` | Initializes ``dst`` to a copy of ``src`` with all edges in the opposite direction.121``free_short_digraph(short_digraph g)`` | Free the ressources used by ``g``122123**Connectivity**124125``can_be_reached_from(short_digraph g, int src, bitset_t reached)``126127Assuming ``bitset_t reached`` has size at least ``g.n``, this method updates128``reached`` so that it represents the set of vertices that can be reached129from ``src`` in ``g``.130131``strongly_connected_component_containing_vertex(short_digraph g, short_digraph g_reversed, int v, bitset_t scc)``132133Assuming ``bitset_t reached`` has size at least ``g.n``, this method updates134``scc`` so that it represents the vertices of the strongly connected135component containing ``v`` in ``g``. The variable ``g_reversed`` is assumed136to represent the reverse of ``g``.137138What is this module used for ?139------------------------------140141At the moment, it is only used in the :mod:`sage.graphs.distances_all_pairs` module.142143Python functions144----------------145146These functions are available so that Python modules from Sage can call the147Cython routines this module implements (as they can not directly call methods148with C arguments).149"""150include "sage/misc/bitset.pxi"151cimport cpython152153##############################################################################154# Copyright (C) 2010 Nathann Cohen <[email protected]>155# Distributed under the terms of the GNU General Public License (GPL)156# The full text of the GPL is available at:157# http://www.gnu.org/licenses/158##############################################################################159160from sage.graphs.base.c_graph cimport CGraph161from libc.stdint cimport INT32_MAX162163cdef int init_short_digraph(short_digraph g, G, edge_labelled = False) except -1:164r"""165Initializes ``short_digraph g`` from a Sage (Di)Graph.166167If ``G`` is a ``Graph`` objet (and not a ``DiGraph``), an edge between two168vertices `u` and `v` is replaced by two arcs in both directions.169"""170g.edge_labels = NULL171172if G.order() >= INT32_MAX:173raise ValueError("This structure can handle at most "+str(INT32_MAX)+" vertices !")174else:175g.n = G.order()176177cdef int isdigraph178179from sage.graphs.all import Graph, DiGraph180181if isinstance(G, DiGraph):182isdigraph = 1183elif isinstance(G, Graph):184isdigraph = 0185else:186raise ValueError("The source graph must be either a DiGraph or a Graph object !")187188cdef list vertices = G.vertices()189cdef dict v_to_id = {}190cdef int i,j,v_id191cdef list neighbor_label192cdef list edge_labels193194g.m = G.size()195cdef int n_edges = g.m if isdigraph else 2*g.m196197for i, v in enumerate(vertices):198v_to_id[v] = i199200g.edges = <uint32_t *> sage_malloc(n_edges*sizeof(uint32_t))201if g.edges == NULL:202raise ValueError("Problem while allocating memory (edges)")203204g.neighbors = <uint32_t **> sage_malloc((1+<int>g.n)*sizeof(uint32_t *))205if g.neighbors == NULL:206raise ValueError("Problem while allocating memory (neighbors)")207208# Initializing the value of neighbors209g.neighbors[0] = g.edges210cdef CGraph cg = <CGraph> G._backend211212if not G.has_loops():213# Normal case214for i in range(1,(<int>g.n)+1):215g.neighbors[i] = g.neighbors[i-1] + <int> (cg.out_degree(vertices[i-1]) if isdigraph else G.degree(vertices[i-1]))216else:217# In the presence of loops. For a funny reason, if a vertex v has a loop218# attached to it and no other incident edge, Sage declares that it has219# degree 2. This way, the sum of the degrees of the vertices is twice220# the number of edges, but then the degree of a vertex is not the number221# of its neighbors anymore. One should never try to think. It never ends222# well.223for i in range(1,(<int>g.n)+1):224g.neighbors[i] = g.neighbors[i-1] + <int> len(G.edges_incident(vertices[i-1]))225226if not edge_labelled:227for u,v in G.edge_iterator(labels = False):228i = v_to_id[u]229j = v_to_id[v]230231g.neighbors[i][0] = j232g.neighbors[i] += 1233234if not isdigraph and i!=j:235g.neighbors[j][0] = i236g.neighbors[j] += 1237238# Reinitializing the value of neighbors239for g.n> i >0:240g.neighbors[i] = g.neighbors[i-1]241242g.neighbors[0] = g.edges243244# Sorting the neighbors245for i in range(g.n):246qsort(g.neighbors[i],g.neighbors[i+1]-g.neighbors[i],sizeof(int),compare_uint32_p)247248else:249edge_labels = [None]*n_edges250for v in G:251neighbor_label = [(v_to_id[uu],l) if uu != v else (v_to_id[u],l)252for u,uu,l in G.edges_incident(v)]253neighbor_label.sort()254v_id = v_to_id[v]255256for i,(j,label) in enumerate(neighbor_label):257g.neighbors[v_id][i] = j258edge_labels[(g.neighbors[v_id]+i)-g.edges] = label259260g.edge_labels = <PyObject *> <void *> edge_labels261cpython.Py_XINCREF(g.edge_labels)262263cdef inline int n_edges(short_digraph g):264# The number of edges is nothing but a difference of pointers265return <int> (g.neighbors[g.n]-g.edges)266267cdef inline int out_degree(short_digraph g, int i):268# The out-degree is nothing but a difference of pointers269return <int> (g.neighbors[i+1]-g.neighbors[i])270271cdef int init_empty_copy(short_digraph dst, short_digraph src) except -1:272dst.n = src.n273dst.m = src.m274dst.edge_labels = NULL275cdef list edge_labels276277dst.edges = <uint32_t *> sage_malloc(n_edges(src)*sizeof(uint32_t))278if dst.edges == NULL:279raise ValueError("Problem while allocating memory (edges)")280281dst.neighbors = <uint32_t **> sage_malloc((src.n+1)*sizeof(uint32_t *))282if dst.neighbors == NULL:283raise ValueError("Problem while allocating memory (neighbors)")284285if src.edge_labels != NULL:286edge_labels = [None]*n_edges(src)287dst.edge_labels = <PyObject *> <void *> edge_labels288cpython.Py_XINCREF(dst.edge_labels)289290cdef int init_reverse(short_digraph dst, short_digraph src) except -1:291cdef int i,j,v292# Allocates memory for dst293init_empty_copy(dst, src)294295# Avoiding a later segfault296if dst.n == 0:297return 0298299#### 1/3300#301# In a first pass, we count the in-degrees of each vertex and store it in a302# vector. With this information, we can initialize dst.neighbors to its303# correct value. The content of dst.edges is not touched at this level.304305cdef int * in_degree = <int *> sage_malloc(src.n*sizeof(int))306if in_degree == NULL:307raise ValueError("Problem while allocating memory (in_degree)")308309# Counting the degrees310memset(in_degree, 0, src.n*sizeof(int))311312for i in range(n_edges(src)):313in_degree[src.edges[i]] += 1314315# Updating dst.neighbors316dst.neighbors[0] = dst.edges317for i in range(1, src.n+1):318dst.neighbors[i] = dst.neighbors[i-1] + in_degree[i-1]319sage_free(in_degree)320321#### 2/3322#323# Second pass : we list the edges again, and add them in dst.edges. Doing324# so, we will change the value of dst.neighbors, but that is not so bad as325# we can fix it afterwards.326for i in range(0, src.n):327for j in range(out_degree(src,i)):328v = src.neighbors[i][j]329dst.neighbors[v][0] = i330331if dst.edge_labels != NULL:332(<list> dst.edge_labels)[dst.neighbors[v]-dst.edges] = edge_label(src,src.neighbors[i]+j)333334dst.neighbors[v] += 1335336#### 3/3337#338# Final step : set the correct values of dst.neighbors again. It is easy, as339# the correct value of dst.neighbors[i] is actually dst.neighbors[i-1]340for src.n> i >0:341dst.neighbors[i] = dst.neighbors[i-1]342dst.neighbors[0] = dst.edges343344return 0345346cdef int compare_uint32_p(const_void *a, const_void *b):347return (<uint32_t *> a)[0] - (<uint32_t *> b)[0]348349cdef inline uint32_t * has_edge(short_digraph g, int u, int v):350r"""351Tests the existence of an edge.352353Assumes that the neighbors of each vertex are sorted.354"""355return <uint32_t *> bsearch(&v, g.neighbors[u], g.neighbors[u+1]-g.neighbors[u], sizeof(uint32_t), compare_uint32_p)356357cdef inline object edge_label(short_digraph g, uint32_t * edge):358r"""359Returns the label associated with a given edge360"""361if g.edge_labels == NULL:362return None363else:364return (<list> g.edge_labels)[edge-g.edges]365366cdef int can_be_reached_from(short_digraph g, int src, bitset_t reached) except -1:367if g.n == 0:368return 0369370# Initializing the set of vertices reached by setting only bit src371bitset_set_first_n(reached, 0)372bitset_add(reached, src)373374# We will be doing a Depth-First Search. We allocate the stack we need for375# that, and put "src" on top of it.376cdef int * stack = <int *> sage_malloc(g.n*sizeof(int))377if stack == NULL:378raise ValueError("Problem while allocating memory (stack)")379380stack[0] = src381cdef int stack_size = 1382383# What we need to iterate over the edges384cdef int i385cdef uint32_t * v386cdef uint32_t * end387388# Plain old DFS ...389#390#If there is something left on the stack, we remove it consider each of its391# neighbors. If we find any which has not been reached yet, we set its392# corresponding bit in the reached bitset, and add it on top of the stack.393394while stack_size:395stack_size -= 1396i = stack[stack_size]397398v = g.neighbors[i]399end = g.neighbors[i+1]400401while v < end:402if not bitset_in(reached, v[0]):403bitset_add(reached, v[0])404stack[stack_size] = v[0]405stack_size += 1406407v += 1408409sage_free(stack)410411cdef strongly_connected_component_containing_vertex(short_digraph g, short_digraph g_reversed, int v, bitset_t scc):412413# Computing the set of vertices that can be reached from v in g414can_be_reached_from(g, v, scc)415# Computing the set of vertices that can be reached from v in g *reversed*416cdef bitset_t scc_reversed417bitset_init(scc_reversed, g.n)418can_be_reached_from(g_reversed, v, scc_reversed)419# The scc containing v is the intersection of both sets420bitset_intersection(scc, scc, scc_reversed)421422cdef void free_short_digraph(short_digraph g):423if g.edges != NULL:424sage_free(g.edges)425426if g.neighbors != NULL:427sage_free(g.neighbors)428429if g.edge_labels != NULL:430cpython.Py_XDECREF(g.edge_labels)431432def strongly_connected_components(G):433r"""434Returns the strongly connected components of the given DiGraph.435436INPUT:437438- ``G`` -- a DiGraph.439440.. NOTE::441442This method has been written as an attempt to solve the slowness443reported in :trac:`12235`. It is not the one used by444:meth:`sage.graphs.digraph.DiGraph.strongly_connected_components` as445saving some time on the computation of the strongly connected components446is not worth copying the whole graph, but it is a nice way to test this447module's functions. It is also tested in the doctest or448:meth:`sage.graphs.digraph.DiGraph.strongly_connected_components`.449450EXAMPLE::451452sage: from sage.graphs.base.static_sparse_graph import strongly_connected_components453sage: g = digraphs.ButterflyGraph(2)454sage: strongly_connected_components(g)455[[('00', 0)], [('00', 1)], [('00', 2)], [('01', 0)], [('01', 1)], [('01', 2)],456[('10', 0)], [('10', 1)], [('10', 2)], [('11', 0)], [('11', 1)], [('11', 2)]]457"""458459if G.order() == 0:460return [[]]461462# To compute the connected component containing a given vertex v, we take463# the intersection of the set of vertices that can be reached from v in G464# and the set of vertices that can be reached from v in G reversed.465#466# That's all that happens here.467468cdef list answer = []469cdef list vertices = G.vertices()470cdef short_digraph g, gr471472init_short_digraph(g, G)473init_reverse(gr, g)474475cdef bitset_t seen476bitset_init(seen, g.n)477bitset_set_first_n(seen, 0)478479cdef bitset_t scc480bitset_init(scc, g.n)481bitset_set_first_n(scc, 0)482483cdef int v484while bitset_len(seen) < g.n:485v = bitset_first_in_complement(seen)486strongly_connected_component_containing_vertex(g, gr, v, scc)487answer.append([vertices[i] for i in bitset_list(scc)])488bitset_union(seen, seen, scc)489490bitset_free(seen)491bitset_free(scc)492free_short_digraph(g)493free_short_digraph(gr)494return answer495496497