Possibly all the ways to get loop-finding in graphs wrong

(greenend.org.uk)

230 points | by matt_d 126 days ago

13 comments

xelxebar 126 days ago
First time I've heard of the disjoint-set data structure. For a DSF implementation, Wikipedia [0] gives the wildest asymptotic complexity I've seen, using the inverse Ackermann function! "Almost constant time" indeed.
[0]:https://en.wikipedia.org/wiki/Disjoint-set_data_structure
[-]
- kragen 126 days ago
  more commonly it's called union-find; it came up on here last thursday: https://news.ycombinator.com/item?id=41450948
- shepherdjerred 125 days ago
  Disjoint sets are awesome! You can use them with a _very_ simple algorithm to form a minimum spanning tree.
  https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
- kevinventullo 125 days ago
  What’s especially surprising to me is that inverse Ackerman is provably optimal for this problem, regardless of implementation.
- LudwigNagasena 125 days ago
  It’s a standard part of Data Structures & Algorithms curricula. You can find it in CLRS.
- agumonkey 125 days ago
  Ha, I've been looking for an inverse ackermann instance for quite a while. Thanks
karmakaze 125 days ago
I've always thought Floyd's tortoise and hare[0] algorithm was handy as it can find a cycle in a DAG with low memory usage.
[0] https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoi...
[-]
- water-data-dude 125 days ago
  Here is where a smartass might point out that by definition you won’t find any cycles in your DAG
  [-]
  - pavon 125 days ago
    It's short for Directed Accidentally-cyclic Graph.
    [-]
    - alfiedotwtf 125 days ago
      Thanks for the morning laugh :)
- konstantinua00 125 days ago
  how does it work on DAG?
  I always thought it's a linked list algo
  [-]
  - tejtm 125 days ago
    By never finding a cycle, it proves the graph is a DAG.
    [-]
    - jerf 125 days ago
      But you have to run it on a sequence, or perhaps even more accurately, something that is putatively a sequence but may have an infinite loop in it. Tortoise & hare requires an unambiguous "next" function, which a graph does not have. Modifying it to not require that so it can run on a graph may be possible but that would constitute a different algorithm for sure, with different complexity analysis, etc.
  - pavon 125 days ago
    It is useful for determining whether a given traversal of a graph has encountered a cycle, rather than determining whether the entire graph has any cycles. So different from what the article was about, but still a nice tool to have in the box.
zeroonetwothree 125 days ago
This problem is called “online cycle detection” in the literature. You can search for various better algorithms that have been developed.
[-]
- layer8 125 days ago
  The article isn’t interested in the “online” aspect, but in finding all (what turns out to be called) non-bridge edges, in order to visualize the loops.
  [-]
  - konstantinua00 125 days ago
    it is interested in both
tromp 126 days ago
The problem of finding cycles in (pseudo randomly generated) graphs is actually one of the first examples of an asymmetric Proof-of-Work puzzle [1]. Asymmetry means that while finding a cycle takes a lot of memory and effort, a solution can be verified instantly.
Several of the suggested algorithms were in fact implemented in the various Cuckoo Cycle solvers [2]. But by maintaining directed paths, my union-find based algorithm did identify the entire cycle and not just the cycle completing edge [1] (implemented in [3]).
This algorithm suffers from three problems though. One, it only finds one among a set of interconnected cycles. Second, it's rather slow, being latency bound by the random memory accesses. Third, it uses much more memory than needed.
So repeatedly trimming edges with only one endpoint is a much faster approach. And for the random graphs that the PoW generates, it can be done using only 1 bit per edge to record if it's been trimmed.
Once sufficiently trimmed, a loop tracing algorithm (like [4] for the Cuckatoo Cycle variant) can find all remaining (simple) cycles.
[1] https://www.semanticscholar.org/paper/Cuckoo-Cycle%3A-A-Memo...
[2] https://github.com/tromp/cuckoo
[3] https://github.com/tromp/cuckoo/blob/master/src/cuckoo/cycle...
[4] https://github.com/tromp/cuckoo/blob/master/src/cuckatoo/gra...
[-]
- pxeger1 126 days ago
  I initially took issue with "first asymmetric proof of work", thinking that in practice any NP-complete problem would surely be fine, even if we can’t prove that it can’t be solved quickly. But is it difficult to find NP-complete problems which can't also be quickly approximated? And I suppose it might also be hard to find NP-complete problems in a "Goldilocks zone" of difficulty to make for practical proofs of work, since their complexity increases so sharply.
  [-]
  - tromp 126 days ago
    For Proof-of-Work applications, at least in blockchains, you need puzzles that
    * can be pseudo randomly generated from a seed (usually the hash of a partial blockchain header)
    * have a known optimal solution method with a (nearly) constant running time that's at most about one second (for progress freeness this should be a small fraction of the target block time)
    NP-hard problems rarely satisfy both these constraints. Finding fixed length cycles in random (bipartite) graphs with billions of edges does.
  - JohnKemeny 126 days ago
    Approximation doesn't help, since you can require optimal solutions, but yes, it's very difficult to come up with hard instances. After all, you need to know the correct answer, so you can't just randomly generate instances. In addition, SAT solvers are ridiculously good these days.
- JohnKemeny 126 days ago
  After trimming, you can also branch on edges: either it is in a cycle, in which you find one, otherwise you can remove it and further reduce the instance by trimming more edges.
JohnVideogames 126 days ago
I had to tackle a problem much like this one when analysing the polygons in planar graphs (analysis of planar chemical networks). A fun approach I found was to use a Delaunay triangulation (the dual of a Voronoi diagram; draw the Voronoi polygons and join their centres if they share an edge). Then join the triangles into larger polygons by removing edges that are in the Delaunay triangulation but not in your original planar graph. Once you've got no more edges to remove you've got all the polygons in the graph and convenient triangles for rendering / analysis.
For planar graphs on the surface of a torus, like they faced here, I just ended up adding extra images of the central unit cell, and then removing the excess. Not elegant (lots of weird corner cases) but convenient for visualisation.
[-]
- gilleain 125 days ago
  Chemistry does ring finding in various ways. I've always liked the SSSR (smallest set of smallest rings) just for the name :)
  https://depth-first.com/articles/2020/08/31/a-smallest-set-o...
  [-]
  - slashdave 125 days ago
    Unfortunately, as noted in your link, the SSSR solution is not unique. Also unfortunate, Daylight (the inventor of SMILES) added SSSR as an optional atom identifier in their SMARTS query specification without realizing this. Because OpenEye doesn't like SSSR, they actually leave out this feature in their implement of SMARTS, breaking from standard.
    https://daylight.com/dayhtml/doc/theory/theory.smarts.html
    BTW, there is a very clever algorithm for finding SSSR
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2765087/
- JohnKemeny 126 days ago
  Can you expand on what you wanted to analyze? Why didn't you just output the polygons immediately?
yohannesk 126 days ago
If this article triggers a desire to work on an actual problem with graphs try https://leetcode.com/problems/freedom-trail
[-]
- thaumasiotes 125 days ago
  This looks like a dynamic programming problem taking space proportional to |ring length|*|key length|. By that analysis, graphs don't really come into it.
  What's the graph-theoretical approach? Does it have nicer characteristics?
vouwfietsman 125 days ago
Funny I've had to do this occasionally and I always do the following: for each edge E, attempt to construct a clockwise loop from that edge by traversing each vertex and picking the edge that is "most clockwise" compared to the incoming edge. Converse for counter clockwise. This gives you, for each edge, the two loops the edge is a part of (if they exist).
It sounds horribly slow, but isn't because the loop finding process for one edge can mark all edges it traverses as either being on the same loop, or not having a loop, for the orientation its checking. These edges can be skipped henceforth.
This does not find all loops, only the smallest disjoint loops (as every edge is at most part of two disjoint loops). But basically results in a similar planar "face" partition the author describes.
Its not clear to me whether my approach does not work (perhaps the angle compares fail on a torus? I've never had to deal with torii), or simply isn't suited because it misses some loops, but I thought I'd share it anyway :-)
[-]
- cvoss 125 days ago
  Notwithstanding the limitation of assuming the graph is embedded in a coordinate space where you can try to measure angles, I think this algorithm still doesn't work.
  I may be missing a detail, but it seems to fail on a graph consisting of a loop with some extra edges branching off the loop toward its interior and some edges branching off toward the exterior. The left hand walk hits a dead end in one branch, and the right hand walk dead ends in the other, no matter where on the loop you start (except for some special cases).
  Edit: strengthening the proposed counterexample
  [-]
  - vouwfietsman 125 days ago
    Hmm I'm having a hard time visualizing your counter example, I'll try to think of it some more. An important detail I left out though is that in my problem sets edges do not cross, which I suppose simplifies the entire problem a lot :-)
    [-]
    - mrgoldenbrown 125 days ago
      I think you are trying to say your method only works if the graph is "planar", ie it can be drawn on a flat piece of paper with no lines crossing.
      [-]
      - vouwfietsman 125 days ago
        Indeed, thanks for the clarification!
    - eigenket 125 days ago
      If I understand your algorithm correctly I think it fails on this graph (drawn on my phone, using an online implementation of Paint).
      https://imgur.com/a/ihPrwrX
      I think this is roughly the counterexample the person above was suggesting.
      [-]
      - vouwfietsman 125 days ago
        Aha! Yes to fix these cases a bit of bookkeeping is needed: a protruding edge like that would force the "walking" to back up, because its a dead end indeed. However, it also immediately means such an edge is not part of any loops, can be marked as such, and ignored upon retracing the steps.
        It's an edge case that should be handled, but does not prevent the general algo from working.
        [-]
        eigenket 124 days ago
        Ah yeah, if you're not trying to identify all loops but just check if some loops exist then that works. I think you can dispense with the "walking around taking clockwise/anticlockwise edges" part though. If you repeatedly do the "remove all nodes with only 1 edge going to them" step then you've already identified whether or not at least one loop exists.
        [-]
        vouwfietsman 124 days ago
        Repeatedly removing those edges would help indeed! I am interested in finding the loops, but in the example provided the "dead end" edges do not contribute to the loop. Officially a loop is not allowed to contain the same vertex twice (besides start/end), otherwise any sequence of vertices could be considered a loop just by repeating them in reverse order.
        However, indeed it remains true that it does not find all loops, only the smallest set of loops that cover all edges.
- konstantinua00 125 days ago
  how is it different from the "sidewalk" algo in the article?
  you'd get the same pitfall of torus topology with 2 perpendicular loops
Hendrikto 126 days ago
I think it is strange that there is no mention of either time or space complexity, in an article about graph algorithms.
If you don‘t need to be efficient, any problem is relatively easy to solve.
[-]
- 082349872349872 126 days ago
  If you're not yet correct, complexity is moot.
  > When in doubt, use brute force. —KLT
emmanueloga_ 125 days ago
I thought I recognized the name; this article was written by the putty guy [1] :-)
---
1: https://www.chiark.greenend.org.uk/~sgtatham/putty/
sim7c00 125 days ago
one of the more interesting reads ive had on here so firstly thanks for that! love all the explanations of the intermediate solutions. ive been working lately with graphs to deduce network topologies and finding loops is one of my problems. excited to learn about the final answer, and happy to see it involves spanning tree :') kind of saves me reading through how that protocol works (or should work) so i can get on with implementing it! saving me likely all sorts of wrong solutions <3 thanks!
[-]
- orbisvicis 124 days ago
  Same. It might also be pertinent to the radicle protocol, which was posted a few days ago and which I'm still slogging through. This has led to one of the most satisfying wikipedia dives in a long time, and I haven't even finished the original article. The wikipedia entry for DFS is very well written. Intuitively I would have structured find via path splitting, so I'm glad that the article covered all three find variations. Then that led to MSTs, and Kruskal's algorithm, then radix search (still reading). This thread is the first time hearing about CLR, so I'm definitely going to add that to my reading list, alongside Purely Functional Data Structures.
penguin_booze 125 days ago
I'm curious: If I were to share the same link again now, I'll be forwarded to the latest submission instead. Interestingly, this link was shared twice in the past few days (click the "past" link above - and, yes, one of the submitters was I). How, then, was it that neither of the recent submitters, including myself, were forwarded to the latest at the time, but instead new submissions were created?
wonnage 126 days ago
Any time you’re looking for a graph algorithm there’s a pretty good chance Tarjan’s name is on it
[-]
- 082349872349872 126 days ago
  Tarjan's SCC is elegant with a heavily imperative flavour; has anyone come up with an elegantly pure (yet still linear) functional variation?
  EDIT: looks like Okasaki has a linear-time functional; I'm pragmatic enough that 2 passes are (even small constant passes would be) fine for me — my graphs are all tiny enough that I'm not about to store them on tape.
  EDIT2: my bad, not Okasaki himself, King & Launchbury: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
```
    scc' :: Graph -> Forest Vertex
    scc' g = dfs g (reverse (postOrd (transposeG g)))
```
  (but their sol'n requires laziness to avoid generating an infinite intermediate; has anyone done it linearly, functionally, and eagerly?)
  [-]
  - n4r9 125 days ago
    I'm not too deep into CS theory so I'm not sure if this is an answer to your question but... SCC algorithms are useful when building routeable road networks from source data (like OSM) in order to filter out isolated sub-networks (like an airfield). We use dotnet but have something that looks possibly similar to what you wrote(?):
```
  StronglyConnectedVertices(Vertex source) => ReachableVertices(source).Intersect(ReverseReachableVertices(source))
```
    The methods ReachableVertices and ReverseReachableVertices are effectively depth-first searches on the graph.
    The challenge - in terms of performance - is actually not finding a single SCC but returning the full set of potentially thousands SCCs in large networks. I originally implemented an iterative (imperative) version of Gabow's "Path-Based" algorithm but it was too slow on continent-sized datasets. We did some experimentation with a parallel variant of Tarjan which didn't improve things much.
    Eventually I settled for something "good enough" - randomly pick a vertex, calculate the SCC with the above code, return it if it's large enough (>1k vertices) otherwise filter it out, keep going until I've accounted for >90% of the graph. The remaining <10% gets filtered out. That's basically imperative (using a while loop) but could probably be functionalised.
    -----------------------------------------------------
    EDIT I've taken a better look at the linked paper and what I wrote is definitely not an answer to your question. But I'll leave it in case anyone's interested.
devit 125 days ago
Seems like a trivial problem to me.
If the graph is directed, do an SCC decomposition in linear time using a graph library and then any SCC with size more than one has at least a loop, trivially extracted by following any edges in the SCC.
If the graph is undirected, compute a spanning tree in linear time using a graph library and then any edge outside the tree forms a loop, again trivially extractable from the edge and the paths from its endpoints to their least common ancestor in the tree.
Since those algorithms are already asymptotically optimal, it's just an engineering problem of finding the solution with the best constant factor given the precise data structures and goal in question.
[-]
- zeroonetwothree 125 days ago
  The most obvious way is to just run a DFS, if you ever come back to a node you’ve seen then you have a cycle (and by storing parent pointers you can get the cycle back). That’s like CS 101 level.
  However in the article it’s actually a graph being modified. If you don’t want to run a full dfs every time, there are better ways. This problem is called “online cycle detection” and you can google some other algorithms.
- layer8 125 days ago
  You obviously didn’t read the article. It’s about undirected graphs, and about visualizing the loops, not just testing for their existence. The final solution is indeed based on spanning trees.
- ludston 125 days ago
  > Says problem is trivial
  > Spews out computer science jargon that 99% of the world's population wouldn't even recognise, let alone understand.
  Hmm.