SQL Server: How to limit CTE recursion to rows just recursivly added?

26,297

Try this:

WITH Nodes AS
(
   --initialization
   SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
   FROM ##Nodes

   UNION ALL

   ----recursive execution
   SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
   FROM Nodes AS P
      INNER JOIN ##Nodes AS N
      ON P.NodeID = N.ParentNodeID
   WHERE P.GenerationsRemoved <= 10

)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM Nodes
ORDER BY ParentNodeID, NodeID, GenerationsRemoved

Basically removing the "only show me absolute parents" from the initialization query; That way it generates the results starting from each of them and decending from there. I also added in the "WHERE P.GenerationsRemoved <= 10" as an infinite recursion catch(replace 10 with any number up to 100 to fit your needs). Then add the sort so it looks like the results you wanted.

Share:
26,297
mistertodd
Author by

mistertodd

Any code is public domain. No attribution required. జ్ఞా &lt;sup&gt;🕗&lt;/sup&gt;🕗 Yes, i do write i with a lowercase i. The Meta Stackexchange answer that I am most proud of

Updated on July 05, 2022

Comments

  • mistertodd
    mistertodd almost 2 years

    Simpler Example

    Let's try a simpler example, so people can wrap their heads around the concepts, and have a practical example that you can copy&paste into SQL Query Analizer:

    Imagine a Nodes table, with a heirarchy:

    A
     - B
        - C
    

    We can start testing in Query Analizer:

    CREATE TABLE ##Nodes
    (
     NodeID varchar(50) PRIMARY KEY NOT NULL,
     ParentNodeID varchar(50) NULL
    )
    
    INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', null)
    INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A')
    INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B')
    

    Desired output:

    ParentNodeID    NodeID    GenerationsRemoved
    ============    ======    ==================
    NULL            A         1
    NULL            B         2
    NULL            C         3
    A               B         1
    A               C         2
    B               C         1
    

    Now the suggested CTE expression, with it's incorrect output:

    WITH NodeChildren AS
    (
       --initialization
       SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
       FROM ##Nodes
       WHERE ParentNodeID IS NULL
    
       UNION ALL
    
       --recursive execution
       SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
       FROM NodeChildren AS P
          INNER JOIN ##Nodes AS N
          ON P.NodeID = N.ParentNodeID
    )
    SELECT ParentNodeID, NodeID, GenerationsRemoved
    FROM NodeChildren
    

    Actual output:

    ParentNodeID    NodeID    GenerationsRemoved
    ============    ======    ==================
    NULL            A         1
    NULL            B         2
    NULL            C         3
    

    Note: If SQL Server 2005† CTE cannot do what i was doing before in 2000‡, that's fine, and that's the answer. And whoever gives "it's not possible" as the answer will win the bounty. But i will wait a few days to make sure everyone concur's that it's not possible before i irrecovably give 250 reputation for a non-solution to my problem.

    Nitpickers Corner

    †not 2008

    ‡without resorting to a UDF*, which is the solution already have

    *unless you can see a way to improve the performance of the UDF in the original question


    Original Question

    i have a table of Nodes, each with a parent that points to another Node (or to null).

    For illustration:

    1 My Computer
        2 Drive C
             4 Users
             5 Program Files
             7 Windows
                 8 System32
        3 Drive D
             6 mp3
    

    i want a table that returns all the parent-child relationships, and the number of generations between them

    For for all direct parent relationships:

    ParentNodeID  ChildNodeID  GenerationsRemoved
    ============  ===========  ===================
    (null)        1            1
    1             2            1
    2             4            1
    2             5            1
    2             7            1
    1             3            1
    3             6            1
    7             8            1
    

    But then there's the grandparent relationships:

    ParentNodeID  ChildNodeID  GenerationsRemoved
    ============  ===========  ===================
    (null)        2            2
    (null)        3            2
    1             4            2
    1             5            2
    1             7            2
    1             6            2
    2             8            2
    

    And the there's the great-grand-grandparent relationships:

    ParentNodeID  ChildNodeID  GenerationsRemoved
    ============  ===========  ===================
    (null)        4            3
    (null)        5            3
    (null)        7            3
    (null)        6            3
    1             8            3
    

    So i can figure out the basic CTE initialization:

    WITH (NodeChildren) AS
    {
       --initialization
       SELECT ParentNodeID, NodeID AS ChildNodeID, 1 AS GenerationsRemoved
       FROM Nodes
    } 
    

    The problem now is the recursive part. The obvious answer, of course, doesn't work:

    WITH (NodeChildren) AS
    {
       --initialization
       SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
       FROM Nodes
    
       UNION ALL
    
       --recursive execution
       SELECT parents.ParentNodeID, children.NodeID, parents.Generations+1
       FROM NodeChildren parents
        INNER JOIN NodeParents children
        ON parents.NodeID = children.ParentNodeID
    } 
    
    Msg 253, Level 16, State 1, Line 1
    Recursive member of a common table expression 'NodeChildren' has multiple recursive references.
    

    All the information needed to generate the entire recursive list is present in the inital CTE table. But if that's not allowed i'll try:

    WITH (NodeChildren) AS
    {
       --initialization
       SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
       FROM Nodes
    
       UNION ALL
    
       --recursive execution
       SELECT parents.ParentNodeID, Nodes.NodeID, parents.Generations+1
       FROM NodeChildren parents
        INNER JOIN Nodes
        ON parents.NodeID = nodes.ParentNodeID
    } 
    

    But that fails because it's not only joining on the recursive elements, but keeps recursivly adding the same rows over and over:

    Msg 530, Level 16, State 1, Line 1
    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
    

    In SQL Server 2000 i simulated a CTE by using a User Defined Function (UDF):

    CREATE FUNCTION [dbo].[fn_NodeChildren] ()
    RETURNS @Result TABLE (
        ParentNodeID int NULL,
        ChildNodeID int NULL,
        Generations int NOT NULL) 
    AS  
    /*This UDF returns all "ParentNode" - "Child Node" combinations
        ...even multiple levels separated
    BEGIN 
        DECLARE @Generations int
        SET @Generations = 1
    
        --Insert into the Return table all "Self" entries
        INSERT INTO @Result
        SELECT ParentNodeID, NodeID, @Generations
        FROM Nodes
        WHILE @@rowcount > 0 
        BEGIN
            SET @Generations = @Generations + 1
            --Add to the Children table: 
            --  children of all nodes just added 
            -- (i.e. Where @Result.Generation = CurrentGeneration-1)
            INSERT @Result
            SELECT CurrentParents.ParentNodeID, Nodes.NodeID, @Generations
            FROM Nodes
                INNER JOIN @Result CurrentParents
                ON Nodes.ParentNodeID = CurrentParents.ChildNodeID
            WHERE CurrentParents.Generations = @Generations - 1
        END
        RETURN
    END
    

    And the magic to keep it from blowing up was the limiting where clause: WHERE CurrentParents.Generations - @Generations-1

    How do you prevent a recursive CTE from recursing forever?

  • mistertodd
    mistertodd about 15 years
    No, don't have 2008. And in 2003 when i originally need the problem solved, people said to wait for 2005.
  • mistertodd
    mistertodd about 15 years
    That doesn't work because it only starts from a specific node - i need all nodes. Then all those nodes children. Then all those node's children. etc.
  • mistertodd
    mistertodd about 15 years
    i can't limit it to rows with no parent, because then it limits to rows with no parent. i need to include rows with a parent also.
  • mistertodd
    mistertodd about 15 years
    Additionally, i can't get ahold of 2008, intall it, port the database to 2008, redesign the entire system, get the customer to buy, install and configured 2008 - all in the next few hours.
  • ZygD
    ZygD about 15 years
    You can limit recursions using OPTION (MAXRECURSION nn) correctly. This is not a correct solution
  • Chris Shaffer
    Chris Shaffer about 15 years
    OPTION(MAXRECURSION nn) causes the statement to fail; My goal with including the WHERE clause was to guarantee a return (obviously there will be times that a failure is better).
  • Michael K
    Michael K about 7 years
    This solution works for me where MAXRECURSION will not. I have a special situation with legacy data where recursing too far can often find self referencing data. This allows me to only descend n deep where if you set MAXRECURSION to say, 5, it will throw the error on 5. This solution returns the first 5 regardless if there would have been an infinite loop.