Oracle SQL Hierarchical Query: Flatten Hierarchy and Perform Aggregation

sql oracle aggregate-functions hierarchical-data hierarchical

10,267

Solution 1

You can use SYS_CONNECT_BY_PATH to generate an expression, the product of all quantities in a branch. Then use a function to dynamically execute that expression to get the final quantity.

It's not an ideal solution. The context switches between SQL and PL/SQL will take some time. And you'll need to worry about SQL injection. But at least you can avoid querying the same table twice.

(A recursive CTE, as Dan A. and podiluska suggested, would very likely be the best solution. In my experience, even when the two syntaxes are doing the same thing and using similar access paths, the recursive CTE can be significantly faster than connect by. But you'll need to wait until an upgrade to 11gR2.)

CREATE OR REPLACE FUNCTION EVALUATE_EXPRESSION(P_EXPRESSION IN VARCHAR2) RETURN NUMBER AS
    R_QTY  INTEGER;
BEGIN
    EXECUTE IMMEDIATE 'SELECT '||P_EXPRESSION||' FROM DUAL' INTO R_QTY;
    RETURN R_QTY;
END;
/


SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2) QTY_EXPRESSION,
      EVALUATE_EXPRESSION(SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2)) QTY2,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

Also, you mentioned that the table has indexes. But are the queries using the indexes? Can you post the explain plan?

Finally, with a query this slow you may need to look into parallelism. Unfortunately, I've never had much luck using parallelism and connect by.

Solution 2

Podiluska's suggestion is good. If you have Oracle 11g R2, common table expressions are the way to go. The recursive nature of the new syntax will allow you to ditch the sys_connect_by_path combined with instr, which is going to seriously hurt your performance.

Try this:

select
  child,
  sum(total_quantity) total_quantity
from (
  with h (parent, child, isleaf, quantity, total_quantity) as (
    select 
      parent,
      child,
      isleaf,
      quantity,
      quantity total_quantity
    from
      itemhier
    where
      parent = 'ASSY001' 
    union all
    select
      ih.parent,
      ih.child,
      ih.isleaf,
      ih.quantity,
      ih.quantity * h.total_quantity total_quantity
    from
      itemhier ih
    join 
      h on h.child = ih.parent
  )
  select * from h
  where isleaf = 1
)
group by child;

Here's the sqlfiddle: http://sqlfiddle.com/#!4/9840f/6

Solution 3

You should look at the WITH statement, and common table expressions/subquery factoring, which will allow you to traverse your hierarchy in a single SQL statement. It will probably be faster too.

eg:

to find all Leafs of 'assy002'

with cte as
(
    select * from #ITEMHIER
    union all
    select i.PARENT, cte.CHILD, cte.QUANTITY, cte.ISLEAF
    from #ITEMHIER i
        inner join cte on i.CHILD = cte.PARENT
)
    select CHILD,QUANTITY, isleaf from cte
    where PARENT='assy002'
    and isleaf=1;

10,267

Author by

Admin

Updated on June 28, 2022

Comments

Admin almost 2 years

I am trying to improve performance for a proof of concept I have already written and am having no luck. I think the approach is probably flawed, but I’m struggling to find another solution. I’ve covered all the Ask Tom articles and forum posts I can find.

We’re running Oracle 10g R2.

We have items arranged in a hierarchical structure. Quantities are defined on the relationships. There are two types of objects in the hierarchy: assemblies that are logical groupings, and items that represent an actual item. So if we were representing a full tool set we would have a root representing the whole tool set, and a leaf that represent an actual tool. So:

tool set -> screw drivers -> flat head screw drivers -> small flat head screw driver

The assemblies can be reused in the hierarchy, as can the items.

I need to flatten the hierarchy so each instance of an item has a row, and the quantity. Any of the relationships can have a quantity >= 1. To get the quantity of an item we need to get the product of the quantities from all the relationships from the root to the leaf.

My solution works, but it does not scale well. Running against actual data it is taking about 8 minutes to produce 6000+ rows, and we have hierarchies that would produce 50k+ rows. Ideally this would be completed in 10 seconds or less, but I know that’s… optimistic ;)

My solution and simplified dataset is below. Any feedback will be much appreciated!

CREATE TABLE ITEMHIER
(
  PARENT          VARCHAR2(30 BYTE),
  CHILD           VARCHAR2(30 BYTE),
  QUANTITY        NUMBER(15,2),
  ISLEAF          NUMBER
);

INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM001',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM002',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY005','ITEM003',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM002',10,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY006','ITEM004',3,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM005',12,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY007','ITEM006',1,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM006',2,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY008','ITEM005',5,1);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY005',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY002','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY006',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY003','ASSY008',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY007',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY005',3,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY004','ASSY006',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY002',1,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY003',2,0);
INSERT INTO ITEMHIER (PARENT, CHILD, QUANTITY, ISLEAF) VALUES ('ASSY001','ASSY004',1,0);

COMMIT;
/

CREATE OR REPLACE FUNCTION GETQTY(P_NAVPATH   IN VARCHAR2,
                                  P_STARTWITH IN VARCHAR2) RETURN INTEGER AS

R_QTY  INTEGER;

BEGIN

    SELECT EXP(SUM(LN(QUANTITY)))
    INTO R_QTY
    FROM (
           SELECT QUANTITY, SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH
           FROM ITEMHIER
           START WITH PARENT = P_STARTWITH
           CONNECT BY PRIOR  CHILD = PARENT
         )
    WHERE INSTR(P_NAVPATH, NAV_PATH) = 1; 

    RETURN R_QTY;
END;
/

SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
      GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
      CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

----EDIT

Using the WITH clause I was able to cut processing time in about 1/2, which is a great gain! Any other ideas?

with
h as (
    select sys_connect_by_path(child,'/') navpath,
          child,
          quantity qty,
          isleaf
    from itemhier
    start with parent = 'ASSY001'
    connect by prior child = parent
)
select h1.navpath,
       h1.child,
       (SELECT exp(sum(ln(h2.qty)))
        FROM h h2
        WHERE instr(h1.navpath, h2.navpath) = 1) qty
from h h1
where isleaf = 1

EDIT 2

jonearles suggestion to use the sys_connect_by_path to build an arithmetic expression, then use PL/SQL to evaluate it appears to be the way to go. Running against my largest dataset I was able to produce 77k rows of output in 55 seconds.

I also attempted to use parallelism, but as he noted there was little to no performance gain to be had.

Admin over 11 years

Wouldn't I get the same result using START WITH? SELECT QUANTITY, CHILD FROM ITEMHIER WHERE ISLEAF = 1 START WITH PARENT = 'ASSY002' CONNECT BY PRIOR CHILD = PARENT My challenge has been getting the product of the quantities for a single path down the hierarchy.
podiluska over 11 years

Not sure. My oracle syntax is shoddy at best :)
Admin over 11 years

Thanks for the suggestion to use the with clause though, I'm going to play with that and see if I can get it to only do one hierarchical query. That should save some time...
Dan A. over 11 years

I missed the part about you running on Oracle 10g R2. But there are some great suggestions here: stackoverflow.com/a/4786672/537166
Admin over 11 years

That is much more efficient, but I need the quantity for the branch of the hierarchy, not the total quantity in the hierarchy. That's why I am using the instr function, which I know is costly, but I can't think of another way to narrow the aggregation to a single branch.
Dan A. over 11 years

Not sure I follow you on that. This solution recursively traverses the branches of the hierarchy and calculates the product of the branches. If an leaf node item is only associated with one branch of the hierarchy, you'll only get the product of that branch. Using the sample data you provided, items are located on multiple branches, so it sums up the totals of any potential branches the item falls within. But as I mentioned above, it only works with Oracle 11g R2 or later.
Admin over 11 years

If you look at the output from the query in the first post it produces a row for each branch the item is in, and the quantity column only reflects the quantity in that branch.
Admin over 11 years

I think that's the answer, it's much faster. We are planning an 11g upgrade, but it's still early in the works. I will revisit once the upgrade is complete, but this is something I can certainly work with until then.
Dan A. over 11 years

Ah, OK. If you run the inner query (starting with "with h" and ending with "where isleaf = 1"), you'll get a row for each individual branch.
Admin about 11 years

So we recently upgraded to 11g and I tried this solution. As expected it is much faster than the other methods, and lets us do some real time processing we had previously resigned to persisting in tables. So, thanks :)