Represent a tree hierarchy using an Excel spreadsheet to be easily parsed by Python CSV reader?

13,199

For future readers, I ended up using a column-based hierarchy where each row is the complete traversal to a leaf. So you end up with as many rows as there are leafs.

Electronics | Computers    | Laptops
Electronics | Computers    | Desktop
Electronics | Game Systems | Xbox
Electronics | Game Systems | PS3
Electronics | Game Systems | Wii
Electronics | MP3 Players  | iPod Shuffle
Clothing    | Menswear     | Pants         | Shorts
Clothing    | Menswear     | Pants         | Pajamas

In the script, Python traverses row-by-row, cell-by-cell, keeping track of both the current row and the previous row. Since you traverse from left-to-right you go from root to leaf. If the current column in current row is ever different than the current column in the previous row, then we must have gone down a new branch, and we'll add a new node to our tree.

Share:
13,199

Related videos on Youtube

Erich
Author by

Erich

Updated on August 02, 2022

Comments

  • Erich
    Erich almost 2 years

    I have a non-technical client who has some hierarchical product data that I'll be loading into a tree structure with Python. The tree has a variable number of levels, and a variable number nodes and leaf nodes at each level.

    The client already knows the hierarchy of products and would like to put everything into an Excel spreadsheet for me to parse.

    What format can we use that allows the client to easily input and maintain data, and that I can easily parse into a tree with Python's CSV? Going with a column for each level isn't without its hiccups (especially if we introduce multiple node types)

  • DevLounge
    DevLounge almost 11 years
    I also recommend this solution. IMO, using an empty row indent is not good, especially to use data filters in excel, all the cells should have a value. So Erich solution is clean on both excel and python sides.
  • DevLounge
    DevLounge almost 11 years
    Then, on the python side, you can just use a nested structure of defaultdict(dict)