Convert nested JSON array into separate columns in CSV file

11,464

Solution 1

The CSV conversion/export cmdlets have no way of "flattening" an object, and I may be missing something, but I know of no way to do this with a built-in cmdlet or feature. If you can guarantee that disciplines and areas will always have the same number of elements, you can trivialize it by using Select-Object with derived properties to do this:

$properties=@('id','title','slug','url','email','link','subject','level',
    @{Name='discipline_1';Expression={$_.disciplines[0]}}
    @{Name='discipline_2';Expression={$_.disciplines[1]}}
    @{Name='discipline_3';Expression={$_.disciplines[2]}}
    @{Name='area_1';Expression={$_.areas[0]}}
    @{Name='area_2';Expression={$_.areas[1]}}
)
(Get-Content 'PATH_TO\test.json' -Raw | ConvertFrom-Json)| Select-Object -Property $properties | Export-CSV -NoTypeInformation -Path 'PATH_TO\test.csv'

However, I am assuming that disciplines and areas will be variable length for each record. In that case, you will have to loop over the input and pull the highest count value for both disciplines and areas, then build the properties array dynamically:

$inputData = Get-Content 'PATH_TO\test.json' -Raw | ConvertFrom-Json
$counts = $inputData | Select-Object -Property     @{Name='disciplineCount';Expression={$_.disciplines.Count}},@{Name='areaCount';Expression={$_.areas.count}}
$maxDisciplines = $counts | Measure-Object -Maximum -Property disciplineCount | Select-Object -ExpandProperty     Maximum
$maxAreas = $counts | Measure-Object -Maximum -Property areaCount | Select-Object -ExpandProperty Maximum

$properties=@('id','title','slug','url','email','link','subject','level')

1..$maxDisciplines | % {
  $properties += @{Name="discipline_$_";Expression=[scriptblock]::create("`$_.disciplines[$($_ - 1)]")}
}

1..$maxAreas | % {
  $properties += @{Name="area_$_";Expression=[scriptblock]::create("`$_.areas[$($_ - 1)]")}
}

$inputData | Select-Object -Property $properties | Export-CSV -NoTypeInformation -Path 'PATH_TO\test.csv'

This code hasn't been fully tested, so it may need some tweaking to work 100%, but I believe the ideas are solid =)

Solution 2

2017-11-20, Completely rewrote function to improve performance and add features as -ArrayBase and support for PSStandardMembers and grouped objects.

Flatten-Object

Recursively flattens objects containing arrays, hash tables and (custom) objects. All added properties of the supplied objects will be aligned with the rest of the objects.

Requires PowerShell version 2 or higher.

Cmdlet

Function Flatten-Object {                                       # Version 00.02.12, by iRon
    [CmdletBinding()]Param (
        [Parameter(ValueFromPipeLine = $True)][Object[]]$Objects,
        [String]$Separator = ".", [ValidateSet("", 0, 1)]$Base = 1, [Int]$Depth = 5, [Int]$Uncut = 1,
        [String[]]$ToString = ([String], [DateTime], [TimeSpan]), [String[]]$Path = @()
    )
    $PipeLine = $Input | ForEach {$_}; If ($PipeLine) {$Objects = $PipeLine}
    If (@(Get-PSCallStack)[1].Command -eq $MyInvocation.MyCommand.Name -or @(Get-PSCallStack)[1].Command -eq "<position>") {
        $Object = @($Objects)[0]; $Iterate = New-Object System.Collections.Specialized.OrderedDictionary
        If ($ToString | Where {$Object -is $_}) {$Object = $Object.ToString()}
        ElseIf ($Depth) {$Depth--
            If ($Object.GetEnumerator.OverloadDefinitions -match "[\W]IDictionaryEnumerator[\W]") {
                $Iterate = $Object
            } ElseIf ($Object.GetEnumerator.OverloadDefinitions -match "[\W]IEnumerator[\W]") {
                $Object.GetEnumerator() | ForEach -Begin {$i = $Base} {$Iterate.($i) = $_; $i += 1}
            } Else {
                $Names = If ($Uncut) {$Uncut--} Else {$Object.PSStandardMembers.DefaultDisplayPropertySet.ReferencedPropertyNames}
                If (!$Names) {$Names = $Object.PSObject.Properties | Where {$_.IsGettable} | Select -Expand Name}
                If ($Names) {$Names | ForEach {$Iterate.$_ = $Object.$_}}
            }
        }
        If (@($Iterate.Keys).Count) {
            $Iterate.Keys | ForEach {
                Flatten-Object @(,$Iterate.$_) $Separator $Base $Depth $Uncut $ToString ($Path + $_)
            }
        }  Else {$Property.(($Path | Where {$_}) -Join $Separator) = $Object}
    } ElseIf ($Objects -ne $Null) {
        @($Objects) | ForEach -Begin {$Output = @(); $Names = @()} {
            New-Variable -Force -Option AllScope -Name Property -Value (New-Object System.Collections.Specialized.OrderedDictionary)
            Flatten-Object @(,$_) $Separator $Base $Depth $Uncut $ToString $Path
            $Output += New-Object PSObject -Property $Property
            $Names += $Output[-1].PSObject.Properties | Select -Expand Name
        }
        $Output | Select ([String[]]($Names | Select -Unique))
    }
}; Set-Alias Flatten Flatten-Object

Syntax

<Object[]> Flatten-Object [-Separator <String>] [-Base "" | 0 | 1] [-Depth <Int>] [-Uncut<Int>] [ToString <Type[]>]

or:

Flatten-Object <Object[]> [[-Separator] <String>] [[-Base] "" | 0 | 1] [[-Depth] <Int>] [[-Uncut] <Int>] [[ToString] <Type[]>]

Parameters

-Object[] <Object[]>
The object (or objects) to be flatten.

-Separator <String> (Default: .)
The separator used between the recursive property names. .

-Depth <Int> (Default: 5)
The maximal depth of flattening a recursive property. Any negative value will result in an unlimited depth and could cause a infinitive loop.

-Uncut <Int> (Default: 1)
The number of object iterations that will left uncut further object properties will be limited to just the DefaultDisplayPropertySet. Any negative value will reveal all properties of all objects.

-Base "" | 0 | 1 (Default: 1)
The first index name of an embedded array:

  • 1, arrays will be 1 based: <Parent>.1, <Parent>.2, <Parent>.3, ...
  • 0, arrays will be 0 based: <Parent>.0, <Parent>.1, <Parent>.2, ...
  • "", the first item in an array will be unnamed and than followed with 1: <Parent>, <Parent>.1, <Parent>.2, ...

-ToString <Type[]= [String], [DateTime], [TimeSpan]>
A list of value types (default [String], [DateTime], [TimeSpan]) that will be converted to string rather the further flattened. E.g. a [DateTime] could be flattened with additional properties like Date, Day, DayOfWeek etc. but will be converted to a single (String) property instead.

Note:
The parameter -Path is for internal use but could but used to prefix property names.

Examples

Answering the specific question:

(Get-Content "PATH_TO\test.json" -Raw | ConvertFrom-Json) | Flatten-Object | Convertto-CSV -NoTypeInformation | Set-Content "PATH_TO\test.csv"

Result:

{
    "url":  "http://test.test",
    "slug":  "slug",
    "id":  10011,
    "link":  "http://test.er",
    "level":  1,
    "areas.2":  "area_b",
    "areas.1":  "area_a",
    "disciplines.3":  "discipline_c",
    "disciplines.2":  "discipline_b",
    "disciplines.1":  "discipline_a",
    "subject":  "testing",
    "title":  "Test procedure",
    "email":  "[email protected]"
}

Stress testing a more complex custom object:

New-Object PSObject @{
    String    = [String]"Text"
    Char      = [Char]65
    Byte      = [Byte]66
    Int       = [Int]67
    Long      = [Long]68
    Null      = $Null
    Booleans  = $False, $True
    Decimal   = [Decimal]69
    Single    = [Single]70
    Double    = [Double]71
    Array     = @("One", "Two", @("Three", "Four"), "Five")
    HashTable = @{city="New York"; currency="Dollar"; postalCode=10021; Etc = @("Three", "Four", "Five")}
    Object    = New-Object PSObject -Property @{Name = "One";   Value = 1; Text = @("First", "1st")}
} | Flatten

Result:

Double               : 71
Decimal              : 69
Long                 : 68
Array.1              : One
Array.2              : Two
Array.3.1            : Three
Array.3.2            : Four
Array.4              : Five
Object.Name          : One
Object.Value         : 1
Object.Text.1        : First
Object.Text.2        : 1st
Int                  : 67
Byte                 : 66
HashTable.postalCode : 10021
HashTable.currency   : Dollar
HashTable.Etc.1      : Three
HashTable.Etc.2      : Four
HashTable.Etc.3      : Five
HashTable.city       : New York
Booleans.1           : False
Booleans.2           : True
String               : Text
Char                 : A
Single               : 70
Null                 :

Flatting grouped objects:

$csv | Group Name | Flatten | Format-Table # https://stackoverflow.com/a/47409634/1701026

Flatting common objects:

(Get-Process)[0] | Flatten-Object

Or a list (array) of objects:

Get-Service | Flatten-Object -Depth 3 | Export-CSV Service.csv

Note that a command as below takes hours to compute:

Get-Process | Flatten-Object | Export-CSV Process.csv

Why? because it results in a table with a few hundred rows and several thousand columns. So if you if would like to use this for flatting process, you beter limit the number of rows (using the Where-Object cmdlet) or the number of columns (using the Select-Object cmdlet).

For the latest Flatten-Object version, see: https://powersnippets.com/flatten-object/

Share:
11,464
user2758935
Author by

user2758935

Updated on July 21, 2022

Comments

  • user2758935
    user2758935 almost 2 years

    I have a JSON file that looks like this:

    {
        "id": 10011,
        "title": "Test procedure",
        "slug": "slug",
        "url": "http://test.test",
        "email": "[email protected]",
        "link": "http://test.er",
        "subject": "testing",
        "level": 1,
        "disciplines": [
          "discipline_a",
          "discipline_b",
          "discipline_c"
        ],
        "areas": [
          "area_a",
          "area_b"
        ]
      },
    

    I was trying to use the following command to convert that into the CSV file:

    (Get-Content "PATH_TO\test.json" -Raw | ConvertFrom-Json)| Convertto-CSV -NoTypeInformation | Set-Content "PATH_TO\test.csv"
    

    However, for disciplines and areas I am getting System.Object[] in the resulting CSV file.

    Is there a way to put all those nested values as a separate columns in CSV file like area_1, area_2 etc. And the same for disciplines.