Parse XML and find all instances of a string
Solution 1
Assuming that XML located at file.xml
, following XPath
with returns you Name
attribute:
String "C:\" could be at:
//Task[contains(text(), "C:\") or //*[contains(text(), "C:\")] or //*[@*[contains(., "C:\")]]]/@Name
Explanations:
- Text of
Task
tag - Text of any children
- In any attribute of any children
PowerShell sample:
#read xml
$xml = [xml](gc -Encoding utf8 .\test.xml)
#process it
$xml |
Select-Xml '//Task[contains(text(), "C:\") or //*[contains(text(), "C:\")] or //*[@*[contains(., "C:\")]]]/@Name' |
% { $_.Node."#text" }
Solution 2
When you cast to [xml], you can access everything using a really nice "property" syntax. Multiple nodes with the same tag will be exposed as arrays. Then you can use the InnerXml property to get at the raw XML string defining the current node. You then just need to do a simple "-like" match against your search string.
Assuming you have multiple "Task" nodes under a single "Tasks" node in one file:
$tasks = [xml] (Get-Content .\Tasks.xml)
$tasks.Tasks.Task |?{ $_.InnerXml -like '*C:\*' } | select -expand Name
Or, if there is a single Task node in each of multiple files:
dir *.xml |%{ [xml] (Get-Content $_) } |?{ $_.Task.InnerXml -like '*C:\*' } | select -expand Name
These will get you the task names. Getting every line within the node which contains the search string is a bit trickier. Here's a hacky regex approach (I know I know, don't parse XML with regex...). Again, assuming a single Task node in each XML file:
$taskXmls = dir *.xml |%{ [xml](Get-Content $_) }
foreach($taskXml in $taskXmls)
{
if($taskXml.Task.InnerXml -like '*C:\*')
{
$hits = [Regex]::Matches($taskXml.Task.InnerXml, '<[^<]*C:\\[^>]*>')
$hitList = $null
if($hits)
{
$hitList = $hits | select -expand Value
}
new-object psobject -prop @{TaskName = $taskXml.Task.Name; Hits = $hitList}
}
}
mhopkins321
Updated on August 02, 2022Comments
-
mhopkins321 over 1 year
I'm working with an xml file that looks similar to the following. However it is the following thousands of times over. I will be using powershell to parse through the xml
I need to find the task name of all the tasks where the string "c:\" shows up. While this could be easy if there was only one area that the string might show up, it can quite literally show up all over the task. In this particular task I have put the C:\ in 4 different times.
I'm hoping to get an output of the task name, and the places that the given path was referenced...
<Task ID="00000000" Name="Task name goes here" Active="0" NextEID="22" CacheNames="random" AR="0" TT="COS"> <Info> <Description> </Description> <Notes> </Notes> </Info> <Parameters> <moreParameters>C:\pathGoesHere</moreParameters> </Parameters> <Schedules/> <Source HostID="0" Type="FileSystem" Path="C:\path" FileMask="[Parm:parameter].txt" DeleteOrig="0" NewFilesOnly="0" SearchSubdirs="0" Unzip="0" RetryIfNoFiles="0" UseDefRetryCount="1" UseDefRetryTimeoutSecs="1" UseDefRescanSecs="1" UDMxFi="1" UDMxBy="1" ID="11"/> <For ID="13"> <Destination HostID="000000" Type="siLock" FolderID="" FolderType="4" FolderName="Home/[Parm:parameter]/" Subject="" FileName="[OnlyName]_[YYYY][MM][DD].bai" UseOrigName="0" ForceDir="1" OverwriteOrig="1" UseRelativeSubdirs="1" Zip="0" UseDefRetryCount="1" UseDefRetryTimeoutSecs="1" UseDefUser="1" UseDefClientCert="1" ID="12"/> <If ID="14"> <When> <Criteria> <comp a="[ErrorCodeFile]" test="NEQ" b="0"/> </Criteria> <UpdOrig Action="d" ID="15"/> <Destination HostID="0000000000" Type="Share" Path="C:\anotherCPath" FileName="[Parm:parameter]_[YYYY][MM][DD].bai" UseOrigName="0" ForceDir="1" OverwriteOrig="1" UseRelativeSubdirs="1" Zip="0" UseDefRetryCount="1" UseDefRetryTimeoutSecs="1" ID="17"/> </When> </If> </For> <If ID="19"> <When> <Criteria> <comp a="[ErrorCodeTask]" test="NNE" b="0"/> </Criteria> <Email HostID="385322183" Subject="[TaskStatus]-[TaskName]" Message="" AddressTo="[email protected]" Attachment = "C:\path\" UseDefRetryCount="1" UseDefRetryTimeoutSecs="1" ID="20"/> </When> </If> </Task>