Extract lines matching a pattern from all text files in a folder to a single output file

37,824

Solution 1

I think that mklement0's suggestion to use Select-String is the way to go. Adding to his answer, you can pipe the output of Get-ChildItem into the Select-String so that the entire process becomes a Powershell one liner.

Something like this:

Get-ChildItem "folder" -Filter *.txt | Select-String -Pattern '^%%' | Select -ExpandProperty line | Set-Content "Output.txt"

Solution 2

The Select-String cmdlet offers a much simpler solution (PSv3+ syntax):

(Select-String -Path folder\*.txt -Pattern '^%%').Line | Set-Content Output.txt
  • Select-String accepts a filename/path pattern via its -Path parameter, so, in this simple case, there is no need for Get-ChildItem.

    • If, by contrast, you input file selection is recursive or uses more complex criteria, you can pipe Get-ChildItem's output to Select-String, as demonstrated in Dave Sexton's helpful answer.
    • Note that, according to the docs, Select-String by default assumes that the input files are UTF-8-encoded, but you can change that with the -Encoding parameter; also consider the output encoding discussed below.
  • Select-String's -Pattern parameter expects a regular expression rather than a wildcard expression.
    ^%% only matches literal %% at the start (^) of a line.

  • Select-String outputs [Microsoft.PowerShell.Commands.MatchInfo] objects that contain information about each match; each object's .Line property contains the full text of an input line that matched.

  • Set-Content Output.txt sends all matching lines to single output file Output.txt

    • Set-Content uses the system's legacy Windows codepage (an 8-bit single-byte encoding - even though the documentation mistakenly claims that ASCII files are produced).
      If you want to control the output encoding explicitly, use the -Encoding parameter; e.g., ... | Set-Content Output.txt -Encoding Utf8.
    • By contrast, >, the output redirection operator always creates UTF-16LE files (an encoding PowerShell calls Unicode), as does Out-File by default (which can be changed with -Encoding).
      Also note that > / Out-File apply PowerShell's default formatting to the input objects to obtain the string representation to write to the output file, whereas Set-Content treats the input as strings (calls .ToString() on input objects, if necessary). In the case at hand, since all input objects are already strings, there is no difference (except for the character encoding, potentially).

As for what you've tried:

  • $_ inside your foreach ($file in $files) refers to a file (a [System.IO.FileInfo] object), so you're effectively evaluating your wildcard expression *%%* against the input file's name rather than its contents.

  • Aside from that, wildcard pattern *%%* will match %% anywhere in the input string, not just at its start (you'd have to use %%* instead).

  • The Set-Content "Output.txt" call is missing input, because it is not part of a pipeline and, in the absence of pipeline input, no -Value argument was passed.

    • Even if you did provide input, however, output file Output.txt would get rewritten as a whole in each iteration of your foreach loop.

Solution 3

ls *.txt | %{
$f = $_
  gc $f.fullname | {
     if($_.StartWith("%%") -eq 1){
        $_ >> Output.txt
     }#end if
  }#end gc
}#end ls

Alias

ls - Get-ChildItem
gc - Get-Content
% - ForEach
$_ - Iterator variable for loop
>> - Redirection construct
# - Comment

http://ss64.com/ps/

Solution 4

First you have to use

Get-Content

in order to get the content of the file. Then you do the string match and based on that you again set the content back to the file. Use get-content and put another loop inside the foreach to iterate all the lines in the file.

I hope this logic helps you

Share:
37,824
Jabir Jamal
Author by

Jabir Jamal

Updated on July 19, 2022

Comments

  • Jabir Jamal
    Jabir Jamal almost 2 years

    I am trying to extract each line starting with "%%" in all files in a folder and then copy those lines to a separate text file. Currently using this code in PowerShell code, but I am not getting any results.

    $files = Get-ChildItem "folder" -Filter *.txt
    foreach ($file in $files)
    {
    if ($_ -like "*%%*")
    {
    Set-Content "Output.txt" 
    }  
    }