Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options

80,666

Solution 1

How about this (reads last 8 bytes for demo):

$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
    $fs.ReadByte()
}

UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)

UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:

$M = 3
$fpath = "C:\10GBfile.dat"

$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size

$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
    $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
    $fs.Read($buffer, 0, $buffer_size) | Out-Null
    $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()

($result -split $seq) | Select -Last $M

Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n. This is very dirty code without any error handling and optimizations.

Solution 2

If you have PowerShell 3 or higher, you can use the -Tail parameter for Get-Content to get the last n lines.

Get-content -tail 5 PATH_TO_FILE;

On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5

Solution 3

When the file is already opened, it's better to use

Get-Content $fpath -tail 10

because of "exception calling "OpenRead" with "1" argument(s): "The process cannot access the file..."

Solution 4

With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script

$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
    $mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr

which I call from a batch file containing

@PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"

(thanks to How to run a PowerShell script from a batch file).

Solution 5

This is not an answer, but a large comment as reply to sancho.s' answer.

When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:

@PowerShell  ^
   $fpath = %2;  ^
   $fs = [IO.File]::OpenRead($fpath);  ^
   $fs.Seek(-%1, 'End') ^| Out-Null;  ^
   $mystr = '';  ^
   for ($i = 0; $i -lt %1; $i++)  ^
   {  ^
      $mystr = ($mystr) + ([char[]]($fs.ReadByte()));  ^
   }  ^
   Write-Host $mystr
%End PowerShell%
Share:
80,666
sancho.s ReinstateMonicaCellio
Author by

sancho.s ReinstateMonicaCellio

user245595 Part of the story about Monica Cellio. Parties in a dispute have usually better chance by talking to each other (not infallible, but worth a try). https://judaism.meta.stackexchange.com/questions/5193/stack-overflow-inc-sinat-chinam-and-the-goat-for-azazel

Updated on October 12, 2021

Comments