C# StreamReader, "ReadLine" For Custom Delimiters
Solution 1
I figured I would post my own solution. It seems to work pretty well and the code is relatively simple. Feel free to comment.
public static String ReadUntil(this StreamReader sr, String delim)
{
StringBuilder sb = new StringBuilder();
bool found = false;
while (!found && !sr.EndOfStream)
{
for (int i = 0; i < delim.Length; i++)
{
Char c = (char)sr.Read();
sb.Append(c);
if (c != delim[i])
break;
if (i == delim.Length - 1)
{
sb.Remove(sb.Length - delim.Length, delim.Length);
found = true;
}
}
}
return sb.ToString();
}
Solution 2
This code should work for any string separator.
public static IEnumerable<string> ReadChunks(this TextReader reader, string chunkSep)
{
var sb = new StringBuilder();
var sepbuffer = new Queue<char>(chunkSep.Length);
var sepArray = chunkSep.ToCharArray();
while (reader.Peek() >= 0)
{
var nextChar = (char)reader.Read();
if (nextChar == chunkSep[sepbuffer.Count])
{
sepbuffer.Enqueue(nextChar);
if (sepbuffer.Count == chunkSep.Length)
{
yield return sb.ToString();
sb.Length = 0;
sepbuffer.Clear();
}
}
else
{
sepbuffer.Enqueue(nextChar);
while (sepbuffer.Count > 0)
{
sb.Append(sepbuffer.Dequeue());
if (sepbuffer.SequenceEqual(chunkSep.Take(sepbuffer.Count)))
break;
}
}
}
yield return sb.ToString() + new string(sepbuffer.ToArray());
}
Disclaimer:
I made a little testing on this and is actually slower than ReadLine
method, but I suspect it is due to the enqueue/dequeue/sequenceEqual calls that in the ReadLine
method can be avoided (because the separator is always \r\n
).
Again, I made few tests and it should work, but don't take it as perfect, and feel free to correct it. ;)
Solution 3
Here is a simple parser I used where needed (usually if streaming is not a paramount just read and .Split does the job), not too optimized but should work fine:
(it's more of a Split like method - and more notes below)
public static IEnumerable<string> Split(this Stream stream, string delimiter, StringSplitOptions options)
{
var buffer = new char[_bufffer_len];
StringBuilder output = new StringBuilder();
int read;
using (var reader = new StreamReader(stream))
{
do
{
read = reader.ReadBlock(buffer, 0, buffer.Length);
output.Append(buffer, 0, read);
var text = output.ToString();
int id = 0, total = 0;
while ((id = text.IndexOf(delimiter, id)) >= 0)
{
var line = text.Substring(total, id - total);
id += delimiter.Length;
if (options != StringSplitOptions.RemoveEmptyEntries || line != string.Empty)
yield return line;
total = id;
}
output.Remove(0, total);
}
while (read == buffer.Length);
}
if (options != StringSplitOptions.RemoveEmptyEntries || output.Length > 0)
yield return output.ToString();
}
...and you can simply switch to char delimiters if needed just replace the
while ((id = text.IndexOf(delimiter, id)) >= 0)
...with
while ((id = text.IndexOfAny(delimiters, id)) >= 0)
(and id++
instead of id+=
and a signature this Stream stream, StringSplitOptions options, params char[] delimiters
)
...also removes empty etc.
hope it helps
Solution 4
public static String ReadUntil(this StreamReader streamReader, String delimiter)
{
StringBuilder stringBuilder = new StringBuilder();
while (!streamReader.EndOfStream)
{
stringBuilder.Append(value: (Char) streamReader.Read());
if (stringBuilder.ToString().EndsWith(value: delimiter))
{
stringBuilder.Remove(stringBuilder.Length - delimiter.Length, delimiter.Length);
break;
}
}
return stringBuilder.ToString();
}
Eric
Updated on June 05, 2022Comments
-
Eric almost 2 years
What is the best way to have the functionality of the
StreamReader.ReadLine()
method, but with custom (String) delimiters?I'd like to do something like:
String text; while((text = myStreamReader.ReadUntil("my_delim")) != null) { Console.WriteLine(text); }
I attempted to make my own using
Peek()
andStringBuilder
, but it's too inefficient. I'm looking for suggestions or possibly an open-source solution.Thanks.
Edit
I should have clarified this earlier...I have seen this answer, however, I'd prefer not to read the entire file into memory.
-
Jon Coombs about 10 yearsIt would be slightly clearer (to me) if you put a "break" right after "found = true" as well. Requires a little bit less mental processing.
-
Jirka Hanika almost 10 yearsThis solution only works in some cases. For example, if the delimiter is "xy", then this algorithm will miss the delimiter in "axxyb" and it will read until the end of the stream.