Split a collection into `n` parts with LINQ?
Solution 1
A pure linq and the simplest solution is as shown below.
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.AsEnumerable();
return splits;
}
}
Solution 2
EDIT: Okay, it looks like I misread the question. I read it as "pieces of length n" rather than "n pieces". Doh! Considering deleting answer...
(Original answer)
I don't believe there's a built-in way of partitioning, although I intend to write one in my set of additions to LINQ to Objects. Marc Gravell has an implementation here although I would probably modify it to return a read-only view:
public static IEnumerable<IEnumerable<T>> Partition<T>
(this IEnumerable<T> source, int size)
{
T[] array = null;
int count = 0;
foreach (T item in source)
{
if (array == null)
{
array = new T[size];
}
array[count] = item;
count++;
if (count == size)
{
yield return new ReadOnlyCollection<T>(array);
array = null;
count = 0;
}
}
if (array != null)
{
Array.Resize(ref array, count);
yield return new ReadOnlyCollection<T>(array);
}
}
Solution 3
static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> list, int parts)
{
return list.Select((item, index) => new {index, item})
.GroupBy(x => x.index % parts)
.Select(x => x.Select(y => y.item));
}
}
Solution 4
Ok, I'll throw my hat in the ring. The advantages of my algorithm:
- No expensive multiplication, division, or modulus operators
- All operations are O(1) (see note below)
- Works for IEnumerable<> source (no Count property needed)
- Simple
The code:
public static IEnumerable<IEnumerable<T>>
Section<T>(this IEnumerable<T> source, int length)
{
if (length <= 0)
throw new ArgumentOutOfRangeException("length");
var section = new List<T>(length);
foreach (var item in source)
{
section.Add(item);
if (section.Count == length)
{
yield return section.AsReadOnly();
section = new List<T>(length);
}
}
if (section.Count > 0)
yield return section.AsReadOnly();
}
As pointed out in the comments below, this approach doesn't actually address the original question which asked for a fixed number of sections of approximately equal length. That said, you can still use my approach to solve the original question by calling it this way:
myEnum.Section(myEnum.Count() / number_of_sections + 1)
When used in this manner, the approach is no longer O(1) as the Count() operation is O(N).
Solution 5
This is same as the accepted answer, but a much simpler representation:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
int numOfParts)
{
int i = 0;
return items.GroupBy(x => i++ % numOfParts);
}
The above method splits an IEnumerable<T>
into N number of chunks of equal sizes or close to equal sizes.
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
The above method splits an IEnumerable<T>
into chunks of desired fixed size with total number of chunks being unimportant - which is not what the question is about.
The problem with the Split
method, besides being slower, is that it scrambles the output in the sense that the grouping will be done on the basis of i'th multiple of N for each position, or in other words you don't get the chunks in the original order.
Almost every answer here either doesn't preserve order, or is about partitioning and not splitting, or is plainly wrong. Try this which is faster, preserves order but a lil' more verbose:
public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items,
int numberOfChunks)
{
if (numberOfChunks <= 0 || numberOfChunks > items.Count)
throw new ArgumentOutOfRangeException("numberOfChunks");
int sizePerPacket = items.Count / numberOfChunks;
int extra = items.Count % numberOfChunks;
for (int i = 0; i < numberOfChunks - extra; i++)
yield return items.Skip(i * sizePerPacket).Take(sizePerPacket);
int alreadyReturnedCount = (numberOfChunks - extra) * sizePerPacket;
int toReturnCount = extra == 0 ? 0 : (items.Count - numberOfChunks) / extra + 1;
for (int i = 0; i < extra; i++)
yield return items.Skip(alreadyReturnedCount + i * toReturnCount).Take(toReturnCount);
}
The equivalent method for a Partition
operation here
aruno
Updated on September 01, 2021Comments
-
aruno over 2 years
Is there a nice way to split a collection into
n
parts with LINQ? Not necessarily evenly of course.That is, I want to divide the collection into sub-collections, which each contains a subset of the elements, where the last collection can be ragged.
-
Marc Gravell over 15 yearsYou really don't like those "array[count++]", eh ;-p
-
Jon Skeet over 15 yearsIronically I might well have used it before yesterday. But having written that answer it would have been hypocritical - and looking at both versions I think this is slightly easier to read at a glance. At first I used a list instead - Add is even more readable :)
-
Marc Gravell over 15 yearsWell, I just added a slightly different answer (to address "n pieces" rather than "pieces of length n"), and mixed in parts of your version ;-p
-
Jonathan Allen about 15 yearsDoing all those modulus operations can get a bit expensive on long lists.
-
Marc Gravell over 14 yearsIt would be better to use the Select overload that includes the index.
-
reustmd about 13 yearsI've added a response that uses the select overload and method chaining syntax
-
ShadowChaser about 13 yearsThis version breaks the contract of IEnumerator. It's not valid to throw InvalidOperationException when Reset is called - I believe many of the LINQ extension methods rely on this behavior.
-
ShadowChaser about 13 yearsI initially programmed the same thing, but the pattern breaks when Reset is called on one of the nested IEnumerable<T> instances.
-
ShadowChaser about 13 yearsBrilliant - best solution here! A few optimizations: * Clear the linked list instead of creating a new one for each section. A reference to the linked list is never returned to the caller, so it's completely safe. * Don't create the linked list until you reach the first item - that way there's no allocation if the source is empty
-
Mike about 13 years@ShadowChaser According to MSDN clearing the LinkedList is O(N) complexity so it would ruin my goal of O(1). Of course, you could argue that the foreach is O(N) to begin with... :)
-
aruno over 12 yearsof course this time around I needed pieces of 8 rather than 8 pieces :-) and this worked great
-
Brad over 12 yearsDoes this still work if you only enumerate the partition and not the inner enumerable? since the inner enumerator is deferred then none of the code for the inner (take from current) will execute until it is enumerated therefore movenext() will only be called by the outer partition function, right? If my assumptions are true then this can potentially yield n partitions with n elements in the original enumerable and the inner enumerables will yield unexpected results
-
Robin Maben over 12 years@HasanKhan: What if I dont know how many parts I'm splitting it into. I mean, split into groups of 3, lets say.
-
Elmer over 12 yearsPlease explain why. I have been using this function without any trouble!
-
Muhammad Hasan Khan over 12 yearsread the question again and see if you get n (almost) equal length parts with your function
-
Gishu over 11 yearsThank you for not deleting even though it is not an answer for the OP, I wanted the exact same thing - pieces of length n :).
-
nawfal over 11 yearsyour answer is right, but the question is wrong for it. Your answer gives unknown number of chunks with fixed size for each chunk. But OP wants a Split functionality where it gives fixed number of chunks with any size per chunk(hopefully of equal or close to equal sizes). Perhaps more suited here stackoverflow.com/questions/3773403/…
-
nawfal over 11 years@Elmer your answer is right, but the question is wrong for it. Your answer gives unknown number of chunks with fixed size for each chunk (exactly as Partition, the name you have given for it). But OP wants a Split functionality where it gives fixed number of chunks with any size per chunk(hopefully of equal or close to equal sizes). Perhaps more suited here stackoverflow.com/questions/3773403/…
-
Mike over 11 years@nawfal - I agree. I approached it this way because it doesn't require knowing the enumerable's length ahead of time, which is more efficient for some applications. If you want a fixed number of chunks then you can call myEnum.Section(myEnum.Count() / number_of_chunks + 1). I will update my answer to reflect this.
-
nawfal over 11 yearsIt can be done. Instead of direct casting just make the function generic and then call it for your int array
-
nawfal over 11 years@Mike, why all the hassle of linkedlist, separate static function etc? Why not a straightforward approach which I believe makes the function even faster? In the previous thread I posted, here's my answer which is a similar implementation which is faster with no hassles of collection structures.
-
Mike over 11 years@nawfal - I wanted an approach that was O(1) or "constant time." The Skip() inside your loop makes your approach O(N) because the time required to process the skip increases linearly with the number of items. Consider an enumerable with 1e12 items being separated into groups of two--the further you get into the enumerable, the slower it will go. I was also trying to avoid multiplication, which is expensive to compute relative to the other operations.
-
nawfal over 11 years@Mike have you benchmarked it? I hope you know O(1) doesn't mean faster, it only means that the time required for partitioning doesn't scale. I'm just wondering what's your rationale to blindly stick to O(1) when it can be slower than other O(n)'s for all the real life scenarios. I even tested it for a crazy 10^8 strength list and mine appeared to be faster still. I hope you know there are not even standard collection types that can hold 10^12 items..
-
nawfal over 11 yearsFew points: 1) why do you keep insisting that multiplication/modulo operations etc are expensive? 2)
myEnum.Section(myEnum.Count() / number_of_sections + 1)
enumerates the first collection (and for the given question can be inefficient), so pls edit the advantages you have mentioned for your approach. -
nawfal over 11 years3) You may use a lightweight collection structure instead of LinkedList<T> as in Jon Skeet's or Brad's answers, they tend to be faster. As I said O(1) doesn't guarantee speed. The possible overhead with LinkedList outweighs its gains here (they are meant for random removal and insertion). See this link here
-
Mike over 11 years@nawfal - Thank you for your detailed analysis, it helps keep me on my toes. Linked lists in general are known for efficient end-inserts, which is why I selected it here. However I just benchmarked it and indeed the List<> is much faster. I suspect this is some sort of .NET implementation detail, perhaps deserving of a separate StackOverflow question. I have modified my answer to use List<> as per your suggestion. Preallocating the list capacity guarantees that end-insertion is still O(1) and meets my original design goal. I also switched to the built-in .AsReadOnly() in .NET 4.5.
-
Mike over 11 years@nawfal - At the machine code level, mult/div/mod are generally much more expensive (by an order of magnitude) than simple operations such as set/get/compare. Of course it varies by processor, compiler, and implementation, so you are indeed wise to benchmark your specific case if this sectioning operation is creating a bottleneck.
-
toong almost 11 years@ShadowChaser I think Reset() should throw a NotSupportedException and everything would be fine. From the MSDN documentation: "The Reset method is provided for COM interoperability. It does not necessarily need to be implemented; instead, the implementer can simply throw a NotSupportedException."
-
ShadowChaser almost 11 years@toong Wow, you're right. Not sure how I missed that after all this time.
-
ironic over 10 yearsSorry, this may be a dumb question, but why do we need .AsReadOnly() call here?
-
Mike over 10 yearsTechnically you don't need it but it's kind of a "best practice" since we're returning an IEnumerable<> which implies that it is immutable.
-
piedar over 10 yearsI have an irrational dislike of SQL-style Linq, so this is my favorite answer.
-
CodeFox over 10 years@Jon, I just wonder why you prefer the
ReadOnlyCollection
to a read-only array in this case? -
CodeFox about 10 years@Jon: Ah - good point! I have had the fixed length of arrays in mind, but of course - in contrast to the
ReadOnlyCollection
- the elements could be replaced. Thanks for your reply. -
Frank Tzanabetis about 10 years@Gishu - Indeed - it's exactly what i needed too!
-
Dejan about 10 yearsThe problem with this approach is that it traverses the complete source list (requiring to have the whole list in-memory) before moving on. Does anyone know a better solution for large enumerables?
-
Jon Skeet about 10 years@Dejan: No it doesn't. Note the use of
yield return
. It requires one batch to be in memory at a time, but that's all. -
Dejan about 10 yearsYes, I've seen the
yield return
and of course my comment "needs to traverse the complete list" is plainly wrong. Sorry. What happened was this: I was trying to use the method in a Parallel.ForEach in order to partition my data, which exploded in memory usage. I guess that the TPL spawns too many Tasks because theyield return
only comes after filling a partition and so the first phase will be just preparing partitions in parallel. I guess what I need is a proper implementation of msdn.microsoft.com/en-us/library/dd381768(VS.100).aspx. -
Jon Skeet about 10 years@Dejan: Right - I wouldn't like to guess about how it interacts with Parallel LINQ partitioning, to be honest :)
-
SalientBrain over 9 yearsThis is not working at all! The best possible is here stackoverflow.com/questions/13709626/…! See comments.
-
SalientBrain over 9 yearsIt is buggy! I don't remember exactly, but (as far as I remember) it performs unwanted step and it can lead to ugly side effects (with datareader for ex.). The best solution is here (Jeppe Stig Nielsen): stackoverflow.com/questions/13709626/…
-
SalientBrain over 9 yearsBufferization is a disadvantage if you have big chunks. I think the best solution is here (Jeppe Stig Nielsen): stackoverflow.com/questions/13709626/…
-
Jon Skeet over 9 years@SalientBrain: Um, Jeppe Stig Nielsen's solution still uses batching - note the call to
ToList
inyield return GetNextBatch(enumerator).ToList();
. Batching is necessary as otherwise you end up with very odd results if you skip one partition. -
SalientBrain over 9 years@Jon Skeet: Yep, I missed that)) So no magic(. Same implementation in other words. Thanks.
-
Lucas Rodrigues Sena over 9 yearsMy not work, I just have one Splits whit count 1 and all my items inside this One position, any idea?
-
Muhammad Hasan Khan over 9 years@LucasRodriguesSena This method asks you for no. of splits, not how much you want in each split. You can get the no. of splits by dividing your desired size by total no. of items.
-
drzaus almost 9 years@Brad it will "fail" as you expect, similar to some of the issues in this thread stackoverflow.com/questions/419019/… (specifically stackoverflow.com/a/20953521/1037948)
-
Alex almost 9 years
.AsEnumerable()
is not necessary, IGrouping<T> is already an IEnumerable<T>. -
Muhammad Hasan Khan almost 9 years@Alex If you return part directly return type would be IEnumerable<IGrouping<T>> instead of IEnumerable<IEnumerable<T>>
-
Alex almost 9 years@HasanKhan I didn't run the code but I tried and it compiles just fine. I think since the return type is explicitly
IEnumerable<IEnumerable<T>>
an implicit cast is taking place when you return part. No? -
Muhammad Hasan Khan almost 9 years@Alex that's right. When I wrote it covariance wasn't introduce in C#.
-
Alex almost 9 years@HasanKhan no harms done. Being explicit in code usually is good thing.
-
Jake Drew almost 9 yearsI think you can just change i.Index / partitionSize to i.Index % partitionSize and get the requested result. I also prefer this over the accepted answer as it is more compact and readable.
-
Karthik Arthik over 8 years@manu08, I have tried ur code, i have a list
var dept = {1,2,3,4,5}
. After splitting the result is likedept1 = {1,3,5}
anddept2 = { 2,4 }
whereparts = 2
. But the result i need isdept1 = {1,2,3}
anddept2 = {4,5}
-
goodeye over 7 yearsI had the same issue with modulo, so I calculated the column length with
int columnLength = (int)Math.Ceiling((decimal)(list.Count()) / parts);
then did division with.GroupBy(x => x.index / columnLength)
. One downside is Count() enumerates the list. -
MBoros about 7 yearsNext time try: var nrs = Enumerable.Range(1,2000).ToList();
-
chinookf almost 7 yearsFor VB.NET, I had difficulty with query syntax due to lack of i++ in VB (and struggled with the Into keyword), so I used method syntax instead:
Dim i As Integer = 0 Dim splits As IEnumerable(Of IEnumerable(Of T)) = list.GroupBy(Function(item) Dim groupRemainder As Integer = i Mod parts i += 1 Return groupRemainder End Function).Select(Function(group) group.AsEnumerable())
-
Edward Brey about 3 yearsThis method (and others like it) relies on the list size being larger than the number of parts it's being split into.
-
claudekennilol over 2 yearsI see that this was in an RC for .net 6, but it doesn't look like this actually made it.
-
Paul Hodgson over 2 yearsMy preferred answer as it maintains the order of the list, I had to make a modification to evenly distribute the "extra" chunks uniformly throughout the data set though.
-
Alex Dresko over 2 yearsThis code does not do the same thing as .NET 6 Chunk. If you call .NET 6 Chunk with 200 items, and a chunk value of 100, you will get 2 chunks. If you do the same thing with the code above, you'll get 100 chunks, each with 2 items.