Pandas every nth row

python pandas resampling

169,779

Solution 1

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

Solution 2

Though @chrisb's accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.

Solution 3

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There's also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

Solution 4

Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.

df1 = df[df.reset_index().index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0]  # Selects every 3rd row starting from 0

df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.

Solution 5

I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]

View more solutions

169,779

mikael

Updated on June 25, 2021

Comments

mikael almost 3 years

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?
Little Bobby Tables over 7 years

For those who might want, for example, every fifth row, but starting at the 2nd row it would be df.iloc[1::5, :].
Constantine almost 6 years

this is not a good answer because makes three assumptions, which are frequently not met: (1) the index is numeric (2) the index it starts at zero (3) the index values are consecutive ... the last one is especially important since you can't use your suggested method more than once without resetting the index
metastableB almost 6 years

I take your point. Will edit the answer to make the assumptions more explicit.
joctee over 5 years

You can omit the column part: df.iloc[::5]
Readler almost 5 years

@Constantine still, wouldn't that be faster than the other solution as you can simply add an index?
FabioSpaghetti over 4 years

@chrisb how do I specify the starting row ? like every 5 row, starting from the second row ?
ppwater over 3 years

While this code may answer the question, including an explanation of how or why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
JohnAndrews about 3 years

How do you include it from the back?
Raksha almost 3 years

how do you make it not include 0th row?
topher217 almost 3 years

What is this slicing syntax called and where can I read more about it?
David Parks over 2 years

This is standard Python slicing. See stackoverflow.com/questions/509211/understanding-slice-notat‌ion
banderlog013 over 2 years

For every 3rd row it will be unintuitive df.iloc[2::3]
Lodinn about 2 years

@banderlog013 No, that's intuitive - just df.iloc[::3] would suffice. What you want ("intuitively") is to the first row in selection to not be the first row in the dataframe. It's not hard to see that for any given N ("give me N rows starting with the naturally-counted Nth row") the indexing is df.iloc[(N-1)::N]. This behavior is rarely needed, however...