Pandas every nth row

169,779

Solution 1

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

Solution 2

Though @chrisb's accepted answer does answer the question, I would like to add to it the following.

A simple method I use to get the nth data or drop the nth row is the following:

df1 = df[df.index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0]  # Selects every 3rd raw starting from 0

This arithmetic based sampling has the ability to enable even more complex row-selections.

This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.

Solution 3

There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

For example, to get every 2 rows, you can do

df[::2]

   a  b  c
0  x  x  x
2  x  x  x
4  x  x  x

There's also GroupBy.first/GroupBy.head, you group on the index:

df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')

df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do

# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x

Solution 4

Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.

df1 = df[df.reset_index().index % 3 != 0]  # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0]  # Selects every 3rd row starting from 0

df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.

Solution 5

I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.

groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]
Share:
169,779

Related videos on Youtube

mikael
Author by

mikael

Updated on June 25, 2021

Comments

  • mikael
    mikael almost 3 years

    Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?

  • Little Bobby Tables
    Little Bobby Tables over 7 years
    For those who might want, for example, every fifth row, but starting at the 2nd row it would be df.iloc[1::5, :].
  • Constantine
    Constantine almost 6 years
    this is not a good answer because makes three assumptions, which are frequently not met: (1) the index is numeric (2) the index it starts at zero (3) the index values are consecutive ... the last one is especially important since you can't use your suggested method more than once without resetting the index
  • metastableB
    metastableB almost 6 years
    I take your point. Will edit the answer to make the assumptions more explicit.
  • joctee
    joctee over 5 years
    You can omit the column part: df.iloc[::5]
  • Readler
    Readler almost 5 years
    @Constantine still, wouldn't that be faster than the other solution as you can simply add an index?
  • FabioSpaghetti
    FabioSpaghetti over 4 years
    @chrisb how do I specify the starting row ? like every 5 row, starting from the second row ?
  • ppwater
    ppwater over 3 years
    While this code may answer the question, including an explanation of how or why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
  • JohnAndrews
    JohnAndrews about 3 years
    How do you include it from the back?
  • Raksha
    Raksha almost 3 years
    how do you make it not include 0th row?
  • topher217
    topher217 almost 3 years
    What is this slicing syntax called and where can I read more about it?
  • David Parks
    David Parks over 2 years
  • banderlog013
    banderlog013 over 2 years
    For every 3rd row it will be unintuitive df.iloc[2::3]
  • Lodinn
    Lodinn about 2 years
    @banderlog013 No, that's intuitive - just df.iloc[::3] would suffice. What you want ("intuitively") is to the first row in selection to not be the first row in the dataframe. It's not hard to see that for any given N ("give me N rows starting with the naturally-counted Nth row") the indexing is df.iloc[(N-1)::N]. This behavior is rarely needed, however...