Dask Dataframe: Get row count?

python dataframe dask

11,650

Solution 1

# ensure small enough block size for the graph to fit in your memory
ddf = dask.dataframe.read_csv('*.csv', blocksize="10MB") 
ddf.shape[0].compute()

From the documentation:

blocksize <str, int or None> Optional Number of bytes by which to cut up larger files. Default value is computed based on available physical memory and the number of cores, up to a maximum of 64MB. Can be a number like 64000000` or a string like ``"64MB". If None, a single block is used for each file.

Solution 2

If you only need the number of rows -
you can load a subset of the columns while selecting the columns with lower memory usage (such as category/integers and not string/object), there after you can run len(df.index)

11,650

Author by

usbToaster

Updated on June 07, 2022

Comments

usbToaster about 2 years

Simple question: I have a dataframe in dask containing about 300 mln records. I need to know the exact number of rows that the dataframe contains. Is there an easy way to do this?

When I try to run dataframe.x.count().compute() it looks like it tries to load the entire data into RAM, for which there is no space and it crashes.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

Convert Pandas dataframe to Dask dataframe

Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`

Strategy for partitioning dask dataframes efficiently

Pandas dataframe to excel: AttributeError: 'list' object has no attribute 'to_excel'

Plot line graph from Pandas dataframe (with multiple lines)

Python DataFrames For Loop with If Statement not working

Select columns from dataframe on condition they exist

Get dot-product of dataframe with vector, and return dataframe, in Pandas

Error, 'only list-like objects are allowed to be passed to isin(), you passed a [int]'

Best way to subset a pandas dataframe