Oracle count (*) is taking too much time

16,292

Solution 1

Counting the number of rows of large table takes long time. It's natural. Some DBMS stores the number of records, however, this kinds of DBMS limits concurrency. It should lock the entire table before DML operation on the table. (The entire table lock is necessary to update the count properly.)

The value in ALL_TABLES.NUM_ROWS (or USER_TABLES.NUM_ROWS) is just a statistical information generated by analyze table ... or dbms_stats.gather_table_stats procedure. It's not accurate, not real-time information.

If you don't need the exact number of rows, you can use the statistical information. However you shouldn't depend on it. It's used by Oracle optimizer, but shouldn't in application program.

I'm not sure why you have to count the number of rows of the table. If you need it in the batch program which is run infrequently, you can partition the table to increase the parallelism. If you need the count in online program, you should find a way not to use the count.

Solution 2

A few issues to mention:

  1. For "select count(*) from table" to use an index, the indexed column must be non-nullable, or the index must be a bitmap type.
  2. If there are known to be no nulls in the column but there is no not null constraint on it, then use "select count(*) from table where column_name is not null".
  3. It does of course have to be more efficient to scan the index than the table, but with so many table columns you're probably fine there.
  4. If you really want a parallel index scan, use the parallel_index hint, not parallel. But with only 7 million rows you might not find any need for parallelism.
  5. You need to check the execution plan to see if an index and/or parallel query is in use.
  6. If you can use an estimated number of rows then consider using the sample clause: for example "select 1000*count(*) from table sample(0.1)"
Share:
16,292
Brabin
Author by

Brabin

Updated on June 04, 2022

Comments

  • Brabin
    Brabin almost 2 years

    I was trying to fetch the count(*) from the table, which has almost 7 million records and it taking more than an hour for returning the result.

    Also the table has 153 columns out of which index has been created for column 123, so tried to run the following query in parallel, but it didn't help.

    select /*+ parallel (5) */ count(123) from <table_name>
    

    Please suggest if there is alternative way.

    When I ran desc on the table in Toad, the index tab holds the value of no. of rows. Any idea how that value is getting updated there?

  • Jon Heller
    Jon Heller almost 11 years
    +1 But in general parallelism and partitioning are independent of each other. That was less true in older versions, and there are still some places where parallelism and partitioning must work together, like with partition-wise joins. But for just counting all the rows, dividing up the extents among 5 different processes probably wouldn't benefit from partitioning.
  • Jon Heller
    Jon Heller almost 11 years
    Why should the DOP be a power of two? Are you thinking about the number of hash partitions instead?
  • APC
    APC almost 11 years
    @jonearles - nope, I'm definitely think about parallel query. The rule of thumb for DOP is 2*CPUs (assuming there are no other processes on the system contending for resource). Although the reason is the same as for hash partitions - even distribution, in this case of workload.
  • Jon Heller
    Jon Heller almost 11 years
    Here's some more information about Granules of Parallelism if you're curious.
  • ntalbs
    ntalbs almost 11 years
    @jonearles Yes, you are right. Thank you for your kindness and useful information.
  • Jon Heller
    Jon Heller almost 11 years
    The power of 2 rule is useful for hash partitioning because of the way ORA_HASH works. But I can't find any source that implies ORA_HASH, or a similar function, plays a role in distributing blocks to parallel servers. I assume that however the list of blocks and extents are stored, for this query it's a simple matter of taking that list and splitting it into N groups. Dividing by 5 should work as well as 4 or 8. And I've seen lots of Oracle examples and some auto-DOPs that are not powers of 2. The number is almost always a multiple of 2, but I wonder if even that really matters.
  • APC
    APC almost 11 years
    @jonearles - we're straying off topic here. But anyway, see my edit.
  • David Aldridge
    David Aldridge almost 11 years
    @jonearles important to note though that parallel query are not a prerequisite for benefiting from partition-wise joins. Even a serially executed hash join will improve.