Benchmark: bigint vs int on PostgreSQL

11,376

In a 64-bit system the two tables are nearly identical. The column description_id in description_int covers 8 bytes (4 for integer and 4 as alignment). Try this test:

select 
    pg_relation_size('description_int')/10000000 as table_int, 
    pg_relation_size('description_bigint')/10000000 as table_bigint,
    pg_relation_size('description_int_pkey')/10000000 as index_int,
    pg_relation_size('description_bigint_pkey')/10000000 as index_bigint;

The average row size of both tables is virtually the same. This is because the integer column occupies 8 bytes (4 bytes for a value and 4 bytes of alignment) exactly like bigint (8 bytes for a value without a filler). The same applies to index entries. This is a special case, however. If we add one more integer column to the first table:

CREATE TABLE two_integers
(
  description_id INT PRIMARY KEY NOT NULL,
  one_more_int INT,
  description VARCHAR(200),
  constraint description_id_positive CHECK (description_id >= 0)
);

the average row size should remain the same because the first 8 bytes will be used for two integers (without filler).

Find more details in Calculating and saving space in PostgreSQL.

Share:
11,376
Valter Silva
Author by

Valter Silva

Updated on June 22, 2022

Comments

  • Valter Silva
    Valter Silva about 2 years

    I want to increase my database performance. In a project, all tables went from int to bigint, which I think is a bad choice not only regarding storage, since int requires 4 bytes, and bigint requires 8 bytes;but also regarding performance. So I created a small table with 10 millions entries, with a script in Python:

    import uuid
    
    rows=10000000
    
    output='insert_description_bigint.sql'
    f = open(output, 'w')
    
    set_schema="SET search_path = norma;\n"
    f.write(set_schema)
    
    for i in range(1,rows):
        random_string=uuid.uuid4()
        query="insert into description_bigint (description_id, description) values (%d, '%s'); \n"
        f.write(query % (i,random_string))
    

    And this is how I created my two tables:

    -- BIGINT
    
    DROP TABLE IF EXISTS description_bigint;
    
    CREATE TABLE description_bigint
    (
      description_id BIGINT PRIMARY KEY NOT NULL,
      description VARCHAR(200),
      constraint description_id_positive CHECK (description_id >= 0)
    );
    
    select count(1) from description_bigint;
    select * from description_bigint;
    select * from description_bigint where description_id = 9999999;
    
    -- INT
    
    DROP TABLE IF EXISTS description_int;
    
    CREATE TABLE description_int
    (
      description_id INT PRIMARY KEY NOT NULL,
      description VARCHAR(200),
      constraint description_id_positive CHECK (description_id >= 0)
    );
    

    After inserting all this data, I do a query for both tables, to measure the difference between them. And for my surprise they both have the same performance:

    select * from description_bigint; -- 11m55s
    select * from description_int; -- 11m55s
    

    Am I doing something wrong with my benchmark ? Shouldn't int be faster than bigint ? Especially, when the primary key is by definition an index which means, to create an index for bigint would be slower than create an index for int, with the same amount of data, right ?

    I know that is not just a small thing that will make a huge impact regarding performance on my database, but I want to ensure that we are using the best practices and focused into performance here.

  • Valter Silva
    Valter Silva almost 8 years
    Hi @klin, I got this result 76;76;22;22; Would you mind to clarify it ?
  • LaVache
    LaVache over 7 years
    @klin You seem to be using the word "bits" instead of "bytes" in many places.