Is it bad for performance to select all columns?

12,661

Solution 1

The issue here isn't so much a matter of the database server, as just the network communication. By selecting all columns at once, you're telling the server to return to you, all columns at once. As for concerns over IO and all that, those are addressed nicely in the question and answer @Karamba gave in a comment: select * vs select column. But for most real-world applications (and I use "applications" in every sense), the main concern is just network traffic and how long it takes to serialize, transmit, then deserialize the data. Although really, the answer is the same either way.

So pulling back all the columns is great, if you intend to use them all, but that can be a lot of extra data transfer, particularly if you store, say, lengthy strings in your columns. In many cases, of course, the difference will be undetectable and is mostly just a matter of principle. Not all, but a significant majority.

It's really just a trade-off between your aforementioned laziness (and trust me, we all feel that way) now and how important performance really is.

That all said, if you do intend to use all the column values, you're much better off pulling them all back at once then you are filing a bunch of queries.

Think of it like doing a web search: you do your search, you find your page, and you only need one detail. You could read the entire page and know everything about the subject, or you could just jump to the part about what you're looking for and be done. The latter is a lot faster if that's all you ever want, but if you're then going to have to learn about the other aspects, you'd be way better off reading them the first time than having to do your search again and find the site to talk about it.

If you aren't sure whether you'll need the other column values in the future, then that's your call to make as the developer for which case is more likely.

It all depends on what your application is, what your data is, how you're using it, and how important performance really is to you.

Solution 2

Selecting a single column can have a large effect on the performance of certain queries. For example, it is more efficient for the query engine to process an index rather than look up data in the original data pages. If a covering index is available -- that is, an index that contains all the columns needed for a query -- then the query will run faster. For large tables that are too big for available memory, the use of a covering index can be a big, big win. (Think orders of magnitude improvement in performance in some cases.)

Another case when a limited number of columns is beneficial is when one or more of the columns are very large, such as a BLOB or TEXT column. These can grow in size to tens of thousands of bytes or even megabytes. Retrieving them and put a big load on the server.

There is a danger in using *, if you have prepared statements and the underlying structure of the table changes. The query itself could get out-of-date (I've had this problem on other databases, but not specifically on MySQL). The underlying change could be as simple as changing the name of a column. What would be caught as a compile time error is instead a run-time error that might be much more mysterious.

In general, the reasons given for avoiding * have more to do with network performance. In many cases, it is not going to make much difference. If you are returning 20 rows from a table where each row contains, on average 100 or 200 bytes, then then difference between selecting all the columns and a subset of the columns will be minor in most hardware environments. The vast majority of the time the spent for the query will be for compiling the query, executing it in the engine, and reading the data pages. The difference between returning 200 bytes or 2000 bytes probably won't be a big difference.

However, there are cases (such as the ones listed above) where it can make a big difference. So, avoiding * is a good habit, but using it now and then probably isn't going to bring down your system.

Share:
12,661

Related videos on Youtube

yoshi
Author by

yoshi

Updated on June 11, 2022

Comments

  • yoshi
    yoshi almost 2 years

    Is it bad to SELECT all columns at once even though you probably don't neeed all of them? However you might need them in another task but you are to lazy to write queries for every task.

    Should you only do queries where you SELECT only columns you need and do this query again if you need another column?

    So basically the question is: Does it has any effect on performance to SELECT one column vs multiple columns?

    The query is very simple (no functions, joins etc.) For example:

    SELECT
    id, name, status, date
    FROM user_table
    WHERE user_id = :user_id
    
    • zerkms
      zerkms over 9 years
      It does affect the performance, but the actual effect value varies depending on a lot of factors. Generally - you want your DBMS server to not do more job than it is required to fulfill your requirements. From the other hand - the shorter the query, the faster it is parsed by mysql.
    • Karamba
      Karamba over 9 years