Python string formatting: is '%' more efficient than 'format' function?
- Yes,
%
string formatting is faster than the.format
method - most likely (this may have a much better explanation) due to
%
being a syntactical notation (hence fast execution), whereas.format
involves at least one extra method call - because attribute value access also involves an extra method call, viz.
__getattr__
I ran a slightly better analysis (on Python 3.8.2) using timeit
of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -
+-----------------+-------+-------+-------+-------+-------+--------+ | Type \ num_vars | 1 | 2 | 5 | 10 | 50 | 250 | +-----------------+-------+-------+-------+-------+-------+--------+ | f_str_str | 0.056 | 0.063 | 0.115 | 0.173 | 0.754 | 3.717 | +-----------------+-------+-------+-------+-------+-------+--------+ | f_str_int | 0.055 | 0.148 | 0.354 | 0.656 | 3.186 | 15.747 | +-----------------+-------+-------+-------+-------+-------+--------+ | concat_str | 0.012 | 0.044 | 0.169 | 0.333 | 1.888 | 10.231 | +-----------------+-------+-------+-------+-------+-------+--------+ | pct_s_str | 0.091 | 0.114 | 0.182 | 0.313 | 1.213 | 6.019 | +-----------------+-------+-------+-------+-------+-------+--------+ | pct_s_int | 0.09 | 0.141 | 0.248 | 0.479 | 2.179 | 10.768 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format_str | 0.143 | 0.157 | 0.251 | 0.461 | 1.745 | 8.259 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format_int | 0.141 | 0.192 | 0.333 | 0.62 | 2.735 | 13.298 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format2_str | 0.159 | 0.195 | 0.33 | 0.634 | 3.494 | 18.975 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format2_int | 0.158 | 0.227 | 0.422 | 0.762 | 4.337 | 25.498 | +-----------------+-------+-------+-------+-------+-------+--------+
The trailing _str
& _int
represent the operation was carried out on respective value types.
Kindly note that the concat_str
result for a single variable is essentially just the string itself, so it shouldn't really be considered.
My setup for arriving at the results -
from timeit import timeit
from beautifultable import BeautifulTable # pip install beautifultable
times = {}
for num_vars in (250, 50, 10, 5, 2, 1):
f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"
# "f'{x0}{x1}'"
concat = '+'.join([f'x{i}' for i in range(num_vars)])
# 'x0+x1'
pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"%s%s" % (x0,x1)'
dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"{}{}".format(x0,x1)'
dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'
# '"{0}{1}".format(x0,x1)'
vars = ','.join([f'x{i}' for i in range(num_vars)])
vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'
setup_str = f'{vars} = {vals_str}'
# "x0,x1 = ('0', '1')"
vals_int = tuple(range(num_vars)) if num_vars > 1 else 0
setup_int = f'{vars} = {vals_int}'
# 'x0,x1 = (0, 1)'
times[num_vars] = {
'f_str_str': timeit(f_str, setup_str),
'f_str_int': timeit(f_str, setup_int),
'concat_str': timeit(concat, setup_str),
# 'concat_int': timeit(concat, setup_int), # this will be summation, not concat
'pct_s_str': timeit(pct_s, setup_str),
'pct_s_int': timeit(pct_s, setup_int),
'dot_format_str': timeit(dot_format, setup_str),
'dot_format_int': timeit(dot_format, setup_int),
'dot_format2_str': timeit(dot_format2, setup_str),
'dot_format2_int': timeit(dot_format2, setup_int),
}
table = BeautifulTable()
table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys()))
# Order is preserved, so I didn't worry much
for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):
table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)])
print(table)
I couldn't go beyond num_vars=250
because of the max arguments (255) limit with timeit
.
tl;dr - Python string formatting performance : f-strings
are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.
Related videos on Youtube
Jean-Francois T.
Working in Software Testing for embedded application in Safety-Critical Embedded Systems. Passionate about Python and OCaml but also C language and other script language (Perl, TCL), and trying to learn Google Scripts/JS. Perpetual learner and love experiment new things. One true admirer of VS Code and Notepad++. Recently embarked on the Deep Learning adventure with startup "UnpackAI".
Updated on June 04, 2022Comments
-
Jean-Francois T. almost 2 years
I wanted to compare different to build a string in Python from different variables:
- using
+
to concatenate (referred to as 'plus') - using
%
- using
"".join(list)
- using
format
function - using
"{0.<attribute>}".format(object)
I compared for 3 types of scenari
- string with 2 variables
- string with 4 variables
- string with 4 variables, each used twice
I measured 1 million operations of each time and performed an average over 6 measures. I came up with the following timings:
test_plus: 0.29480 test_percent: 0.47540 test_join: 0.56240 test_format: 0.72760 test_formatC: 0.90000 test_plus_long: 0.50520 test_percent_long: 0.58660 test_join_long: 0.64540 test_format_long: 1.03400 test_formatC_long: 1.28020 test_plus_long2: 0.95220 test_percent_long2: 0.81580 test_join_long2: 0.88400 test_format_long2: 1.51500 test_formatC_long2: 1.97160
In each scenario, I came up with the following conclusion
- Concatenation seems to be one of the fastest method
- Formatting using
%
is much faster than formatting withformat
function
I believe
format
is much better than%
(e.g. in this question) and%
was almost deprecated.I have therefore several questions:
- Is
%
really faster thanformat
? - If so, why is that?
- Why is
"{} {}".format(var1, var2)
more efficient than"{0.attribute1} {0.attribute2}".format(object)
?
For reference, I used the following code to measure the different timings.
import time def timing(f, n, show, *args): if show: print f.__name__ + ":\t", r = range(n/10) t1 = time.clock() for i in r: f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args) t2 = time.clock() timing = round(t2-t1, 3) if show: print timing return timing class values(object): def __init__(self, a, b, c="", d=""): self.a = a self.b = b self.c = c self.d = d def test_plus(a, b): return a + "-" + b def test_percent(a, b): return "%s-%s" % (a, b) def test_join(a, b): return ''.join([a, '-', b]) def test_format(a, b): return "{}-{}".format(a, b) def test_formatC(val): return "{0.a}-{0.b}".format(val) def test_plus_long(a, b, c, d): return a + "-" + b + "-" + c + "-" + d def test_percent_long(a, b, c, d): return "%s-%s-%s-%s" % (a, b, c, d) def test_join_long(a, b, c, d): return ''.join([a, '-', b, '-', c, '-', d]) def test_format_long(a, b, c, d): return "{0}-{1}-{2}-{3}".format(a, b, c, d) def test_formatC_long(val): return "{0.a}-{0.b}-{0.c}-{0.d}".format(val) def test_plus_long2(a, b, c, d): return a + "-" + b + "-" + c + "-" + d + "-" + a + "-" + b + "-" + c + "-" + d def test_percent_long2(a, b, c, d): return "%s-%s-%s-%s-%s-%s-%s-%s" % (a, b, c, d, a, b, c, d) def test_join_long2(a, b, c, d): return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d]) def test_format_long2(a, b, c, d): return "{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d) def test_formatC_long2(val): return "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val) def test_plus_superlong(lst): string = "" for i in lst: string += str(i) return string def test_join_superlong(lst): return "".join([str(i) for i in lst]) def mean(numbers): return float(sum(numbers)) / max(len(numbers), 1) nb_times = int(1e6) n = xrange(5) lst_numbers = xrange(1000) from collections import defaultdict metrics = defaultdict(list) list_functions = [ test_plus, test_percent, test_join, test_format, test_formatC, test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long, test_plus_long2, test_percent_long2, test_join_long2, test_format_long2, test_formatC_long2, # test_plus_superlong, test_join_superlong, ] val = values("123", "456", "789", "0ab") for i in n: for f in list_functions: print ".", name = f.__name__ if "formatC" in name: t = timing(f, nb_times, False, val) elif '_long' in name: t = timing(f, nb_times, False, "123", "456", "789", "0ab") elif '_superlong' in name: t = timing(f, nb_times, False, lst_numbers) else: t = timing(f, nb_times, False, "123", "456") metrics[name].append(t) # Get Average print "\n===AVERAGE OF TIMINGS===" for f in list_functions: name = f.__name__ timings = metrics[name] print "{:>20}:\t{:0.5f}".format(name, mean(timings))
-
Maximilian Peters over 7 yearsUse
timeit
instead of your custom function, it might that the first execution is slow but the subsequent function execution are faster but in reality you would only call the function once. docs.python.org/2/library/timeit.html -
Moinuddin Quadri over 7 yearsAs mentioned by @MaximilianPeters you should be using
timeit
for getting the trust-worthy results -
Jean-Francois T. over 7 yearsThanks guys. I checked
timeit
but I should have been high that day because I believed it was only supported on Python 3.x and I am mainly using 2.7. -
vishes_shell over 7 yearsFor the 3rd question that answer could help you understand.
-
pylang over 7 yearsConsider adding f-strings to you analysis from Python 3.6. It would be interesting to compare those results too. Nice code!
-
Jean-Francois T. over 7 years@vishes_shell very interesting post. That indeed provides a nice insight.
-
Antti Haapala -- Слава Україні over 6 years
- using
-
Mateen Ulhaq over 5 yearsPython: where it's OK to build strings to use to do timing tests on various methods for building strings... and then import an external library that builds a custom object with it's own
__str__
that builds a string (and likely building that string out of strings that build strings within the process) out of all the results of your timing tests. -
ExternalCompilerError almost 4 yearsDo you know or have at least an idea why the
f-string
andformat
versions seem to take more time for one variable than for two? -
MisterMiyagi almost 4 yearsThe setup code for the 1-vars case is broken. It resolves to
x0 = ('0',)
, which does not unpack the tuple.x0, = ('0',)
would be correct. Usesetup_str = f'{vars}, = {vals_str}'
andsetup_int = f'{vars}, = {vals_int}'
(or attach the,
tovars
) instead to force unpacking. -
shad0w_wa1k3r almost 4 years@ExternalCompilerError the issue was due to the incorrect setup as pointed out by MisterMiyagi. I've fixed it & the results are as expected now.