Generate variable containing number of characters in a string variable
19,768
There is no egen
function because there has long [sic] been a function strict sense to do this. In recent versions of Stata, the function is called strlen()
but the older name length()
continues to work:
. sysuse auto
(1978 Automobile Data)
. gen l1 = length(make)
. gen l2 = strlen(make)
. su l?
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
l1 | 74 11.77027 2.155257 6 17
l2 | 74 11.77027 2.155257 6 17
See help functions
and (e.g.) this tutorial column.
Author by
harre
Updated on June 15, 2022Comments
-
harre almost 2 years
In a survey dataset I have a string variable (type:
str244
) with qualitative responses. I want to count the number of characters in each response/string and generate a new variable containing this number.Using the
egenmore
I have already counted the number of words usingnwords
, but I cannot find the counterpart for counting characters.EXAMPLE:
egen countvar = nwords(stringvar)
where
countvar
is the new variable name andstringvar
is the string variable.Does such an
egen
function exist for counting characters? -
Agustín Indaco almost 5 yearsWhat about for counting digits in a numeric variable?
-
Nick Cox almost 5 yearsThat's a new question really as there are subtle differences. Do you mean integers or you include decimal parts? If you mean integers,
log10(x) + 1
is a good start. If you include numbers with decimal parts, the question is a lot messier without knowing a display format. -
Admin over 3 yearsin case u want the count of numeric variable
-
Nick Cox over 3 yearsThis has to seem naive. See my comment underneath my answer. The "length" of a numeric variable is well defined only in certain cases. In your example,
price
is reported as a positive integer, and for that you don't need to convert to a string variable. You just need to push the maximum value throughceil(log10())
. Your code could be problematic for variables in which any numeric value was negative or contained fractional parts, depending on precision issues and what you want precisely.