Generate variable containing number of characters in a string variable

19,768

There is no egen function because there has long [sic] been a function strict sense to do this. In recent versions of Stata, the function is called strlen() but the older name length() continues to work:

. sysuse auto
(1978 Automobile Data)

. gen l1 = length(make)

. gen l2 = strlen(make)

. su l?

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          l1 |         74    11.77027    2.155257          6         17
          l2 |         74    11.77027    2.155257          6         17

See help functions and (e.g.) this tutorial column.

Share:
19,768
harre
Author by

harre

Updated on June 15, 2022

Comments

  • harre
    harre almost 2 years

    In a survey dataset I have a string variable (type: str244) with qualitative responses. I want to count the number of characters in each response/string and generate a new variable containing this number.

    Using the egenmore I have already counted the number of words using nwords, but I cannot find the counterpart for counting characters.

    EXAMPLE:

    egen countvar = nwords(stringvar)
    

    where countvar is the new variable name and stringvar is the string variable.

    Does such an egen function exist for counting characters?

  • Agustín Indaco
    Agustín Indaco almost 5 years
    What about for counting digits in a numeric variable?
  • Nick Cox
    Nick Cox almost 5 years
    That's a new question really as there are subtle differences. Do you mean integers or you include decimal parts? If you mean integers, log10(x) + 1 is a good start. If you include numbers with decimal parts, the question is a lot messier without knowing a display format.
  • Admin
    Admin over 3 years
    in case u want the count of numeric variable
  • Nick Cox
    Nick Cox over 3 years
    This has to seem naive. See my comment underneath my answer. The "length" of a numeric variable is well defined only in certain cases. In your example, price is reported as a positive integer, and for that you don't need to convert to a string variable. You just need to push the maximum value through ceil(log10()). Your code could be problematic for variables in which any numeric value was negative or contained fractional parts, depending on precision issues and what you want precisely.