How do I delete observations with no data in Stata?

28,036

Solution 1

This will also work with strings as long as they are empty:

ds id*, not
egen num_nonmiss = rownonmiss(`r(varlist)'), strok
drop if num_nonmiss == 0

This gets a list of variables that are not the id and drops any observations that only have the id.

Solution 2

Brian Albert Monroe is quite correct that anyone using dropmiss (SJ) needs to install it first. As there is interest in varying ways of solving this problem, I will add another.

 foreach v of var val* { 
     qui count if missing(`v') 
     if r(N) == _N local todrop `todrop' `v' 
 }
 if "`todrop'" != "" drop `todrop' 

Although it should be a comment under Brian's answer, I will add here a comment here as (a) this format is more suited for showing code (b) the comment follows from my code above. I agree that unab is a useful command and have often commended it in public. Here, however, it is unnecessary as Brian's loops could easily start something like

 foreach v of var * { 

UPDATE September 2015: See http://www.statalist.org/forums/forum/general-stata-discussion/general/1308777-missings-now-available-from-ssc-new-program-for-managing-missings for information on missings, considered by the author of both to be an improvement on dropmiss. The syntax to drop observations if and only if all values are missing is missings dropobs.

Share:
28,036
Admin
Author by

Admin

Updated on July 05, 2022

Comments

  • Admin
    Admin almost 2 years

    I have data with IDs which may or may not have all values present. I want to delete ONLY the observations with no data in them; if there are observations with even one value, I want to retain them. Eg, if my data set is:

    ID val1 val2 val3 val4
    1 23 . 24 75
    2 . . . .
    3 45 45 70 9
    

    I want to drop only ID 2 as it is the only one with no data -- just an ID.

    I have tried Statalist and Google but couldn't find anything relevant.

  • Admin
    Admin almost 10 years
    Thanks Dimitriy, but the critical variables are numeric.
  • Nick Cox
    Nick Cox almost 10 years
    dropmiss (SJ) is dedicated to this problem. Your search strategy should start with search in Stata, not the internet.
  • dimitriy
    dimitriy almost 10 years
    dropmiss is definitely the way to go. One can re/discover it easily with findit drop observations with missing data.
  • Roberto Ferrer
    Roberto Ferrer almost 10 years
    "local variables" in your text should read "local macros". Stata's (12) limit on characters for a macro is: 8,681 (small), 165,200 (IC), and 1,081,511 (MP/SE). Thats enough to hold quite a few variable names. help limits is the reference here. In your example, r(varlist) really holds all variable names. You just need to run display "`r(varlist)'" (note the quotes).
  • Roberto Ferrer
    Roberto Ferrer almost 10 years
    As a side note: if you want to refer to all variables in the dataset, you can use _all. For example, foreach vname of varlist _all { ... . No need for unab here.
  • Nick Cox
    Nick Cox almost 10 years
    Your code assumes numeric variables. missing() is the way to a more general test.
  • Brian Albert Monroe
    Brian Albert Monroe almost 10 years
    I think it is still important to note that the point I made initially was that local macros can store logical conditions. In many datasets where the dataset is derived from surveys, missing data is often recoded as a -9 or -99, also a response of "I don't know" may be recorded as a -33. It is often appropriate to drop these data for certain analyses, in which case missing() is inadequate. The code I display can be easily modified for this purpose.
  • Brian Albert Monroe
    Brian Albert Monroe almost 10 years
    Also, I am quite new to these forums, and haven't quite learned the etiquette when something I've posted is incorrect. I've tested ds and it is certainly correct that "rvarlist'"` contains all variables of the pattern. Should I edit and delete that part of my post? Other than unab being unnecessary I think the logic of the solution is very valid as well as demonstrating the usefulness of macros that is applicable this and other scenarios.
  • Nick Cox
    Nick Cox almost 10 years
    It's certainly good practice to correct mistakes by editing. The code I wrote can also be modified to cope with other conventions on what indicates missing, but one Stata recommendation is to use mvdecode in such cases.
  • Nick Cox
    Nick Cox almost 9 years
    See also UPDATE to my answer on missings (SSC).