How do I delete observations with no data in Stata?
Solution 1
This will also work with strings as long as they are empty:
ds id*, not
egen num_nonmiss = rownonmiss(`r(varlist)'), strok
drop if num_nonmiss == 0
This gets a list of variables that are not the id and drops any observations that only have the id.
Solution 2
Brian Albert Monroe is quite correct that anyone using dropmiss
(SJ) needs to install it first. As there is interest in varying ways of solving this problem, I will add another.
foreach v of var val* {
qui count if missing(`v')
if r(N) == _N local todrop `todrop' `v'
}
if "`todrop'" != "" drop `todrop'
Although it should be a comment under Brian's answer, I will add here a comment here as (a) this format is more suited for showing code (b) the comment follows from my code above. I agree that unab
is a useful command and have often commended it in public. Here, however, it is unnecessary as Brian's loops could easily start something like
foreach v of var * {
UPDATE September 2015: See http://www.statalist.org/forums/forum/general-stata-discussion/general/1308777-missings-now-available-from-ssc-new-program-for-managing-missings for information on missings
, considered by the author of both to be an improvement on dropmiss
. The syntax to drop
observations if and only if all values are missing is missings dropobs
.
![Admin](/assets/logo_square_200-5d0d61d6853298bd2a4fe063103715b4daf2819fc21225efa21dfb93e61952ea.png)
Admin
Updated on July 05, 2022Comments
-
Admin almost 2 years
I have data with IDs which may or may not have all values present. I want to delete ONLY the observations with no data in them; if there are observations with even one value, I want to retain them. Eg, if my data set is:
ID val1 val2 val3 val4 1 23 . 24 75 2 . . . . 3 45 45 70 9
I want to drop only ID 2 as it is the only one with no data -- just an ID.
I have tried Statalist and Google but couldn't find anything relevant.
-
Admin almost 10 yearsThanks Dimitriy, but the critical variables are numeric.
-
Nick Cox almost 10 years
dropmiss
(SJ) is dedicated to this problem. Your search strategy should start withsearch
in Stata, not the internet. -
dimitriy almost 10 years
dropmiss
is definitely the way to go. One can re/discover it easily withfindit drop observations with missing data
. -
Roberto Ferrer almost 10 years"local variables" in your text should read "local macros". Stata's (12) limit on characters for a macro is: 8,681 (small), 165,200 (IC), and 1,081,511 (MP/SE). Thats enough to hold quite a few variable names.
help limits
is the reference here. In your example,r(varlist)
really holds all variable names. You just need to rundisplay "`r(varlist)'"
(note the quotes). -
Roberto Ferrer almost 10 yearsAs a side note: if you want to refer to all variables in the dataset, you can use
_all
. For example,foreach vname of varlist _all { ...
. No need forunab
here. -
Nick Cox almost 10 yearsYour code assumes numeric variables.
missing()
is the way to a more general test. -
Brian Albert Monroe almost 10 yearsI think it is still important to note that the point I made initially was that local macros can store logical conditions. In many datasets where the dataset is derived from surveys, missing data is often recoded as a -9 or -99, also a response of "I don't know" may be recorded as a -33. It is often appropriate to drop these data for certain analyses, in which case
missing()
is inadequate. The code I display can be easily modified for this purpose. -
Brian Albert Monroe almost 10 yearsAlso, I am quite new to these forums, and haven't quite learned the etiquette when something I've posted is incorrect. I've tested
ds
and it is certainly correct that"
rvarlist'"` contains all variables of the pattern. Should I edit and delete that part of my post? Other thanunab
being unnecessary I think the logic of the solution is very valid as well as demonstrating the usefulness of macros that is applicable this and other scenarios. -
Nick Cox almost 10 yearsIt's certainly good practice to correct mistakes by editing. The code I wrote can also be modified to cope with other conventions on what indicates missing, but one Stata recommendation is to use
mvdecode
in such cases. -
Nick Cox almost 9 yearsSee also UPDATE to my answer on
missings
(SSC).