How to remove duplicate observations in Stata
It is no surprise that duplicates
does not do what you are wanting, as it does not fit your problem. For example, the observation with id == 2, disease == 0
is not a duplicate of any other observation. More generally, duplicates
does not purport to be a general-purpose command for dropping observations you don't want.
Your criteria appear to be
Keep one observation for each
id
.If
id
has any observation with value of1
, that is to be kept.
A solution to that is
bysort id (disease) : keep if _n == _N
That keeps the last observation for each distinct id
: after sorting within id
on disease
observations with the disease are necessarily at the end of each group.
statuser
Updated on June 04, 2022Comments
-
statuser almost 2 years
Let's say I have the following data:
id disease 1 0 1 1 1 0 2 0 2 1 3 0 4 0 4 0
I would like to remove the duplicate observations in Stata. For example
id disease 1 1 2 1 3 0 4 0
For group
id
=1, keep observation 2For group
id
=2, keep observation 2For group
id
=3, keep observation 1 (because it has only 1 obs)For group
id
=4, keep observation 1 (or any of them but one obs)I am trying Stata
duplicates
command,duplicates tag id if disease==0, generate(info) drop if info==1
but it's not working as I required.