winsor2 winsorize or trim (if trim option is specified) the variables in varlist at particular percentiles specified by option cuts(#1 #2). In defult, new variables will be generated with a suffix "_w" or "_tr", which can be changed by specifying suffix() option. The replace option replaces the variables with their winsorized or trimmed ones.
+---------------------------------------------+
----+ Difference between winsorizing and trimming +----
Winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming or truncation. In a trimmed estimator, the extreme values are discarded; in a Winsorized estimator, the extreme values are instead replaced by certain percentiles, specified by option cuts(# #). For details, see winsor (if installed), trimmean (if installed).
For example, you type the following commands to get the 1th and 99th percentiles of variable wage, 1.930993 and 38.70926, respectively.
. sysuse nlsw88, clear
. sum wage, detail
In defult, winsor2 winsorize wage at 1th and 99th percentiles,
. winsor2 wage, replace cuts(1 99)
which can be done by hands:
. replace wage=1.930993 if wage<1.930993
. replace wage=38.70926 if wage>38.70926
Note that, values smaller than the 1th percentile is repalce by the 1th percentile, and the similar thing is done with the 99th percentile.
Things change when -trim- option is specified:
. winsor2 wage, replace cuts(1 99) trim
which can also be done by hands:
. replace wage=. if wage<1.930993
. replace wage=. if wage>38.70926
In this case, we discard values smaller than 1th percentile or greater than 99th percentile. This is trimming.
Options
suffix(string) specifies the suffix of the new variables. The defult is "_w" or "_tr" (when trim specified).
replace replaces the variables with their winsorized or trimmed counterpart. Can not be specified with suffix(string).
trim trims the variables.
cuts(# #) specifies the percentiles at which the data is winsorized or trimmed. cuts(1 99) (the default) means winsor (trim) at 1th and 99th percentile. Specify cuts(1 99) or cuts(99 1) makes no difference.
by(groupvar) the winsor or trim is done within each group specified by groupvar.
Examples
*- winsor at (p1 p99), get new variable "wage_w"
. sysuse nlsw88, clear
. winsor2 wage
*- winsor 3 variables at 0.5th and 99.5th percentiles, and overwrite the old variables
. winsor2 wage age hours, cuts(0.5 99.5) replace
*- winsor 3 variables at (p1 p99), gen new variables with suffix _win, and add variable labels
. winsor2 wage age hours, suffix(_win) label