|
查帮助 missing
manual里面应该更加详细
Title
[U] 12.2.1 Missing values
Description
Stata has 27 numeric missing values:
., the default, which is called the "system missing value" or sysmiss
and
.a, .b, .c, ..., .z, which are called the "extended missing values".
Numeric missing values are represented by large positive values. The ordering is
all nonmissing numbers < . < .a < .b < ... < .z
Thus, the expression age > 60 is true if variable age is greater than 60 or
missing.
To exclude missing values, ask whether the value is less than ".". For instance,
. list if age > 60 & age < .
To specify missing values, ask whether the value is greater than or equal to ".".
For instance,
. list if age >=.
Stata has one string missing value, which is denoted by "" (blank).
Remarks
More details concerning missing values and their treatment in Stata are provided
under the following headings:
Overview
Expressions
Operators
Functions
Matrices
Useful commands
Value labels
Estimation commands
Technical note: checking if a value is missing
Overview
1. Stata supports different types of numeric missing values that can be used to
specify different reasons that a value is unknown. The most frequently used
missing value ., referred to as sysmiss, is nearly always generated by Stata
when it cannot assign a specific value. The 26 extended missing values .a,
.b, ..., .z are available to users requiring more elaborate tracking of
missing values.
Empty strings are treated as missing values of type string.
2. Numeric missing values are represented by large positive values. This means
that an expression such as income > 100 evaluates to true for missing values
of the variable income, as well as to those that are greater than 100. Also,
the simple expression if varname evaluates to true for all nonzero values of
varname, including missing values.
3. The ordering of missing values is
all nonmissing numbers < . < .a < .b < ... < .z
4. Most Stata statistical commands deal with missing values by disregarding
observations with one or more missing values (called "listwise deletion" or
"complete cases only").
Expressions
Expressions occur in many places in Stata (see [P] syntax and exp). For example,
. generate newvarname = exp
evaluates the expression exp for each observation of the variable newvarname.
Observations of newvarname are set to missing if exp evaluates to missing.
Expressions are also used to restrict a command's operation to a subset of the
observations. For instance,
. summarize varname if exp
summarizes varname by using all observations for which exp evaluates to true (not
zero), including observations that are missing.
Operators
The relational operators (see operators) interpret missing values as large
positive numbers (see above). All the following thus evaluate to true
73 < . . == . .a == .a
.a != . .a < .b .a <= .b
whereas all the following evaluate to false
73 >= . . == .a . > .a
The numerical operators (+ etc) return missing if any of their arguments are
missing.
Functions
Stata has a few special functions for dealing with missing values:
missing() returns 1 (meaning true) if any of its arguments, numeric or
string, evaluates to missing and 0 (meaning false) otherwise.
mi() is a shorthand for missing().
matmissing(K) returns 1 (meaning true) if any elements of the matrix K are
missing and 0 (meaning false) otherwise.
Some Stata functions interpret . in a special way. For instance, the function
inrange(x,a,b) returns 1 if x belongs in the interval [a,b]. This function
interprets a==. as -infinity and b==. as +infinity. These special interpretations
are discussed in functions.
Other Stata functions return missing (.) if one or more of the arguments are
missing or invalid.
Matrices
Matrices may contain all types of missing values. The matrix operators (see
matrix operators)
- negate
' transpose
\ row join
, column join
+ add
- subtract
* multiply (including multiply by scalar)
/ division by scalar
# Kronecker product
generate missing values elementwise.
In the matrix product C=A*B, C[i,j] is missing if row i of A or column j of B
contain a missing value.
Matrix division by scalar C=A/b is not allowed if the scalar b is a missing value.
Otherwise, missing values in matrix A generate missing values in C elementwise.
Like the list command, the matrix list command has a nodotz option to display
extended missing value .z as a blank string rather than as ".z".
Useful commands
----------------------------------------------------------------------------------
mvencode changes missing values into numeric values
mvdecode changes numeric values into missing values
codebook provides extensive information about variables, including the
occurrence of simple and extended missing values
misstable tabulates missing values
egen, rownonmiss() number of valid observations in a varlist
egen, rowmiss() number of missing values in a varlist
recode recodes a variable, optionally into a new variable, with
special facilities to recode missing values.
mi multiple imputation of missing values
xtdescribe describes participation patterns in panel data
----------------------------------------------------------------------------------
Value labels
It is possible to define value labels for the extended missing values .a to .z,
but not for sysmiss .. These value labels show up in the same way as value labels
for nonmissing values. See [D] label.
Estimation commands
Most Stata commands ignore observations that are missing in one or more of the
variables referred to in the command. For instance, the regression command
regress disregards all observations that have a missing value for the dependent
variable or missing values for any of the independent variables. This method is
known as "listwise deletion", "complete cases only", etc. It is statistically
appropriate only if the missing values are "at random". In an if or weight
expression to a command, the expressions will be evaluated, and the missing values
will be processed using the operators and function() logic.
Stata commands that can treat multiple observations as being related to one
observational unit (for example, observations from a panel in xt models, episodes
in st models) ignore specific observations from the "group", namely, those that
have missing values.
Technical note: checking if a value is missing
You might think you can test whether an expression or variable exp is missing with
the expression exp==.. Remember, however, that Stata has 27 different missing
values (., a, b, ..., z).
exp==. means that the expression exp equals a specific missing value, namely,
sysmiss .. exp==. returns false if exp equals one of the extended missing-value
types such as .a or .z. To test whether exp is missing, that is, equals either .
or one of the extended missing values, one should use the expression
exp >= .
or
missing(exp)
which can be abbreviated to
mi(exp)
To test whether exp is missing, use one of the following forms:
exp < .
!missing(exp)
!mi(exp)
An advantage of the last two forms is that the missing functions missing() and
mi() allow multiple (numeric or string) arguments to test whether any of the
argument is missing.
|