Title
gsample -- Sampling
Syntax
gsample [#|varname] [if] [in] [weight] [, options]
options Description
------------------------------------------------------------------------------------------------------------------------------------------- percent sample size is in percent
wor sample without replacement
strata(varlist) variables identifying strata
cluster(varlist) variables identifying resampling clusters
idcluster(newvar) create new cluster ID variable
keep keep observations that do not meet if and in
generate(newvar) store sampling frequencies in newvar
replace overwrite existing variables
------------------------------------------------------------------------------------------------------------------------------------------- aweights are allowed; see weight.
Description
gsample draws a random sample from the data in memory. Simple random sampling (SRS) is supported, as well as unequal probability sampling (UPS), of which sampling with probabilities proportional to size (PPS) is a special case. Both methods, SRS and UPS/PPS, provide sampling with replacement and sampling without replacement. Furthermore, stratified sampling and cluster sampling is supported.
# specifies the size of the sample. The default for gsample is to replace the data in memory with the sampled observations in random order. Alternatively, gsample may store a new variable containing the sampling frequencies of the observations (see the generate(newvar) option). In the case of sampling without replacement (see the wor option), the sample size must be less than or equal to the number of sampling units in the data. Sampling units are either single observations or clusters identified by the cluster() option. If # is not specified or if #==., the sample size is equal to the observed number of units in the data. For stratified sampling, # units will be selected from each stratum identified by the strata() option. Alternatively, specify varname instead of #, where varname is a variable containing for each stratum a specific sample size. varname is assumed to be constant within strata.
Specifying aweights causes unequal probability sampling (UPS/PPS) to be performed. The sampling probabilities of the observations will be proportional to the specified weights in this case. gsample is implemented as a wrapper for the mm_sample() function from the moremata package. See help for mm_sample() for methodical details and references. Note that for unequal probability sampling without replacement many different algorithms have been proposed in the literature and there may be better solutions than the method implemented here. In addition, UPS without replacement may fail if the distribution of weights is very uneven (see help for mm_sample() for an explanation of this problem).
If you are serious about sampling, you should first set the random number seed; see help generate.
|