Random sampling & matrix of histograms problem - SPSS论坛

1关注
62粉丝

VIP

已卖：4901份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49675 个
通用积分: 56.2487
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2014-5-7 00:04:09 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Dear people on the list,
I have tried to get my head around one syntax command. My problems are:

I have an empirical distribution of a variable x with, say, 1000 observations.
I want to take 100 (or n) amount of random samples (with replacement) from x so that each sample size is, for example, 10 % of the x.
I need those random samples as variables x1...x100 into a new data set.
Is there a possibility to plot several histograms (of different variables) as a matrix with set dimensions (say 3x3 matrix)? This is a common way to plot your results, but I haven't yet figured out any other way than reordering the whole data into a list with a grouping variable of the old variables and then use that "variable group" as categorical variable in a panel plot for the whole list...

Thanks a lot!

Petro P.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享1 收藏0 回帖

关键词：histogram Sampling problem random matrix different problems example several Random

相关帖子

• Random average shifted histograms
• 问一个random sampling 问题
• Building an R loop for random sampling
• 2. Simple Random Sampling
• 有关histogram的问题
• R中histogram的breaks怎么理解
• R新手求指导~hist&histogram
• Histogram 图
• 相邻年份作图数据在图上合并了
• 如何调图形边距？

沙发

ReneeBK 发表于 2014-5-7 00:05:09

Yes, I don't know of an easy way to random sample. Below is a thought I had awhile ago of generating a second dataset and then table matching to that. Unfortunately this approach can't be written into a MACRO (you can't include INPUT PROGRAM in a MACRO) - so I look forward to other solutions. This just takes advantage of the random uniform sampling procedure, between
1 and n of the original sample size, and then makes 9 runs. The data isn't returned in wide format like requested, but IMO it is frequently better to have the data like this anyway and use the SPLIT procedure to return stats on the subsets.

************************************************************************************************.
*Original Dataset.
set seed = 5.
input program.
loop #i = 1 to 1600.
   compute X = RV.NORMAL(0,1).
   compute id = #i.
   end case.
end loop.
end file.
end input program.
dataset name orig.
exe.

*Making a dataset with random samples with replacement - need to know N of
original dataset beforehand.
set seed = 10.
input program.
loop #iter = 1 to 9.          /*This is the number of replications */.
   loop #rand = 1 to 100. /*This is the number of random samples with
replacement */.
      compute #n = 1600. /*You need to supply this info - this is the
number of records in original database */.
      compute id = TRUNC(RV.UNIFORM(1,#n + 1)).
      compute run = #iter.
      end case.
   end loop.
end loop.
end file.
end input program.
dataset name rand_samps.
exe.

*now just table match the orig dataset to the random samples dataset.
dataset activate rand_samps.
sort cases by id.
match files file = *
/table = 'orig'
/by id.
exe.
************************************************************************************************.

I wonder if the newer bootstrapping procedures (or maybe even the old one in NLR) can be hacked to return the needed ID's with replacement. A matrix procedure should be possible as well (which can be called in a macro), but I'm not as saavy with that to give a quick answer. I don't know what your asking for 4, SPSS can produce small multiples if that is what your asking.
Andy W

藤椅

ReneeBK 发表于 2014-5-7 00:07:31

"you can't include INPUT PROGRAM in a MACRO) - so I look forward to other solutions. "
Sure you can.
Err... Have you tried?
INPUT PROGRAM is perfectly usable within MACRO!!
BEGIN DATA and END DATA are not permitted (for some reason which maybe Jon Peck can elaborate).

板凳

ReneeBK 发表于 2014-5-7 00:08:34

Perhaps I have missed it but has anyone suggested the old SPSS command "Sample"? Here is info from the SPSS v18
syntax reference:

SAMPLE

SAMPLE {decimal value} or {n FROM m }

This command does not read the active dataset. It is stored, pending execution with the next command that reads the dataset. For more information, see the topic Command Order on p. 40.
Example
SAMPLE .25.
Overview

SAMPLE permanently draws a random sample of cases for processing in all subsequent procedures.

For a temporary sample, use a TEMPORARY command before SAMPLE.

Basic Specification

The basic specification is either a decimal value between 0 and 1 or the sample size followed by keyword FROM and the size of the active dataset.

To select an approximate percentage of cases, specify a decimal value between 0 and 1.
To select an exact-size random sample, specify a positive integer that is less than the file size, and follow it with keyword FROM and the file size.

Operations

SAMPLE is a permanent transformation.
Sampling is based on a pseudo-random-number generator that depends on a seed value that is established by the program. On some implementations of the program, this number defaults to a fixed integer, and a SAMPLE command that specifies n FROM m will generate the identical sample whenever a session is rerun. To generate a different sample each time, use the SET command to reset SEED to a different value for each session. See the SET command for more information.

So, I think something like:

Temporary.
Sample 100 from 1000.
save outfile= etc.

Probably embed it in a loop or other structure to generate as many samples as one wants (probably create a new variable
ranging from 1 to 100 in all files which would allow one to use Match files to combine them all into a single file). Or something like that.

Mike Palij

加关注串个门加好友发消息 6关注 15粉丝学术权威 graylens 当前离线阅读权限 255 威望 1 级论坛币 11734 个通用积分 117.1688 学术水平 225 点热心指数 323 点信用等级 214 点经验 250645 点帖子 7991 精华 0 在线时间 2418 小时注册时间 2012-5-15 最后登录 2021-10-3 雷达卡	报纸 graylens 发表于 2014-5-7 00:09:26 提示: 作者被禁止或删除内容自动屏蔽
	签名被屏蔽
	回复举报

地板

ReneeBK 发表于 2014-5-7 00:14:08

new file.
input program.
loop id = 1 to 1000.
   compute x = rv.normal(0,1).

   end case.
end loop.
end file.
end input program.

formats x(f6.3).
execute.
dataset name madeup.

new file.
set seed 20130307.
input program.
vector sampleflag (100,f1).
loop id = 1 to 1000.
   loop #sample= 1 to 100.
      compute sampleflag(#sample) =rv.uniform(0,1) le .10.
   end loop.

   end case.
end loop.
end file.
end input program.

dataset name sampleflags.
descriptives vars= sampleflag1 to sampleflag100.

match file file= madeup/file=sampleflags /by id.
dataset name combined.
do repeat
xsample = xsample1 to xsample100
/flag = sampleflag1 to sampleflag100.
do if not sampleflag.
   compute xsample = 99999.
else if sampleflag.
   compute xsample = x.
ELSE.
   print 'oops!'.
end if.
end repeat.
formats xsample1 to xsample100 (f6.3).
missing values xsample1 to xsample100 (99999).
descriptives vars = xsample1 to xsample100.

Art Kendall

7楼

ReneeBK 发表于 2014-5-7 00:14:46

Random sampling with replacement is simple to do with Complex Samples procedures. Just set the "with replacement" button in the Sampling Wizard (Analyze > Complex Samples > Select a Sample), specify your sample size, and give it a dataset name. The wizard generates
CSPLAN and CSSELECT commands.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@us.ibm.com
new phone: 720-342-5621

8楼

ReneeBK 发表于 2014-5-7 00:17:17

Ahh yes David you are correct I was confusing input program with begin-end commands, (and I do have it in some of my MACRO's even!). I should have known to search the list to see if you had already posted a solution!

Art's solution still only produces random sampling WITHOUT replacement. As Mike stated, if that is all you want probably SPSS's sample function within a (MACRO) loop will be fine.

David's matrix solution avoids building the massive dataset mine does, certainly preferable for many big data problems (or a large number of repetitions). Also I don't know, is it usual for bootstrapping to make the bootstrapped estimate sample the same size of the original dataset?

One of the annoyances with the matrix procedure though is that you can't run the more complex regression procedures (or at least I don't have the chops to write them up myself in matrix language). It seems if you have the license for complex samples it makes this all somewhat moot (although I don't have it so it is not moot for me personally!)

Andy

9楼

ReneeBK 发表于 2014-5-7 00:18:32

Try this.

new file.
set seed 20130407.
input program.
vector PopX(1000,f6.3).
loop #i = 1 to 1000.
   compute PopX(#i) = rv.normal(0,1).
end loop.
end case.

end file.
end input program.

execute.
dataset name madeup.
dataset activate madeup.
* from pop of 1000 draw 100 samples of size 50 with replacement.
vector PopX = PopX1 to PopX1000.
display vector.
numeric SampledX (f6.3).
loop sample_id = 1 to 100.
loop draw = 1 to 50.
   compute SampledX = PopX(rnd(rv.uniform(.5,1000.5))).
   xsave outfile = 'c:\project\long.sav' /keep =sample_id draw SampledX.
end loop.
end loop.
execute.
get file= 'c:\project\long.sav'.
dataset name longy.
descriptives variables = SampledX.

Art Kendall

10楼

Lisrelchen 发表于 2014-5-7 00:20:29

This is something I was trying to tell Mike as well (in a off-Nabble email correspondence). Sampling multiple times WITHOUT REPLACEMENT is not the same as sampling one time with replacement. Your code, still, is just a sample without replacement, conducted 50 times.

Please, look at the code I posted, and/or David's matrix bootstrapping procedure. You should see the potential difference in that one of the iterations you can have the same id sampled multiple times, whereas your approaches (and the sample function) can never re-sample the same record more than 1 time in any iteration.

Andy

[问答] Random sampling & matrix of histograms problem [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

[问答] Random sampling & matrix of histograms problem [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群