楼主: anklebreak
12856 7

[问答] spss如何合并多个个案?请高手帮助 [推广有奖]

  • 3关注
  • 4粉丝

已卖:45份资源

博士生

34%

还不是VIP/贵宾

-

威望
0
论坛币
14618 个
通用积分
40.3768
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
7246 点
帖子
214
精华
0
在线时间
192 小时
注册时间
2007-11-27
最后登录
2025-10-31

楼主
anklebreak 发表于 2014-3-17 10:48:33 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
现在有多个个案,里面变量都一样,我想合并到一个个案当中,但是每次合并都只能选择一个个案,必须一个一个添加,太麻烦了,我想问高手如何在浏览时添加多个个案,这样就能一个都搞定了,视图如下: QQ截图20140317104705.png QQ截图20140317104648.png QQ截图20140317104629.png
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:SPSS PSS 麻烦了 如何

本帖被以下文库推荐

沙发
kuangsir6 发表于 2014-3-17 15:46:03
编程可以的.

藤椅
anklebreak 发表于 2014-3-19 11:21:37
kuangsir6 发表于 2014-3-17 15:46
编程可以的.
普通操作不可以么?能说详细点么?

板凳
ReneeBK 发表于 2014-3-20 05:41:51

You need a macro because you are going to do the same thing for every file.


For file 1 you have this.

Get data ….

*   y is the variable whose values are recorded minute by minute every day

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Save outfile=<cumulant file name>

*   File 2 and on.

Get data ….

varstocases make y from day1 to day366/index=day.

sort cases by day time.

Match files file=<cumulant file name>/file=*/rename=(y=y2)/by time day.

Save outfile=<cumulant file name>

If you run spss with ‘have multiple data files open’ enabled, which I don’t do, you will have, I think, a slightly different structure due to needing to keep track of opening and closing datasets.



报纸
ReneeBK 发表于 2014-4-1 02:55:58
Please try the code below. The first block should generate test data and the second block should

  • read the variable names from the first row of the first sheet of the first workbook
  • read all data from all lines in all sheets in all workbooks (from line 2)
  • output an active DataSet containing the source_file, source_sheet and all data
  • string lengths in SPSS should be exactly as long as required given the data contained in the work books

This hasn't been thoroughly tested yet so there may be complications but it seems to work on the test data provided. Please keep us informed on how things are going, OK?

Kind regards,



*Create test data.
begin program.
rdir=r*"d:\temp"* # Please specify a folder in which test files can be
created.
import xlwt,random
for year in range(2004,2014):
     wb=xlwt.Workbook()
     ws=wb.add_sheet("data")
     for col,cont in
enumerate(['EmployeeID','JobTitle','YearSalary','DaysAbsent']):
         ws.write(0,col,cont)
     for row,id in enumerate([104,21,60,2,1030]):
                 ws.write(row+1,0,id)
     for row in range(5):

ws.write(row+1,1,random.choice(['Developer','Tester','Manager']))
     for row in range(5):
                 ws.write(row+1,2,random.randrange(40,80)*1000)
     for row in range(5):
                 ws.write(row+1,3,random.choice(range(20)))
     wb.save(os.path.join(rdir,'data_%d.xls'%year))
end program.

*Read and merge all xls workbooks.

begin program.
rdir=r*"d:\temp"* # Please specify folder holding .xls files
import xlrd,spss
fils=[fil for fil in os.listdir(rdir) if fil.endswith(".*xls*")] # Should
probably be "xlsx" in your case.
allData=[]
for cnt,fil in enumerate(fils):
     wb=xlrd.open_workbook(os.path.join(rdir,fil))
     for ws in wb.sheets():
         for row in range(1,ws.nrows):
             allData.append([fil]+[ws.name]+[val for val in
ws.row_values(row)])
     if cnt==0:
         Names=["source_file"]+["source_sheet"]+ws.row_values(0)
mxLens=[0]*len(vNames)
for line in allData:
     for cnt in range(len(line)):
         if isinstance(line[cnt],basestring) and len(line[cnt])>mxLens[cnt]:
             mxLens[cnt]=len(line[cnt])
with spss.DataStep():
     nds = spss.Dataset('*') ### nds = "New Data Set"
     for vrbl in zip(vNames,mxLens):
         nds.varlist.append(vrbl[0],vrbl[1])
     for line in allData:
         nds.cases.append(line)
end program.

Some notes:

  • Make sure you have no open dataset when you run this
  • A crucial assumption is that the structure (column orders) are identical over sheets over workbooks
  • The first rows of all sheets in all workbooks should hold (identical) variable names
  • You need to have 1) SPSS, 2) SPSS Python essentials and 3) the Python xlrd module properly installed
  • You may need to replace ".xls" with ".sav" in the second block
  • Date variables should be no problem but will look weird in SPSS. To convert a date called "date_1" to a normal date, try

compute date_new=datesum(date.dmy(3,1,1900),date_1,"days").
format date_new(datetime22).


  • This should work although there seems to be some kind of bug somewhere so please check carefully.

地板
ReneeBK 发表于 2014-4-1 03:05:14
Jon Peck from IBM SPSS wrote an extension command that would fit the bill here without having to roll your own Python - the
`SPSSINC PROCESS FILES` command

Here is an example in the developerworks forum <http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14573567>of the tool in action with a near synonymous situation.

Also with a real consistent naming structure and no missing ID's it would be pretty simple to write this up in a macro, see  this other answer I gave recently <http://spssx-discussion.1045642.n5.nabble.com/Looping-td5716527.html>  . All that would need to be changed is (what I have done anyways) is to have the first pass of the loop create a basefile, and then successively add
files concatenate all of the new files to that basefile.

7
ReneeBK 发表于 2014-4-1 03:15:58
SPSSINC PROCESS FILES  can do this pretty easily.  You will wind up doing an ADD FILES for each Excel file (after the first) even though ADD FILES can handle 50 files at a time, but that's probably not going to be an issue unless you have to do this several times per second.

A few tips beyond the example that Andy pointed out.

Getting this process started is a little bit tricky, since ADD FILES requires that you already have a data file open.
Move one of your Excel files to a different directory and open it with GET DATA /TYPE=XLSX or interactively.  Give it a dataset name, say, ACTIVE, so that it will remain open and referenceable  as other files are read.

Your syntax file to be applied to each dataset by PROCESS FILES would just have statements like
GET DATA /TYPE XLS .../FILE="JOB_INPUTFILE" ...
DATASET NAME=FRED.
ADD FILES /FILE=ACTIVE /FILE="JOB_INPUTFILE".
DATASET CLOSE FRED.
JOB_INPUTFILE is defined by PROCESS FILES as a file handle for the name of the current input.  It will be redefined each time another file is processed.

You can then construct the PROCESS FILE command from the menus via Utilities > Process Data Files.
The input filespec would be something like
c:\mydata\*.xlsx

After process files is run, you can save the constructed file in the usual way.

You can, of course, do this with Python or even Basic scripting more directly, but it probably isn't worth the trouble to do that.

8
ReneeBK 发表于 2014-4-1 03:20:01
* Open the first Excel file from its own separate directory and give it the dataset name 'active'.

GET DATA /TYPE=XLSX
  /FILE='U:\.AU Work\Client Files\XXXXX\CGMS data processing\first file\DD001_Baseline_Excel_Raw Data.xlsx'
  /SHEET=name 'SPSS'
  /CELLRANGE=full
  /READNAMES=on
  /ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME active.
* Call the PROCESS FILES command to loop over all other Excel files.
DATASET ACTIVATE active.
SPSSINC PROCESS FILES INPUTDATA="U:\.AU Work\Client Files\XXXXX\CGMS data processing\*.xlsx"  
SYNTAX="U:\.AU Work\Client Files\XXXXX\CGMS data processing\ImportFromExcelAndMerge.sps"
CONTINUEONERROR=YES
VIEWERFILE= "U:\.AU Work\Client Files\XXXXX\CGMS data processing\final output.spv"
CLOSEDATA=NO
MACRONAME="!JOB"
LOGFILEMODE=APPEND
/MACRODEFS ITEMS.

============

And the called syntax file, ImportFromExcelAndMerge.sps:
GET DATA
  /TYPE=XLSX
  /FILE="JOB_INPUTFILE"
  /SHEET=name 'SPSS'
  /ASSUMEDSTRWIDTH=32767.
DATASET NAME incoming.
DATASET ACTIVATE active.
ADD FILES /FILE=* /FILE='incoming'.
EXECUTE.
DATASET CLOSE incoming.


您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-22 19:48