已经看过楼主给的数据。我的方法是这样:
(1)在EXCEL软件中,先动用VBA工具在该数据表的空白处添加一个按钮,并在调试状态下将鼠标移到该按钮控件处,点击右键,选”查看代码“,并在其中录入如下代码:
- Private Sub CommandButton1_Click()
- Dim colRange(1 To 1000) As Integer, rowRange(1 To 1000) As Integer, s As String
-
- For i = 1 To 1000
- colRange(i) = -1
- rowRange(i) = -1
- Next i
-
- colBegin = 2
- rowBegin = 2
-
- For i = 1 To 1000
-
- a = Cells(6, 1 + i)
-
- If Trim(a) = "" Then Exit For
-
- For j = 8 To 1000
-
- b = Cells(j, 1)
-
- If Trim(b) = "" Then Exit For
- If Trim(b) = Trim(a) Then
- colRange(colBegin) = i + 1
- colBegin = colBegin + 1
- Exit For
- End If
-
- Next j
-
- Next i
-
- colEnd = colBegin - 1
- colRange(1) = 1
-
- For i = 1 To 1000
-
- a = Cells(7 + i, 1)
-
- If Trim(a) = "" Then Exit For
-
- For j = 2 To 1000
-
- b = Cells(6, j)
-
- If Trim(b) = "" Then Exit For
- If Trim(b) = Trim(a) Then
- rowRange(rowBegin) = i + 7
- rowBegin = rowBegin + 1
- Exit For
- End If
-
- Next j
-
- Next i
-
- rowEnd = rowBegin - 1
- rowRange(1) = 6
-
-
- TargetFileName = "C:\2012Export2.txt" '创建该txt文本文件,以存储处理好的数据。数据用逗号间隔。
- Open TargetFileName For Output As #1
-
-
- For row_Index = 1 To rowEnd
- For col_Index = 1 To colEnd - 1
- s = Trim(Cells(rowRange(row_Index), colRange(col_Index)))
- If s = ".." Or s = "_" Then s = "NA"
- Write #1, s;
- Next col_Index
- s = Trim(Cells(rowRange(row_Index), colRange(colEnd)))
- If s = ".." Or s = "_" Then s = "NA"
- Write #1, s
-
- Next row_Index
-
- Close #1
- End Sub
复制代码
录好后,保存程序,退出调试状态。之后点击该按钮,以执行该段代码。代码将创建一个txt文档,用于存储“过滤”后的数据。
(2)在R中读取数据:
- a = read.csv(file = "c:/2012Export2.txt", header = TRUE)
复制代码
a的具体内容你可以自己看一下(可以用View(a)命令查看)。这里给出a的尺寸信息:
> dim(a)
[1] 222 223
数据框的第一列用于存储EXCEL文件数据的行名称(注意,a本身没有行名称,EXCEL数据的行名称是作为第一列信息存入a当中的)。剔除该列,有效数据列有222个,恰与行数222相等。