Question

假设我有以下数据集：

cntry <-c(1,2,3,4,1,2,3,4,1,2,3,4)
year<-c(1990,1990,1990,1990,1991,1991,1991,1991,1992,1992,1992,1992)
exist<-c(1,1,0,0,1,1,1,0,1,1,1,1)
region<-c(1,2,2,1,1,2,2,1,1,2,2,1)

data<-data.frame(cntry,year,exist,region)
split.data<-split(data,data$year)

$`1990`
  cntry year exist region
1     1 1990     1      1
2     2 1990     1      2
3     3 1990     0      2
4     4 1990     0      1

$`1991`
  cntry year exist region
5     1 1991     1      1
6     2 1991     1      2
7     3 1991     1      2
8     4 1991     0      1

$`1992`
   cntry year exist region
9      1 1992     1      1
10     2 1992     1      2
11     3 1992     1      2
12     4 1992     1      1

cntry：国家，年份：观察年份，存在：一个国家是否确实存在，地区：该国家所在的地区

对于每一年，我想创建一个矩阵，指明两个国家是否存在，是否位于同一地区，并最好将其存储在一个列表中。

对于1991年，结果看起来像这样（只有国家2和3确实存在且位于同一地区）：

b<-matrix(NA, nrow=length(unique(cntry)), ncol=length(unique(cntry)))
colnames(b)<-unique(cntry)
rownames(b)<-unique(cntry)

for(j in 1:length(split.data$`1991`$cntry)){
    for(i in 1:length(split.data$`1991`$cntry)){
      if(split.data$`1991`$region[i]==split.data$`1991`$region[j]&split.data$`1991`$exist[i]==1&split.data$`1991`$exist[j]==1){
        b[j,i] <- 1
    } else{
        b[j,i]<-0
      }
    }
  }
diag(b)<-0

所有年份的输出都需要如下所示：

我很难找到一种方法来包含年份维度（也用于存储结果），我也想知道for循环是否真的是解决问题的有效方法。

任何输入都是高度适应的！

Answer 1

这是一种可能性，输出是一个列表（每年带有命名元素），其中包含每个区域在同一区域中具有国家/地区的data.frames列表：

res = lapply(split(data, year), function(u){
    df = subset(u, exist==1, select=c("cntry", "region"))
    Filter(function(x) nrow(x)>1, split(df, df$region))
}) 
Filter(function(x) length(x)>0, res)

#$`1991`
#$`1991`$`2`
#  cntry region
#6     2      2
#7     3      2


#$`1992`
#$`1992`$`1`
#   cntry region
#9      1      1
#12     4      1

#$`1992`$`2`
#   cntry region
#10     2      2
#11     3      2

那样：

#> res$'1991'
#$`2`
#  cntry region
#6     2      2
#7     3      2

Answer 2

以下是使用tcrossprod的选项。使用lapply循环遍历列表（＆＃34; split.data＆＃34;），对数据集的行进行子集化，其中＆＃34;存在＆＃34;等于1（x$exist==1），选择列（c('cntry', 'region')）来创建＆＃34; x1＆＃34;。改变＆＃34; cntry＆＃34;要对因子进行分析并指定“＆＃34; cntry＆＃34;”的唯一元素的级别。来自＆＃34;数据＆＃34; （factor(x$cntry, levels=lvls)），获取输出的table＆＃34; x1＆＃34;，tcrossprod，并将对角线更改为＆＃34; 0＆＃34;。可以选择删除结果的属性。

 lvls <- unique(data$cntry)

 lst <- lapply(split.data, function(x) {
            x1 <- x[x$exist==1, c('cntry', 'region')]
            x1$cntry <- factor(x1$cntry, levels=lvls)
            tbl <- table(x1)
            t1 <- tcrossprod(tbl)
            diag(t1) <- 0
            names(dimnames(t1))<- NULL
            t1
             })

 lst
 #$`1990`
 #  1 2 3 4
 #1 0 0 0 0
 #2 0 0 0 0
 #3 0 0 0 0
 #4 0 0 0 0

 #$`1991`
 #  1 2 3 4
 #1 0 0 0 0
 #2 0 0 1 0
 #3 0 1 0 0
 #4 0 0 0 0

 #$`1992`
 #  1 2 3 4
 #1 0 0 0 1
 #2 0 0 1 0
 #3 0 1 0 0
 #4 1 0 0 0

根据几个条件创建矩阵列表

2 个答案: