Question

如何更快地制作以下代码。到目前为止，P = 1（即一个循环）的整个过程大约需要15分钟。我知道问题应该是For循环，我已经阅读了几个相关的问题，但我无法理解它们是如何工作的。

在以下脚本中：P和R大约为1000，TOLTarget和TOLSource最多可以为500.

任何帮助都会感激不尽。

for(i in 1:P)
{
  Source <- MITLinks[i,1]
  Target <- MITLinks[i,2]
  TOLTarget <- sum(!is.na(MITMatrix[Target,]))-1                  # TOLTarget would be the number of concepts for the target course 
  TOLSource <- sum(!is.na(MITMatrix[Source,]))-1
  for(q in 2:TOLSource)                                           # since the first coulmn is the courseID
  {
    DD <- vector(length = R)
    ConceptIDSource <- MITMatrix[Source,q]
    counterq <- 1                                                 # counterq is a pointer to cell of vector DD that keep the corses from another university.
    for(c in 1:R)
    {
      if(CALBinary[c,match(ConceptIDSource,BB)]==1)             # if(CALBinary[c,"ConceptIDSource"]==1)
      {
        DD[counterq] <- c                                     # it is the courseID
        counterq <- counterq+1
      }
    }
    DD <- DD[ DD != 0 ]                                           # DD is a vector that keep all courses from another university hat share the same concepts as source course in the first university (MIT)
    for(j in 2:TOLTarget)                                         # Since the first coulmn is the courseID
    {
      ZZ <- vector(length = R)
      ConceptIDTarget <- MITMatrix[Target,j]
      counter <- 1
      for(v in 1:R)
      {
        if(CALBinary[v,match(ConceptIDTarget,BB)]==1)          #if(CALBinary[v,"ConceptIDTarget"]==1)
        {
          ZZ[counter] <- v                                   # v is courseID
          counter <- counter+1
        }
      }
      ZZ <- ZZ[ ZZ != 0 ]                                        # delete the zero elements from the vector
      Jadval<- expand.grid(Source,Target,ConceptIDSource,ConceptIDTarget,DD,ZZ)
      Total<-rbind(Total,Jadval)                                 # to make all possible pair of the courses for the sorce and the target course
      Total
    }
  }
}

Answer 1

有许多方面可以改进此代码并使其更快。看起来你基本上是编写C风格的代码，而不是利用内置的矢量化R函数。这是一个例子。这部分代码：

DD <- vector(length = R)
ConceptIDSource <- MITMatrix[Source,q]
counterq <- 1                                                 # counterq is a pointer to cell of vector DD that keep the corses from another university.
for(c in 1:R)
{
  if(CALBinary[c,match(ConceptIDSource,BB)]==1)             # if(CALBinary[c,"ConceptIDSource"]==1)
  {
    DD[counterq] <- c                                     # it is the courseID
    counterq <- counterq+1
  }
}
DD <- DD[ DD != 0 ]

可以这样做：

ConceptIDSource <- MITMatrix[Source,q]
CalBinaryBB <- CALBinary[,match(ConceptIDSource,BB)]
DD<-which(CalBinaryBB[1:R]==1)

在你的代码中，每次循环都会调用match，这是不必要的。而且，由于您正在尝试找到CALBinary[c,match(ConceptIDSource,BB)]==1的索引，因此R函数which将更快地执行此操作。

看起来你可以在循环的第二部分做同样的事情。并且可能还有其他优化机会。

优化For循环 - R.

1 个答案: