我有一个距离矩阵df1,显示了8个位置a:h之间的距离
x <- c("a","b","c","d","e","f","g","h")
df1 <- data.frame(a=c(0,1,2,3,4,5,6,7), b=c(1,0,1,2,3,4,5,6),
c=c(2,1,0,1,2,3,4,5), d=c(3,2,1,0,1,2,3,4),
e=c(4,3,2,1,0,1,2,3), f=c(5,4,3,2,1,0,1,2),
g=c(6,5,4,3,2,1,0,1), h=c(7,6,5,4,3,2,1,0),
row.names=x)
> df1
a b c d e f g h
a 0 1 2 3 4 5 6 7
b 1 0 1 2 3 4 5 6
c 2 1 0 1 2 3 4 5
d 3 2 1 0 1 2 3 4
e 4 3 2 1 0 1 2 3
f 5 4 3 2 1 0 1 2
g 6 5 4 3 2 1 0 1
h 7 6 5 4 3 2 1 0
我还有另一个数据框df2,显示了每个月记录的位置
df2 <- data.frame(Month=c(rep(11,3),rep(12,4),rep(1,3)),
Location=sample(letters[1:8],10,replace=T))
> df2
Month Location
1 11 c
2 11 a
3 11 d
4 12 f
5 12 c
6 12 f
7 12 a
8 1 b
9 1 b
10 1 h
我想提取每个月记录的位置之间的最大距离。输出应该看起来像这样
Month Max.Distance
1 11 3
2 12 5
3 1 6
我还想计算每个月位置之间的累计距离,其输出结果如下:
Month Cum.Distance
1 11 5
2 12 11
3 1 6
我希望这是有道理的。我考虑过使用for循环,但是我对R循环的了解有限,因此,我们将不胜感激。非常感谢!
答案 0 :(得分:0)
> install.packages("rgdal")
Installing package into ‘/home/rstudio/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/rgdal_1.3-3.tar.gz'
Content type 'application/x-gzip' length 1670656 bytes (1.6 MB)
==================================================
downloaded 1.6 MB
* installing *source* package ‘rgdal’ ...
** package ‘rgdal’ successfully unpacked and MD5 sums checked
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
configure: CC: gcc -std=gnu99
configure: CXX: g++
configure: rgdal: 1.3-3
checking for /usr/bin/svnversion... yes
configure: svn revision: 759
checking whether g++ supports C++11 features by default... no
checking whether g++ supports C++11 features with -std=gnu++11... yes
configure: C++11 support available
checking for gdal-config... /usr/bin/gdal-config
checking gdal-config usability... yes
configure: GDAL: 1.11.3
checking GDAL version >= 1.11.4... no
configure: error: upgrade GDAL to 1.11.4 or later
ERROR: configuration failed for package ‘rgdal’
* removing ‘/home/rstudio/R/x86_64-pc-linux-gnu-library/3.4/rgdal’
Warning in install.packages :
installation of package ‘rgdal’ had non-zero exit status
The downloaded source packages are in
‘/tmp/RtmpGUxbcA/downloaded_packages’
第二个目标:
df2 <- read.table(text = "
Month Location
1 11 c
2 11 a
3 11 d
4 12 f
5 12 c
6 12 f
7 12 a
8 1 b
9 1 b
10 1 h", h = T)
aggregate(Location ~ Month, df2, function(j) diff(range(sapply(j, function(i) grep(i, letters)))))
Month Location
1 1 6
2 11 3
3 12 5
答案 1 :(得分:0)
首先,按照您的示例定义数据框。
x <- c("a","b","c","d","e","f","g","h")
df1 <- data.frame(a=c(0,1,2,3,4,5,6,7), b=c(1,0,1,2,3,4,5,6),
c=c(2,1,0,1,2,3,4,5), d=c(3,2,1,0,1,2,3,4),
e=c(4,3,2,1,0,1,2,3), f=c(5,4,3,2,1,0,1,2),
g=c(6,5,4,3,2,1,0,1), h=c(7,6,5,4,3,2,1,0),
row.names=x)
df2 <- data.frame(Month=c(rep(11,3),rep(12,4),rep(1,3)),
Location=sample(letters[1:8],10,replace=T))
# Month Location
# 1 11 d
# 2 11 c
# 3 11 h
# 4 12 e
# 5 12 c
# 6 12 b
# 7 12 h
# 8 1 h
# 9 1 g
# 10 1 b
接下来,我定义一个函数,该函数查找月份m
的所有可能的位置组合,然后寻找最大距离。
# Find maximum distance
max_dist <- function(m){
# Check if it's just one location
if(sum(df2$Month == m) == 1)return(0)
# Get all combinations of locations for given month
tmp <- t(combn(match(df2$Location[df2$Month == m], rownames(df1)), 2))
# Get max value from these location combinations
max(df1[tmp[, 1], tmp[, 2]])
}
最后,我将该函数应用于df2
中的所有月份,并重新打包为数据框。
# Run function on all months
data.frame(month = unique(df2$Month), max_dist = unlist(lapply(unique(df2$Month), max_dist)))
# month max_dist
# 1 11 5
# 2 12 6
# 3 1 6
以下提供了总距离:
tot_dist <- function(m){
tmp <- match(df2$Location[df2$Month == m], rownames(df1))
sum(df1[cbind(head(tmp, -1), tail(tmp, -1))])
}
为回应您的评论,我认为这可行:
# Find maximum distance
max_dist <- function(m){
# Check if it's just one location
if(sum(df2$Month == m) == 1)return(0)
# Get all locations
locs <- which(df2$Month == m)
if(tail(which(df2$Month == m), 1) != nrow(df2))locs <- c(locs, tail(which(df2$Month == m), 1) + 1)
# Get all combinations of locations for given month
tmp <- t(combn(match(df2$Location[locs], rownames(df1)), 2))
# Get max value from these location combinations
max(df1[tmp[, 1], tmp[, 2]])
}
从本质上讲,它仅会获得m
月份的下一行,前提是还有另一行。等效总距离如下:
tot_dist <- function(m){
# Get all locations
locs <- which(df2$Month == m)
if(tail(which(df2$Month == m), 1) != nrow(df2))locs <- c(locs, tail(which(df2$Month == m), 1) + 1)
tmp <- match(df2$Location[locs], rownames(df1))
sum(df1[cbind(head(tmp, -1), tail(tmp, -1))])
}