从嵌套列表中删除元素

时间:2017-04-14 12:46:40

标签: r mongodb

我有一个问题,我已经工作了几天,但找不到合适的答案。

我有一个需要放入mongo数据库的列表。它看起来像这样:

    listtest = list(
  list(section_id = NULL, name = "Name1", slug = "slug1"),
  list(section_id = NULL, name = 'Name2', slug = 'slug2'),
  list(section_id = NULL, name = 'Name3', slug = 'slug3', categories = 
         list(
           list(section_id = NULL, name = 'Name31', slug = 'slug31'),
           list(section_id = NULL, name = 'Name32', slug = 'slug32', categories = 
                  list(
                    list(section_id = NULL, name = 'Name321', slug = 'slug321'),
                    list(section_id = NULL, name = 'Name322', slug = 'slug322'),
                    list(section_id = NULL, name = 'Name323', slug = 'slug323')
                  )),
           list(section_id = NULL, name = 'Name33', slug = 'slug33', categories = 
                  list(
                    list(section_id = NULL, name = 'Name331', slug = 'slug331'),
                    list(section_id = NULL, name = 'Name332', slug = 'slug332'),
                    list(section_id = NULL, name = 'Name333', slug = 'slug333'),
                    list(section_id = NULL, name = 'Name334', slug = 'slug334'),
                    list(section_id = NULL, name = 'Name335', slug = 'slug335')
                  )),
           list(section_id = NULL, name = 'Name34', slug = 'slug34'),
           list(section_id = NULL, name = 'Name35', slug = 'slug35', categories = 
                  list(
                    list(section_id = NULL, name = 'Name351', slug = 'slug351', categories = 
                           list(
                             list(section_id = NULL, name = 'Name3511', slug = 'slug3511'),
                             list(section_id = NULL, name = 'Name3512', slug = 'slug3512'),
                             list(section_id = NULL, name = 'Name3513', slug = 'slug3513')
                           )
                    )
)
)
)
)
)

问题是我有一个带有section_ids的数据框,我希望根据名称或slug将其放入嵌套列表中。我设法做到了这一点,但是当数据帧中没有secion_id时,它仍然留下一些等于字符(0)的section_ids。如何删除section_id等于字符(0)的列表?我也可以将section_id更改为mongoDB中的object_id吗?或者只能在JSON中完成,而不是在列表中完成?

干杯。

1 个答案:

答案 0 :(得分:2)

我不得不重新创建一个关键表以供参考。

unique(unlist(listtest, use.names = FALSE))  %>% {
    data.frame(name_var = .[c(T,F)], slug_var = .[c(F,T)]) %>% 
        mutate(section_id = sample(4678:92456,length(name_var))) %>% 
        select(section_id, name_var, slug_var)
} -> key_table

供参考:

> head(key_table)
   section_id name_var slug_var
1       78002    Name1    slug1
2       48508    Name2    slug2
3       16510    Name3    slug3
4       89004   Name31   slug31
5       55853   Name32   slug32
6       65886  Name321  slug321

因此,从密钥表中,第一个函数匹配list元素中的name变量,并对密钥表进行索引并返回section id。

fn <- function(l, pat = NULL){
    key_table[l[['name']] == key_table[['name_var']],'section_id']
}

然后递归,遍历列表并在每次迭代时重复fn调用,用关键表中匹配的id替换section_id名称槽。在现实世界中,我猜你需要在没有明显匹配的情况下应用NA变量。但是这应该指向你,你可以根据需要进行调整:

递归函数: L是列表元素,因此在这种情况下listtest

f <- function(L){
    if("name" %in% names(L)){
    L[['section_id']] <- fn(l = L)
    } else {
        L <- L
    }
    if(is.list(L)){
        lapply(L, f)
    }else {
        L
    }
}

输出:

glimpse(toJSON(f(listtest), auto_unbox = TRUE, pretty = F))
Class 'json'  chr "[{\"section_id\":16667,\"name\":\"Name1\",\"slug\":\"slug1\"},{\"section_id\":76003,\"name\":\"Name2\",\"slug\""| __truncated__

更新以实际解决删除问题:注意:很多代码......

前言:我从上面对数据集进行了采样并创建了一个较小的key_tbl来复制不匹配的id或名称模式的条件

key_tbl <- sample_frac(key_table, 0.3)
#' Handles the matching from the key table, and substituting of the indexed match from the table, or replaces the `NULL` with an `NA`

fn <- function(l, pat = NULL){
    check <- match(l[['name']], key_tbl$name_var)
    if(is.na(check)){
        NA
    }else {
        key_tbl[check,'section_id']
    }
}

递归迭代的最大问题是处理check if a condition is met, if it is, for each sub-iteration, check if the first condition is met, and evaluate additional conditions; if any are not met, do this..else..do this.. ; then step back through and find which slots didn't meet certain conditions, and drop those

以下是嵌套列表的条件:

#' Determine if a list and not a data.frame which is an array of lists
is_list <- function(x){
  (!is.data.frame(x) && inherits(x, "list"))
}
#' Forcefull NULL detection
is_null <- function(x){
  identical(x, eval(parse(text = typeof(NULL))))
}
#' Forecful NA detection
is_na <- function(x){
  if(identical(is.na(x),logical(0)) || is.na(x)){
    TRUE
  }else {
    FALSE
  }
}

长度检查

#' Trick for determining empty objects that are still class-or-object oriented
#' and can throw an empty return
is.empty <- function(x){
  if(is.list(x)){
    chk <- length(x)
  }else if(is.character(x)){
    chk <- nchar(x)
  }else if(is.data.frame(x)){
    chk <- nrow(x)
  }else {
    chk <- 1
  }
  if(chk == 0){
    return(TRUE)
  }else{
    return(FALSE)
  }
}
#' Checks most/all the above
is.invalid <- function(x){
  if(is_null(x)){
    return(TRUE)
  }else if(is_na(x)){
    return(TRUE)
  }else if(!length(x)){
    return(TRUE)
  }else if(is.empty(x)){
    return(TRUE)
  }else {
    return(FALSE)
  }
}

以上所有的包装:

#' Vectorized function to remove any items within a list that meet any of the above tests.
drop.invalids <- function(x){
  x[!mapply(is.invalid,x)]
}

全部放在一起:

f <- function(L){
#' Since we're in a loop, we need to ensure that there is a 'name' variable
#' to match against
    if('name' %in% names(L)){
      #' We've now substituted the indexed id, or an NA
        L[['section_id']] <- fn(l = L)
    }else {
       #' We didn't do a thing, since we're not in a nested iteration yet. 
        L <- L
    }
    #' Since this is nested, and each list is a named list..
    #' Check if ANY of those items, in each sub-item, is NA.
    if(any(mapply(is.na, L[!mapply(is.list, L)]))){
        #' For each sub item, make sure that it has children, and 
        #' if so, get only those kids, since we're dropping the array
        #' that has an NA value in it.
        if(any(mapply(is.list, L))){
                L <- L[!mapply(function(x)any(is_na(x)), L)]
        }else {
            L <- NA
        }

    }else {
        L <- L
    }
#' If it's a list, recursively do all the above, 
#' but only return the good-stuff
    if(is.list(L)){
        drop.invalids(lapply(L,f))
    }else {
        drop.invalids(L)
    }
}

现在运行它:

> toJSON(f(listtest), pretty= TRUE, auto_unbox = TRUE)
[{"name":"Name3","slug":"slug3","categories":[{"name":"Name32","slug":"slug32","categories":[{"section_id":89623,"name":"Name322","slug":"slug322"}]},{"name":"Name33","slug":"slug33","categories":[{"section_id":55220,"name":"Name331","slug":"slug331"},{"section_id":44116,"name":"Name334","slug":"slug334"}]},{"section_id":42746,"name":"Name34","slug":"slug34"},{"section_id":60862,"name":"Name35","slug":"slug35","categories":[{"section_id":37357,"name":"Name351","slug":"slug351"}]}]}]