Question

我有一个URL，以JSON格式显示某个网站API的内容（文件ID）。为了以编程方式执行此操作，我使用jsonlite包的fromJSON（txt）函数，然后将JSON解析为向量（或列表，不确定）。

这在我的家用电脑上完美运行。但是，当我在工作中运行相同的相同代码时，似乎fromJSON（txt）不会识别URL，而是尝试解析实际的URL文本，因为我收到以下错误：

 Error: lexical error: invalid char in json text.
                                       https://api.gdc.cancer.gov/file
                     (right here) ------^

我已多次检查并重新检查我的代码和网址。粘贴到浏览器中并返回JSON格式的文本时，URL可以正常工作。

我尝试了几种替代方法，例如jsonlite包的unserializeJSON（）和RJSONIO包的fromJSON（），后者会产生不同的错误。

我很感激在解决错误方面有任何帮助......

以下是我的代码的相关部分：

# The URL (works fine in a browser):
urlForIDs <- "https://api.gdc.cancer.gov/files?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%0A%20%20%20%20%22op%22%3A%20%22and%22%2C%0A%20%20%20%20%22content%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22op%22%3A%20%22in%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22content%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22cases.project.program.name%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22value%22%3A%20%22TCGA%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22op%22%3A%20%22and%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22content%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22op%22%3A%20%22in%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22content%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22cases.project.disease_type%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22value%22%3A%20%22%2ACarcinoma%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22op%22%3A%20%22in%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22content%22%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22field%22%3A%20%22cases.project.primary_site%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22value%22%3A%20%22Breast%22%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%5D%0A%7D%0A%2C%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22type%22%2C%22value%22%3A%22copy_number_segment%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22data_category%22%2C%22value%22%3A%22Copy%20Number%20Variation%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22data_type%22%2C%22value%22%3A%22Masked%20Copy%20Number%20Segment%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22experimental_strategy%22%2C%22value%22%3A%22Genotyping%20Array%22%7D%7D%5D%7D%5D%7D&fields=file_id&size=5000&related_files=false"

# Another URL which I tried, that does the same thing, but when creating this one I minimised the JSON (removed white spaces) before encoding it:
# The first one worked on Chrome browser but not in Explorer, this one does work in Explorer, but not in the fromJSON() function:
url2 <- "https://api.gdc.cancer.gov/files?filters=%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22cases.project.program.name%22%2C%22value%22%3A%22TCGA%22%7D%7D%2C%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22cases.project.disease_type%22%2C%22value%22%3A%22%2ACarcinoma%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22cases.project.primary_site%22%2C%22value%22%3A%22Breast%22%7D%7D%5D%7D%5D%7D%2C%7B%22op%22%3A%22and%22%2C%22content%22%3A%5B%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22type%22%2C%22value%22%3A%22copy_number_segment%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22data_category%22%2C%22value%22%3A%22Copy%20Number%20Variation%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22data_type%22%2C%22value%22%3A%22Masked%20Copy%20Number%20Segment%22%7D%7D%2C%7B%22op%22%3A%22in%22%2C%22content%22%3A%7B%22field%22%3A%22experimental_strategy%22%2C%22value%22%3A%22Genotyping%20Array%22%7D%7D%5D%7D%5D%7D&fields=file_id&size=5000&related_files=false"

fileIDs <- fromJSON(urlForIDs) # I have tried changing parameters, such as 'simplifyVector = FALSE' but nothing seems to work.

# The following line is not executed, since the error happens before
fileIDs$data$hits$file_id

也许最奇怪的是，相同的代码，复制和粘贴，在我的家用电脑上运行良好。

提前致谢。

更新试图调试问题，我发现问题是当jsonlite包中的以下函数到达时，它似乎检查是否有URL，否则将其视为JSON文本。出于某种原因，它进入＆＃34;否则＆＃34; part ...这是功能：

function (txt, bigint_as_char = FALSE) 
{
    if (inherits(txt, "connection")) {
        parse_con(txt, bigint_as_char)
    }
    else {
        parse_string(txt, bigint_as_char)
    }
}

更新＃2：我意识到，当我将链接粘贴到Microsoft Edge或Internet Explorer时，某些URL会被切断，然后我会收到一条消息，告知它不是有效的JSON。我更改了默认设置以将Chrome用作默认浏览器，因为Chrome无法将其关闭。但它仍然无法运作...... 这可能是问题吗？有什么建议吗？

最后更新：我写信给包装的创建者Jeroen Ooms，他建议我从GitHub下载包，因为那里的问题已得到解决。这是一年多以前的事情，所以我想现在从CRAN下载时包也没有这个问题。感谢所有回复的人！

Answer 1

您可以使用readLines直接从URL中读取文本，手动为其分配一个'json'类，然后使用jsonlite转换为R对象。

注意：您会收到一些关于不完整行尾

的警告

res <- readLines(urlForIDs)
res2 <- readLines(url2)

class(res) <- "json"
class(res2) <- "json"

## View the raw JSON
jsonlite::prettify(res)
jsonlite::prettify(res2)

## convert to data.frame
df <- jsonlite::fromJSON(res)
df2 <- jsonlite::fromJSON(res2)

str(df)
# List of 2
# $ data    :List of 2
# ..$ hits      :'data.frame':  2223 obs. of  2 variables:
#   .. ..$ file_id: chr [1:2223] "2f22c96a-7b69-4e9c-96ac-be58fc2a79f1" "38d7d00a-594d-4bdc-a34c-660bfc195ff0" "03596a48-d4d1-4d8e-b76b-75fe8c0f0b75" "6bfe38b2-f0bb-4a79-83fd-b0c18c0f6a79" ...
# .. ..$ id     : chr [1:2223] "2f22c96a-7b69-4e9c-96ac-be58fc2a79f1" "38d7d00a-594d-4bdc-a34c-660bfc195ff0" "03596a48-d4d1-4d8e-b76b-75fe8c0f0b75" "6bfe38b2-f0bb-4a79-83fd-b0c18c0f6a79" ...
# ..$ pagination:List of 7
# .. ..$ count: int 2223
# .. ..$ sort : chr ""
# .. ..$ from : int 0
# .. ..$ page : int 1
# .. ..$ total: int 2223
# .. ..$ pages: int 1
# .. ..$ size : int 5000
# $ warnings: Named list()


str(df2)
# List of 2
# $ data    :List of 2
# ..$ hits      :'data.frame':  2223 obs. of  2 variables:
#   .. ..$ file_id: chr [1:2223] "2f22c96a-7b69-4e9c-96ac-be58fc2a79f1" "38d7d00a-594d-4bdc-a34c-660bfc195ff0" "03596a48-d4d1-4d8e-b76b-75fe8c0f0b75" "6bfe38b2-f0bb-4a79-83fd-b0c18c0f6a79" ...
# .. ..$ id     : chr [1:2223] "2f22c96a-7b69-4e9c-96ac-be58fc2a79f1" "38d7d00a-594d-4bdc-a34c-660bfc195ff0" "03596a48-d4d1-4d8e-b76b-75fe8c0f0b75" "6bfe38b2-f0bb-4a79-83fd-b0c18c0f6a79" ...
# ..$ pagination:List of 7
# .. ..$ count: int 2223
# .. ..$ sort : chr ""
# .. ..$ from : int 0
# .. ..$ page : int 1
# .. ..$ total: int 2223
# .. ..$ pages: int 1
# .. ..$ size : int 5000
# $ warnings: Named list()

另外，请注意length of the URL

Answer 2

问题已修复（一年前，但可以共享以防其他人使用）。

我写信给该软件包的创建者Jeroen Ooms，他建议我从GitHub下载该软件包，因为该问题已得到解决。这是一年多以前的事，所以我想现在从CRAN下载该标准软件包时也没有这个问题。

要从GitHub下载：

devtools::install_github("jeroen/jsonlite")

在J

2 个答案: