使用R编程将文字形式的数字转换为数字

时间:2019-05-07 08:37:44

标签: r

我的挑战是将输入的句子中的10和1(即单词)转换为数字10和1:

example_input <- paste0("I have ten apple and one orange")

数字可能会根据用户要求而改变,输入句子可以被标记:

my_output_toget<-paste("I have 10 apple and 1 orange")

4 个答案:

答案 0 :(得分:5)

我们可以在replacement中将键/值对作为gsubfn传递,以数字替换这些单词

library(english)
library(gsubfn)
gsubfn("\\w+", setNames(as.list(1:10), as.english(1:10)), example_input)
#[1] "I have 10 apple and 1 orange"

答案 1 :(得分:2)

textclean很容易完成此任务:

mgsub(example_input, replace_number(seq_len(10)), seq_len(10))

[1] "I have 10 apple and 1 orange"

您只需要根据数据中的最大数量来调整seq_len()参数。

一些例子:

example_input <- c("I have one hundred apple and one orange")

mgsub(example_input, replace_number(seq_len(100)), seq_len(100))

[1] "I have 100 apple and 1 orange"

example_input <- c("I have one tousand apple and one orange")

mgsub(example_input, replace_number(seq_len(1000)), seq_len(1000))

[1] "I have 1 tousand apple and 1 orange"

如果您事先不知道最大数量,可以选择一个足够大的数字。

答案 2 :(得分:2)

我为此编写了一个R包-https://github.com/fsingletonthorn/words_to_numbers,该包应该适用于更多用例。

devtools::install_github("fsingletonthorn/words_to_numbers")

library(wordstonumbers)

example_input <- "I have ten apple and one orange"

words_to_numbers(example)

[1] "I have 10 apple and 1 orange"

它也适用于更复杂的情况,例如


words_to_numbers("The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible four-hundred and ten page books made with a character set of twenty five characters (twenty two letters, as well as spaces, periods, and commas), with eighty lines per book and forty characters per line.")
#> [1] "The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible 410 page books made with a character set of 25 characters (22 letters, as well as spaces, periods, and commas), with 80 lines per book and 40 characters per line."

words_to_numbers("300 billion, 2 hundred and 79 cats")
#> [1] "300000000279 cats"

答案 3 :(得分:1)

比阿克伦(Akrun)的答案要优雅,但要base

nums = c("one","two","three","four","five",
         "six","seven","eight","nine","ten")
example_input <- paste0("I have ten apple and one orange")

aux = strsplit(example_input," ")[[1]]
aux[!is.na(match(aux,nums))]=na.omit(match(aux,nums))
example_output = paste(aux,collapse=" ")
example_output
[1] "I have 10 apple and 1 orange"

我们首先按空格分割,找到匹配的数字,然后根据位置(与数字本身一致)进行更改,然后再次粘贴。