我正在尝试在字符串列中分隔数字和字符。到目前为止,我一直在使用tidyr::separate
来执行此操作,但是在“异常”情况下遇到错误。
假设我有以下数据
df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))
我想获取带有列的数据框
data.frame(c2 = c("5.5", "2", "3.1", NA),
c3 = c("K", "M", NA, "M))
到目前为止,我一直在使用tidyr::separate
df %>%
separate(c1, into =c("c2", "c3"), sep = "(?<=[0-9])(?=[A-Za-z])")
但这仅适用于前三种情况。我意识到这是因为?<=...
和?=...
需要使用正则表达式。如何修改此代码以捕获字母前缺少数字的情况?也在尝试使用extract
函数,但没有成功。
编辑:我想一种解决方法是将其分解为
df$col2 <- as.numeric(str_extract(df$col1, "[0-9]+"))
df$col3 <- (str_extract(df$col1, "[aA-zZ]+"))
但是我很好奇是否还有其他处理方式。
答案 0 :(得分:2)
Fatal Exception: java.lang.IllegalStateException: Not allowed to start service Intent { act=<omitted> cmp=<omitted>.VoiceSessionService }: app is in background uid UidRecord{72951c4 u0a302 TPSL bg:+1m43s888ms idle change:cached procs:1 proclist:21717, seq(179,179,179)}
at android.app.ContextImpl.startServiceCommon(ContextImpl.java:1666)
at android.app.ContextImpl.startService(ContextImpl.java:1611)
at android.content.ContextWrapper.startService(ContextWrapper.java:677)
at <omitted>.VoiceSessionService$Companion.start(VoiceSessionService.java:52)
at <omitted>.RootActivity.onStart(RootActivity.java:531)
at android.app.Instrumentation.callActivityOnStart(Instrumentation.java:1391)
at android.app.Activity.performStart(Activity.java:7347)
at android.app.ActivityThread.handleStartActivity(ActivityThread.java:3110)
at android.app.servertransaction.TransactionExecutor.performLifecycleSequence(TransactionExecutor.java:180)
at android.app.servertransaction.TransactionExecutor.cycleToPath(TransactionExecutor.java:165)
at android.app.servertransaction.TransactionExecutor.executeLifecycleState(TransactionExecutor.java:142)
at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:70)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1926)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:214)
at android.app.ActivityThread.main(ActivityThread.java:6981)
at java.lang.reflect.Method.invoke(Method.java)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1445)
您可以通过这种方式简单地使用extract(df, c1, into =c("c2", "c3"), "([\\.\\d]*)([a-zA-Z]*)")
# c2 c3
# 1 5.5 K
# 2 2 M
# 3 3.1
# 4 M
,但是应该有一个更优雅的方法。
seperate
答案 1 :(得分:1)
我们可以使用基数R sub
分别删除字符和数字以获取不同的列。
df$c2 <- sub("[A-Za-z]+", "", df$c1)
df$c3 <- sub("\\d*\\.?\\d*", "", df$c1)
df
# c1 c2 c3
#1 5.5K 5.5 K
#2 2M 2 M
#3 3.1 3.1
#4 M M
如果以后不需要,可以通过执行c1
来删除df$c1 <- NULL
列。
答案 2 :(得分:0)
答案 3 :(得分:0)
我们可以使用extract
中的tidyr
library(tidyr)
extract(df, c1, into = c("c2", "c3"), "^([0-9.]*)([A-Z]*)",
convert = TRUE, remove = FALSE)
# c1 c2 c3
#1 5.5K 5.5 K
#2 2M 2.0 M
#3 3.1 3.1
#4 M NA M
或者使用read.csv
中的base R
read.csv(text= sub("^([0-9.]*)", "\\1,", df$c1),
header = FALSE, stringsAsFactors = FALSE, col.names = c("c2", "c3"))
答案 4 :(得分:0)
您可以使用软件包 unglue :
df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))
library(unglue)
unglue_unnest(df, c1, "{c2}{c3=\\D*}", convert = TRUE)
#> c2 c3
#> 1 5.5 K
#> 2 2.0 M
#> 3 3.1
#> 4 NA M