有条件拆分

时间:2021-07-05 17:27:49

标签: parsing split stata

我有一个类似于一系列键值对象的字符串列,有点像这样:

<头>
索引 字符串列
1 A:blahblahblah。乙:随便。 C: idkidk。
2 答:废话。 C: idkidk
3 B:随便。 C: idkidk
4 B:随便。 D:随机的东西

我需要为每个特定键生成新列,显示其对应的值。问题是仅仅 split(.) 不起作用,因为并非所有条目都具有相同的键。

这基本上就是我想要实现的:

<头>
A B C D
blahblahblah 随便 idkidk
废话 idkidk
随便 idkidk
随便 随机材料

我已经挣扎了一段时间,但似乎什么都不对。有什么建议吗?

1 个答案:

答案 0 :(得分:2)

这里有一个解决方案,只要没有值或键包含 : 并且没有键包含任何空格。我更改了示例数据中的一个键来测试多字母键。

* Example generated by -dataex-. For more info, type help dataex
clear
input byte Index str68 String_column
1 "A:blahblahblah. non_single_letter_key: whatever whatever. C: idkidk."
2 "A:blahblah. C: idkidk"                           
3 "B:whatever whatever. C: idkidk"                  
4 "B:whatever whatever. D: randomstuff"             
end

* Get the number of rows and loop over them
count 
forvalues row = 1/`r(N)' {
    
    *Get the raw string for this 
    local raw_string = String_column[`row']

    *Get the first key in the raw string (anything before the first :)
    gettoken nextkey raw_string : raw_string , parse(":")
    local raw_string = subinstr("`raw_string'",":","",1) //Remove the parse character ":"
    
    *Loop over the raw_string until it is empty
    while "`raw_string'" != "" {
        
        *Get the key from above or last loop
        local key "`nextkey'"
        
        *For the last pair in the string when raw_string only contains the last value
        if strpos("`raw_string'",":") == 0 {
            local value "`raw_string'"
            local raw_string ""
        }
        
        *Not yet last pair, parse out this value and next key
        else {
            *Get all content until next parse character
            gettoken value_and_nextkey raw_string : raw_string , parse(":")
            local raw_string = subinstr("`raw_string'",":","",1) //Remove the parse character ":"
            
            *Reverse that content and get the first word in the reversed result
            local v_and_nk_reversed = strreverse("`value_and_nextkey'")
            gettoken next_key_reversed value_reversed : v_and_nk_reversed , parse(" ")
            
            *Reverse the value for this pair and next key
            local value   = strreverse("`value_reversed'")
            local nextkey = strreverse("`next_key_reversed'")
        }
        
        *Test if a variable exist for this key, if not create it
        cap confirm variable `key'
        if _rc != 0 {
            gen `key' = "" 
        }
        
        *Add the value for this row in the variable for this key
        replace `key' = "`value'" if _n == `row'
    }
}
相关问题