首次出现字符串模式后提取字符串

时间:2017-05-30 18:44:12

标签: r string

我无法提取“首次出现”字样后的所有文字' PRODUCTS'。我正在使用的文字在下面并存储在

test$description

(有更多文字但R截断最后一部分)

[1] "Hey guys! Been wanting to film a Get Ready With Me for a while, just to sit back and chill and chit chat with you all! It has been a MINUTE since I have done one of these so I hope you enjoy this first impressions get ready with me :D Love you guys! \n\nDONT FORGET TO HIT SUBSCRIBE! :D \n---------------------------------------------------------------------------------------------------------------\nFACE PRODUCTS : \n\nH2O Green Tea Matcha Facial Essence -  \nMILK Makeup Blur Stick - \nLoreal Total Coverage Foundation - \nGallany Concealer - \n\nBecca Soft Light Powder - \nPixie X Maryam NYC Glow and Bronze Palete - \nClinique Honey Cheek Pop Blush -\n---------------------------------------------------------------------------------------------------------------\nEYE PRODUCTS! \n\nColourpop Pressed Eyeshadows -  <truncated>

当我使用时: sub(".*PRODUCTS",'',test$description)

我明白了:

[1] "! \n\nColourpop Pressed Eyeshadows - \n\nTarte Cosmetics Fake Away Pencil - \n\nKat Vond D Trooper Eyeliner - \n\nNubounsom Dragon Li Lashes - Use code MANNYMUA to save 20% - \n---------------------------------------------------------------------------------------------------------------\nLIPS \n\nMorphe Brushes Liquid Lipstick in the shade Mood - USE CODE MANNYMUA to save money -\n--------------------------------------------------------------------------------------------\nBRUSHES AND TOOLS - \n\nMorphe Brushes - use code \"MANNYMUA\" all caps for 10% off everything! - \n- \nMorphe E2 Bronzer Brush - \nMorphe E4 Blush Brush - \nMorphe MB13 Nose Contour - \nMorphe M510 Highlight Brush - \n\nEYES:\nE2... <truncated>

所以,只有在第二次出现&#39; PRODUCTS&#39;

之后的所有内容

我使用时:sub(".*PRODUCTS ",'',test$description)

我明白了:

[1] ": \n\nH2O Green Tea Matcha Facial Essence -  \nMILK Makeup Blur Stick - \n\nLoreal Total Coverage Foundation - \nGallany Concealer - \n\nBecca Soft Light Powder - \n\nPixie X Maryam NYC Glow and Bronze Palete - \n\nClinique Honey Cheek Pop Blush - \n\n---------------------------------------------------------------------------------------------------------------\nEYE PRODUCTS! \n\nColourpop Pressed Eyeshadows - \n\nTarte Cosmetics Fake Away Pencil - \n\nKat Vond D Trooper Eyeliner - \n\nNubounsom Dragon Li Lashes - Use code MANNYMUA to save 20% - \n\n---------------------------------------------------------------------------------------------------------------\nLIPS \n\nMorphe Brushes Liquid Lipstick in the shade Mood - USE CODE MANNYMUA to save money... <truncated>

我认为问题在于产品与产品之间的关系。和第一次出现的结肠以及“产品”之间缺乏空间。和第二次出现的感叹号。但我试图告诉R只是寻找字符串&#39; PRODUCTS&#39;。如何让它忽略间距?

1 个答案:

答案 0 :(得分:2)

你几乎拥有它。而是使用sub(".*?PRODUCTS",'',test$description)
注意添加?,产品后没有空格。默认情况下,匹配是&#34;贪婪&#34 ;;它尽可能匹配,因此.*PRODUCTS一直持续到产品的 last 副本。添加?会关闭贪婪匹配,因此它只会转到第一个实例。