Question

我想在我的c程序中转换给定的输入，例如：

foo_bar_something-like_this

进入这个：

thissomethingbarfoolike

说明：

每次收到_时，以下文字最多但不包括下一个_或-（或行尾）需要转到开头（以及之前的_需要删除）。每当我收到-时，下一个_或-（或行尾）的以下文字都需要附加到末尾（但不包括在内）（删除了-。

如果可能，我想使用正则表达式来实现这一目标。如果有办法直接从标准输入执行此操作，那将是最佳的。

请注意，没有必要在单个正则表达式中执行此操作。我可以做某种循环来做到这一点。在这种情况下，我认为我必须首先捕获变量中的数据，然后执行我的算法。

我必须对输入中的每一行执行此操作，每一行都以\n结尾。

编辑：我已经为此编写了一个代码而没有使用任何与正则表达式相关的代码，除此之外我应该首先发布它，道歉。我知道不应该使用scanf来防止缓冲区溢出，但是在程序中使用之前，字符串已经过验证。代码如下：

#include <stdio.h>
#include <stdlib.h>
#define MAX_LENGTH 100001 //A fixed maximum amount of characters per line
int main(){
  char c=0;
  /*
  *home: 1 (append to the start), 0 (append to the end)
  *str: array of words appended to the begining
  *strlen: length of str
  *line: string of words appended to the end
  *linelen: length of line
  *word: word between a combination of symbols - and _
  *wordlen: length of the actual word
  */
  int home,strlen,linelen,wordlen;
  char **str,*line,*word;
  str=(char**)malloc(MAX_LENGTH*sizeof(char*));
  while(c!=EOF && scanf("%c",&c)!=EOF){
    line=(char*)malloc(MAX_LENGTH);
    word=(char*)malloc(MAX_LENGTH);
    line[0]=word[0]='\0';
    home=strlen=linelen=wordlen=0;
    while(c!='\n'){
      if(c=='-'){ //put word in str and restart word to '\0'
        home=1;
        str[strlen++]=word;
        word=(char*)malloc(MAX_LENGTH);
        wordlen=0;
        word[0]='\0';
      }else if(c=='_'){ //put word in str and restart word to '\0'
        home=0;
        str[strlen++]=word;
        word=(char*)malloc(MAX_LENGTH);
        wordlen=0;
        word[0]='\0';
      }else if(home){ //append the c to word
        word[wordlen++]=c;
        word[wordlen]='\0';
      }else{ //append c to line
        line[linelen++]=c;
        line[linelen]='\0';
      }
      scanf("%c",&c); //scan the next character
    }
    printf("%s",word); //print the last word
    free(word);
    while(strlen--){ //print each word stored in the array
      printf("%s",str[strlen]);
      free(str[strlen]);
    }
    printf("%s\n",line); //print the text appended to the end
    free(line);
  }
  return 0;
}

Answer 1

我认为正则表达式不能满足您的要求，因此我在C中编写了一个简单的状态机解决方案。

//
//Discription: This Program takes a string of character input, and parses it
//using underscore and hyphen as queue to either send data to
//the begining or end of the output.
//
//Date: 11/18/2017
//
//Author: Elizabeth Harasymiw
//

#include <stdio.h>
#include <string.h>
#define MAX_SIZE 100

typedef enum{ AppendEnd, AppendBegin } State; //Used to track either writeing to begining or end of output

int main(int argc,char**argv){
        char ch;                   //Used to hold the character currently looking at
        State state=AppendEnd;     //creates the State
        char Buffer[MAX_SIZE]={};  //Current Ouput
        char Word[MAX_SIZE]={};    //Pending data to the Buffer
        char *c;                   //Used to index and clear Word
        while((ch = getc(stdin)) != EOF){
                if(ch=='\n')continue;
                switch(state){
                        case AppendEnd:
                                if( ch == '-' )
                                        break;
                                if( ch == '_'){
                                        state = AppendBegin;     //Change State
                                        strcat(Buffer, Word);    //Add Word to end of Output
                                        for(c=Word;*c;c++)*c=0;  //Clear Word
                                        break;
                                }
                                {
                                        int postion = -1;
                                        while(Word[++postion]);  //Find end of Word
                                        Word[postion] = ch;      //Add Character to end of Word
                                }
                                break;
                        case AppendBegin:
                                if( ch == '-' ){
                                        state = AppendEnd;       //Change State
                                        strcat(Word, Buffer);    //Add Output to end of Word
                                        strcpy(Buffer, Word);    //Move Output from Word back to Output
                                        for(c=Word;*c;c++)*c=0;  //Clear Word
                                        break;
                                }
                                if( ch == '_'){
                                        strcat(Word, Buffer);    //Add Output to end of Word
                                        strcpy(Buffer, Word);    //Move Output from Word back to Output
                                        for(c=Word;*c;c++)*c=0;  //Clear Word
                                        break;
                                }
                                {
                                        int postion = -1;
                                        while(Word[++postion]);  //Find end of Word
                                        Word[postion] = ch;      //Add Character to end of Word
                                }
                                break;

                }
        }
        switch(state){ //Finish adding the Last Word Buffer to Output
                case AppendEnd:
                        strcat(Buffer, Word); //Add Word to end of Output
                        break;
                case AppendBegin:
                        strcat(Word, Buffer); //Add Output to end of Word
                        strcpy(Buffer, Word); //Move Output from Word back to Output
                        break;
        }

        printf("%s\n", Buffer);
}

Answer 2

这个可以使用循环使用正则表达式，假设您不严格限制为ANSI。以下使用PCRE。

^{（注意，这个答案故意没有显示C代码。它只是为了通过展示使用正则表达式的可能技术来指导OP，因为它不明显如何这样做。）}

方法A

使用两种不同的正则表达式。

第1/2部分（Demo）

正则表达式：([^_\n]*)_([^_\n]*)(_.*)?替换：$2--$1$3

这会将下一个下划线后面的文本移动到开头，并将--附加到其上。它还删除了下划线。您需要在循环中重复此替换，直到找不到更多匹配项。

对于您的示例，这会导致以下字符串：

this--something-like--bar--foo

第2/2部分（Demo）：

正则表达式：(.*)(?<!-)-(?!-)(\w+)(.*)替换：$1$3--$2

这会将下一个单个连字符后面的文本移动到结尾，并将--添加到其前面。它还删除了连字符。您需要在循环中重复此替换，直到找不到更多匹配项。

对于您的示例，这会导致以下字符串：

this--something--bar--foo--like

从字符串中删除连字符以获得结果。

请注意，第一个正则表达式可以简化为以下内容，并且仍然有效：

([^_]*)_([^_]*)(_.*)?

\n只需要在演示中显示中间循环结果。

以下是使用--作为新分隔符的原因：

需要分隔符，以便第2部分中的正则表达式可以找到连字符前缀文本的正确结尾;
不能使用下划线，因为它会干扰第1部分中的正则表达式，从而导致无限循环;
不能使用连字符，因为它会导致第2部分中的正则表达式找到无关的文本;
虽然输入中存在从不的任何单个字符分隔符都可以工作并导致更简单的第2部分正则表达式，但--是 1 分隔符，允许输入中的任何和所有字符 ^*。
\n实际上是完美 ^*分隔符，但无法在此答案中使用，因为它不允许演示显示中间结果。（提示：它应该是您使用的实际分隔符。）

方法B

结合两个正则表达式。

（Demo）

正则表达式：([^_\n]*)_([^_\n]*)(_.*)?|(.*)(?<!-)-(?!-)(\w+)(.*)替换：$2--$1$3$4$6--$5

对于您的示例，这会导致以下字符串：

----this------something--bar--foo----like

和以前一样，删除字符串中的所有连字符以获得结果。

与以前一样，正则表达式可以简化为以下内容并且仍然有效：

([^_]*)_([^_]*)(_.*)?|(.*)(?<!-)-(?!-)(\w+)(.*)

这种组合正则表达式的工作原理是因为捕获组1,2和3与5,5和4组相互排斥。 6.然而，有额外连字符的副作用。

警告：

^*如果输入包含连续的连字符，则使用--作为分隔符会失败。所有其他＆＃34;好＆＃34;分隔符有类似的失败边缘情况。保证只有\n不存在于输入中，因此是故障安全的。

使用ANSI C中的正则表达式扫描和交换字符串值

2 个答案:

方法A

方法B

警告：