循环双引号但忽略单引号中的双引号

时间:2013-10-22 21:46:00

标签: c# iteration quotes double-quotes

我认为这是一个逻辑问题。我在C#编码,但欢迎使用一般的伪代码解决方案。

我有这个文本文件,例如,包含这个文本:

blah "hello john"
blah 'the code is "flower" '
blah "good night"

我想循环使用双引号并对它们执行某些操作,但我想忽略单引号中包含的双引号。我得到开头双引号和结尾双引号的位置(string data包含文本文件的内容):

C#

// Start searching from beginning
int quotestart = 0, quoteend = 0;

while (data.IndexOf('"', quotestart) != -1)
{
  // Get opening double quote
  quotestart = data.IndexOf('"', quotestart);
  // Get ending double quote
  quoteend = data.IndexOf('"', quotestart + 1);

  string sub = data.Substring(quotestart + 1, quoteend - quotestart - 1);
  Console.WriteLine(sub);

  // Set the start position for the next round
  quotestart = quoteend + 1;
}

使用我的代码,输出将是:

hello john
flower
good night

因为“花”在单引号内,我希望我的输出为:

hello john
good night

修改

我目前正在开发一种方法,我首先在单引号之间填充所有数据,例如'A'。这样,当我遍历双引号时,忽略单引号之间的任何数据。不确定这是否是正确的方法。

4 个答案:

答案 0 :(得分:7)

  

我尝试谷歌搜索有限状态机,但没有正式的计算机工程培训我必须承认我有点迷失。你还有其他指示吗?

FSM是最简单的计算机形式之一。这个想法是你有一定数量的“状态”信息和稳定的输入流。每个输入都会导致状态以可预测的方式发生变化,仅基于当前状态和当前输入,并导致发生可预测的输出

因此,假设您的输入是单个字符,输出是单个字符或“空”字符。这是一个可以满足您需求的FSM:

  • 状态为OUTSIDEINSIDEDOUBLEINSIDESINGLE
  • 输入为"'x。 (WOLOG让x代表任何其他角色。)

我们有三种状态和三种输入,因此有九种可能的组合。

  • 如果我们OUTSIDE并获得x,请保留OUTSIDE并发出null
  • 如果我们OUTSIDE并获得",请转到INSIDEDOUBLE并发出null
  • 如果我们OUTSIDE并获得',请转到INSIDESINGLE并发出null
  • 如果我们INSIDEDOUBLE并获得x,请保留INSIDEDOUBLE并发出x
  • 如果我们INSIDEDOUBLE并获得",请转到OUTSIDE并发出null
  • 如果我们INSIDEDOUBLE并获得',请保留INSIDEDOUBLE并发出'
  • 如果我们INSIDESINGLE并获得x,请保留INSIDESINGLE并发出null
  • 如果我们INSIDESINGLE并获得",请保留INSIDESINGLE并发出null
  • 如果我们INSIDESINGLE并获得',请转到OUTSIDE并发出null

唯一剩下的就是说开始状态是OUTSIDE

所以我们假设输入为1 " 2 " 3 ' 4 " 5 " ' 6。州和产出是:

  • OUTSIDE获取1,发出null,保留OUTSIDE
  • OUTSIDE获取",发出null,发送INSIDEDOUBLE
  • INSIDEDOUBLE获取2,发出2,保留INSIDEDOUBLE
  • INSIDEDOUBLE获取",发出null,发送OUTSIDE
  • OUTSIDE获取3,发出null,保留OUTSIDE
  • OUTSIDE获取',发出null,发送INSIDESINGLE

...自己填写其余部分。

这是否足以让您编写代码?

答案 1 :(得分:5)

很好的解决方案;使用switch语句是为小型FSM执行此操作的传统方法,但是当状态和输入的数量变得庞大且复杂时,它变得难以处理。以下是一种更易于扩展的备用解决方案:表驱动的解决方案。也就是说,将有关转换和动作的事实放入数组中,然后FSM只不过是一系列数组查找:

// States
const int Outside = 0;
const int InDouble = 1;
const int InSingle = 2;

// Inputs
const int Other = 0;
const int DoubleQuote = 1;
const int SingleQuote = 2;

static readonly int[,] stateTransitions =
{   /*               Other     DoubleQ   SingleQ */
    /* Outside */  { Outside,  InDouble, InSingle },
    /* InDouble */ { InDouble, Outside,  InDouble },
    /* InSingle */ { InSingle, InSingle, Outside }
};

// Do we emit the character or ignore it?
static readonly bool[,] actions =
{   /*              Other   DoubleQ SingleQ */
    /* Outside */ { false,  false,  false },
    /* InDouble */{ true,   false,  true  },
    /* InSingle */{ false,  false,  false }
};

static int Classify(char c)
{
    switch (c)
    {
        case '\'': return SingleQuote;
        case '\"': return DoubleQuote;
        default: return Other;
    }
}

static IEnumerable<char> FSM(IEnumerable<char> inputs)
{
    int state = Outside;
    foreach (char input in inputs)
    {
        int kind = Classify(input);
        if (actions[state, kind]) 
            yield return input;
        state = stateTransitions[state, kind];
    }
}

现在我们可以用

获得结果
string.Join("", FSM(@"1""2'3""4""5'6""7'8""9""A'B"))

答案 2 :(得分:2)

非常感谢Eric Lippert提供此解决方案背后的逻辑。如果有人需要,我在下面提供我的C#解决方案。为了清晰起见,我留下了一些不必要的重新分配。

string state = "outside";

for (int i = 0; i < data.Length; i++)
{
    c = data[i];
    switch (state)
    {
        case "outside":
            switch (c)
            {
                case '\'':
                    state = "insidesingle";
                    break;
                case '"':
                    state = "insidedouble";
                    break;
                default:
                    state = "outside";
                    break;
            }
            break;

        case "insidedouble":
            switch (c)
            {
                case '\'':
                    state = "insidedouble";
                    Console.Write(c);
                    break;
                case '"':
                    state = "outside";
                    break;
                default:
                    state = "insidedouble";
                    Console.Write(c);
                    break;
            }
            break;  

        case "insidesingle":
            switch (c)
            {
                case '\'':
                    state = "outside";
                    break;
                case '"':
                    state = "insidesingle";
                    break;
                default:
                    state = "insidesingle";
                    break;
            }
            break;
    }
}

答案 3 :(得分:2)

为了好玩,我决定使用名为stateless的非常轻量级的FSM库来解决这个问题。

如果您要使用此库,代码将如何显示。

就像Eric的解决方案一样,下面的代码可以轻松更改以满足新的要求。

void Main()
{
    Console.WriteLine(string.Join("", GetCharacters(@"1""2'3""4""5'6""7'8""9""A'B")));  
}

public enum CharacterType
{
    Other,
    SingleQuote,
    DoubleQuote
}

public enum State
{
    OutsideQuote,
    InsideSingleQuote,
    InsideDoubleQuote
}

public IEnumerable<char> GetCharacters(string input)
{
    //Initial state of the machine is OutSideQuote.
    var sm = new StateMachine<State, CharacterType>(State.OutsideQuote);

    //Below, we configure state transitions.
    //For a given state, we configure how CharacterType 
    //transitions state machine to a new state.

    sm.Configure(State.OutsideQuote)
        .Ignore(CharacterType.Other)        
        //If you are outside quote and you receive a double quote, 
        //state transitions to InsideDoubleQuote.
        .Permit(CharacterType.DoubleQuote, State.InsideDoubleQuote)
        //If you are outside quote and you receive a single quote,
        //state transitions to InsideSingleQuote.
        //Same logic applies for other state transitions below.
        .Permit(CharacterType.SingleQuote, State.InsideSingleQuote);

    sm.Configure(State.InsideDoubleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.SingleQuote)
        .Permit(CharacterType.DoubleQuote, State.OutsideQuote);

    sm.Configure(State.InsideSingleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.DoubleQuote)
        .Permit(CharacterType.SingleQuote, State.OutsideQuote);

    foreach (var character in input)
    {
        var characterType = GetCharacterType(character);
        sm.Fire(characterType);
        if(sm.IsInState(State.InsideDoubleQuote) && characterType != CharacterType.DoubleQuote)
            yield return character;
    }       

}

public CharacterType GetCharacterType(char input)
{
    switch (input)
    {
        case '\'': return CharacterType.SingleQuote;
        case '\"': return CharacterType.DoubleQuote;
        default: return CharacterType.Other;
    }
}