从字符串中提取颜色和大小

时间:2015-08-12 11:20:00

标签: c#

鉴于以下产品名称。我的任务是提取所有颜色和尺寸。

示例:Nike Relay Women's Running Capris - **Black**, **L/XS**

Color = Black
Size = [XS,L]

最好的方法是什么?我想的是拥有dictionary所有颜色和大小,然后只是做一场比赛。

但必须有更好的方法和更可维护的方式。 我看到的最大问题是有很多不同的组合

  1. Nautica S 蓝色骨编织睡衣裤
  2. Nike Relay女子跑步裤 - 黑色 XS
  3. Nautica男装J级睡衣裤 - NAVY
  4. Nautica J级梭织睡衣裤 L 海军海军
  5. Nike Legend Tank - 女装 - 黑/黑
  6. Nike 3PK DF Cushion No Show Tab袜子 - 女装 - 黑/白/黑
  7. Stance Casual Socks - 男士Mahalo, L / XL
  8. Nautica抗皱礼服裤 30x30 灰色
  9. Nautica抗皱紧身连衣裤 36x30 黑色
  10. Nautica抗皱紧身连衣裙 33x32 黑色
  11. RVCA VA翻盖修身T恤 - 短袖 - 男士 Bluestone ,  的
  12. RVCA VA翻盖修身T恤 - 短袖 - 男士 Bluestone ,  的中号
  13. RVCA VA翻盖修身T恤 - 短袖 - 男士 Bluestone ,  取值

2 个答案:

答案 0 :(得分:3)

这是时间,但服务于目的,整个想法是你必须有一个List / Collection可用的colorssizes,然后迭代它们一个一个检查

enum ColorBase {
    [Description("Blue")] //by using System.ComponentModel;  
    Blue,
    [Description("White")]
    White,
    [Description("Grey")]
    Grey,
    [Description("Magenta")]
    Magenta,
    [Description("Pale")]
    Pale,
    [Description("MaryTime Navy")]
    MaryTimeNavy,
    [Description("Navy")]
    Navy,
    [Description("Bluestone")]
    Bluestone,
}

enum SizeBase
{
    [Description("XL")]
    XL,
    [Description("XXL")]
    XXL,
    [Description("L")]
    L,
    [Description("M")]
    M,
    [Description("S")]
    S,
    [Description("XS")]
    XS,
    [Description("3X30")]
    S30X30,
    [Description("36X30")]
    S36X30,
    [Description("33X32")]
    S33X32
}

使用System.Reflection的辅助方法,它会返回上面声明的Description enum

 public static string GetEnumDescription(Enum value)
    {
        FieldInfo fi = value.GetType().GetField(value.ToString());

        DescriptionAttribute[] attributes =
            (DescriptionAttribute[])fi.GetCustomAttributes(
            typeof(DescriptionAttribute),
            false);

        if (attributes != null &&
            attributes.Length > 0)
            return attributes[0].Description;
        else
            return value.ToString();
    }

以下是对它们的访问: -

 static void Main(string[] args)
    {
      List<string> availableColorsAndSizes = new List<string>();

        string item = string.Empty;
        StringBuilder mediator = new StringBuilder();

        List<string> capries = new List<string>{"Nautica S Blue Bone Woven Pajama Pants",
                                                "Nike Relay Women's Running Capris - Black, XS",
                                                "Nautica Mens J-Class Pajama Pants-Small, NAVY",
                                                "Nautica J-Class Woven Pajama Pant L, Maritime Navy",
                                                "Nike Legend Tank - Womens - Black/Black",
                                                "Nike 3PK DF Cushion No Show Tab Socks - Womens - Black/White/Black",
                                                "Stance Casual Socks - Men's Mahalo, L/XL",
                                                "Nautica Wrinkle Resistant Dress Pant 30x30, Grey",
                                                "Nautica Wrinkle Resistant Dress Pant 36x30, Black",
                                                "Nautica Wrinkle Resistant Dress Pant 33x32, Black",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, L",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, M",
                                                "RVCA VA Flipped Box Slim T-Shirt - Short-Sleeve - Men's Bluestone, S",
                                                };

        foreach (var caprie in capries)
        {
            string[] words = caprie.Split(); //added this for WORD level precison
            foreach (ColorBase colorBase in Enum.GetValues(typeof(ColorBase)))
            {
                item = Program.GetEnumDescription(colorBase);
                if (caprie.Contains(item))
                    if (!mediator.ToString().Contains(item + ":"))//just to confirm that it's not being added to the same twice
                        mediator.Append(item + ":");
            }
            foreach (SizeBase sizeBase in Enum.GetValues(typeof(SizeBase)))
            {
                item = Program.GetEnumDescription(sizeBase);
                if (caprie.Contains(item))
                    if (!mediator.ToString().Contains(item + ":"))
                        mediator.Append(item);
            }
            mediator.Append("|"); //identifies a pair of 'Color' and 'Size'
        }

        Console.WriteLine("Availabe Parameters");

        string[] colorsAndSizes = mediator.ToString().Split('|');

        foreach (var clrSiz in colorsAndSizes)
        {
            Console.Write("Color : {0}", clrSiz.Split(':')[0]);
            if(clrSiz.Split(':').Length > 1)
                Console.Write(" ,Size : {0}", clrSiz.Split(':')[1]);
            Console.WriteLine();
        }
 }

答案 1 :(得分:2)

我会做一个分层的正则表达式构建。我已经创建了这样一个效果很好的系统,尽管它用于日志解析。

//basic definitions:
String colorsRegex = "(?black|red|blue|orange|navy|cyan|white)";
String sizesRegex = "(?small|large|medium)";
String sizesShortRegex = "(?s|m|l|xl|xxl|xxxl)";

// some more complex definitions
// always start the array with the most complex regex, so that as much is captured as possible ("blue-green" instead of just "blue")
String[] colorFinders = {"("+colorsRegex+"[/- ]+)+", colorsRegex};
String[] sizesFinders = {"("+sizesRegex+"[/- ]+)+", "("+sizesShortRegex+"[/- ]+){2,}", sizesRegex};

// match the string for each complex definition

对于此系统未匹配(或正确匹配)的每一行,构建一个专用的“查找程序​​”。重复,直到匹配所有数据。

注意无效的交叉匹配。在测试和生产环境中记录不匹配的行。记得要注意部分匹配并排除可能会混淆你的算法的字符串的任何部分(想象一个名为“蓝月亮”的公司,它总是会匹配)。

相关问题