从数组\ list

时间:2017-05-21 10:59:48

标签: c# regex linq

我试图从一大堆非常长的字符串中提取主要单词,以简化显示它...

假设我们有一个字符串数组输出:

Something One
Something [ABC] Two
Something [ABC] Three
Something Four Section 1
Something Four Section 2
Something Five

如何移除non-constant重复字词,例如Something[ABC],以便它只留下每个字符串的唯一标识符,例如One Two {{ 1}}并输出此列表:

Three

知道:

  • 副本是;在列表中重复多次的任何单词

  • {" One"," Two"," Three",..}如上所述,不是常数,只是为了示例并且可以改为其他任何东西,例如{" Alpha" " Bravo"," Charlie"}或{" Nu"," Xi"," Pi"}不要重复。

  • 如果存在某个单词(在这种情况下)"第1节",则保留之前的单词以便" Something Four Section 1"将成为"四部分1"

2 个答案:

答案 0 :(得分:1)

除了Section 1"之类的某些单词之外,此解决方案假定您一无所知(就像John Snow一样)。它适用于任意字符串输入。它有两个要点。

1)FindRepeatedWords是一个填充UniqueWords hashset和Repeats hashset的方法。 UniqueWords,顾名思义就是列表中的每个单词,重复是重复的单词。

2)CleanUpWordsAndDoNotChangeList是做你想要的主要方法。它决定删除基于某些单词的单词。

namespace StackOverfFLow {

    using System;
    using System.Collections.Generic;
    using System.Linq;

    internal class Program {
        private static readonly HashSet<string> UniqueWords = new HashSet<string>();
        private static readonly HashSet<string> Repeats = new HashSet<string>();
        private static readonly List<string> CertainWords = new List<string> { "Section 1", "Section 2" };
        private static readonly List<string> Words = new List<string> { "Something One", "Something [ABC] Two", "Something [ABC] Three", "Something Four Section 1", "Something Four Section 2", "Something Five" };

        private static void Main(string[] args) {
            FindRepeatedWords();
            var result = CleanUpWordsAndDoNotChangeList();
            result.ForEach(Console.WriteLine);
            Console.ReadKey();
        }

        /// <summary>
        /// Cleans Up Words And Des oNot Change List.
        /// </summary>
        /// <returns></returns>
        private static List<string> CleanUpWordsAndDoNotChangeList() {
            var newList = new List<string>();
            foreach(var t in Words) {
                var sp = SeperateStringByString(t);
                for(var index = 0; index < sp.Count; index++) {
                    if(Repeats.Contains(sp[index]) != true) { continue; }
                    var fixedTocheck = sp.ElementAtOrDefault(index + 1);
                    if(fixedTocheck == null || CertainWords.Contains(fixedTocheck)) { continue; }
                    sp.RemoveAt(index);
                    index = index - 1;
                }
                newList.Add(string.Join(" ", sp));
            }
            return newList;
        }

        /// <summary>
        /// Finds Unique and Repeated Words.
        /// </summary>
        private static void FindRepeatedWords() {
            foreach(var eachWord in Words) {
                foreach(var element in SeperateStringByString(eachWord)) {
                    if(UniqueWords.Add(element) == false) { Repeats.Add(element); };
                }
            }
        }

        /// <summary>
        /// Seperates a string by another string
        /// </summary>
        /// <param name="source">Source string</param>
        /// <returns></returns>
        private static List<string> SeperateStringByString(string source) {
            var seperatedStringByString = new List<string>();
            foreach(var certainWord in CertainWords) {
                var indexOf = source.IndexOf(certainWord);
                if(indexOf <= -1) { continue; }
                var a = source.Substring(0, indexOf).Trim().Split(' ');
                seperatedStringByString.AddRange(a);
                seperatedStringByString.Add(certainWord);
            }
            if(seperatedStringByString.Count < 1) { seperatedStringByString.AddRange(source.Split(' ')); }
            return seperatedStringByString;
        }
    }
}

答案 1 :(得分:0)

我不确定这是你想要的,但我会通过我的代码。

快速代码:

        string itemName = "";
        List<string> destinationArray = new List<string>();

        List<string> inputArrayList = new List<string>();
        inputArrayList.Add("Something One");
        inputArrayList.Add("Something [ABC] Two");
        inputArrayList.Add("Something [ABC] Three");
        inputArrayList.Add("Something Four Section 1");
        inputArrayList.Add("Something Four Section 2");
        inputArrayList.Add("Something Five");
        inputArrayList.Add("Other Text");

        List<string> allWordList = new List<string>();

        foreach (var item in inputArrayList)
        {
            allWordList.AddRange(item.Split(' ').ToList());
        }

        List<string> searchingArrayList = new List<string>();
        searchingArrayList = allWordList.GroupBy(x => x)
                    .Where(group => group.Count() > 1)
                    .Select(group => group.Key).ToList();

        foreach (var itemInput in inputArrayList)
        {
            itemName = itemInput;
            foreach (var itemSearching in searchingArrayList)
            {
                itemName = itemName.Replace(itemSearching, "");
            }
            destinationArray.Add(itemName);
        }

        destinationArray.ToList().ForEach(x => Console.WriteLine(x));
        Console.ReadKey();