对带有连字符的字母数字数据进行排序

时间:2014-05-22 12:14:28

标签: excel vba loops sorting

我在工作簿中有两张表,每张表都有自己的电子邮件地址列,以及其他数据。我将引用Sheet1中的Column1和Sheet2中的Column2,其中只有Column1可能列出了重复的电子邮件地址。

我需要确定Column1中的电子邮件地址是否在Column2中找到,每次都是如此,必须运行某些代码。

我用两个嵌套的Do While循环解决了这个问题,其中外部循环遍历Column1中的每个单元格,名为Cell1,从上到下,内部循环将Cell1与Column2中的每个单元格进行比较,名为Cell2,也来自顶部到底部,如果找到相同的值,则提前退出内部循环。

为了提高效率,我想按升序对每一列进行排序,让每个Cell1只查看Column2,直到Cell2中字符串的值大于Cell1中字符串的值,以及下一个Cell1被迭代它将从最后一个循环停止的Cell2继续,因为早期的Cell2值都小于Cell1并且不能具有相等的值。

我提出的代码是一个外部循环,它通过Column1中的每个单元格,以及一个内部循环,如下所示:

'x1 is the row number of Cell1
'x2 is the row number of Cell2
'below is the code for the internal loop looking through Column2

Do While Sheets(2).Cells(x2, 1).Value <> 0
    If LCase(Sheets(1).Cells(x1, 1).Value) < LCase(Sheets(2).Cells(x2, 1).Value) Then
        Exit Do
    ElseIf LCase(Sheets(1).Cells(x1, 1).Value) = LCase(Sheets(2).Cells(x2, 1).Value) Then

        '... code is run

        Exit Do
    End If
    x2 = x2 + 1
Loop

问题是电子邮件地址可以有连字符( - )和撇号(')。虽然Excel在对列进行排序时会忽略它们,但在比较字母数字值时,VBA不会忽略它们。

如果我有:

     A           B
1  Noemi      Noemi
2  no-reply   no-reply
3  notify     notify

代码会将A1与B1进行比较并查看A1=B1,然后使用B1查看A2并查看A2<B1,然后跳至A3。

我的第一个问题是,我可以强制Excel对包括连字符和撇号的字母数字文本进行排序吗?

如果没有,到目前为止,我只考虑了一个解决方法,通过查看Cell1和Cell2是否有 - 或'在其中,如果其中任何一个为TRUE,然后使用新变量从Cell1和Cell2中提取文本连字符和撇号,并继续在内部循环中使用这些新值。

我的第二个问题是,我如何以更有效的方式解决这个问题?

修改

Microsoft认识到Excel在排序时会忽略短划线和撇号:

http://office.microsoft.com/en-001/excel-help/default-sort-orders-HP005199669.aspx http://support.microsoft.com/kb/322067

2 个答案:

答案 0 :(得分:2)

如果我昨天被问过,我会同意大卫对Excel排序的预期结果的看法。然而,经过实验,我被迫同意Dirk。这一点很重要:

  

忽略撇号(')和连字符( - ),但有一个例外:如果除连字符外两个文本字符串相同,则带连字符的文本最后排序。 {{3 }}

Sort examples

A列包含我用于测试Dirk声明的未排序值。

B列已经过定期的Excel排序。如您所见,该列不是ASCII / Unicode序列,因为“单引号”应该在“连字符”应该出现在“字母a”之前。

Excel使用波浪号(〜)作为查找的转义字符,所以我想知道它是否会对Sort执行相同操作。 AdjustedSort1将“tilde single quote”替换为“single quote”,将“hyphen”替换为“tilde hyphen”,对“single quote”和“hyphen”进行排序然后恢复。结果显示在C列中。序列更好但不是ASCII / Unicode,因为“aa-b”出现在“aa'c”之前。

D列使用了我多年前写过的VBA Shell Sort例程。如果您的列表非常大,您可能最好在网上搜索“VBA快速排序”,但我的排序应该为合理大小的列表提供可接受的性能。

Sub AdjustedSort1()

  With Worksheets("Sheet2").Columns("C")

    .Replace What:="'", Replacement:="~'", LookAt:=xlPart, _
             SearchOrder:=xlByRows, MatchCase:=False, _
             SearchFormat:=False, ReplaceFormat:=False
    .Replace What:="-", Replacement:="~-", LookAt:=xlPart, _
             SearchOrder:=xlByRows, MatchCase:=False, _
             SearchFormat:=False, ReplaceFormat:=False
    .Sort Key1:=Range("C2"), Order1:=xlAscending, Header:=xlYes, _
          OrderCustom:=1, MatchCase:=False, _
          Orientation:=xlTopToBottom, DataOption1:=xlSortNormal
    .Replace What:="~~-", Replacement:="-", LookAt:=xlPart, _
          SearchOrder:=xlByRows, MatchCase:=False, _
          SearchFormat:=False, ReplaceFormat:=False
    .Replace What:="~~'", Replacement:="'", LookAt:=xlPart, _
          SearchOrder:=xlByRows, MatchCase:=False, _
          SearchFormat:=False, ReplaceFormat:=False

  End With

End Sub
Sub AdjustedSort2()

  Dim Inx As Long
  Dim RngValue As Variant
  Dim RowLast As Long
  Dim ColValue() As String

  With Worksheets("Sheet2")

    RowLast = .Cells(Rows.Count, "D").End(xlUp).Row

    ' Load values from column D excluding header
    RngValue = .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value

    ' Copy values from 2D array to 1D array
    ReDim ColValue(LBound(RngValue, 1) To UBound(RngValue, 1))
    For Inx = LBound(RngValue, 1) To UBound(RngValue, 1)
      ColValue(Inx) = RngValue(Inx, 1)
    Next

    ' Sort array
    Call ShellSort(ColValue, UBound(ColValue))

    ' Copy values back to 2D array
    For Inx = LBound(ColValue) To UBound(ColValue)
      RngValue(Inx, 1) = ColValue(Inx)
    Next

    ' Copy values back to column D
    .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value = RngValue

  End With

End Sub
Public Sub ShellSort(arrstgTgt() As String, inxLastToSort As Integer)

  ' Coded 2 March 07
  ' Algorithm and text from Algorithms (Second edition) by Robert Sedgewick

  '   The most basic sort is the insertion sort in which adjacent elements are compared
  ' and swapped as necessary.  This can be very slow if the smallest elements are at
  ' end.  ShellSort is a simple extension which gains speed by allowing exchange of
  ' elements that are far apart.
  '   The idea is to rearrange the file to give it the property that taking every h-th
  ' element (starting anywhere) yields a sorted file.  Such a file is said to be
  ' h-sorted.  Put another way, an h-sorted file is h independent sorted files,
  ' interleaved together.  By h-sorting for large value of H, we can move elements
  ' in the array long distances and thus make it easier to h-sort for smaller values of
  ' h.  Using such a procedure for any sequence of values of h which ends in 1 will
  ' produce a sorted file.
  '   This program uses the increment sequence: ..., 1093, 364, 121, 40, 13, 4, 1.  This
  ' is known to be a good sequence but cannot be proved to be the best.
  '   The code looks faulty but it is not.  The inner loop compares an
  ' entry with the previous in the sequence and if necessary moves it back down the
  ' sequence to its correct position.  It does not continue with the rest of the sequence
  ' giving the impression it only partially sorts a sequence.  However, the code is not
  ' sorting one sequence then the next and so on.  It examines the entries in element
  ' number order.  Having compared an entry against the previous in its sequence, it will
  ' be intH loops before the next entry in the sequence in compared against it.

  Dim intNumRowsToSort          As Integer
  Dim intLBoundAdjust           As Integer
  Dim intH                      As Integer
  Dim inxRowA                   As Integer
  Dim inxRowB                   As Integer
  Dim inxRowC                   As Integer
  Dim stgTemp                   As String

  intNumRowsToSort = inxLastToSort - LBound(arrstgTgt) + 1
  intLBoundAdjust = LBound(arrstgTgt) - 1

  ' Set intH to 1, 4, 13, 40, 121, ..., 3n+1, ... until intH > intNumRowsToSort
  intH = 1
  Do While intH <= intNumRowsToSort
    intH = 3 * intH + 1
  Loop

  Do While True
    If intH = 1 Then Exit Do
    ' The minimum value on entry to this do-loop will be 4 so there is at least
    ' one repeat of the loop.
    intH = intH \ 3
    For inxRowA = intH + 1 To intNumRowsToSort
      stgTemp = arrstgTgt(inxRowA + intLBoundAdjust)
      inxRowB = inxRowA
      Do While True
        ' The value of element inxRowA has been saved.  Now move the element intH back
        ' from row inxRowA into this row if it is smaller than the saved value.  Repeat
        ' this for earlier elements until one is found that is larger than the saved
        ' value which is placed in the gap.
        inxRowC = inxRowB - intH
        If arrstgTgt(inxRowC + intLBoundAdjust) <= stgTemp Then Exit Do
        arrstgTgt(inxRowB + intLBoundAdjust) = arrstgTgt(inxRowC + intLBoundAdjust)
        inxRowB = inxRowC
        If inxRowB <= intH Then Exit Do
      Loop
      arrstgTgt(inxRowB + intLBoundAdjust) = stgTemp
    Next
  Loop

End Sub

答案 1 :(得分:0)

替换所有&#34; - &#34;用&#34; ^&#34;然后Excel不会忽略&#34; ^&#34;按照&#34; - &#34;。

的排序

然后可以替换&#34; ^&#34;回到&#34; - &#34;如果你愿意的话。