以字母顺序排序,小写字母先于大写字母?

时间:2018-07-06 03:44:06

标签: powershell sorting output

我从古腾堡计划的“威廉·莎士比亚的威廉·莎士比亚全集”开始,这是一个可从http://www.gutenberg.org/ebooks/100获得的UTF-8文本文件。在PowerShell中,我运行了

Get-Content -Tail 50 $filename | Sort-Object -CaseSensitive

-我认为-将文件的最后50行(即,由换行符分隔的字符串)通过管道传输到Sort-Object,该文件被配置为按字母顺序以小写字母开头的字符串排序,然后以大写字母开头的字符串排序

为什么下图(尤其是P中)的输出未根据-CaseSensitive开关进行排序?什么是解决方案?

Link to Sort-Output Picture

3 个答案:

答案 0 :(得分:2)

注意:此答案侧重于排序整个字符串的一般情况(按字符的 all 排序,而不仅仅是 一)。

您正在寻找普通排序,其中字符通过 Unicode代码点数字进行排序(( “ ASCII值”),因此 所有大写字母作为一个组,在所有小写字母之前排序。

从Windows PowerShell v5.1 / PowerShell Core v6.1.0开始, Sort-Object 始终使用 word 排序(使用默认情况下为不变区域性,但是可以使用-Culture参数进行更改),其中区分大小写的排序只是表示小写形式给定字母直接位于 大写形式之前,而不是所有字母共同 ;例如,bB之前排序,但它们都在aA之后(此外,逻辑与序数情况相反,后者为大写字母(第一个字母):

PS> 'B', 'b', 'A', 'a' | Sort-Object -CaseSensitive
a
A
b
B

因此,用于常规排序,您当前必须直接直接使用.NET框架(但请注意,增强Sort-Object {{ 3}}):

# Get the last 50 lines as a list.
[Collections.Generic.List[string]] $lines = Get-Content -Tail 50 $filename

# Sort the list in place, using ordinal sorting
$lines.Sort([StringComparer]::Ordinal)

# Output the result.
# Note that uppercase letters come first.
$lines

is being considered返回实现[StringComparer]::Ordinal接口的对象。

可以在管道中使用此解决方案,但这很简单(将行数组发送给单个项):

(, (Get-Content -Tail 50 $filename)) | ForEach-Object { 
  ($lines = [Collections.Generic.List[string]] $_).Sort([StringComparer]::Ordinal)
  $lines # output the sorted lines 
}

注意:如前所述,此大写字母排序为第一


要先 对所有小写字母进行排序,则需要通过[System.Collections.IComparer]委托来实施自定义排序, PowerShell可以作为脚本块({ ... })来实现,该脚本块接受两个输入字符串并返回它们的排序等级(-1(或任何负值),表示小于,{ {em}等于的{1},大于0(或任何正值):

1

注意:对于 English 文本,上述内容应该可以正常工作,但是为了支持所有可能包含代理代码单元对和不同规范化形式(组成与分解的重音符号)的Unicode文本, ,还需要做更多的工作。

答案 1 :(得分:1)

获得所需结果的一种方法是获取每个字符串的第一个字符并将其转换为Int,这将为您提供该字符的ASCII码,然后您可以将其按数字排序到所需的字符中订购。

Get-Content -Tail 50 $filename | Sort-Object -Property @{E={[int]$_[0]};Ascending=$true} 

我们可以使用-property的{​​{1}}参数创建一个表达式,使用sort-object强制转换为int,然后使用[int]捕获第一个字符接受管道中的当前字符串/行,然后$_接受该字符串中的第一个字符,并按升序对它进行排序。

这提供了以下输出。

您可能希望修剪输出中的空格,但是,我将由您自己决定。

[0]

更新

要首先对小写字母进行排序,并修剪空白行。本质上,我只是将ascii数乘以任意数量,以使它在数值上高于小写字母。

在示例文本中,没有任何行以特殊字符或标点符号开头,可能需要对其进行修改以正确处理这些情况。

 

















    DONATIONS or determine the status of compliance for any particular state
    Foundation, how to help produce our new eBooks, and how to subscribe to
    Gutenberg-tm eBooks with only a loose network of volunteer support.
    International donations are gratefully accepted, but we cannot make any
    Most people start at our Web site which has the main PG search facility:
    Project Gutenberg-tm eBooks are often created from several printed
    Please check the Project Gutenberg Web pages for current donation
    Professor Michael S. Hart was the originator of the Project Gutenberg-tm
    Section 5. General Information About Project Gutenberg-tm electronic
    This Web site includes information about Project Gutenberg-tm, including
    While we cannot and do not solicit contributions from states where we
    against accepting unsolicited donations from donors in such states who
    approach us with offers to donate.
    concept of a library of electronic works that could be freely shared
    considerable effort, much paperwork and many fees to meet and keep up
    editions, all of which are confirmed as not protected by copyright in
    have not met the solicitation requirements, we know of no prohibition
    how to make donations to the Project Gutenberg Literary Archive
    including checks, online payments and credit card donations. To donate,
    methods and addresses. Donations are accepted in a number of other ways
    necessarily keep eBooks in compliance with any particular paper edition.
    our email newsletter to hear about new eBooks.
    please visit: www.gutenberg.org/donate
    statements concerning tax treatment of donations received from outside
    the United States. U.S. laws alone swamp our small staff.
    the U.S. unless a copyright notice is included. Thus, we do not
    visit www.gutenberg.org/donate
    with anyone. For forty years, he produced and distributed Project
    www.gutenberg.org
    we have not received written confirmation of compliance. To SEND
    with these requirements. We do not solicit donations in locations where
    works.

这将输出:

Get-Content -Tail 50 $filename | ? { -not [string]::IsNullOrEmpty($_) } | Sort-Object -Property {
    if($_[0] -cmatch "[A-Z]")
    {
        5*[int]$_[0]
    }
    else
    {
        [int]$_[0]
    } 
}

答案 2 :(得分:0)

比较Jacob和mklement0的响应,Jacob解决方案的优点是视觉上简单,直观,使用管道,并且可以扩展到按第一个单词的第二个字符或第二个单词的第一个字符进行排序等。mklement0的解决方案具有更快的优点,并为我提供了有关小写然后大写排序的想法。

下面,我想分享我对Jacob解决方案的扩展,该解决方案按第二个单词的第一个字符排序。对于莎士比亚全集而言不是特别有用,但是对于以逗号分隔的表格却非常有用。

Function Replace-Nulls($line) {

 $dump_var = @(
      if ( !($line) ) {
           $line = [char]0 + " " + [char]0 + " [THIS WAS A LINE OF NULL WHITESPACE]"
      } # End if
      if ( !(($line.split())[1]) ) {
           $line += " " + [char]8 + " [THIS WAS A LINE WITH ONE WORD AND THE REST NULL WHITESPACE]"
      } # End if
 ) # End definition of dump_var

 return $line

} # End Replace-Nulls

echo "."
$cleaned_output = Get-Content -Tail 20 $filename | ForEach-Object{ Replace-Nulls($_) }
$cleaned_output | Sort-Object -Property {[int]((($_).split())[1])[0]}