查找文本文件中最长的单词

时间:2012-01-22 16:11:44

标签: linux bash unix

我正在尝试制作一个简单的脚本,使用bash在文本文件中查找最大的单词及其数量/长度。我知道当我使用awk它简单直接但我想尝试使用这种方法...我想我知道是否a=wmememememe如果我想找到我可以使用echo {#a}的长度我想echo ${a}。但我想将其应用于以下

for i in `cat so.txt` do

如果so.txt包含单词,我希望它有意义。

8 个答案:

答案 0 :(得分:21)

打一个班轮。

cat YOUR_FILENAME | sed 's/ /\n/g' | sort | uniq | awk '{print length, $0}' | sort -nr | head
  1. 打印文件(通过猫)
  2. 分开单词(通过sed)
  3. 删除重复项(通过sort | uniq)
  4. 为每个单词添加长度(awk)前缀
  5. 按字长
  6. 对列表进行排序
  7. 打印最长的单词。
  8. 是的,这将比上面的一些解决方案慢,但它也不需要记住bash for循环的语义。

答案 1 :(得分:12)

通常情况下,您需要使用while read循环而不是for i in $(cat),但是因为您希望拆分所有单词,所以在这种情况下它可以正常运行。

#!/bin/bash
longest=0
for word in $(<so.txt)
do
    len=${#word}
    if (( len > longest ))
    then
        longest=$len
        longword=$word
    fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"

答案 2 :(得分:5)

longest=""
for word in $(cat so.txt); do
    if [ ${#word} -gt ${#longest} ]; then
        longest=$word
    fi
done

echo $longest

答案 3 :(得分:4)

另一种解决方案:

for item in  $(cat "$infile"); do
  length[${#item}]=$item          # use word length as index
done
maxword=${length[@]: -1}          # select last array element

printf  "longest word '%s', length %d" ${maxword} ${#maxword}

答案 4 :(得分:3)

awk脚本:

#!/usr/bin/awk -f

# Initialize two variables
BEGIN {
  maxlength=0;
  maxword=0
} 

# Loop through each word on the line
{
  for(i=1;i<=NF;i++) 

  # Assign the maxlength variable if length of word found is greater. Also, assign
  # the word to maxword variable.
  if (length($i)>maxlength) 
  {
    maxlength=length($i); 
    maxword=$i;
  }
}

# Print out the maxword and the maxlength  
END {
  print maxword,maxlength;
}

TEXTFILE:

[jaypal:~/Temp] cat textfile 
AWK utility is a data_extraction and reporting tool that uses a data-driven scripting language 
consisting of a set of actions to be taken against textual data (either in files or data streams) 
for the purpose of producing formatted reports. 
The language used by awk extensively uses the string datatype, 
associative arrays (that is, arrays indexed by key strings), and regular expressions.

测试:

[jaypal:~/Temp] ./script.awk textfile 
data_extraction 15

答案 5 :(得分:1)

for i in $(cat so.txt); do echo ${#i}; done | paste - so.txt | sort -n | tail -1

答案 6 :(得分:0)

'jimis' xargs-based answer的修改后的 POSIX shell版本;仍然很慢,需要两到三分钟:

tr "'" '_'  < /usr/share/dict/words |
xargs -P$(nproc) -n1 -i sh -c 'set -- {} ; echo ${#1} "$1"' | 
sort -n | tail | tr '_' "'"

请注意开头和结尾的tr位可以用单引号解决 GNU xargs的困难。

答案 7 :(得分:-1)

由于数量众多的叉子而变慢,但纯壳,不需要awk或特殊的bash功能:

$ cat /usr/share/dict/words | \
    xargs -n1 -i sh -c 'echo `echo -n {} | wc -c` {}' | sort -n | tail
23 Pseudolamellibranchiata
23 pseudolamellibranchiate
23 scientificogeographical
23 thymolsulphonephthalein
23 transubstantiationalist
24 formaldehydesulphoxylate
24 pathologicopsychological
24 scientificophilosophical
24 tetraiodophenolphthalein
24 thyroparathyroidectomize

您可以轻松并行化,例如通过向xargs提供-P4来获得4个CPU。

相关问题