用特殊字符编写grep表达式?

时间:2019-04-09 19:44:43

标签: bash grep

我一直在尝试编写一个Grep表达式,该表达式将遍历目录中的所有文本文件,并且仅返回包含我要查找的所有模式的文件。输入文件示例如下:

A   29  LIJ uniteresting_numbers    uniteresting_numbers    uniteresting_numbers
A   30  RTX uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B
A   31  BRN uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B
A   32  SJY uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B
A   33  MRT uniteresting_numbers    uniteresting_numbers    uniteresting_numbers
A   34  MUY uniteresting_numbers    uniteresting_numbers    uniteresting_numbers
A   35  OOP uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    

我希望能够搜索目录中的所有.txt文件,并仅返回包含以下全部的文件:

A   30  RTX uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B
A   31  BRN uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B
A   32  SJY uniteresting_numbers    uniteresting_numbers    uniteresting_numbers    <=B

如果这三个都不存在,我希望跳过该文件。我会知道每种情况下我要寻找的两位数字和三个字母代码。我想输入那些作为变量供用户输入。我要查找的是文件,其中所有我感兴趣的两位数字和三个字母代码的末尾都有<= B。

Here is the code I have thus far:

echo What do you want to name your output file? 
read myoutput
for file in *.txt; do
    if  grep -q "RTX$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]" <"$file"; then 
        if grep -q "BRN$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]" <"$file"" <"$file"; then
            if grep -q "SJY$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]" <"$file"" <"$file"; then
                echo "$file" >>"$myoutput".txt

    else
        echo not found
    fi
    fi 
    fi
done

注意,我没有添加用户输入三个字母代码和两个数字的部分。这不应该太糟糕。在输入数据中,有一个制表符分隔每个列。现在,我可以一路搜索到最终标签和<= B。

我没有运气尝试过这个

echo What do you want to name your output file? 
read myoutput
for file in *.txt; do
    if  grep -q "RTX$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')$(printf '<=B')" <"$file"; then 
        if grep -q "BRN$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')$(printf '<=B')" <"$file"" <"$file"; then
            if grep -q "SJY$(printf '\t')*[0-9]$(printf '\t')*[0-9]$(printf '\t')*[0-9]*$(printf '\t')$(printf '<=B')*" <"$file"" <"$file"; then
                echo "$file" >>"$myoutput".txt

    else
        echo not found
    fi
    fi 
    fi
done

任何帮助将不胜感激。在某些情况下,我要查找的行将超过三行。有没有一种简单的方法可以修改它以查找n个<= B行? 非常感谢大家!

编辑: 我按照建议搬到了awk

为此,我输入以下内容:

#!/bin/bash
echo What do you want to name your output file? 
read myoutput
for file in *.txt; do
    if awk '/30/ && /RTX/ && /B/' "$file"; then
        echo it worked
    fi
done

短语“成功”出现了6次。我正在测试此脚本的迷你目录中有6个文件。这些文件中只有3个实际上与awk模式匹配。如何在“然后”之后获取仅对包含awk模式的文件执行的代码?我根据此处的教程尝试了以下方法:https://www.thegeekstuff.com/2010/02/awk-conditional-statements

#!/bin/bash
echo What do you want to name your output file? 
read myoutput
for file in *.txt; do
    $ awk '{
    if ($2 =="30" || $3 == "RTX" || $7 == "B")
        echo it worked
}' "$file"
done

我没有成功。感谢您的指导!

1 个答案:

答案 0 :(得分:1)

尽管可能与您的方法不同,请尝试以下操作:

myoutput="myoutput.txt"
for f in *.txt; do
    awk -v output="$myoutput" -v numbers="30 31 32" -v strings="RTX BRN SJY" '
    BEGIN {
        split(numbers, num)
        split(strings, str)
        delete matched
    }
    {
        for (n in num) {
            if (match($0, "^A\t" num[n] "\t" str[n] "\t[0-9]+\t[0-9]+\t[0-9]+\t<=B$")) {
                matched[n]++
            }
        }
    }
    END {
        for (n in num) {
            if (!matched[n]) {
                exit
            }
        }
        print FILENAME >> output
    } ' "$f"
done

您可以将shell变量numbersstrings分配给用户想要的任意长度的变量。