在Bash中提取子字符串

时间:2009-01-09 13:53:23

标签: string bash shell substring

给定someletters_12345_moreleters.ext形式的文件名,我想提取5位数字并将它们放入变量中。

所以为了强调这一点,我有一个带有x个字符的文件名,然后是一个五位数序列,两边都是一个下划线,然后是另一组x个字符。我想取5位数字并将其放入变量中。

我对能够实现这一目标的不同方式感兴趣。

22 个答案:

答案 0 :(得分:944)

如果 x 是常量,则以下参数扩展执行子字符串提取:

b=${a:12:5}

其中 12 是偏移量(从零开始), 5 是长度

如果数字周围的下划线是输入中的唯一下划线,则可以分两步去除前缀和后缀(<分别):

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

如果有其他下划线,无论如何它可能是可行的,虽然更棘手。如果有人知道如何在单个表达式中执行两个扩展,我也想知道。

所提出的两种解决方案都是纯粹的bash,没有涉及过程产生,因此非常快。

答案 1 :(得分:593)

使用cut

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

更通用:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

答案 2 :(得分:87)

通用解决方案,其中数字可以是文件名中的任何位置,使用第一个这样的序列:

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

提取变量的一部分的另一种解决方案:

number=${filename:offset:length}

如果您的文件名始终采用stuff_digits_...格式,则可以使用awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

除了数字之外,还有另一种删除所有内容的解决方案,请使用

number=$(echo $filename | tr -cd '[[:digit:]]')

答案 3 :(得分:76)

尝试使用cut -c startIndx-stopIndx

答案 4 :(得分:31)

如果有人想要更严格的信息,你也可以像男人这样搜索

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

结果:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

答案 5 :(得分:18)

以jor的答案为基础(这对我不起作用):

substring=$(expr "$filename" : '.*_\([^_]*\)_.*')

答案 6 :(得分:18)

我很惊讶这个纯粹的bash解决方案没有出现:

a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

您可能希望将IFS重置为之前的值,或之后的unset IFS

答案 7 :(得分:12)

遵循要求

  

我的文件名是x个字符,然后是五位数   序列由两侧的单个下划线围绕,然后是另一个   一组x个字符。我想取5位数字和   把它放到变量中。

我找到了一些可能有用的grep方法:

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]+" 
12345

或更好

$ echo "someletters_12345_moreleters.ext" | grep -Eo "[[:digit:]]{5}" 
12345

然后使用-Po语法:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d+' 
12345

或者如果你想让它恰好适合5个字符:

$ echo "someletters_12345_moreleters.ext" | grep -Po '(?<=_)\d{5}' 
12345

最后,要将其存储在变量中,只需使用var=$(command)语法。

答案 8 :(得分:10)

没有任何子流程,您可以:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

这个的一个非常小的变体也适用于ksh93。

答案 9 :(得分:10)

如果我们专注于以下概念:     “一个(一个或几个)数字的运行”

我们可以使用几种外部工具来提取数字 我们可以很容易地删除所有其他字符,sed或tr:

name='someletters_12345_moreleters.ext'

echo $name | sed 's/[^0-9]*//g'    # 12345
echo $name | tr -c -d 0-9          # 12345

但如果$ name包含多个数字,则上述操作将失败:

如果“name = someletters_12345_moreleters_323_end.ext”,则:

echo $name | sed 's/[^0-9]*//g'    # 12345323
echo $name | tr -c -d 0-9          # 12345323

我们需要使用常规表达(正则表达式) 要在sed和perl中仅选择第一次运行(12345而不是323):

echo $name | sed 's/[^0-9]*\([0-9]\{1,\}\).*$/\1/'
perl -e 'my $name='$name';my ($num)=$name=~/(\d+)/;print "$num\n";'

但我们也可以直接在bash中 (1)

regex=[^0-9]*([0-9]{1,}).*$; \
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}

这允许我们提取任意长度的第一轮数字
被任何其他文字/字符包围。

注意regex=[^0-9]*([0-9]{5,5}).*$;仅匹配5位数字。 : - )

(1):比为每个短文本调用外部工具更快。不比在sed中执行所有处理或在大文件中执行awk快。

答案 10 :(得分:9)

这是一个前缀后缀解决方案(类似于JB和Darron提供的解决方案),它与第一个数字块匹配,并且不依赖于周围的下划线:

str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}"   # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}"            # strip off non-digit suffix from s1
echo "$s2"                           # 12345

答案 11 :(得分:8)

以下是我的表现方式:

FN=someletters_12345_moreleters.ext
[[ $FN =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

注意:上面是一个正则表达式,仅限于由下划线包围的五位数的特定场景。如果需要不同的匹配,请更改正则表达式。

答案 12 :(得分:6)

我喜欢sed处理正则表达式群体的能力:

> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

稍微更通用的选项是,假设您有一个标记数字序列开头的下划线_,因此例如剥离您之前获得的所有非数字你的序列:s/[^0-9]\+\([0-9]\+\).*/\1/p

> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

更多相关信息,如果您对regexp不太自信:

  • s适用于_s_ubstitute
  • [0-9]+匹配1+位数
  • \1链接到正则表达式输出的组n.1(组0是整个匹配,组1在这种情况下是括号内的匹配)
  • p标志用于_p_rinting

所有转义\都可以使sed的正则表达式处理工作。

答案 13 :(得分:5)

鉴于test.txt是一个包含&#34; ABCDEFGHIJKLMNOPQRSTUVWXYZ&#34;

的文件
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

答案 14 :(得分:3)

我的答案可以更好地控制你想要的字符串。以下是有关如何从字符串

中提取12345的代码
str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

如果您要提取包含abc等字符或_-等任何特殊字符的内容,效率会更高。例如:如果您的字符串是这样的,并且您想要在someletters_之后和_moreleters.ext之前的所有内容:

str="someletters_123-45-24a&13b-1_moreleters.ext"

使用我的代码,您可以提到您想要的内容。 说明:

#*它将删除前面的字符串,包括匹配的键。我们提到的关键是_ %它将删除包含匹配键的以下字符串。我们提到的关键是&#39; _more *&#39;

自己做一些实验,你会发现这很有趣。

答案 15 :(得分:2)

shell cut-从字符串打印特定范围的字符或给定的部分

#method1)使用bash

 str=2020-08-08T07:40:00.000Z
 echo ${str:11:8}

#method2)使用剪切

 str=2020-08-08T07:40:00.000Z
 cut -c12-19 <<< $str

#method3)使用awk时

 str=2020-08-08T07:40:00.000Z
 awk '{time=gensub(/.{11}(.{8}).*/,"\\1","g",$1); print time}' <<< $str

答案 16 :(得分:2)

好的,这里使用空字符串进行纯参数替换。警告是我将 someletters moreletters 定义为仅限字符。如果它们是字母数字,则不会按原样运行。

filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345

答案 17 :(得分:2)

类似于php中的substr('abcdefg',2-1,3):

echo 'abcdefg'|tail -c +2|head -c 3

答案 18 :(得分:1)

bash解决方案:

IFS="_" read -r x digs x <<<'someletters_12345_moreleters.ext'

这会破坏一个名为x的变量。 var x可以更改为var _

input='someletters_12345_moreleters.ext'
IFS="_" read -r _ digs _ <<<"$input"

答案 19 :(得分:1)

有点晚了,但我遇到了这个问题,发现了以下内容:

host:/tmp$ asd=someletters_12345_moreleters.ext 
host:/tmp$ echo `expr $asd : '.*_\(.*\)_'`
12345
host:/tmp$ 

我用它来获得日期没有%N的嵌入式系统的毫秒分辨率:

set `grep "now at" /proc/timer_list`
nano=$3
fraction=`expr $nano : '.*\(...\)......'`
$debug nano is $nano, fraction is $fraction

答案 20 :(得分:1)

还有bash builtin'expr'命令:

INPUT="someletters_12345_moreleters.ext"  
SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)_.*' `  
echo $SUBSTRING

答案 21 :(得分:0)

Inklusive端,类似于JS和Java实现。如果您不希望这样做,请删除+1。

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<ul>
		<li>
			<a href="#">Title 01</a>
			<p>
				Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nihil incidunt blanditiis repellendus nulla error quaerat praesentium sint possimus est vel eveniet deserunt accusantium veniam ullam ex, illo consectetur optio deleniti laudantium quisquam quis quibusdam temporibus corporis autem! Assumenda ex cumque, mollitia non quisquam deserunt voluptatibus culpa sit cum voluptatum molestiae, ipsam debitis quasi inventore alias. Reiciendis, architecto! Quod iusto, asperiores ipsum nesciunt officia repellendus libero, suscipit blanditiis architecto labore necessitatibus itaque natus eveniet quia nam sunt magni hic animi molestias. Officiis eos, magnam necessitatibus nulla quae amet odit omnis tempore ducimus sunt totam praesentium aperiam illum impedit quis et. Quidem.
			</p>
		</li>
		<li>
			<a href="#">Title 02</a>
			<p>
				Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nihil incidunt blanditiis repellendus nulla error quaerat praesentium sint possimus est vel eveniet deserunt accusantium veniam ullam ex, illo consectetur optio deleniti laudantium quisquam quis quibusdam temporibus corporis autem! Assumenda ex cumque, mollitia non quisquam deserunt voluptatibus culpa sit cum voluptatum molestiae, ipsam debitis quasi inventore alias. Reiciendis, architecto! Quod iusto, asperiores ipsum nesciunt officia repellendus libero, suscipit blanditiis architecto labore necessitatibus itaque natus eveniet quia nam sunt magni hic animi molestias. Officiis eos, magnam necessitatibus nulla quae amet odit omnis tempore ducimus sunt totam praesentium aperiam illum impedit quis et. Quidem.
			</p>
		</li>
		<li>
			<a href="#">Title 03</a>
			<p>
				Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nihil incidunt blanditiis repellendus nulla error quaerat praesentium sint possimus est vel eveniet deserunt accusantium veniam ullam ex, illo consectetur optio deleniti laudantium quisquam quis quibusdam temporibus corporis autem! Assumenda ex cumque, mollitia non quisquam deserunt voluptatibus culpa sit cum voluptatum molestiae, ipsam debitis quasi inventore alias. Reiciendis, architecto! Quod iusto, asperiores ipsum nesciunt officia repellendus libero, suscipit blanditiis architecto labore necessitatibus itaque natus eveniet quia nam sunt magni hic animi molestias. Officiis eos, magnam necessitatibus nulla quae amet odit omnis tempore ducimus sunt totam praesentium aperiam illum impedit quis et. Quidem.
			</p>
		</li>
		<li>
			<a href="#">Title 04</a>
			<p>
				Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nihil incidunt blanditiis repellendus nulla error quaerat praesentium sint possimus est vel eveniet deserunt accusantium veniam ullam ex, illo consectetur optio deleniti laudantium quisquam quis quibusdam temporibus corporis autem! Assumenda ex cumque, mollitia non quisquam deserunt voluptatibus culpa sit cum voluptatum molestiae, ipsam debitis quasi inventore alias. Reiciendis, architecto! Quod iusto, asperiores ipsum nesciunt officia repellendus libero, suscipit blanditiis architecto labore necessitatibus itaque natus eveniet quia nam sunt magni hic animi molestias. Officiis eos, magnam necessitatibus nulla quae amet odit omnis tempore ducimus sunt totam praesentium aperiam illum impedit quis et. Quidem.
			</p>
		</li>
	</ul>

	<!-- Exemple d'une autre structure prossible ! Le plugin est FLEXIBLE. Si tu as envie de ne mettre que des div avec des class différentes, c'est possible, il faut juste les changer dans le js :) -->

	<div class="collapsible">
		<div class="test">
			<h1 class="title">Title 01</h1>
			<div class="content">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nihil incidunt blanditiis repellendus nulla error quaerat praesentium sint possimus est vel eveniet deserunt accusantium veniam ullam ex, illo consectetur optio deleniti laudantium quisquam quis quibusdam temporibus corporis autem! Assumenda ex cumque, mollitia non quisquam deserunt voluptatibus culpa sit cum voluptatum molestiae, ipsam debitis quasi inventore alias. Reiciendis, architecto! Quod iusto, asperiores ipsum nesciunt officia repellendus libero, suscipit blanditiis architecto labore necessitatibus itaque natus eveniet quia nam sunt magni hic animi molestias. Officiis eos, magnam necessitatibus nulla quae amet odit omnis tempore ducimus sunt totam praesentium aperiam illum impedit quis et. Quidem.</div>
		</div>
		<div class="test">
			<h1 class="title">Title 02</h1>
			<div class="content">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Possimus nam, id laborum repellat quam accusamus ipsam modi doloribus officiis enim quisquam ea quod repudiandae voluptate repellendus ratione dignissimos nulla dolores! Similique ab doloribus reiciendis, quis sit at ducimus. Corrupti ullam possimus error perferendis, asperiores rerum aliquid, quos similique vero, expedita facilis adipisci nemo explicabo. Esse cupiditate, illo perspiciatis mollitia rerum, iste, sint non facilis labore cum commodi nam beatae officia corporis, inventore. Asperiores ut expedita exercitationem corporis quos soluta unde quibusdam consectetur eius officia tempore porro, sapiente odio cum amet, dicta ipsum earum. Est, illum. Enim quo provident aliquid sit!</div>
		</div>
		<div class="test">
			<h1 class="title">Title 03</h1>
			<div class="content">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Excepturi tempore nesciunt adipisci, ipsa officiis impedit laboriosam voluptate. Labore eos corrupti facere accusamus laborum eum, earum eius numquam deleniti expedita sed voluptate beatae unde ratione id perspiciatis ducimus repellendus eveniet. Dolores itaque deserunt aperiam ab et facere asperiores soluta accusamus sint excepturi earum quos beatae quam, aspernatur nihil amet perspiciatis natus qui ut consequatur est nesciunt repudiandae nisi vitae. Blanditiis voluptate expedita vero unde, et tempore, cumque ad, distinctio corrupti illo consequatur facere sunt est numquam nulla. Expedita minus ipsum placeat, ipsa ad architecto consequuntur vel, sapiente error saepe at quae?</div>
		</div>
		<div class="test">
			<h1 class="title">Title 04</h1>
			<div class="content">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Distinctio eum iure quisquam, enim, blanditiis quis natus, aliquam totam asperiores earum dicta nulla, dolores rem ex doloribus quam nobis! Ab, temporibus eos ex perferendis id dicta, nesciunt quibusdam nobis numquam omnis necessitatibus quia sequi repellat aliquam mollitia! Excepturi vitae provident, ratione soluta sequi, magni nostrum rem officia animi amet consectetur tempore beatae cupiditate temporibus rerum omnis optio distinctio odio. Impedit totam tempora sequi numquam adipisci, asperiores alias minus voluptate quae tempore quasi saepe porro libero. Deserunt iusto numquam necessitatibus tempore vel. Dolorum in reprehenderit veniam ullam vitae temporibus sint commodi libero.</div>
		</div>
		<div class="test">
			<h1 class="title">Title 05</h1>
			<div class="content">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Veniam, praesentium quis molestiae quia iusto quae commodi ad, totam ratione porro nisi nesciunt ipsum amet dolores delectus esse modi. Illo ducimus adipisci officiis natus, molestias officia! Natus perspiciatis eum inventore, temporibus praesentium sint quae ipsam veritatis deleniti ex ratione dolore maiores tempora similique officiis. Quasi, maxime, cupiditate! Voluptates corrupti ad veritatis recusandae iste maiores nisi, repellat, nobis doloribus numquam soluta omnis, commodi suscipit. Tempore aliquid nisi magnam libero ducimus delectus molestiae, distinctio laboriosam incidunt magni nostrum aperiam quisquam ipsum quo quam voluptatibus in corporis debitis, aut et laborum eaque doloribus assumenda?</div>
		</div>
	</div>

示例:

substring() {
    local str="$1" start="${2}" end="${3}"

    if [[ "$start" == "" ]]; then start="0"; fi
    if [[ "$end"   == "" ]]; then end="${#str}"; fi

    local length="((${end}-${start}+1))"

    echo "${str:${start}:${length}}"
} 

更多示例调用:

    substring 01234 0
    01234
    substring 012345 0
    012345
    substring 012345 0 0
    0
    substring 012345 1 1
    1
    substring 012345 1 2
    12
    substring 012345 0 1
    01
    substring 012345 0 2
    012
    substring 012345 0 3
    0123
    substring 012345 0 4
    01234
    substring 012345 0 5
    012345

不客气。