国际访客建议访问 Primers 编程伙伴 国际版站点 > Bash 教程 > grep 以获得更好的体验。

# Bash 的 grep 命令

grep [OPTION]... PATTERNS [FILE]...

!subtitle:功能

搜索文件中匹配的行。

!subtitle:类型

可执行文件(/usr/bin/grep)。

!subtitle:参数

  • OPTION 选项:

    • -E, --extended-regexp - 使用扩展正则(EREs)

    • -F, --fixed-strings - 不使用正则,PATTERNS 试做普通字符串

    • -G, --basic-regexp - 使用基本正则(BRGs);默认

    • -P, --perl-regexp - 使用 Perl 正则(PCREs)

    • -e PATTERNS, --regexp=PATTERNS - 指定正则表达式,可以多次使用此选项指定多个表达式

    • -f FILE, --file=FILE - 从 FILE 文件中获取正则表达式,每行表示一个正则表达式

    • -i, --ignore-case - 忽略大小写

    • --no-ignore-case - 不要忽略大小写

    • -v, --invert-match - 反转匹配(选择不匹配的行)

    • -w, --word-regexp - 仅选择匹配完整单词的行

    • -x, --line-regexp - 仅选择匹配整行的行

    • -c, --count - 只显示各个文件匹配的行数

    • --color[=WHEN], --colour[=WHEN] - 何时显示彩色;取值为:never, always, 或 auto

    • -L, --files-without-match - 只显示无匹配行的文件名

    • -l, --files-with-matches - 只显示有匹配行的文件名

    • -m NUM, --max-count=NUM - 最多匹配 NUM

    • -o, --only-matching - 只显示匹配的部分,而非匹配的行

    • -q, --quiet, --silent - 不打印任何内容,仅通过返回值返回是(0)否(非 0)存在匹配

    • -s, --no-messages - 忽略文件不存在或不可读的错误

    • -b, --byte-offset - 在每行输出前打印从 0 开始字节偏移量

    • -H, --with-filename - 打印文件名;默认

    • -h, --no-filename - 不打印文件名

    • --label=LABEL - 将来自标准输入的数据当作来自 LABEL 文件

    • -n, --line-number - 打印从 1 开始的行号

    • -T, --initial-tab - 确保打印行的第一个字符在制表位上

    • -Z, --null - 文件名后跟空字符(\0)代替原本的字符(通常是 :\n

    • -A NUM, --after-context=NUM - 打印匹配行之后的 NUM

    • -B NUM, --before-context=NUM - 打印匹配行之前的 NUM

    • -C NUM, -NUM, --context=NUM - 打印匹配行之前和之后各 NUM

    • --group-separator=SEP - 使用 -A,-B,-C 打印多行时,使用 SEP 分隔不同的匹配;默认为 --

    • --no-group-separator - 使用 -A,-B,-C 打印多行时,不进行分隔

    • -a, --text - 将二进制文件试做文本文件;等价于 --binary-files=text

    • --binary-files=TYPE - 将二进制文件视作 TYPE 类型的文件

      • binary - 忽略并警告;默认

      • text - 视作文本文件进行匹配

      • without-match - 忽略

    • -D ACTION, --devices=ACTION - 遇到设备文件时进行的操作

      • read - 当作普通文件读取并匹配

      • skip - 跳过

    • -d ACTION, --directories=ACTION - 遇到目录文件时进行的操作

      • read - 当作普通文件读取并匹配

      • skip - 跳过

      • recurse - 递归读取目录中的所有文件

    • --exclude=GLOB - 文件名以通配符模式匹配 GLOB 时,跳过该文件

    • --exclude-from=FILE - 文件名以通配符模式匹配 FILE 中的任意行时,跳过该文件

    • --exclude-dir=GLOB - 文件名以通配符模式匹配 GLOB 时,跳过该目录

    • -I - 忽略二进制文件;等价于 --binary-files=without-match

    • --include=GLOB - 文件名以通配符模式匹配 GLOB 时,包含该文件;跳过其它文件

    • -r, --recursive - 递归搜索目录;等价于 --directories=recurse

    • -R, --dereference-recursive - 递归搜索目录,并跟踪符号链接

    • --line-buffered - 输出时使用行缓冲

    • -U, --binary - 将文件视作二进制文件

    • -z, --null-data - 以空字符(\0)作为行的结尾,而不是换行符(\n

    • --help - 显示当前帮助

    • --version - 显示版本

  • PATTERNS - 正则表达式

  • FILE - 文件列表;可以是目录

    • 如果此参数设为 - 则读取标准输入

    • 如果省略此参数,则递归搜索当前工作目录

# 正则表达式

grep 主要支持三种风格的正则:基本正则 (BRE - Basic Regular Expression)扩展正则 (ERE - Extended Regular Expression)Perl 正则 (PCRE - Perl Compatible Regular Expressions)

核心差异对比

特性 基本正则 (BRE) 扩展正则 (ERE) Perl 正则 (PCRE)
选项 无 (默认) -E -P
次数匹配 {n,m} \{n,m\} {n,m} {n,m}
分组 () \(\) () ()
或运算 \| !code:\| \| \|
1次或多次 + \+ + +
0次或1次 ? \? ? ?
预定义类 (如 \d) 不支持 不支持 支持
非贪婪匹配 不支持 不支持 支持

# 基本正则表达式

这是 grep 默认使用的模式。它的原则是“尽量把字符当成普通文本”,因此很多特殊元字符需要通过反斜杠 \ 转义后才具有特殊含义。

  • ?, +, {, |, (, ) 这些符号在 BRE 中被视为普通字符。

  • 如果要使用它们的正则功能,必须转义:\?, \+, \{, \|, \(

# 扩展正则表达式

ERE 简化了语法,去掉了大部分反斜杠。它认为这些特殊符号“天生”就应该是功能符号。

  • 使用 -E 选项开启

  • ?, +, {, |, (, ) 直接作为正则的元字符使用,无需转义。

  • 如果要将它们当作普通字符,必须转义:\?, \+, \{, \|, \(

# Perl 正则表达式

这是最强大的正则风格。它不仅包含 ERE 的所有功能,还加入了许多高级特性(如环视、非贪婪匹配等)。

  • 使用 -P 选项开启

  • 支持特殊的预定义字符集,如 \d (数字), \w (字母数字下划线), \s (空白符)。

  • 支持非贪婪模式(在量词后加 ?)。

| !embed:<span style='display:block;width:12em'>字符</span> | 描述 | | :-: | :- | | \ | 将下一个字符标记为一个特殊字符(File Format Escape,清单见本表)、或一个原义字符(Identity Escape,有“^$()+?.[{|\”共计12个)、或一个向后引用(backreferences)、或一个八进制转义符。例如,“n”匹配字符“n”。“\n”匹配一个换行符。序列“\”匹配“\”而“(”则匹配“(”。 | | ^ | 匹配输入字符串的开始位置。如果设置了RegExp对象的Multiline属性,^也匹配“\n”或“\r”之后的位置。 | | $ | 匹配输入字符串的结束位置。如果设置了RegExp对象的Multiline属性,$也匹配“\n”或“\r”之前的位置。 | | * | 匹配前面的子表达式零次或多次。例如,zo能匹配“z”、“zo”以及“zoo”。等价于{0,}。 | | + | 匹配前面的子表达式一次或多次。例如,“zo+”能匹配“zo”以及“zoo”,但不能匹配“z”。+等价于{1,}。 | | ? | 匹配前面的子表达式零次或一次。例如,“do(es)?”可以匹配“does”中的“do”和“does”。?等价于{0,1}。 | | {n} | n是一个非负整数。匹配确定的n次。例如,“o{2}”不能匹配“Bob”中的“o”,但是能匹配“food”中的两个o。 | | {n,} | n是一个非负整数。至少匹配n次。例如,“o{2,}”不能匹配“Bob”中的“o”,但能匹配“foooood”中的所有o。“o{1,}”等价于“o+”。“o{0,}”则等价于“o”。 | | {n,m} | m和n均为非负整数,其中n<=m。最少匹配n次且最多匹配m次。例如,“o{1,3}”将匹配“fooooood”中的前三个o。“o{0,1}”等价于“o?”。请注意在逗号和两个数之间不能有空格。 | | ? | 非贪心量化(Non-greedy quantifiers):当该字符紧跟在任何一个其他重复修饰符(*,+,?,{n},{n,},{n,m})后面时,匹配模式是非贪婪的。非贪婪模式尽可能少的匹配所搜索的字符串,而默认的贪婪模式则尽可能多的匹配所搜索的字符串。例如,对于字符串“oooo”,“o+?”将匹配单个“o”,而“o+”将匹配所有“o”。 | | . | 匹配除“\r”“\n”之外的任何单个字符。要匹配包括“\r”“\n”在内的任何字符,请使用像“(.|\r|\n)”的模式。 | | (pattern) | 匹配pattern并获取这一匹配的子字符串。该子字符串用于向后引用。所获取的匹配可以从产生的Matches集合得到,在VBScript中使用SubMatches集合,在JScript中则使用\(0...\)9属性。要匹配圆括号字符,请使用“(”或“)”。可带数量后缀。 | | (?:pattern) | 匹配pattern但不获取匹配的子字符串(shy groups),也就是说这是一个非获取匹配,不存储匹配的子字符串用于向后引用。这在使用或字符“(|)”来组合一个模式的各个部分是很有用。例如“industr(?:y|ies)”就是一个比“industry|industries”更简略的表达式。 | | (?=pattern) | 正向肯定预查(look ahead positive assert),在任何匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如,“Windows(?=95|98|NT|2000)”能匹配“Windows2000”中的“Windows”,但不能匹配“Windows3.1”中的“Windows”。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹配的搜索,而不是从包含预查的字符之后开始。 | | (?!pattern) | 正向否定预查(negative assert),在任何不匹配pattern的字符串开始处匹配查找字符串。这是一个非获取匹配,也就是说,该匹配不需要获取供以后使用。例如“Windows(?!95|98|NT|2000)”能匹配“Windows3.1”中的“Windows”,但不能匹配“Windows2000”中的“Windows”。预查不消耗字符,也就是说,在一个匹配发生后,在最后一次匹配之后立即开始下一次匹配的搜索,而不是从包含预查的字符之后开始 | | (?<=pattern) | 反向(look behind)肯定预查,与正向肯定预查类似,只是方向相反。例如,“(?<=95|98|NT|2000)Windows”能匹配“2000Windows”中的“Windows”,但不能匹配“3.1Windows”中的“Windows”。 | | (?<!pattern) | 反向否定预查,与正向否定预查类似,只是方向相反。例如“(?<!95|98|NT|2000)Windows”能匹配“3.1Windows”中的“Windows”,但不能匹配“2000Windows”中的“Windows”。 | | x|y | 没有包围在()里,其范围是整个正则表达式。例如,“z|food”能匹配“z”或“food”。“(?:z|f)ood”则匹配“zood”或“food”。 | | [xyz] | 字符集合(character class)。匹配所包含的任意一个字符。例如,“[abc]”可以匹配“plain”中的“a”。特殊字符仅有反斜线\保持特殊含义,用于转义字符。其它特殊字符如星号、加号、各种括号等均作为普通字符。脱字符^如果出现在首位则表示负值字符集合;如果出现在字符串中间就仅作为普通字符。连字符 - 如果出现在字符串中间表示字符范围描述;如果如果出现在首位(或末尾)则仅作为普通字符。右方括号应转义出现,也可以作为首位字符出现。 | | [^xyz] | 排除型字符集合(negated character classes)。匹配未列出的任意字符。例如,“[^abc]”可以匹配“plain”中的“plin”。 | | [a-z] | 字符范围。匹配在Unicode编码表指定范围内的任意字符。例如,“[a-z]”可以匹配“a”到“z”范围内的任意小写字母字符。 | | [^a-z] | 排除型的字符范围。匹配任何不在Unicode编码表指定范围内的任意字符。例如,“[^a-z]”可以匹配任何不在“a”到“z”范围内的任意字符。 | | [:name:] | 增加命名字符类(named character class)[注 1]中的字符到表达式。只能用于方括号表达式。 | | [=elt=] | 增加当前locale下排序(collate)等价于字符“elt”的元素。例如,[=a=]可能会增加ä、á、à、ă、ắ、ằ、ẵ、ẳ、â、ấ、ầ、ẫ、ẩ、ǎ、å、ǻ、ä、ǟ、ã、ȧ、ǡ、ą、ā、ả、ȁ、ȃ、ạ、ặ、ậ、ḁ、ⱥ、ᶏ、ɐ、ɑ 。只能用于方括号表达式。 | | [.elt.] | 增加排序元素(collation element)elt到表达式中。这是因为某些排序元素由多个字符组成。例如,29个字母表的西班牙语, "CH"作为单个字母排在字母C之后,因此会产生如此排序“cinco, credo, chispa”。只能用于方括号表达式。 | | \b | 匹配一个单词边界,也就是指单词和空格间的位置。例如,“er\b”可以匹配“never”中的“er”,但不能匹配“verb”中的“er”。 | | \B | 匹配非单词边界。“er\B”能匹配“verb”中的“er”,但不能匹配“never”中的“er”。 | | \cx | 匹配由x指明的控制字符。x的值必须为A-Z或a-z之一。否则,将c视为一个原义的“c”字符。控制字符的值等于x的值最低5比特(即对3210进制的余数)。例如,\cM匹配一个Control-M或回车符。\ca等效于\u0001, \cb等效于\u0002, 等等... | | \d | 匹配一个数字字符。等价于[0-9]。注意Unicode正则表达式会匹配全角数字字符。 | | \D | 匹配一个非数字字符。等价于[^0-9]。 | | \f | 匹配一个换页符。等价于\x0c和\cL。 | | \n | 匹配一个换行符。等价于\x0a和\cJ。 | | \r | 匹配一个回车符。等价于\x0d和\cM。 | | \s | 匹配任何空白字符,包括空格、制表符、换页符等等。等价于[ \f\n\r\t\v]。注意Unicode正则表达式会匹配全角空格符。 | | \S | 匹配任何非空白字符。等价于[^ \f\n\r\t\v]。 | | \t | 匹配一个制表符。等价于\x09和\cI。 | | \v | 匹配一个垂直制表符。等价于\x0b和\cK。 | | \w | 匹配包括下划线的任何单词字符。等价于“[A-Za-z0-9_]”。注意Unicode正则表达式会匹配中文字符。 | | \W | 匹配任何非单词字符。等价于“[^A-Za-z0-9_]”。 | | \xnn | 十六进制转义字符序列。匹配两个十六进制数字nn表示的字符。例如,“\x41”匹配“A”。“\x041”则等价于“\x04&1”。正则表达式中可以使用ASCII编码。. | | \num | 向后引用(back-reference)一个子字符串(substring),该子字符串与正则表达式的第num个用括号围起来的捕捉群(capture group)子表达式(subexpression)匹配。其中num是从1开始的十进制正整数,其上限可能是9[注 2]、31[注 3]、99甚至无限[注 4]。例如:“(.)\1”匹配两个连续的相同字符。 | | \n | 标识一个八进制转义值或一个向后引用。如果\n之前至少n个获取的子表达式,则n为向后引用。否则,如果n为八进制数字(0-7),则n为一个八进制转义值。 | | \nm | 3位八进制数字,标识一个八进制转义值或一个向后引用。如果\nm之前至少有nm个获得子表达式,则nm为向后引用。如果\nm之前至少有n个获取,则n为一个后跟文字m的向后引用。如果前面的条件都不满足,若n和m均为八进制数字(0-7),则\nm将匹配八进制转义值nm。 | | \nml | 如果n为八进制数字(0-3),且m和l均为八进制数字(0-7),则匹配八进制转义值nml。 | | \un | Unicode转义字符序列。其中n是一个用四个十六进制数字表示的Unicode字符。例如,\u00A9匹配著作权符号(©)。 |

# 示例

!subtitle:基本示例

$ grep "error" log.txt                      # 在 log.txt 中搜索含有 "error" 的行
$ grep -i "error" log.txt                   # 忽略大小写
$ grep -n "error" log.txt                   # 显示行号
$ grep -c "error" log.txt                   # 统计匹配的行数
$ grep -v "error" log.txt                   # 在 log.txt 中搜索不含 "error" 的行

!subtitle:递归搜索目录

$ grep "TODO" -r ./src                      # 在 ./src 目录中搜索包含 "TODO" 的行,不跟踪符号链接
$ grep "error" -R /var/log                  # 在 /var/log 目录中搜索包含 "error" 的行,并且跟踪符号链接

!subtitle:显示上下文

$ grep "TODO" -A 10 ./src                   # 现在匹配的行以及之后的 10 行
$ grep "TODO" -B 10 ./src                   # 现在匹配的行以及之前的 10 行
$ grep "TODO" -C 10 ./src                   # 现在匹配的行以及前后各 10 行

!subtitle:正则表达式

$ grep "^#" file.txt                        # 搜索以 # 开头的行
$ grep -E "error|fail|panic" log.txt        # 使用扩展正则,“或”关系不需要转义
$ grep -E "[0-9]+" file.txt                 # 匹配十进制数字串
$ grep -P "[\w.-]+@[\w.-]+\.\w+" file.txt   # 匹配邮箱地址,使用 Perl 正则,支持 \w 等预定义字符集

!subtitle:信息提取

$ grep -o "[0-9]\+" file.txt                # 只输出匹配的部分
$ grep -oP "(?<=:)\d+" url.txt              # 提取端口号
$ grep -oP "(?<=user=)\w+" log.txt          # 提取 user= 的值

# 推荐阅读

# 手册

GREP(1)                          User Commands                         GREP(1)

NAME
       grep, egrep, fgrep, rgrep - print lines that match patterns

SYNOPSIS
       grep [OPTION...] PATTERNS [FILE...]
       grep [OPTION...] -e PATTERNS ... [FILE...]
       grep [OPTION...] -f PATTERN_FILE ... [FILE...]

DESCRIPTION
       grep  searches  for  PATTERNS  in  each  FILE.  PATTERNS is one or more
       patterns separated by newline characters, and  grep  prints  each  line
       that  matches a pattern.  Typically PATTERNS should be quoted when grep
       is used in a shell command.

       A FILE of “-”  stands  for  standard  input.   If  no  FILE  is  given,
       recursive  searches  examine  the  working  directory, and nonrecursive
       searches read standard input.

       Debian also includes the  variant  programs  egrep,  fgrep  and  rgrep.
       These   programs  are  the  same  as  grep -E,  grep -F,  and  grep -r,
       respectively.  These  variants  are  deprecated  upstream,  but  Debian
       provides  for  backward  compatibility.  For portability reasons, it is
       recommended to avoid the  variant  programs,  and  use  grep  with  the
       related option instead.

OPTIONS
   Generic Program Information
       --help Output a usage message and exit.

       -V, --version
              Output the version number of grep and exit.

   Pattern Syntax
       -E, --extended-regexp
              Interpret  PATTERNS  as  extended regular expressions (EREs, see
              below).

       -F, --fixed-strings
              Interpret PATTERNS as fixed strings, not regular expressions.

       -G, --basic-regexp
              Interpret PATTERNS  as  basic  regular  expressions  (BREs,  see
              below).  This is the default.

       -P, --perl-regexp
              Interpret   PATTERNS   as  Perl-compatible  regular  expressions
              (PCREs).  This option is experimental when combined with the  -z
              (--null-data)  option,  and  grep  -P  may warn of unimplemented
              features.

   Matching Control
       -e PATTERNS, --regexp=PATTERNS
              Use PATTERNS as the patterns.  If this option is  used  multiple
              times or is combined with the -f (--file) option, search for all
              patterns  given.   This  option can be used to protect a pattern
              beginning with “-”.

       -f FILE, --file=FILE
              Obtain patterns from FILE, one per line.  If this option is used
              multiple times or is combined with  the  -e  (--regexp)  option,
              search  for  all  patterns  given.  The empty file contains zero
              patterns, and therefore matches nothing.  If FILE is  -  ,  read
              patterns from standard input.

       -i, --ignore-case
              Ignore  case  distinctions  in  patterns and input data, so that
              characters that differ only in case match each other.

       --no-ignore-case
              Do not ignore case distinctions  in  patterns  and  input  data.
              This is the default.  This option is useful for passing to shell
              scripts  that  already use -i, to cancel its effects because the
              two options override each other.

       -v, --invert-match
              Invert the sense of matching, to select non-matching lines.

       -w, --word-regexp
              Select only those  lines  containing  matches  that  form  whole
              words.   The  test is that the matching substring must either be
              at the  beginning  of  the  line,  or  preceded  by  a  non-word
              constituent  character.  Similarly, it must be either at the end
              of the line or followed by  a  non-word  constituent  character.
              Word-constituent   characters   are  letters,  digits,  and  the
              underscore.  This option has no effect if -x is also specified.

       -x, --line-regexp
              Select only those matches that exactly  match  the  whole  line.
              For  a  regular  expression pattern, this is like parenthesizing
              the pattern and then surrounding it with ^ and $.

   General Output Control
       -c, --count
              Suppress normal output; instead print a count of matching  lines
              for  each  input  file.  With the -v, --invert-match option (see
              above), count non-matching lines.

       --color[=WHEN], --colour[=WHEN]
              Surround  the  matched  (non-empty)  strings,  matching   lines,
              context  lines,  file  names,  line  numbers,  byte offsets, and
              separators (for fields and groups of context lines) with  escape
              sequences  to display them in color on the terminal.  The colors
              are defined by the environment variable  GREP_COLORS.   WHEN  is
              never, always, or auto.

       -L, --files-without-match
              Suppress  normal  output;  instead  print the name of each input
              file from which no output would normally have been printed.

       -l, --files-with-matches
              Suppress normal output; instead print the  name  of  each  input
              file  from  which  output  would  normally  have  been  printed.
              Scanning each input file stops upon first match.

       -m NUM, --max-count=NUM
              Stop reading a file after NUM matching lines.  If NUM  is  zero,
              grep  stops  right  away  without reading input.  A NUM of -1 is
              treated as infinity and grep does not stop; this is the default.
              If the input is standard input from  a  regular  file,  and  NUM
              matching  lines are output, grep ensures that the standard input
              is positioned to  just  after  the  last  matching  line  before
              exiting,  regardless  of the presence of trailing context lines.
              This enables a calling process to resume a  search.   When  grep
              stops  after NUM matching lines, it outputs any trailing context
              lines.  When the -c or --count option is also  used,  grep  does
              not   output   a  count  greater  than  NUM.   When  the  -v  or
              --invert-match option is also used, grep stops after  outputting
              NUM non-matching lines.

       -o, --only-matching
              Print  only  the  matched  (non-empty) parts of a matching line,
              with each such part on a separate output line.

       -q, --quiet, --silent
              Quiet;  do  not  write  anything  to  standard   output.    Exit
              immediately  with  zero status if any match is found, even if an
              error was detected.  Also see the -s or --no-messages option.

       -s, --no-messages
              Suppress error messages about nonexistent or unreadable files.

   Output Line Prefix Control
       -b, --byte-offset
              Print the 0-based byte offset within the input file before  each
              line of output.  If -o (--only-matching) is specified, print the
              offset of the matching part itself.

       -H, --with-filename
              Print  the  file  name for each match.  This is the default when
              there is more than one file to search.  This is a GNU extension.

       -h, --no-filename
              Suppress the prefixing of file names on  output.   This  is  the
              default  when there is only one file (or only standard input) to
              search.

       --label=LABEL
              Display input actually  coming  from  standard  input  as  input
              coming  from  file  LABEL.  This can be useful for commands that
              transform a file's contents before  searching,  e.g.,  gzip  -cd
              foo.gz  |  grep  --label=foo -H 'some pattern'.  See also the -H
              option.

       -n, --line-number
              Prefix each line of output with the 1-based line  number  within
              its input file.

       -T, --initial-tab
              Make  sure  that the first character of actual line content lies
              on a tab stop, so that the alignment of tabs looks normal.  This
              is useful with options that prefix their output  to  the  actual
              content:  -H,-n,  and  -b.   In order to improve the probability
              that lines from a single file will all start at the same column,
              this also causes the line number and byte offset (if present) to
              be printed in a minimum size field width.

       -Z, --null
              Output a zero byte (the ASCII  NUL  character)  instead  of  the
              character  that normally follows a file name.  For example, grep
              -lZ outputs a zero byte after each  file  name  instead  of  the
              usual  newline.   This option makes the output unambiguous, even
              in the presence of file names containing unusual characters like
              newlines.  This option can  be  used  with  commands  like  find
              -print0,  perl  -0,  sort  -z, and xargs -0 to process arbitrary
              file names, even those that contain newline characters.

   Context Line Control
       -A NUM, --after-context=NUM
              Print NUM  lines  of  trailing  context  after  matching  lines.
              Places   a  line  containing  a  group  separator  (--)  between
              contiguous groups of matches.  With the  -o  or  --only-matching
              option, this has no effect and a warning is given.

       -B NUM, --before-context=NUM
              Print  NUM  lines  of  leading  context  before  matching lines.
              Places  a  line  containing  a  group  separator  (--)   between
              contiguous  groups  of  matches.  With the -o or --only-matching
              option, this has no effect and a warning is given.

       -C NUM, -NUM, --context=NUM
              Print NUM lines of output context.  Places a line  containing  a
              group separator (--) between contiguous groups of matches.  With
              the  -o  or  --only-matching  option,  this  has no effect and a
              warning is given.

       --group-separator=SEP
              When -A, -B, or -C are in use, print SEP instead of  --  between
              groups of lines.

       --no-group-separator
              When  -A, -B, or -C are in use, do not print a separator between
              groups of lines.

   File and Directory Selection
       -a, --text
              Process a binary file as if it were text; this is equivalent  to
              the --binary-files=text option.

       --binary-files=TYPE
              If  a  file's  data  or metadata indicate that the file contains
              binary data, assume that the file is  of  type  TYPE.   Non-text
              bytes  indicate  binary data; these are either output bytes that
              are improperly encoded for the current  locale,  or  null  input
              bytes when the -z option is not given.

              By  default,  TYPE  is  binary, and grep suppresses output after
              null input binary data  is  discovered,  and  suppresses  output
              lines that contain improperly encoded data.  When some output is
              suppressed,  grep  follows any output with a message to standard
              error saying that a binary file matches.

              If TYPE is without-match, when grep discovers null input  binary
              data  it  assumes that the rest of the file does not match; this
              is equivalent to the -I option.

              If TYPE is text, grep processes a binary  file  as  if  it  were
              text; this is equivalent to the -a option.

              When  type  is  binary,  grep  may  treat non-text bytes as line
              terminators even without the -z  option.   This  means  choosing
              binary  versus text can affect whether a pattern matches a file.
              For example, when type is binary the pattern q$  might  match  q
              immediately  followed  by  a  null byte, even though this is not
              matched when type is text.  Conversely, when type is binary  the
              pattern . (period) might not match a null byte.

              Warning:  The  -a  option might output binary garbage, which can
              have nasty side effects if the output is a terminal and  if  the
              terminal driver interprets some of it as commands.  On the other
              hand,  when  reading  files whose text encodings are unknown, it
              can  be  helpful  to  use  -a  or  to  set  LC_ALL='C'  in   the
              environment,  in  order to find more matches even if the matches
              are unsafe for direct display.

       -D ACTION, --devices=ACTION
              If an input file is a device, FIFO  or  socket,  use  ACTION  to
              process  it.   By  default,  ACTION  is  read,  which means that
              devices are read just as if they were ordinary files.  If ACTION
              is skip, devices are silently skipped.

       -d ACTION, --directories=ACTION
              If an input file is a directory, use ACTION to process  it.   By
              default,  ACTION is read, i.e., read directories just as if they
              were  ordinary  files.   If  ACTION  is  skip,   silently   skip
              directories.   If  ACTION  is recurse, read all files under each
              directory, recursively, following symbolic links  only  if  they
              are on the command line.  This is equivalent to the -r option.

       --exclude=GLOB
              Skip  any  command-line file with a name suffix that matches the
              pattern GLOB, using wildcard matching; a name suffix  is  either
              the  whole name, or a trailing part that starts with a non-slash
              character immediately after a  slash  (/)  in  the  name.   When
              searching  recursively, skip any subfile whose base name matches
              GLOB; the base name is the part after the last slash.  A pattern
              can use *, ?, and [...] as wildcards, and \ to quote a  wildcard
              or backslash character literally.

       --exclude-from=FILE
              Skip  files  whose  base name matches any of the file-name globs
              read from FILE  (using  wildcard  matching  as  described  under
              --exclude).

       --exclude-dir=GLOB
              Skip  any command-line directory with a name suffix that matches
              the  pattern  GLOB.   When  searching  recursively,   skip   any
              subdirectory whose base name matches GLOB.  Ignore any redundant
              trailing slashes in GLOB.

       -I     Process  a  binary  file as if it did not contain matching data;
              this is equivalent to the --binary-files=without-match option.

       --include=GLOB
              Search only files whose base name matches GLOB  (using  wildcard
              matching   as  described  under  --exclude).   If  contradictory
              --include and --exclude options are given, the last matching one
              wins.  If no --include or --exclude options  match,  a  file  is
              included unless the first such option is --include.

       -r, --recursive
              Read  all  files  under  each  directory, recursively, following
              symbolic links only if they are on the command line.  Note  that
              if   no  file  operand  is  given,  grep  searches  the  working
              directory.  This is equivalent to the -d recurse option.

       -R, --dereference-recursive
              Read all files under each directory,  recursively.   Follow  all
              symbolic links, unlike -r.

   Other Options
       --line-buffered
              Use  line  buffering  on  output.   This can cause a performance
              penalty.

       -U, --binary
              Treat the file(s) as binary.  By default, under MS-DOS  and  MS-
              Windows,  grep  guesses  whether  a  file  is  text or binary as
              described for the --binary-files option.  If  grep  decides  the
              file  is  a  text  file,  it  strips  the CR characters from the
              original file contents (to make regular expressions with ^ and $
              work  correctly).   Specifying  -U  overrules  this   guesswork,
              causing  all  files  to  be  read  and  passed  to  the matching
              mechanism verbatim; if the file is a text file with CR/LF  pairs
              at   the  end  of  each  line,  this  will  cause  some  regular
              expressions to fail.  This option has  no  effect  on  platforms
              other than MS-DOS and MS-Windows.

       -z, --null-data
              Treat  input  and  output  data  as  sequences  of  lines,  each
              terminated by a zero byte (the ASCII NUL character) instead of a
              newline.  Like the -Z or --null option, this option can be  used
              with commands like sort -z to process arbitrary file names.

REGULAR EXPRESSIONS
       A  regular  expression  is  a  pattern that describes a set of strings.
       Regular  expressions  are   constructed   analogously   to   arithmetic
       expressions, by using various operators to combine smaller expressions.

       grep understands three different versions of regular expression syntax:
       “basic”  (BRE), “extended” (ERE) and “perl” (PCRE).  In GNU grep, basic
       and extended regular expressions are merely different notations for the
       same pattern-matching functionality.  In other  implementations,  basic
       regular  expressions are ordinarily less powerful than extended, though
       occasionally it is the other way  around.   The  following  description
       applies  to extended regular expressions; differences for basic regular
       expressions  are  summarized   afterwards.    Perl-compatible   regular
       expressions   have  different  functionality,  and  are  documented  in
       pcre2syntax(3) and pcre2pattern(3), but work only if  PCRE  support  is
       enabled.

       The  fundamental building blocks are the regular expressions that match
       a single character.  Most characters, including all letters and digits,
       are regular expressions that match themselves.  Any meta-character with
       special meaning may be quoted by preceding it with a backslash.

       The period . matches any single character.  It is  unspecified  whether
       it matches an encoding error.

   Character Classes and Bracket Expressions
       A  bracket  expression is a list of characters enclosed by [ and ].  It
       matches any single character in that list.  If the first  character  of
       the  list is the caret ^ then it matches any character not in the list;
       it is unspecified whether it matches an encoding error.   For  example,
       the regular expression [0123456789] matches any single digit.

       Within  a  bracket  expression,  a  range  expression  consists  of two
       characters separated by a hyphen.  It matches any single character that
       sorts  between  the  two  characters,  inclusive,  using  the  locale's
       collating  sequence  and  character set.  For example, in the default C
       locale, [a-d] is equivalent to [abcd].  Many locales sort characters in
       dictionary  order,  and  in  these  locales  [a-d]  is  typically   not
       equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example.
       To  obtain  the  traditional interpretation of bracket expressions, you
       can use the C locale by setting the LC_ALL environment variable to  the
       value C.

       Finally,  certain  named  classes  of  characters are predefined within
       bracket expressions, as follows.  Their names are self explanatory, and
       they  are  [:alnum:],  [:alpha:],  [:blank:],   [:cntrl:],   [:digit:],
       [:graph:],  [:lower:],  [:print:], [:punct:], [:space:], [:upper:], and
       [:xdigit:].  For example, [[:alnum:]]  means  the  character  class  of
       numbers  and  letters in the current locale.  In the C locale and ASCII
       character set encoding, this is the same as  [0-9A-Za-z].   (Note  that
       the  brackets  in these class names are part of the symbolic names, and
       must be included in addition to the  brackets  delimiting  the  bracket
       expression.)   Most  meta-characters  lose their special meaning inside
       bracket expressions.  To include a literal ]  place  it  first  in  the
       list.   Similarly,  to include a literal ^ place it anywhere but first.
       Finally, to include a literal - place it last.

   Anchoring
       The caret ^ and the dollar sign $ are meta-characters that respectively
       match the empty string at the beginning and end of a line.

   The Backslash Character and Special Expressions
       The symbols \< and \>  respectively  match  the  empty  string  at  the
       beginning and end of a word.  The symbol \b matches the empty string at
       the  edge  of a word, and \B matches the empty string provided it's not
       at the edge of a word.  The symbol \w is a synonym for [_[:alnum:]] and
       \W is a synonym for [^_[:alnum:]].

   Repetition
       A regular expression may be  followed  by  one  of  several  repetition
       operators:
       ?      The preceding item is optional and matched at most once.
       *      The preceding item will be matched zero or more times.
       +      The preceding item will be matched one or more times.
       {n}    The preceding item is matched exactly n times.
       {n,}   The preceding item is matched n or more times.
       {,m}   The  preceding  item  is matched at most m times.  This is a GNU
              extension.
       {n,m}  The preceding item is matched at least n  times,  but  not  more
              than m times.

   Concatenation
       Two  regular  expressions  may  be  concatenated; the resulting regular
       expression matches any string formed by  concatenating  two  substrings
       that respectively match the concatenated expressions.

   Alternation
       Two  regular  expressions  may  be  joined by the infix operator |; the
       resulting  regular  expression  matches  any  string  matching   either
       alternate expression.

   Precedence
       Repetition  takes  precedence  over  concatenation, which in turn takes
       precedence over alternation.  A whole expression  may  be  enclosed  in
       parentheses   to   override   these   precedence   rules   and  form  a
       subexpression.

   Back-references and Subexpressions
       The back-reference \n, where n is a single digit, matches the substring
       previously matched  by  the  nth  parenthesized  subexpression  of  the
       regular expression.

   Basic vs Extended Regular Expressions
       In  basic  regular expressions the meta-characters ?, +, {, |, (, and )
       lose their special meaning; instead use the  backslashed  versions  \?,
       \+, \{, \|, \(, and \).

EXIT STATUS
       Normally the exit status is 0 if a line is selected, 1 if no lines were
       selected, and 2 if an error occurred.  However, if the -q or --quiet or
       --silent  is  used and a line is selected, the exit status is 0 even if
       an error occurred.

ENVIRONMENT
       The  behavior  of  grep  is  affected  by  the  following   environment
       variables.

       The  locale  for  category  LC_foo  is specified by examining the three
       environment variables LC_ALL, LC_foo, LANG, in that order.   The  first
       of  these  variables that is set specifies the locale.  For example, if
       LC_ALL is not set, but LC_MESSAGES is set to pt_BR, then the  Brazilian
       Portuguese  locale  is used for the LC_MESSAGES category.  The C locale
       is used if none of these environment variables are set, if  the  locale
       catalog  is  not  installed,  or if grep was not compiled with national
       language support (NLS).  The shell command locale -a lists locales that
       are currently available.

       GREP_COLORS
              Controls how the --color option highlights output.  Its value is
              a  colon-separated  list  of  capabilities  that   defaults   to
              ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36  with  the  rv
              and ne boolean capabilities omitted  (i.e.,  false).   Supported
              capabilities are as follows.

              sl=    SGR  substring  for  whole selected lines (i.e., matching
                     lines when the -v command-line option is omitted, or non-
                     matching lines when -v is  specified).   If  however  the
                     boolean  rv capability and the -v command-line option are
                     both specified, it  applies  to  context  matching  lines
                     instead.   The  default  is  empty  (i.e., the terminal's
                     default color pair).

              cx=    SGR substring for whole context lines (i.e., non-matching
                     lines when the -v  command-line  option  is  omitted,  or
                     matching  lines  when  -v  is specified).  If however the
                     boolean rv capability and the -v command-line option  are
                     both specified, it applies to selected non-matching lines
                     instead.   The  default  is  empty  (i.e., the terminal's
                     default color pair).

              rv     Boolean value that reverses (swaps) the meanings  of  the
                     sl=  and cx= capabilities when the -v command-line option
                     is specified.  The default is false (i.e., the capability
                     is omitted).

              mt=01;31
                     SGR substring for matching non-empty text in any matching
                     line (i.e., a selected  line  when  the  -v  command-line
                     option   is  omitted,  or  a  context  line  when  -v  is
                     specified).  Setting this is equivalent to  setting  both
                     ms=  and mc= at once to the same value.  The default is a
                     bold  red  text  foreground   over   the   current   line
                     background.

              ms=01;31
                     SGR  substring  for matching non-empty text in a selected
                     line.  (This is only used when the -v command-line option
                     is omitted.)  The effect  of  the  sl=  (or  cx=  if  rv)
                     capability  remains  active  when  this  kicks  in.   The
                     default is a bold red text foreground  over  the  current
                     line background.

              mc=01;31
                     SGR  substring  for  matching non-empty text in a context
                     line.  (This is only used when the -v command-line option
                     is specified.)  The effect of the  cx=  (or  sl=  if  rv)
                     capability  remains  active  when  this  kicks  in.   The
                     default is a bold red text foreground  over  the  current
                     line background.

              fn=35  SGR  substring for file names prefixing any content line.
                     The  default  is  a  magenta  text  foreground  over  the
                     terminal's default background.

              ln=32  SGR  substring  for  line  numbers  prefixing any content
                     line.  The default is a green text  foreground  over  the
                     terminal's default background.

              bn=32  SGR  substring  for  byte  offsets  prefixing any content
                     line.  The default is a green text  foreground  over  the
                     terminal's default background.

              se=36  SGR  substring  for  separators that are inserted between
                     selected line fields (:), between  context  line  fields,
                     (-),  and  between  groups of adjacent lines when nonzero
                     context is specified (--).  The default is  a  cyan  text
                     foreground over the terminal's default background.

              ne     Boolean  value  that prevents clearing to the end of line
                     using Erase in Line (EL) to Right  (\33[K)  each  time  a
                     colorized  item  ends.   This  is  needed on terminals on
                     which EL is not supported.  It  is  otherwise  useful  on
                     terminals  for  which  the back_color_erase (bce) boolean
                     terminfo capability  does  not  apply,  when  the  chosen
                     highlight colors do not affect the background, or when EL
                     is  too  slow or causes too much flicker.  The default is
                     false (i.e., the capability is omitted).

              Note that boolean capabilities have  no  =...  part.   They  are
              omitted (i.e., false) by default and become true when specified.

              See   the   Select   Graphic  Rendition  (SGR)  section  in  the
              documentation of the text terminal that is  used  for  permitted
              values   and  their  meaning  as  character  attributes.   These
              substring values are integers in decimal representation and  can
              be  concatenated with semicolons.  grep takes care of assembling
              the result into a  complete  SGR  sequence  (\33[...m).   Common
              values to concatenate include 1 for bold, 4 for underline, 5 for
              blink,  7 for inverse, 39 for default foreground color, 30 to 37
              for foreground colors, 90 to 97  for  16-color  mode  foreground
              colors,  38;5;0  to  38;5;255  for  88-color and 256-color modes
              foreground colors, 49 for default background color, 40 to 47 for
              background colors, 100  to  107  for  16-color  mode  background
              colors,  and 48;5;0 to 48;5;255 for 88-color and 256-color modes
              background colors.

       LC_ALL, LC_COLLATE, LANG
              These variables specify the locale for the LC_COLLATE  category,
              which  determines the collating sequence used to interpret range
              expressions like [a-z].

       LC_ALL, LC_CTYPE, LANG
              These variables specify the locale for  the  LC_CTYPE  category,
              which  determines the type of characters, e.g., which characters
              are whitespace.  This category  also  determines  the  character
              encoding,  that  is, whether text is encoded in UTF-8, ASCII, or
              some other encoding.  In the C or POSIX locale,  all  characters
              are  encoded  as  a  single  byte  and  every  byte  is  a valid
              character.

       LC_ALL, LC_MESSAGES, LANG
              These variables specify the locale for the LC_MESSAGES category,
              which determines the language that grep uses for messages.   The
              default C locale uses American English messages.

       POSIXLY_CORRECT
              If  set, grep behaves as POSIX requires; otherwise, grep behaves
              more like other GNU programs.  POSIX requires that options  that
              follow  file  names  must  be treated as file names; by default,
              such options are permuted to the front of the operand  list  and
              are  treated as options.  Also, POSIX requires that unrecognized
              options be diagnosed as “illegal”, but since they are not really
              against the law the default is to diagnose them as “invalid”.

NOTES
       This man page is maintained only fitfully; the  full  documentation  is
       often more up-to-date.

COPYRIGHT
       Copyright 1998-2000, 2002, 2005-2023 Free Software Foundation, Inc.

       This is free software; see the source for copying conditions.  There is
       NO  warranty;  not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.

BUGS
   Reporting Bugs
       Email bug reports to the bug-reporting address ⟨[email protected]⟩.   An
       email  archive  ⟨https://lists.gnu.org/mailman/listinfo/bug-grep⟩ and a
       bug  tracker   ⟨https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep⟩
       are available.

   Known Bugs
       Large  repetition  counts  in the {n,m} construct may cause grep to use
       lots of memory.  In addition, certain other obscure regular expressions
       require exponential time and space, and may cause grep to  run  out  of
       memory.

       Back-references are very slow, and may require exponential time.

EXAMPLE
       The  following  example  outputs  the location and contents of any line
       containing “f” and ending in “.c”, within all files in the current  di‐
       rectory whose names contain “g” and end in “.h”.  The -n option outputs
       line  numbers,  the  --  argument treats expansions of “*g*.h” starting
       with “-” as file names not options, and the empty file /dev/null causes
       file names to be output even if only one file name happens to be of the
       form “*g*.h”.

         $ grep -n -- 'f.*\.c$' *g*.h /dev/null
         argmatch.h:1:/* definitions and prototypes for argmatch.c

       The only line that matches is line 1 of argmatch.h.  Note that the reg‐
       ular expression syntax used in the pattern differs  from  the  globbing
       syntax that the shell uses to match file names.

SEE ALSO
   Regular Manual Pages
       awk(1),  cmp(1),  diff(1), find(1), perl(1), sed(1), sort(1), xargs(1),
       read(2),  pcre2(3),   pcre2syntax(3),   pcre2pattern(3),   terminfo(5),
       glob(7), regex(7)

   Full Documentation
       A complete manual ⟨https://www.gnu.org/software/grep/manual/⟩ is avail‐
       able.   If  the  info  and grep programs are properly installed at your
       site, the command

              info grep

       should give you access to the complete manual.

GNU grep 3.11                     2019-12-29                           GREP(1)
本文 更新于: 2026-03-06 09:52:31 创建于: 2026-03-06 09:52:31