What are `git diff --word-diff' default regexps?

时间:2015-05-24 21:12:49

标签: git git-diff word-diff

git diff has option --word-diff-regex=<...> that matches words. There are special default values for some languages (as said in man 5 gitattributes). But what are these? No description in docs, I looked up sources of git, haven't found them too.

Any ideas?

EDIT: I'm on git 1.9.1, but I'll accept answers for any version.

2 个答案:

答案 0 :(得分:3)

The sources contain the default word regexes in the userdiff.c file. The PATTERNS and IPATTERN macros take the base word regex as their third parameter, and add "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" to make sure all non-whitespace characters that aren't part of a larger word are treated as a word by themselves, and assuming UTF-8, without splitting up multi-byte characters. For example, in:

PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
         "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+"),

the word regex is "\\\\[a-zA-Z@]+|\\\\.|[a-zA-Z0-9\x80-\xff]+|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+".

In this case, the |[\xc0-\xff][\x80-\xbf]+ happens not to have any benefit, as everything covered by [\xc0-\xff][\x80-\xbf]+ is already covered by [a-zA-Z0-9\x80-\xff]+, but it doesn't cause any harm either.

答案 1 :(得分:1)

docs for .gitattributes中给出了预定义差异驱动程序列表(它们都具有预定义的单词差异正则表达式)。进一步说明

  

您仍需要使用属性机制通过.gitattributes

启用此功能

因此,要激活hvd所有tex文件的答案中显示的*.tex模式,您可以在项目根目录中发出以下命令(省略Windows下的引号):

echo '*.tex diff=tex' >> .gitattributes