regrex补充

零宽断言（lookahead 和 lookbehind）、负向断言、后向引用以及平衡组是正则表达式中的高级特性，能够处理更复杂的匹配需求。以下是它们的功能和在 Python 中的具体应用。

1. 零宽断言
#

正向零宽断言（Positive Lookahead）
#

匹配某个位置后面紧跟着特定内容，但不会把这个内容包括在结果中。

语法：
#

(?=pattern)

示例：
#

提取所有后面跟着数字的单词：

import re

text = "apple 123, banana456, cherry 789."
# 匹配后面紧跟数字的单词
pattern = r'\b\w+(?=\d)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['banana']

负向零宽断言（Negative Lookahead）
#

匹配某个位置后面不跟特定内容。

语法：
#

(?!pattern)

由感叹号都知道这是表示非

示例：
#

提取所有后面不跟数字的单词：

import re

text = "apple 123, banana456, cherry 789."
# 匹配后面不跟数字的单词
pattern = r'\b\w+(?!\d)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['apple', 'cherry']

2. 后向零宽断言
#

正向后向断言（Positive Lookbehind）
#

匹配某个位置前面紧跟着特定内容。

语法：
#

(?<=pattern)

示例：
#

判断条件是 “@” ：

import re

text = "Contact us at [email protected] or [email protected]."
# 匹配 "@" 后的部分
pattern = r'(?<=@)\w+'
matches = re.findall(pattern, text)
print(matches)  # 输出：['example', 'company']

负向后向断言（Negative Lookbehind）
#

匹配某个位置前面不紧跟特定内容。

语法：
#

(?<!pattern)

示例：
#

import re

text = "Contact us at [email protected] or [email protected]."
# 匹配前面不是 @ 的单词
pattern = r'(?<!@)\b\w+'
matches = re.findall(pattern, text)
print(matches)  # 输出：['Contact', 'us', 'at', 'support', 'or', 'sales']

3. 后向引用
#

后向引用是指在正则表达式中引用之前捕获的内容，可以用 \1, \2, 等等表示。

示例：
#

匹配文本中的重复单词：

import re

text = "I saw a dog and a cat, but I didn't see the dog dog."
# 匹配重复单词
pattern = r'\b(\w+)\b\s+\1\b'
matches = re.findall(pattern, text)
print(matches)  # 输出：['dog']

4. 平衡组
#

平衡组主要用于匹配嵌套结构（例如括号对等结构）。Python 的标准库 re 不直接支持平衡组操作，但可以通过复杂的逻辑来实现。例如，使用 pyparsing 或 regex 模块可以更好地处理。

示例：使用 regex 匹配嵌套括号：

import regex

text = "(a(b(c)d)e)f(g(h)i)"
# 匹配嵌套括号
pattern = r'\((?>[^\(\)]+|(?R))*\)'
matches = regex.findall(pattern, text)
print(matches)  # 输出：['(a(b(c)d)e)', '(b(c)d)', '(c)', '(g(h)i)', '(h)']

解释：

(?R) 是递归调用整个正则表达式本身。
(?>...) 是原子组，确保括号内的内容是完整匹配。

组合练习：解析复杂数据
#

任务：
#

提取所有括号中的内容，但要求括号内不含特定关键词（如 “skip”）。

示例代码：
#

import re

text = "Match (this), (skip this), and (not skip)."
# 匹配括号中的内容，但不包含 "skip"
pattern = r'\((?!.*skip).*?\)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['(this)', '(not skip)']

任务：验证嵌套括号结构是否匹配
#

例如：

输入 "(a(b)c)d" 是有效的。
输入 "a(b(c)d" 是无效的。

示例代码：
#

import regex

def is_valid_parentheses(s):
    # 匹配嵌套括号
    pattern = r'^\((?>[^\(\)]+|(?R))*\)$'
    return bool(regex.match(pattern, s))

print(is_valid_parentheses("(a(b)c)d"))  # False
print(is_valid_parentheses("(a(b)c)d)"))  # False
print(is_valid_parentheses("(a(b)c)"))    # True

总结
#

零宽断言：用于检查前后内容是否满足条件，但不消耗匹配的内容。
后向引用：引用前面的捕获内容，特别适合处理重复和结构化数据。
平衡组：用于复杂嵌套结构匹配（需借助 regex 模块）。

通过练习这些场景，你可以熟悉它们的强大功能，并灵活应用到数据处理、文件操作和复杂匹配任务中！

正则表达式练习

February 4, 2025·1670 字·4 分钟· loading · loading

Regrex 正则表达式 Python 练习题文件操作实战应用

正则表达式30分钟入门

February 4, 2025·7121 字·15 分钟· loading · loading

Regrex 正则表达式入门教程元字符分组匹配模式

1. 零宽断言#

正向零宽断言（Positive Lookahead）#

语法：#

示例：#

负向零宽断言（Negative Lookahead）#

语法：#

示例：#

2. 后向零宽断言#

正向后向断言（Positive Lookbehind）#

语法：#

示例：#

负向后向断言（Negative Lookbehind）#

语法：#

示例：#

3. 后向引用#

示例：#

4. 平衡组#

组合练习：解析复杂数据#

任务：#

示例代码：#

任务：验证嵌套括号结构是否匹配#

示例代码：#

总结#

相关文章

1. 零宽断言
#

正向零宽断言（Positive Lookahead）
#

语法：
#

示例：
#

负向零宽断言（Negative Lookahead）
#

语法：
#

示例：
#

2. 后向零宽断言
#

正向后向断言（Positive Lookbehind）
#

语法：
#

示例：
#

负向后向断言（Negative Lookbehind）
#

语法：
#

示例：
#

3. 后向引用
#

示例：
#

4. 平衡组
#

组合练习：解析复杂数据
#

任务：
#

示例代码：
#

任务：验证嵌套括号结构是否匹配
#

示例代码：
#

总结
#