regrex补充

零宽断言（lookahead 和 lookbehind）、负向断言、后向引用以及平衡组是正则表达式中的高级特性，能够处理更复杂的匹配需求。以下是它们的功能和在 Python 中的具体应用。

1. 零宽断言

正向零宽断言（Positive Lookahead）

匹配某个位置后面紧跟着特定内容，但不会把这个内容包括在结果中。

语法：

1	`(?=pattern)`

示例：

提取所有后面跟着数字的单词：

import re

text = "apple 123, banana456, cherry 789."
# 匹配后面紧跟数字的单词
pattern = r'\b\w+(?=\d)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['banana']

负向零宽断言（Negative Lookahead）

匹配某个位置后面不跟特定内容。

语法：

1	`(?!pattern)`

由感叹号都知道这是表示非

示例：

提取所有后面不跟数字的单词：

import re

text = "apple 123, banana456, cherry 789."
# 匹配后面不跟数字的单词
pattern = r'\b\w+(?!\d)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['apple', 'cherry']

2. 后向零宽断言

正向后向断言（Positive Lookbehind）

匹配某个位置前面紧跟着特定内容。

语法：

1	`(?<=pattern)`

示例：

判断条件是 "@" ：

import re

text = "Contact us at [email protected] or [email protected]."
# 匹配 "@" 后的部分
pattern = r'(?<=@)\w+'
matches = re.findall(pattern, text)
print(matches)  # 输出：['example', 'company']

负向后向断言（Negative Lookbehind）

匹配某个位置前面不紧跟特定内容。

语法：

1	`(?<!pattern)`

示例：

import re

text = "Contact us at [email protected] or [email protected]."
# 匹配前面不是 @ 的单词
pattern = r'(?<!@)\b\w+'
matches = re.findall(pattern, text)
print(matches)  # 输出：['Contact', 'us', 'at', 'support', 'or', 'sales']

3. 后向引用

后向引用是指在正则表达式中引用之前捕获的内容，可以用 \1, \2, 等等表示。

示例：

匹配文本中的重复单词：

import re

text = "I saw a dog and a cat, but I didn't see the dog dog."
# 匹配重复单词
pattern = r'\b(\w+)\b\s+\1\b'
matches = re.findall(pattern, text)
print(matches)  # 输出：['dog']

4. 平衡组

平衡组主要用于匹配嵌套结构（例如括号对等结构）。Python 的标准库 re 不直接支持平衡组操作，但可以通过复杂的逻辑来实现。例如，使用 pyparsing 或 regex 模块可以更好地处理。

示例：使用 regex 匹配嵌套括号：

import regex

text = "(a(b(c)d)e)f(g(h)i)"
# 匹配嵌套括号
pattern = r'\((?>[^\(\)]+|(?R))*\)'
matches = regex.findall(pattern, text)
print(matches)  # 输出：['(a(b(c)d)e)', '(b(c)d)', '(c)', '(g(h)i)', '(h)']

解释：

(?R) 是递归调用整个正则表达式本身。
(?>...) 是原子组，确保括号内的内容是完整匹配。

组合练习：解析复杂数据

任务：

提取所有括号中的内容，但要求括号内不含特定关键词（如 "skip"）。

示例代码：

import re

text = "Match (this), (skip this), and (not skip)."
# 匹配括号中的内容，但不包含 "skip"
pattern = r'\((?!.*skip).*?\)'
matches = re.findall(pattern, text)
print(matches)  # 输出：['(this)', '(not skip)']

任务：验证嵌套括号结构是否匹配

例如：

输入 "(a(b)c)d" 是有效的。
输入 "a(b(c)d" 是无效的。

示例代码：

import regex

def is_valid_parentheses(s):
    # 匹配嵌套括号
    pattern = r'^\((?>[^\(\)]+|(?R))*\)$'
    return bool(regex.match(pattern, s))

print(is_valid_parentheses("(a(b)c)d"))  # False
print(is_valid_parentheses("(a(b)c)d)"))  # False
print(is_valid_parentheses("(a(b)c)"))    # True

总结

零宽断言：用于检查前后内容是否满足条件，但不消耗匹配的内容。
后向引用：引用前面的捕获内容，特别适合处理重复和结构化数据。
平衡组：用于复杂嵌套结构匹配（需借助 regex 模块）。

通过练习这些场景，你可以熟悉它们的强大功能，并灵活应用到数据处理、文件操作和复杂匹配任务中！

#regrex

regrex补充

http://example.com/2025/02/04/regrex补充/

作者

JunBin Liang

发布于

2025年2月4日

许可协议

alist配置上一篇

正则表达式练习下一篇