递归boost :: xpressive使用太多内存

时间:2018-06-01 01:40:46

标签: c++ regex boost boost-regex boost-xpressive

嗨boost :: xpressive用户,

我尝试使用boost :: xpressive解析某些决策树时出现堆栈溢出错误。它似乎适用于一定大小的树木,但是在大的树木上失败了。树木,其中'大'似乎意味着大约3000个节点,并且具有gdb的堆栈变为133979帧深。我认为我需要以某种方式优化正则表达式,但在任何地方都有*。所以我不确定从哪里开始。

#include <boost/regex.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>

using namespace boost::xpressive;
using namespace regex_constants;


sregex integral_number;
sregex floating_point_number;

sregex bid;
sregex ask;
sregex side;
sregex value_on_market_limit_ratio_gt;
sregex value_on_market_delta_ratio_gt;

sregex stdevs_from_mean_auction_time_gt;
sregex no_orders_on_opposite_side;
sregex is_pushing_price;
sregex is_desired;

sregex predicate, leaf, branch, tree;

integral_number = sregex_compiler().compile("[-+]?[0-9]+");
floating_point_number = sregex_compiler().compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?");
stdevs_from_mean_auction_time_gt = "StdevsFromMeanAuctionTimeGT(" >> floating_point_number >> ")";
side = sregex_compiler().compile("def::BID|def::ASK");
value_on_market_limit_ratio_gt = "ValueOnMarketLimitRatioGT<" >> side >> ">(" >> floating_point_number >> ")";
value_on_market_delta_ratio_gt = "ValueOnMarketDeltaRatioGT(" >> floating_point_number >> ")";
no_orders_on_opposite_side = sregex_compiler().compile("NoOrdersOnOppositeSide");
is_pushing_price = sregex_compiler().compile("IsPushingPrice");
is_desired = sregex_compiler().compile("IsDesired");
predicate = value_on_market_limit_ratio_gt | value_on_market_delta_ratio_gt | stdevs_from_mean_auction_time_gt | no_orders_on_opposite_side | is_pushing_price | is_desired;
leaf = sregex_compiler().compile("SEARCH_TO_MAX|AMEND_TO_AVAILABLE|AMEND_TO_AVAILABLE_MINUS_RECENT_ORDER_SIZE|AMEND_TO_CURRENT_MINUS_RECENT_ORDER_SIZE|SEARCH_BY_RECENT_ORDER_SIZE|PULL|DO_NOTHING");
branch = "Branch(" >> predicate >> "," >> by_ref(tree) >> "," >> by_ref(tree) >> ")";
tree = leaf | branch;

smatch what;
regex_match(s, what, tree)

这里,s未定义,因为它是一个75000个字符的字符串,不适合该问题。如何修改这些表达式以使匹配在更小的空间内执行?

1 个答案:

答案 0 :(得分:4)

我找到了解决这个问题的方法,将分支的定义更改为

branch = "Branch(" >> keep(predicate) >> "," >> keep(by_ref(tree)) >> "," >> keep(by_ref(tree)) >> ")";

为了限制回溯,从而限制内存使用。