php - Issues with ungreedy match -


in php, i'm matching text here http://pastebin.com/pfjegqpd following regex:

preg_match('#(.*(?s))(particella |particelle |p\.|part\.|p |part |mappale |mapp\.|mapp |n\.|\*) *(\d+[\d /\p{pd}]*)($|.{0,20}(?s)(graffati|particella |particelle |p\.|.*part\.|p |part |mappale |mapp\.|mapp |n\.|subalterno |subalterni |sub\.|s\.|sub |s |\bcat\b|\bcategoria\b|\brendita\b|\bvani\b|\bconsistenza\b|\br\.c\.\b))#i', $txt, $matches, preg_offset_capture, $offset) 

with $offset = 944 , i'm getting following output in $matches.

i expected match 1184 matches 4 instead. tried (?su) no luck.

$matches = array(6) {   [0]=>   array(2) {     [0]=>     string(59) "* 1184 sub.702, vioolo san vincenzo n.4, piano t, categoria"     [1]=>     int(1226)   }   [1]=>   array(2) {     [0]=>     string(36) "* 1184 sub.702, vioolo san vincenzo "     [1]=>     int(1226)   }   [2]=>   array(2) {     [0]=>     string(2) "n."     [1]=>     int(1262)   }   [3]=>   array(2) {     [0]=>     string(1) "4"     [1]=>     int(1264)   }   [4]=>   array(2) {     [0]=>     string(20) ", piano t, categoria"     [1]=>     int(1265)   }   [5]=>   array(2) {     [0]=>     string(9) "categoria"     [1]=>     int(1276)   } } $offset = int(944) 

turning comment answer: point there greedy subpatterns in pattern: .* , {0,20}. should turned lazy subpatterns since otherwise, captured texts hold 1 symbol (left greedy subpattern "gobbles" as can , not let group next capture more 1 symbol since require @ least 1 symbol).

see ideone demo, use

$re = '~(.*?(?s))(particella |particelle |p\.|part\.|p |part |mappale |mapp\.|mapp |n\.|\*) *(\d+[\d /\p{pd}]*)($|.{0,20}?(?s)(graffati|particella |particelle |p\\.|.*part\\.|p |part |mappale |mapp\.|mapp |n\.|subalterno |subalterni |sub\.|s\.|sub |s |\bcat\b|\bcategoria\b|\brendita\b|\bvani\b|\bconsistenza\b|\br\.c\.\b))~';  

since pattern fragile optimized bit , replace \s everywhere since intent match whitespace in places:

(?s)(.*?)(particell[ea]\s+|p(?:art)?[.\s]+|mapp(?:(?:ale)?\s+|\.)|n\.|\*)\s*(\d+[\d\s/\p{pd}]*)($|.{0,20}?(graffati|particell[ae]\s+|p(?:art)?[.\s]+|mapp(?:(?:ale)?\s+|\.)|n\.|subaltern[oi]\s+|s(?:ub)?[.\s]+|\bcat(?:egoria)?\b|\brendita\b|\bvani\b|\bconsistenza\b|\br\.c\.\b)) 

see regex demo , ideone demo.


Comments

Popular posts from this blog

Hatching array of circles in AutoCAD using c# -

ios - UITEXTFIELD InputView Uipicker not working in swift -

Python Pig Latin Translator -