Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMILAR TO does not return result when invalid pattern is used (with two adjacent special characters that should be escaped but aren't) [CORE5931] #6188

Closed
firebird-automations opened this issue Oct 1, 2018 · 9 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Attachments:
difference-on-patterns-that-leads-to-POOR-SIMILAR-TO.png
firebird_1820.20181001_012003.stack_trc.txt.7z
firebird_1820.20181001_012034.stack_trc.txt.7z

(yes, i know about tickets CORE3858, CORE4893 & CORE5664 that also relate to SIMILAR TO performance; perhaps this new sample can help to solve them so i decided to create separate ticket).

Consider following command:

set count on;
set list on;
select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13,
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]])+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

It will issue result instantly:

CONSTANT 1

Records affected: 1

Now let's change pattern by adding characters "|" and "_" (PIPE and UNDERSCORE) after 1st ALNUM, but with missed escaping (and this means that pattern will be INVALID):

select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]]|_)+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

This statement will HANGS on 2.5, 3.0 and 4.0.
Difference between patterns - see in attached .pgn
Two stack traces (for WI-V3.0.4.33054 ) are also attached.

NB: This trouble with SIMILAR TO seems to be CRITICAL: no new attachment can be established while engine "does something inside itself" while working with parsing such statement!

Commits: 28e1874

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

Attachment: difference-on-patterns-that-leads-to-POOR-SIMILAR-TO.png [ 13304 ]

Attachment: firebird_1820.20181001_012003.stack_trc.txt.7z [ 13305 ]

Attachment: firebird_1820.20181001_012034.stack_trc.txt.7z [ 13306 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

PPS.

Links for two full memory dumps (relates to attached stack traces for WI-V3.0.4.33054):

https://drive.google.com/open?id=1zGWHp0Eakl1pL4SlULqOQk6UvZyhsSV6
https://drive.google.com/open?id=1cs1Ny6tUAjpZW7YncADbL_xM07RHSPt-

And this is link to FB build WI-V3.0.4.33054 with corresponding PDB files:

https://drive.google.com/open?id=1vI-hOn9-OzB33xkfJuKoc49G6zIFvrmm

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

description: (yes, i know about tickets CORE3858, CORE4893 & CORE5664 that also relate to SIMILAR TO performance; perhaps this new sample can help to solve them so i decided to create separate ticket).

Consider following command:

set count on;
set list on;
select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13,
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]])+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

It will issue result instantly:

CONSTANT 1

Records affected: 1

Now let change pattern by adding characters "|" and "_" (PIPE and UNDERSCORE) after 1st ALNUM, but with missed escaping (and this means that pattern will be INVALID):

select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]]|_)+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

This statement will HANGS on 2.5, 3.0 and 4.0.
Difference between patterns - see in attached .pgn
Two stack traces (for WI-V3.0.4.33054 ) are also attached.

NB: This trouble with SIMILAR TO seems to be CRITICAL: no new attachment can be established while engine "does something inside itself" while working with parsing such statement!

=>

(yes, i know about tickets CORE3858, CORE4893 & CORE5664 that also relate to SIMILAR TO performance; perhaps this new sample can help to solve them so i decided to create separate ticket).

Consider following command:

set count on;
set list on;
select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13,
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]])+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

It will issue result instantly:

CONSTANT 1

Records affected: 1

Now let's change pattern by adding characters "|" and "_" (PIPE and UNDERSCORE) after 1st ALNUM, but with missed escaping (and this means that pattern will be INVALID):

select 1 from rdb$database where
'
group by
f01,
f02,
f03,
f04,
f05,
f06,
f07,
f08,
f09,
f10,
f11,
f12,
f13
'
similar to
'%group[[:WHITESPACE:]]+by[[:WHITESPACE:]]+([[:ALNUM:]]|_)+([[:WHITESPACE:]]*,[[:WHITESPACE:]]*[[:ALNUM:]]+){12,}[[:WHITESPACE:]]*%'
;

This statement will HANGS on 2.5, 3.0 and 4.0.
Difference between patterns - see in attached .pgn
Two stack traces (for WI-V3.0.4.33054 ) are also attached.

NB: This trouble with SIMILAR TO seems to be CRITICAL: no new attachment can be established while engine "does something inside itself" while working with parsing such statement!

@firebird-automations
Copy link
Collaborator Author

Commented by: @mrotteveel

That is not an invalid pattern (or at least: it is a syntactically correct pattern).

That this syntactically correct pattern then leads to a undecidable pattern (where the regex engine is probably stuck backtracking) is in itself an undecidable problem (in this case you probably can, but for more complex patterns, you can't).

@firebird-automations
Copy link
Collaborator Author

Commented by: @mrotteveel

Specifically ([[:ALNUM:]]|_)+

Breaks down to:

<regular expression> = <regular term> = <regular factor> = ([[:ALNUM:]]|_)+ (ignoring quantifier in next steps)
<regular primary> = ([[:ALNUM:]]|_)
<regular expression> = [[:ALNUM:]]|_ (splits up in a|b)
a. <regular term> = <regular factor> = <regular primary> <character class> = [[:ALNUM:]]
a. <member> = <predefined class> = [:ALNUM:]
b. <regular term> = <regular factor> = <regular primary> <character class> = _

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 2 [ 10888 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done successfully

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment