Tcl Source Code

View Ticket
Login
Ticket UUID: 1810264
Title: O(N^2) compile time for some regexps
Type: Bug Version: obsolete: 8.4.19
Submitter: gwad Created on: 2007-10-09 16:18:22
Subsystem: 43. Regexp Assigned To: dkf
Priority: 8 Severity:
Status: Closed Last Modified: 2008-05-27 05:12:58
Resolution: Fixed Closed By: hobbs
    Closed on: 2008-05-26 22:12:58
Description:
Hi Tcl team,


Recently a colleague (Tavis Ormandy) and I uncovered some bugs in Postgresql's regular expression engine.  It turns out that they're using code derived from Tcl.  With Tom Lane's help, we have some tentative patches, but first the bugs :-)


The out of band read in the UCS-4-enabled build -- [CVE-2007-4769]
(the numeric value must be > INT_MAX and <= UINT_MAX)

  select text (E'\\' || '3161573148') ~ (E'\\' || '3161573148');

  regexp {\3161573148} {\3161573148};

(The code location applies to postgres
Backreferences over INT_MAX slip through the check for a single-digit due to
a signedness error in regcomp.c's lexescape() near line 948.  v->numsubexp
is a signed integer and c is unsigned.  The cast to a signed int avoids
warnings but means that any c > INT_MAX will pass the test against the
default nsubexp value of 10.

This mistake allows indexing into v->subs in regcomp.c's parsebranch() near
line 948 to be controlled.  However, this is just a NULL test.  If the value
given indexes out of valid memory, a segmentation fault will occur and the
child process will die.


An infinite loop -- [CVE-2007-4772]\
  regexp {($|^)*} {x}

Tom Lane provided an analysis:
"""
 The problem seems to occur when
an NFA state has a regular ^-constraint as an outgoing arc, plus both
^-constraint and $-constraint outgoing arcs that loop back to itself.
The latter two should be destroyed as useless, but the code didn't get
rid of the $-constraint soon enough and instead transferred it to the
new state that was generated as part of pushing the regular ^-constraint
back to this state's predecessor states.  This then re-created a
situation that needed another iteration of pullback().

The proposed fix deals with this by changing the order of operations:
useless arcs are destroyed in a separate pass before we do the pull()
calls.

I made a similar change in pushfwd(), which has effectively identical
logic.  This part of the patch is not needed to fix the particular
test cases we have, and I'm not certain it can be a problem once
pullback() has gotten rid of ^-constraints.  But it seems like a good
idea to keep the logic in sync, and maybe there are cases where it's
necessary for pushfwd() to protect itself.

The patches pass the Postgres and Tcl regression tests, but I'd sure
like to have someone who understands this regex code look at them.
In particular I'm not entirely sure whether it's OK to flush *all*
non-PLAIN circular arcs.
"""


An expensive regular expression:

regexp {(x{200}){200}$y} {x}


Again with excellent analysis from Tom Lane:
"""
This is not as slow as your original (about 2m rather than 15m on
my machine), but it illustrates the basic issue: the doubly nested
iteration generates 200*200 = 40000 NFA states, all of which are
later found to be dead because the trailing "$y" cannot possibly
match anything, and then when we try to clean them up we spend
O(N^2) time at it because they're all on the same colorchain.

AFAICS the only use of the colorchain data structure is in okcolors(),
which could probably be rewritten to do one iteration over all the NFA's
states/arcs.  So I'm tempted to propose just getting rid of the
colorchain lists.  But I'd *really* want to talk to someone who knows
this code better, first.  There might be cases where the colorchains
are essential for performance.
"""


It'd be great if we could chat about this more, but I wasn't sure the best place to email.  I know that Tom Lane would appreciate any additional insight before patching postgresql.  Tom currently has a few patches explicitly for postgres, but they appear that they'll be pretty portable.  Here are the relevant addresses:

Will Drewry <[email protected]> 
Tavis Ormandy <[email protected]>
Tom Lane <[email protected]>
Postgresql Security <[email protected]>


Thanks!
will
User Comments: hobbs added on 2008-05-27 05:12:58:
Logged In: YES 
user_id=72656
Originator: NO

reduced to use {100}{100} for 8.4

hobbs added on 2008-05-27 03:53:11:
Logged In: YES 
user_id=72656
Originator: NO

reg-33.14's commit to 8.4 segfaults.  Should the test be removed, or the core corrected?

dkf added on 2007-12-18 18:25:33:
Logged In: YES 
user_id=79902
Originator: NO

Backported to 8.4 branch.

An enormous number of thanks to all three of you, especially for putting up with us, the issue tracker we use, and for producing the patches. We now explicitly check each of those cases to ensure that they do not cause problems.

dkf added on 2007-12-18 17:54:26:
Logged In: YES 
user_id=79902
Originator: NO

Those nail it. Excellent! HEAD fixed; will backport.

gwad added on 2007-12-18 03:17:09:

File Added - 258928: tcl8.5b3-state-limit.patch

Logged In: YES 
user_id=1591064
Originator: YES

Tom and I have both created different patches for the expensive expression.  Tom's patch modifies the color chains to be doubly linked making the clean-up time O(N) instead of O(N^2).  My patch changes the actual behavior -- it uses a compile-time-specified (REG_MAX_STATES) maximum number of total NFA states (at a given time) and enforces this during nfa construction.

Also, it would be greatly appreciated if this bug could be made private (again) for a couple of weeks.  That's when more distros/etc will be ready to patch their users.

My patch against tcl8.5b3 is attached, and Tom's patch is here:

Index: src/include/regex/regguts.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/regex/regguts.h,v
retrieving revision 1.5
diff -c -r1.5 regguts.h
*** src/include/regex/regguts.h 15 Oct 2005 02:49:46 -0000      1.5
--- src/include/regex/regguts.h 15 Dec 2007 03:56:44 -0000
***************
*** 272,277 ****
--- 272,278 ----
 #define  freechain     outchain
       struct arc *inchain;            /* *to's ins chain */
       struct arc *colorchain;         /* color's arc chain */
+       struct arc *colorchain_rev;     /* back-link in color's arc chain */
 };

 struct arcbatch
Index: src/backend/regex/regc_color.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/regex/regc_color.c,v
retrieving revision 1.7
diff -c -r1.7 regc_color.c
*** src/backend/regex/regc_color.c      15 Nov 2007 21:14:37 -0000      1.7
--- src/backend/regex/regc_color.c      15 Dec 2007 03:56:43 -0000
***************
*** 569,580 ****
                       while ((a = cd->arcs) != NULL)
                       {
                               assert(a->co == co);
!                               /* uncolorchain(cm, a); */
!                               cd->arcs = a->colorchain;
                               a->co = sco;
!                               /* colorchain(cm, a); */
!                               a->colorchain = scd->arcs;
!                               scd->arcs = a;
                       }
                       freecolor(cm, co);
               }
--- 569,577 ----
                       while ((a = cd->arcs) != NULL)
                       {
                               assert(a->co == co);
!                               uncolorchain(cm, a);
                               a->co = sco;
!                               colorchain(cm, a);
                       }
                       freecolor(cm, co);
               }
***************
*** 604,610 ****
--- 601,610 ----
 {
       struct colordesc *cd = &cm->cd[a->co];

+       if (cd->arcs)
+               cd->arcs->colorchain_rev = a;
       a->colorchain = cd->arcs;
+       a->colorchain_rev = NULL;
       cd->arcs = a;
 }

***************
*** 616,634 ****
                        struct arc * a)
 {
       struct colordesc *cd = &cm->cd[a->co];
!       struct arc *aa;

!       aa = cd->arcs;
!       if (aa == a)                            /* easy case */
               cd->arcs = a->colorchain;
       else
       {
!               for (; aa != NULL && aa->colorchain != a; aa = aa->colorchain)
!                       continue;
!               assert(aa != NULL);
               aa->colorchain = a->colorchain;
       }
       a->colorchain = NULL;           /* paranoia */
 }

 /*
--- 616,637 ----
                        struct arc * a)
 {
       struct colordesc *cd = &cm->cd[a->co];
!       struct arc *aa = a->colorchain_rev;

!       if (aa == NULL)
!       {
!               assert(cd->arcs == a);
               cd->arcs = a->colorchain;
+       }
       else
       {
!               assert(aa->colorchain == a);
               aa->colorchain = a->colorchain;
       }
+       if (a->colorchain)
+               a->colorchain->colorchain_rev = aa;
       a->colorchain = NULL;           /* paranoia */
+       a->colorchain_rev = NULL;
 }

 /*

Index: src/backend/regex/regc_nfa.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/regex/regc_nfa.c,v
retrieving revision 1.4
diff -c -r1.4 regc_nfa.c
*** src/backend/regex/regc_nfa.c        15 Oct 2005 02:49:24 -0000      1.4
--- src/backend/regex/regc_nfa.c        15 Dec 2007 03:56:43 -0000
***************
*** 633,638 ****
--- 633,640 ----
       for (a = s->outs; a != NULL && !NISERR(); a = a->outchain)
       {
               duptraverse(nfa, a->to, (struct state *) NULL);
+               if (NISERR())
+                       break;
               assert(a->to->tmp != NULL);
               cparc(nfa, a, s->tmp, a->to->tmp);
       }

*** regguts.h.orig      Sat Nov 26 21:34:41 2005
--- regguts.h   Fri Dec 14 23:07:05 2007
***************
*** 290,295 ****
--- 290,296 ----
 #     define  freechain       outchain
       struct arc *inchain;    /* *to's ins chain */
       struct arc *colorchain; /* color's arc chain */
+       struct arc *colorchain_rev;     /* back-link in color's arc chain */
 };

 struct arcbatch {             /* for bulk allocation of arcs */
*** regc_color.c.orig   Wed Oct 20 22:16:21 1999
--- regc_color.c        Fri Dec 14 23:07:22 2007
***************
*** 552,563 ****
                       scd->sub = NOSUB;
                       while ((a = cd->arcs) != NULL) {
                               assert(a->co == co);
!                               /* uncolorchain(cm, a); */
!                               cd->arcs = a->colorchain;
                               a->co = sco;
!                               /* colorchain(cm, a); */
!                               a->colorchain = scd->arcs;
!                               scd->arcs = a;
                       }
                       freecolor(cm, co);
               } else {
--- 552,560 ----
                       scd->sub = NOSUB;
                       while ((a = cd->arcs) != NULL) {
                               assert(a->co == co);
!                               uncolorchain(cm, a);
                               a->co = sco;
!                               colorchain(cm, a);
                       }
                       freecolor(cm, co);
               } else {
***************
*** 586,592 ****
--- 583,592 ----
 {
       struct colordesc *cd = &cm->cd[a->co];

+       if (cd->arcs)
+               cd->arcs->colorchain_rev = a;
       a->colorchain = cd->arcs;
+       a->colorchain_rev = NULL;
       cd->arcs = a;
 }

***************
*** 600,617 ****
 struct arc *a;
 {
       struct colordesc *cd = &cm->cd[a->co];
!       struct arc *aa;

!       aa = cd->arcs;
!       if (aa == a)            /* easy case */
               cd->arcs = a->colorchain;
!       else {
!               for (; aa != NULL && aa->colorchain != a; aa = aa->colorchain)
!                       continue;
!               assert(aa != NULL);
               aa->colorchain = a->colorchain;
       }
       a->colorchain = NULL;   /* paranoia */
 }

 /*
--- 600,618 ----
 struct arc *a;
 {
       struct colordesc *cd = &cm->cd[a->co];
!       struct arc *aa = a->colorchain_rev;

!       if (aa == NULL) {
!               assert(cd->arcs == a);
               cd->arcs = a->colorchain;
!       } else {
!               assert(aa->colorchain == a);
               aa->colorchain = a->colorchain;
       }
+       if (a->colorchain)
+               a->colorchain->colorchain_rev = aa;
       a->colorchain = NULL;   /* paranoia */
+       a->colorchain_rev = NULL;
 }

 /*

*** regc_nfa.c.orig     Wed Aug  4 21:16:57 1999
--- regc_nfa.c  Fri Dec 14 23:08:30 2007
***************
*** 651,656 ****
--- 651,658 ----

       for (a = s->outs; a != NULL && !NISERR(); a = a->outchain) {
               duptraverse(nfa, a->to, (struct state *)NULL);
+               if (NISERR())
+                       break;
               assert(a->to->tmp != NULL);
               cparc(nfa, a, s->tmp, a->to->tmp);
       }

File Added: tcl8.5b3-state-limit.patch

dkf added on 2007-12-02 03:49:23:
Logged In: YES 
user_id=79902
Originator: NO

Opening now that fixes for critical parts are done.

gwad added on 2007-11-27 00:55:40:
Logged In: YES 
user_id=1591064
Originator: YES

I don't seem to have access to bug 1810038, but I've pointed Tom to it and to the repository where it looks like some fixes have been submitted.

thanks!
will

dgp added on 2007-11-16 05:05:39:
Logged In: YES 
user_id=80530
Originator: NO

dgp added on 2007-11-16 05:03:08:
Logged In: YES 
user_id=80530
Originator: NO


My patches are attached to
Tcl Bug 1810038 if Tom
wants to compare two independently
derived solutions.

dgp added on 2007-11-16 05:01:53:
Logged In: YES 
user_id=80530
Originator: NO

First two (crash and hang)
are now fixed.  Last issue
is an optimization request.
Worth pursuing, but a lower
priority.

gwad added on 2007-11-16 04:15:57:
Logged In: YES 
user_id=1591064
Originator: YES

If you drop Tom an email, he has some patches to share/discuss.  His patches haven't appeared here because, in general, only I can access this "private" bug.  Please drop him an email - he's indicated that he's very happy to share them, but he wants to confirm some of his ideas first.

If this doesn't work for you, I can ask him for what he has. I was hoping that we'd have some follow up by email rather than via this limited tracker. (as is, I think this bug has been mirrored to your mailing lists/nabble which was more disclosure than expected :-/ )

Let me know if you'd like me to attach what he has here - it just makes it harder to discuss it.

cheers,
will

dgp added on 2007-11-16 04:07:41:
Logged In: YES 
user_id=80530
Originator: NO


Ok, Having dug deep here, I can
confirm Tom Lane's diagnosis.
It's a shame his alleged patches
have not appeared here.

dkf added on 2007-10-30 17:30:04:
Logged In: YES 
user_id=79902
Originator: NO

Backported. Issue is still prio 9; can only drop once the second part is also fixed and backported. (Third issue is bad, but not *that* bad...)

dgp added on 2007-10-30 05:20:30:
Logged In: YES 
user_id=80530
Originator: NO


Please backport the fix for 8.4.17 as well.

Thanks!

dgp added on 2007-10-30 04:24:23:
Logged In: YES 
user_id=80530
Originator: NO


first issue fixed;

second issue duplicate of 1810038?

And 3rd issue looks more like
a feature request than a bug?
Still important but better handled
in another ticket?

dkf added on 2007-10-27 21:00:03:
Logged In: YES 
user_id=79902
Originator: NO

No patches?

Fixed first issue though; I can understand that "locally". :-)

Attachments: