Ticket UUID: | 578363 | |||
Title: | [:xdigit:] makes RE to behave strange | |||
Type: | Bug | Version: | None | |
Submitter: | pvgoran | Created on: | 2002-07-07 14:31:56 | |
Subsystem: | 43. Regexp | Assigned To: | dkf | |
Priority: | 6 | Severity: | ||
Status: | Closed | Last Modified: | 2002-07-29 17:56:50 | |
Resolution: | Fixed | Closed By: | dkf | |
Closed on: | 2002-07-29 10:56:50 | |||
Description: |
Tcl Version: 8.4a3 Platform: Windows Code sample: set str {2:::DebugWin32} set re {([[:xdigit:]])([[:space:]]*)} puts "[regexp $re $str match xdigit spaces]" puts "match=$match" puts "xdigit=$xdigit" puts "spaces=$spaces" This gives: 1 match=2:::DebugWin32 xdigit=2 spaces=:::DebugWin32 Observed behaviour: "spaces=:::DebugWin32" Desired behaviour: "spaces=" Comment: It looks like the [[:xdigit:]] bracket expression causes the [[:space:]] bracket expression to match any symbol. If [[:xdigit:]] is replaced, for example, by [[:digit:]], or [[:space:]] is replaced by [[:alpha:]], all is going right. (Initially, I noticed this problem with \s instead of [[:space:]].) | |||
User Comments: |
dkf added on 2002-07-29 17:44:56:
Logged In: YES user_id=79902 Reviewing your second pair of patches, I've decided to go instead with specifying the number of ranges as 3 because hex-digits are understood to only be done using the standard western digit characters (plus the six alphas in both cases, of course.) Unless there's a good reason for matching the number characters used in other alphabet systems, but then there'll also be a need for a locale-specific version of 'A-F', yes? :^) dkf added on 2002-07-29 17:06:12: Logged In: YES user_id=79902 You've found the fault in the RE engine? I'm impressed; that code is non-trivial. Do you want to become a maintainer of this section? (For future reference, single patches rooted at the top of the CVS tree are easiest to work with by far.) I'll now be able to have a look at fixing this problem (with my general wherever-its-needed maintainer hat on.) pvgoran added on 2002-07-28 23:45:01: File Added - 27909: regc_locale.c.diff-2 pvgoran added on 2002-07-28 23:43:39: File Added - 27908: regc_locale.c.diff-1 pvgoran added on 2002-07-28 23:42:09: File Added - 27907: regc_cvec.c.diff Logged In: YES user_id=383758 Yes, I definitely had to attach the files, since inserting them into the comment text give very strange formatting. Is this a bug in SourceForge.net software, or it is caused by my Opera browser? :) pvgoran added on 2002-07-28 23:31:28: Logged In: YES user_id=383758 This bug is caused by the error in generic/regc_cvec.c. Patch for: File "generic/regc_cvec.c", Branch "MAIN", Revision 1.4 --- regc_cvec.cSun Jul 28 22:34:17 2002 +++ regc_cvec.c.newSun Jul 28 23:15:34 2002 @@ -50,7 +50,7 @@ cv = (struct cvec *)MALLOC(n); if (cv == NULL) return NULL; -cv->chrspace = nc; +cv->chrspace = nchrs; cv->chrs = (chr *)&cv->mcces[nmcces];/* chrs just after MCCE ptrs */ cv->mccespace = nmcces; cv->ranges = cv- >chrs + nchrs + nmcces*(MAXMCCE+1); It's strange that such a serious error was not noticed yet. I also found the inconsistency in generic/regc_locale.c. The existing code should work without problems, but it is not correct. It can be fixed in two ways. The first one: Patch for: File "generic/regc_locale.c", Branch "MAIN", Revision 1.8 --- regc_locale.cSun Jul 28 22:33:28 2002 +++ regc_locale.c.new-1Sun Jul 28 22:38:02 2002 @@ -842,7 +842,10 @@ case CC_XDIGIT: cv = getcvec(v, 0, NUM_DIGIT_RANGE+2, 0); if (cv) { -addrange(cv, '0', '9'); +for (i = 0; i < NUM_DIGIT_RANGE; i++) { + addrange(cv, digitRangeTable[i].start, + digitRangeTable[i].end); +} addrange(cv, 'a', 'f'); addrange(cv, 'A', 'F'); } The second one: Patch for: File "generic/regc_locale.c", Branch "MAIN", Revision 1.8 --- regc_locale.cSun Jul 28 22:33:28 2002 +++ regc_locale.c.new- 2Sun Jul 28 23:15:03 2002 @@ -840,7 +840,7 @@ } break; case CC_XDIGIT: - cv = getcvec(v, 0, NUM_DIGIT_RANGE+2, 0); + cv = getcvec(v, 0, 3, 0); if (cv) { addrange(cv, '0', '9'); addrange(cv, 'a', 'f'); The first way is IMO preferrable. P.S. Maybe it was better to attach three diff files, instead of inserting them in the text? dkf added on 2002-07-08 16:49:59: Logged In: YES user_id=79902 Strange indeed! If only the RE engine was less cryptic... Suggested workaround: replace [[:xdigit:]] with [0-9a-fA-F] which works. |