Tcl Source Code

View Ticket
Login
Ticket UUID: 219233
Title: string match oddity when - is the last char
Type: Bug Version: obsolete: 8.3
Submitter: nobody Created on: 2000-10-26 05:03:56
Subsystem: 18. Commands M-Z Assigned To: dkf
Priority: 7 High Severity: Minor
Status: Open Last Modified: 2018-05-07 22:03:22
Resolution: Remind Closed By: nobody
    Closed on:
Description:
OriginalBugID: 4438 Bug
Version: 8.3
SubmitDate: '2000-03-21'
LastModified: '2000-04-03'
Severity: MED
Status: Assigned
Submitter: techsupp
ChangedBy: hobbs
OS: Windows 95
OSVersion: OSR2
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'


Name:
Keith Lea

ReproducibleScript:
% string match {[a-z0-9_/-]} \\
1
% string match {[a-z0-9_/]} \\
0



It's accidently interp'ing the "/-]" as "/-]]", taking last
] as ] endrange and ] endblock. 
-- 04/03/2000 hobbs
This needs to be fixed in Tcl_String(Case)Match in tclUtil.c,
but wait until 8.4 just in case someone was counting on the
previous perverse behavior. 
-- 04/03/2000 hobbs
User Comments: sebres added on 2018-05-07 22:03:22:
If the behavior like below is acceptable for you, I can do a back-porting from my own branches (I assume it's relative easy):

  % string match {[.-B} A
  invalid match pattern: brackets [] not balanced

Otherwise, I would like to know how exactly it must be then "fixed" (after the possible refactoring)...

E.g. another variant of the "fix" can be:

  % string match {[.-B} A
  0
  % string match {[.-B} \[.-A
  0
  % string match {[.-B} \[.-B
  1

Quasi by unbalanced brackets the pattern {[.-B} will be equivalent to the {\[.-B}

Anyway, current implementation is definitely wrong:
  % string match {[.-B} A
  1
should result either to an error or to 0.

dgp added on 2018-05-07 16:21:29:
If someone wants to pursue a "refactor first, then fix" strategy,
I can accept that.

dgp added on 2018-05-07 16:19:56:

"Wait until 9.0" is no longer a blockage.

This should be fixed at least on the trunk, if not
on earlier branches too.

gneumann added on 2010-11-18 16:52:04:
i was hit by apparently the same problem.

% string match {[-.]} -
1
% string match {[.-]} -
0 
# even worse
% string match {[.-]} A
1 

This is certainly unexpected behavior and should be at least documented.

dkf added on 2007-06-14 21:50:12:
Logged In: YES 
user_id=79902
Originator: NO

Comments below indicate "wait until 9.0", this being a strategy that was mainly from Jeff.

matzek added on 2007-06-14 21:00:55:
Logged In: YES 
user_id=330806
Originator: NO

I forgot to mention that the example below is taken from a recent Tcl interpreter (8.4.14)...

matzek added on 2007-06-14 20:56:20:
Logged In: YES 
user_id=330806
Originator: NO

Hi!

I add another example to the list:

% string match {*[.-]*} "2.1-Beta"
0
% string match {*[-.]*} "2.1-Beta"
1
% string match {*[.-]*} "Beta-2.1"
1
% string match {*[-.]*} "Beta-2.1"
1

If this is not going to be fixed soon, I suggest to at least document this behavior. Could be enough to just add the hyphen to the list of characters that need to be escaped...

kind regards
-- Matthias Kraft

dkf added on 2003-11-24 23:17:10:
Logged In: YES 
user_id=79902

Backslashes aren't special inside the square-bracket term,
so the first match doesn't match what you are expecting and
the second match isn't looking for what you think it is:

% set str \\
\
% string match {[\[]} $str
1
% set str {\]}
\]
% string match {[\]]} $str
1

The glob-matching engine used by [string match] isn't very
smart...  :^(  Consider using regular expressions instead.

lupylucke added on 2003-11-23 15:20:03:
Logged In: YES 
user_id=915599

I encountered a problem with `string match' too. I suppouse
it is, finally, the same bug: it is possible to match a [ in
a character set quoting int with \. Thits should work with ]
too, but it doesn't!

% string match {[\[]} {[}
1
% string match {[\]]} {]}
0

dkf added on 2001-03-17 03:57:56:
Logged In: YES 
user_id=79902

This behaviour won't be changed before 9.0

Any fixes that *are* done must be applied to code in both tclUtil.c and tclUtf.c

dkf added on 2001-02-23 17:24:41:
See Patch #103932 https://sourceforge.net/patch/?func=detailpatch&patch_id=103932&group_id=10894

dkf added on 2001-02-15 23:32:37:
Apparently, the way [string match] handles syntactically invalid patterns is by failing to match anything at all.  It's not entirely clear to me that this is an optimal strategy...

dkf added on 2000-11-24 20:18:50:
Improved detection of bug:
  % sstring match \[a-] ]
  1
  % string match \[a-]x ]x
  0

dkf added on 2000-11-24 18:35:30:
Hmm.  The problem seems to be that the first pattern is actually malformed by the rules of [string match], but there is no way to indicate this.  I suppose the correct way of dealing with this is to decide that we were not really matching a range after all, but that's not very good at all.  Either that, or we state that a malformed pattern matches nothing at all.

Hmm.  On successful matching of a range, should we really back up a character at the unexpected end of string , or should we fail at that point?