Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add two new (undocumented) flags to the Tcl_ExternalToUtf() interface. TCL_ENCODING_NO_TERMINATE rejects the default behavior of appending a terminating NUL byte to the produced Utf output. This permits use of all of the dstLen bytes provided, and simplifies the buffer size calculations demanded from callers. Perhaps some callers need or appreciate this default behavior, but for Tcl's own main use of encodings - conversions within I/O - this just gets in the way. TCL_ENCODING_CHAR_LIMIT lets the caller set a limit on the number of chars to be output to be enforced by the encoding routines themselves. Without this, callers have to check after the fact for going beyond limits and make multiple encoding calls in a trial and error approach. Full compatibility is supported. No defaults are changed, and the flags have their effect even if an encoding driver has not been written to support these flags (but greater efficiency is enjoyed if they do!). All of Tcl's own encoding drivers are updated to support this. Other encoding drivers may exist somewhere, but I cannot point to any. A TIP to document this and make it officially supported may come in time. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
7cefe002e742bc53de10591f3c3b5d36 |
User & Date: | dgp 2014-12-23 18:39:35 |
Context
2015-01-02
| ||
22:42 | Now that we're using TCL_ENCODING_NO_TERMINATE - be careful about acting on the contents of dst -- t... check-in: bd7e81c0fb user: dgp tags: trunk | |
2014-12-23
| ||
18:44 | merge trunk check-in: 526ce54517 user: dgp tags: novem | |
18:39 | Add two new (undocumented) flags to the Tcl_ExternalToUtf() interface. TCL_ENCODING_NO_TERMINATE rej... check-in: 7cefe002e7 user: dgp tags: trunk | |
17:53 | Support TCL_ENCODING_CHAR_LIMIT in TableToUtfProc and EscapeToUtfProc drivers. Closed-Leaf check-in: 14b0ab81bb user: dgp tags: dgp-encoding-flags | |
02:41 | Use more suitable variable name pushers. check-in: d9a4be6a30 user: dgp tags: trunk | |
Changes
Changes to generic/tcl.h.
︙ | ︙ | |||
2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 | * immediately upon encountering an invalid byte * sequence or a source character that has no * mapping in the target encoding. If clear, then * the converter will skip the problem, * substituting one or more "close" characters in * the destination buffer and then continue to * convert the source. */ #define TCL_ENCODING_START 0x01 #define TCL_ENCODING_END 0x02 #define TCL_ENCODING_STOPONERROR 0x04 /* * The following definitions are the error codes returned by the conversion * routines: * * TCL_OK - All characters were converted. * TCL_CONVERT_NOSPACE - The output buffer would not have been large | > > > > > > > > > > > > > > > > > | 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 | * immediately upon encountering an invalid byte * sequence or a source character that has no * mapping in the target encoding. If clear, then * the converter will skip the problem, * substituting one or more "close" characters in * the destination buffer and then continue to * convert the source. * TCL_ENCODING_NO_TERMINATE - If set, Tcl_ExternalToUtf will not append a * terminating NUL byte. Knowing that it will * not need space to do so, it will fill all * dstLen bytes with encoded UTF-8 content, as * other circumstances permit. If clear, the * default behavior is to reserve a byte in * the dst space for NUL termination, and to * append the NUL byte. * TCL_ENCODING_CHAR_LIMIT - If set and dstCharsPtr is not NULL, then * Tcl_ExternalToUtf takes the initial value * of *dstCharsPtr is taken as a limit of the * maximum number of chars to produce in the * encoded UTF-8 content. Otherwise, the * number of chars produced is controlled only * by other limiting factors. */ #define TCL_ENCODING_START 0x01 #define TCL_ENCODING_END 0x02 #define TCL_ENCODING_STOPONERROR 0x04 #define TCL_ENCODING_NO_TERMINATE 0x08 #define TCL_ENCODING_CHAR_LIMIT 0x10 /* * The following definitions are the error codes returned by the conversion * routines: * * TCL_OK - All characters were converted. * TCL_CONVERT_NOSPACE - The output buffer would not have been large |
︙ | ︙ |
Changes to generic/tclEncoding.c.
︙ | ︙ | |||
1202 1203 1204 1205 1206 1207 1208 | * stored in the output buffer as a result of * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const Encoding *encodingPtr; | | > > > | 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 | * stored in the output buffer as a result of * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const Encoding *encodingPtr; int result, srcRead, dstWrote, dstChars = 0; int noTerminate = flags & TCL_ENCODING_NO_TERMINATE; int charLimited = (flags & TCL_ENCODING_CHAR_LIMIT) && dstCharsPtr; int maxChars = INT_MAX; Tcl_EncodingState state; if (encoding == NULL) { encoding = systemEncoding; } encodingPtr = (Encoding *) encoding; |
︙ | ︙ | |||
1227 1228 1229 1230 1231 1232 1233 1234 1235 | srcReadPtr = &srcRead; } if (dstWrotePtr == NULL) { dstWrotePtr = &dstWrote; } if (dstCharsPtr == NULL) { dstCharsPtr = &dstChars; } | > > > > | | | | > | | > > > > > | | | > > > > > > > > > > | > | 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 | srcReadPtr = &srcRead; } if (dstWrotePtr == NULL) { dstWrotePtr = &dstWrote; } if (dstCharsPtr == NULL) { dstCharsPtr = &dstChars; flags &= ~TCL_ENCODING_CHAR_LIMIT; } else if (charLimited) { maxChars = *dstCharsPtr; } if (!noTerminate) { /* * If there are any null characters in the middle of the buffer, * they will converted to the UTF-8 null character (\xC080). To get * the actual \0 at the end of the destination buffer, we need to * append it manually. First make room for it... */ dstLen--; } do { int savedFlags = flags; Tcl_EncodingState savedState = *statePtr; result = encodingPtr->toUtfProc(encodingPtr->clientData, src, srcLen, flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr); if (*dstCharsPtr <= maxChars) { break; } dstLen = Tcl_UtfAtIndex(dst, maxChars) - 1 - dst + TCL_UTF_MAX; flags = savedFlags; *statePtr = savedState; } while (1); if (!noTerminate) { /* ...and then append it */ dst[*dstWrotePtr] = '\0'; } return result; } /* *------------------------------------------------------------------------- * |
︙ | ︙ | |||
2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 | { int result; result = TCL_OK; dstLen -= TCL_UTF_MAX - 1; if (dstLen < 0) { dstLen = 0; } if (srcLen > dstLen) { srcLen = dstLen; result = TCL_CONVERT_NOSPACE; } *srcReadPtr = srcLen; | > > > | 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 | { int result; result = TCL_OK; dstLen -= TCL_UTF_MAX - 1; if (dstLen < 0) { dstLen = 0; } if ((flags & TCL_ENCODING_CHAR_LIMIT) && srcLen > *dstCharsPtr) { srcLen = *dstCharsPtr; } if (srcLen > dstLen) { srcLen = dstLen; result = TCL_CONVERT_NOSPACE; } *srcReadPtr = srcLen; |
︙ | ︙ | |||
2263 2264 2265 2266 2267 2268 2269 | * output buffer. */ int pureNullMode) /* Convert embedded nulls from internal * representation to real null-bytes or vice * versa. */ { const char *srcStart, *srcEnd, *srcClose; const char *dstStart, *dstEnd; | | > > > | | 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 | * output buffer. */ int pureNullMode) /* Convert embedded nulls from internal * representation to real null-bytes or vice * versa. */ { const char *srcStart, *srcEnd, *srcClose; const char *dstStart, *dstEnd; int result, numChars, charLimit = INT_MAX; Tcl_UniChar ch; result = TCL_OK; srcStart = src; srcEnd = src + srcLen; srcClose = srcEnd; if ((flags & TCL_ENCODING_END) == 0) { srcClose -= TCL_UTF_MAX; } if (flags & TCL_ENCODING_CHAR_LIMIT) { charLimit = *dstCharsPtr; } dstStart = dst; dstEnd = dst + dstLen - TCL_UTF_MAX; for (numChars = 0; src < srcEnd && numChars <= charLimit; numChars++) { if ((src > srcClose) && (!Tcl_UtfCharComplete(src, srcEnd - src))) { /* * If there is more string to follow, this will ensure that the * last UTF-8 character in the source buffer hasn't been cut off. */ result = TCL_CONVERT_MULTIBYTE; |
︙ | ︙ | |||
2374 2375 2376 2377 2378 2379 2380 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart; | | > > > | | 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart; int result, numChars, charLimit = INT_MAX; Tcl_UniChar ch; if (flags & TCL_ENCODING_CHAR_LIMIT) { charLimit = *dstCharsPtr; } result = TCL_OK; if ((srcLen % sizeof(Tcl_UniChar)) != 0) { result = TCL_CONVERT_MULTIBYTE; srcLen /= sizeof(Tcl_UniChar); srcLen *= sizeof(Tcl_UniChar); } srcStart = src; srcEnd = src + srcLen; dstStart = dst; dstEnd = dst + dstLen - TCL_UTF_MAX; for (numChars = 0; src < srcEnd && numChars <= charLimit; numChars++) { if (dst > dstEnd) { result = TCL_CONVERT_NOSPACE; break; } /* * Special case for 1-byte utf chars for speed. Make sure we work with |
︙ | ︙ | |||
2558 2559 2560 2561 2562 2563 2564 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart, *prefixBytes; | | > > > | | 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart, *prefixBytes; int result, byte, numChars, charLimit = INT_MAX; Tcl_UniChar ch; const unsigned short *const *toUnicode; const unsigned short *pageZero; TableEncodingData *dataPtr = clientData; if (flags & TCL_ENCODING_CHAR_LIMIT) { charLimit = *dstCharsPtr; } srcStart = src; srcEnd = src + srcLen; dstStart = dst; dstEnd = dst + dstLen - TCL_UTF_MAX; toUnicode = (const unsigned short *const *) dataPtr->toUnicode; prefixBytes = dataPtr->prefixBytes; pageZero = toUnicode[0]; result = TCL_OK; for (numChars = 0; src < srcEnd && numChars <= charLimit; numChars++) { if (dst > dstEnd) { result = TCL_CONVERT_NOSPACE; break; } byte = *((unsigned char *) src); if (prefixBytes[byte]) { src++; |
︙ | ︙ | |||
2789 2790 2791 2792 2793 2794 2795 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart; | | > > > | | 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 | * the conversion. */ int *dstCharsPtr) /* Filled with the number of characters that * correspond to the bytes stored in the * output buffer. */ { const char *srcStart, *srcEnd; const char *dstEnd, *dstStart; int result, numChars, charLimit = INT_MAX; if (flags & TCL_ENCODING_CHAR_LIMIT) { charLimit = *dstCharsPtr; } srcStart = src; srcEnd = src + srcLen; dstStart = dst; dstEnd = dst + dstLen - TCL_UTF_MAX; result = TCL_OK; for (numChars = 0; src < srcEnd && numChars <= charLimit; numChars++) { Tcl_UniChar ch; if (dst > dstEnd) { result = TCL_CONVERT_NOSPACE; break; } ch = (Tcl_UniChar) *((unsigned char *) src); |
︙ | ︙ | |||
3014 3015 3016 3017 3018 3019 3020 | * correspond to the bytes stored in the * output buffer. */ { EscapeEncodingData *dataPtr = clientData; const char *prefixBytes, *tablePrefixBytes, *srcStart, *srcEnd; const unsigned short *const *tableToUnicode; const Encoding *encodingPtr; | | > > > | | 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 | * correspond to the bytes stored in the * output buffer. */ { EscapeEncodingData *dataPtr = clientData; const char *prefixBytes, *tablePrefixBytes, *srcStart, *srcEnd; const unsigned short *const *tableToUnicode; const Encoding *encodingPtr; int state, result, numChars, charLimit = INT_MAX; const char *dstStart, *dstEnd; if (flags & TCL_ENCODING_CHAR_LIMIT) { charLimit = *dstCharsPtr; } result = TCL_OK; tablePrefixBytes = NULL; /* lint. */ tableToUnicode = NULL; /* lint. */ prefixBytes = dataPtr->prefixBytes; encodingPtr = NULL; srcStart = src; srcEnd = src + srcLen; dstStart = dst; dstEnd = dst + dstLen - TCL_UTF_MAX; state = PTR2INT(*statePtr); if (flags & TCL_ENCODING_START) { state = 0; } for (numChars = 0; src < srcEnd && numChars <= charLimit; ) { int byte, hi, lo, ch; if (dst > dstEnd) { result = TCL_CONVERT_NOSPACE; break; } byte = *((unsigned char *) src); |
︙ | ︙ |
Changes to generic/tclIO.c.
︙ | ︙ | |||
4574 4575 4576 4577 4578 4579 4580 | if (GotFlag(statePtr, INPUT_SAW_CR)) { ResetFlag(statePtr, INPUT_SAW_CR); if ((eol < dstEnd) && (*eol == '\n')) { /* * Skip the raw bytes that make up the '\n'. */ | | | > | < | 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 | if (GotFlag(statePtr, INPUT_SAW_CR)) { ResetFlag(statePtr, INPUT_SAW_CR); if ((eol < dstEnd) && (*eol == '\n')) { /* * Skip the raw bytes that make up the '\n'. */ char tmp[TCL_UTF_MAX]; int rawRead; bufPtr = gs.bufPtr; Tcl_ExternalToUtf(NULL, gs.encoding, RemovePoint(bufPtr), gs.rawRead, statePtr->inputEncodingFlags | TCL_ENCODING_NO_TERMINATE, &gs.state, tmp, TCL_UTF_MAX, &rawRead, NULL, NULL); bufPtr->nextRemoved += rawRead; gs.rawRead -= rawRead; gs.bytesWrote--; gs.charsWrote--; memmove(dst, dst + 1, (size_t) (dstEnd - dst)); dstEnd--; } |
︙ | ︙ | |||
4682 4683 4684 4685 4686 4687 4688 | bufPtr = gs.bufPtr; if (bufPtr == NULL) { Tcl_Panic("Tcl_GetsObj: gotEOL reached with bufPtr==NULL"); } statePtr->inputEncodingState = gs.state; Tcl_ExternalToUtf(NULL, gs.encoding, RemovePoint(bufPtr), gs.rawRead, | > | | | 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 | bufPtr = gs.bufPtr; if (bufPtr == NULL) { Tcl_Panic("Tcl_GetsObj: gotEOL reached with bufPtr==NULL"); } statePtr->inputEncodingState = gs.state; Tcl_ExternalToUtf(NULL, gs.encoding, RemovePoint(bufPtr), gs.rawRead, statePtr->inputEncodingFlags | TCL_ENCODING_NO_TERMINATE, &statePtr->inputEncodingState, dst, eol - dst + skip + TCL_UTF_MAX - 1, &gs.rawRead, NULL, &gs.charsWrote); bufPtr->nextRemoved += gs.rawRead; /* * Recycle all the emptied buffers. */ |
︙ | ︙ | |||
5215 5216 5217 5218 5219 5220 5221 | } spaceLeft = length - offset; dst = objPtr->bytes + offset; *gsPtr->dstPtr = dst; } gsPtr->state = statePtr->inputEncodingState; result = Tcl_ExternalToUtf(NULL, gsPtr->encoding, raw, rawLen, | | | | | 5216 5217 5218 5219 5220 5221 5222 5223 5224 5225 5226 5227 5228 5229 5230 5231 5232 | } spaceLeft = length - offset; dst = objPtr->bytes + offset; *gsPtr->dstPtr = dst; } gsPtr->state = statePtr->inputEncodingState; result = Tcl_ExternalToUtf(NULL, gsPtr->encoding, raw, rawLen, statePtr->inputEncodingFlags | TCL_ENCODING_NO_TERMINATE, &statePtr->inputEncodingState, dst, spaceLeft, &gsPtr->rawRead, &gsPtr->bytesWrote, &gsPtr->charsWrote); /* * Make sure that if we go through 'gets', that we reset the * TCL_ENCODING_START flag still. [Bug #523988] */ statePtr->inputEncodingFlags &= ~TCL_ENCODING_START; |
︙ | ︙ | |||
5924 5925 5926 5927 5928 5929 5930 | Tcl_Encoding encoding = statePtr->encoding? statePtr->encoding : GetBinaryEncoding(); Tcl_EncodingState savedState = statePtr->inputEncodingState; ChannelBuffer *bufPtr = statePtr->inQueueHead; int savedIEFlags = statePtr->inputEncodingFlags; int savedFlags = statePtr->flags; char *dst, *src = RemovePoint(bufPtr); | | | | | < | > > > > > > | | | | | 5925 5926 5927 5928 5929 5930 5931 5932 5933 5934 5935 5936 5937 5938 5939 5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 | Tcl_Encoding encoding = statePtr->encoding? statePtr->encoding : GetBinaryEncoding(); Tcl_EncodingState savedState = statePtr->inputEncodingState; ChannelBuffer *bufPtr = statePtr->inQueueHead; int savedIEFlags = statePtr->inputEncodingFlags; int savedFlags = statePtr->flags; char *dst, *src = RemovePoint(bufPtr); int numBytes, srcLen = BytesLeft(bufPtr); /* * One src byte can yield at most one character. So when the * number of src bytes we plan to read is less than the limit on * character count to be read, clearly we will remain within that * limit, and we can use the value of "srcLen" as a tighter limit * for sizing receiving buffers. */ int toRead = ((charsToRead<0)||(charsToRead > srcLen)) ? srcLen : charsToRead; /* * 'factor' is how much we guess that the bytes in the source buffer will * expand when converted to UTF-8 chars. This guess comes from analyzing * how many characters were produced by the previous pass. */ int factor = *factorPtr; int dstLimit = TCL_UTF_MAX - 1 + toRead * factor / UTF_EXPANSION_FACTOR; (void) TclGetStringFromObj(objPtr, &numBytes); Tcl_AppendToObj(objPtr, NULL, dstLimit); if (toRead == srcLen) { unsigned int size; dst = TclGetStringStorage(objPtr, &size) + numBytes; dstLimit = size - numBytes; } else { dst = TclGetString(objPtr) + numBytes; } /* * This routine is burdened with satisfying several constraints. * It cannot append more than 'charsToRead` chars onto objPtr. * This is measured after encoding and translation transformations * are completed. There is no precise number of src bytes that can * be associated with the limit. Yet, when we are done, we must know * precisely the number of src bytes that were consumed to produce * the appended chars, so that all subsequent bytes are left in * the buffers for future read operations. * * The consequence is that we have no choice but to implement a * "trial and error" approach, where in general we may need to * perform transformations and copies multiple times to achieve * a consistent set of results. This takes the shape of a loop. */ while (1) { int dstDecoded, dstRead, dstWrote, srcRead, numChars, code; int flags = statePtr->inputEncodingFlags | TCL_ENCODING_NO_TERMINATE; if (charsToRead > 0) { flags |= TCL_ENCODING_CHAR_LIMIT; numChars = charsToRead; } /* * Perform the encoding transformation. Read no more than * srcLen bytes, write no more than dstLimit bytes. */ code = Tcl_ExternalToUtf(NULL, encoding, src, srcLen, flags & (bufPtr->nextPtr ? ~0 : ~TCL_ENCODING_END), &statePtr->inputEncodingState, dst, dstLimit, &srcRead, &dstDecoded, &numChars); /* * Perform the translation transformation in place. Read no more * than the dstDecoded bytes the encoding transformation actually * produced. Capture the number of bytes written in dstWrote. * Capture the number of bytes actually consumed in dstRead. */ |
︙ | ︙ | |||
6046 6047 6048 6049 6050 6051 6052 | /* * There are chars leading the buffer before the eof * char. Adjust the dstLimit so we go back and read * only those and do not encounter the eof char this * time. */ | | | 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 | /* * There are chars leading the buffer before the eof * char. Adjust the dstLimit so we go back and read * only those and do not encounter the eof char this * time. */ dstLimit = dstRead - 1 + TCL_UTF_MAX; statePtr->flags = savedFlags; statePtr->inputEncodingFlags = savedIEFlags; statePtr->inputEncodingState = savedState; continue; } } |
︙ | ︙ | |||
6072 6073 6074 6075 6076 6077 6078 | /* * There are chars we can read before we hit the bare cr. * Go back with a smaller dstLimit so we get them in the * next pass, compute a matching srcRead, and don't end * up back here in this call. */ | | | | > | | < | 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 6131 6132 6133 6134 6135 | /* * There are chars we can read before we hit the bare cr. * Go back with a smaller dstLimit so we get them in the * next pass, compute a matching srcRead, and don't end * up back here in this call. */ dstLimit = dstRead - 1 + TCL_UTF_MAX; statePtr->flags = savedFlags; statePtr->inputEncodingFlags = savedIEFlags; statePtr->inputEncodingState = savedState; continue; } assert(dstWrote == 0); assert(dstRead == 0); /* * We decoded only the bare cr, and we cannot read a * translated char from that alone. We have to know what's * next. So why do we only have the one decoded char? */ if (code != TCL_OK) { char buffer[TCL_UTF_MAX + 1]; int read, decoded, count; /* * Didn't get everything the buffer could offer */ statePtr->flags = savedFlags; statePtr->inputEncodingFlags = savedIEFlags; statePtr->inputEncodingState = savedState; Tcl_ExternalToUtf(NULL, encoding, src, srcLen, (statePtr->inputEncodingFlags | TCL_ENCODING_NO_TERMINATE) & (bufPtr->nextPtr ? ~0 : ~TCL_ENCODING_END), &statePtr->inputEncodingState, buffer, TCL_UTF_MAX + 1, &read, &decoded, &count); if (count == 2) { if (buffer[1] == '\n') { /* \r\n translate to \n */ dst[0] = '\n'; bufPtr->nextRemoved += read; } else { dst[0] = '\r'; bufPtr->nextRemoved += srcRead; } statePtr->inputEncodingFlags &= ~TCL_ENCODING_START; Tcl_SetObjLength(objPtr, numBytes + 1); return 1; } } else if (statePtr->flags & CHANNEL_EOF) { |
︙ | ︙ | |||
6156 6157 6158 6159 6160 6161 6162 6163 6164 6165 6166 6167 6168 | */ numChars -= (dstRead - dstWrote); if (charsToRead > 0 && numChars > charsToRead) { /* * We read more chars than allowed. Reset limits to * prevent that and try again. Don't forget the extra * padding of TCL_UTF_MAX bytes demanded by the * Tcl_ExternalToUtf() call! */ | > > | | 6162 6163 6164 6165 6166 6167 6168 6169 6170 6171 6172 6173 6174 6175 6176 6177 6178 6179 6180 6181 6182 6183 6184 | */ numChars -= (dstRead - dstWrote); if (charsToRead > 0 && numChars > charsToRead) { /* * TODO: This cannot happen anymore. * * We read more chars than allowed. Reset limits to * prevent that and try again. Don't forget the extra * padding of TCL_UTF_MAX bytes demanded by the * Tcl_ExternalToUtf() call! */ dstLimit = Tcl_UtfAtIndex(dst, charsToRead) - 1 + TCL_UTF_MAX - dst; statePtr->flags = savedFlags; statePtr->inputEncodingFlags = savedIEFlags; statePtr->inputEncodingState = savedState; continue; } if (dstWrote == 0) { |
︙ | ︙ |