Marpa

Timeline
Login

Timeline

Bounty program for improvements to Tcl and certain Tcl packages.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

63 check-ins using file slif-literal.tcl or slif/literal.tcl version 9afbcbfde5

2018-03-20
05:45
rtC's mixed use of both `char` and `unsigned char` in various APIs and code, plus interaction with `int` breaks lexing when attempting to go beyond ASCII, even when restricted to the BMP, due to bytes > 127 showing up as negative. Fixed by changing all uses of `char` to `unsigned char`. Further changed extraction of semantic values. - Lexeme length is now counted in characters, not bytes. - Similarly, lexeme end is now characters from start. - Input is now byte- and character counted, for proper lexeme start. Character counting in C strings pulled from tclUtf.c Error messages now use the new char offsets, plus byte offsets for partially read characters. check-in: b0d7fa6f75 user: aku tags: trunk
2018-03-16
23:44
Specials done, make them available to the main line. check-in: 754111fe8e user: aku tags: trunk
23:43
Implemented special semantic action `::first`. Added support in the tcl and rtc runtimes, and the generators for these runtimes. Plus test. Closed-Leaf check-in: ef22cbb99b user: aku tags: specials
23:24
Updated work on specials with bugfixes for issues found with it. check-in: 8918fc0839 user: aku tags: specials
23:22
Added test triggering rtC code path where a new lexer starts out with the parser exhausted, i.e. nothing acceptable. Fixed missing closing of the earleme for that case, and missing handling of the `lexer exhausted` error from libmarpa. The missing closing operation also caused miscommunication between lexer and gate, ultimately crashing the latter with a symbol id outside of the byte range. The gate change is only a tweak to get better tracing, i.e. print the acceptables before working with them, not after. check-in: 64775966bd user: aku tags: trunk
06:13
Added test triggering the code path for the generation of semantic actions for grammars which have multiple actions across their G1 symbols. Fixed variable name typo in that code path. check-in: 2f455e21d2 user: aku tags: trunk
2018-03-15
07:04
Added description of the various files found in the test grammars, and their relationships. check-in: eaefc0b630 user: aku tags: specials
2018-03-14
23:00
Implemented special semantic action `::array`. This one was trivial, maps to array action `values`. Mapping is done in the semantics, the backends do not see the special. Started on special semantic action `::first`. Tests, no implementation yet. check-in: 7626d0194b user: aku tags: specials
2018-03-02
06:23
unicode. Fixed bad condition for handling the final element in negate-class, after the main loop. Triggered by the next-to-final element ending just before the UNI_MAX. Test added, demonstrating bug and fix. check-in: 546018b243 user: aku tags: trunk
2018-02-19
05:55
marpa::gen::runtime::c Fixed mishandling of zero-length chunks. Generates bad C syntax. Triggered by grammar without any `:discard` clauses. The fix prevents insertion of discard chunks if there are no such. Furthermore now also errors out in the low level ChunkedArray code for zero-length chunks, to catch possible future problems. Reviewed all uses, made notes that none are zero-length now. Added a test demonstrating the possibility. check-in: 14698e1f84 user: aku tags: trunk
2018-02-18
02:37
Start a rework of the unicode layer. The overall plan is to remove the distinction between bmp and full in this layer, and move it into the generators, with some support in the middle, i.e. in literal handling and the transform from codepoints to byte sequences. Optimization: Moved the commands `2utf`, `mode`, and `max` into the C layer. check-in: b481c06f7c user: aku tags: reunification
2018-02-03
00:51
Merged fix for the issue of RT-C mishandling the `proper`-flag into the branch where it was found. Updated test results to match. Marked a number of tests touching on unicode/utf handling as known bugs. Address them when the general utf handling trouble is more solved. Still to address: All the `i_` tests. check-in: 9ed3031891 user: aku tags: language-json
00:28
Fixed mismatch between Tcl and C runtimes. Issue in the C runtime. Forgot to properly convert a boolean `proper` into the flag taken by `marpa_g_sequence_new()`. Conversion added, test cases added. check-in: b86ffae080 user: aku tags: trunk
00:27
Re-enable full set of lang/result checks check-in: 94232b7386 user: aku tags: trunk
2018-02-02
23:19
Fix bad phrasing in comment check-in: 90ce04b75a user: aku tags: trunk
19:50
Added a script to run a fixed demo, from grammar to parser to its use. Plus example json files to use as input. check-in: f82c10dea8 user: aku tags: language-json
18:23
Continued testsuite work. Fixed definition of JSON `whitespace`. Regenerated parsers. Updated parse failure results to match. check-in: eb17740e4d user: aku tags: language-json
2018-02-01
23:56
Continued testsuite work - Added the n_* cases (must reject), and 1st round of results. Reorganized the input/ and result/ directories to separate the various groups better (y|n|i, c|tcl, ...) - A number failures to reject input. - 4x grammar error: \f is not whitespace. - 10x input accepted which should not be (c (bad) vs tcl (ok)) - process vs process-file differences in rt-c (encoding differences?) check-in: 20cb6ea243 user: aku tags: language-json
20:37
Pull rt-c bug fix into the branch which exposed the issue. check-in: 2945ca6cd1 user: aku tags: language-json
20:36
Testsuite work - Clean up of the support code, removed unused procedures. - Ensure that files are read with the proper encoding before fed into the string 'process' (See `fgetc` decl and use). - Allow setting of constraints, runtime-specific - Set __known bug__ constraint for eight y_* tests where rt-c currently diverges from rt-tcl (1). (1) These are all in the unicode/utf-8 handling, which differs between the available runtimes. * rt-tcl operates on chars and defers to Tcl's parsing of utf-8 sequences. * rt-c OTOH operates on bytes, does its own utf-8 parsing, and is more strict (invalid sequences are a parse error). I have to see if I can define a char class (:invalid:) to contain the invalid sequences. Using that would allow me to either accept or discard them (depending on context). Similarly I might have to allow the class of surrogates (:Cs:), as acceptable characters, and as sequences for the characters past BMP. That would allow such characters even in Marpa limited to BMP. These are all things in the MarpaTcl core however, and not something specific to JSON. JSON just exposed the issues here. check-in: 7ab2e4bf63 user: aku tags: language-json
20:17
Added tool similar to `od`, to decode and display utf8 sequences in the input (file, stdin). check-in: 669660f659 user: aku tags: trunk
20:15
Changed gate to lexer flush signaling from in-band `(byte) -1` to a separate function. This removes any possibility that a `(byte) -1` from actual input causes a bogus flush. Added debug function allowing INBOUND to properly print a batch of input bytes. Fixed a crash of the RT-C where the loop searching for the end of the lexeme tried to pop a byte from the empty lexeme, triggering an underflow assert. This may happen when `lexer_complete` is called for an empty-valued lexeme. I.e. when the GATE rejects the first byte after the end of a lexeme as invalid and signals a flush before any byte was entered into the lexer at all. Note that this does not necessarily indicate a mismatch. The current set of acceptable lexemes may contain some which allow an empty value. We have to keep recognizing them. And after that the new context may have caused the invalid byte to be valid. So we only skip the attempt of making an empty value even emptier. The deeper issue is that for LATM-mode symbols the earley-set id does not match the length of the lexeme due to the zero-width ACS guards in front; causing an additional round through the loop before it can declare mismatch. The concrete example which triggered the issue are the `string` and `lstring` symbols in the JSON grammar, for input `[""]`. check-in: 0de21b2314 user: aku tags: trunk
2018-01-31
21:16
Pull the Tcl lexer fix over into the branch where the issue was found. check-in: d411dda199 user: aku tags: language-json
21:09
Fixed typo in the spec of escaped characters in strings. Fixed definition of `control` characters for JSON. Updated the results to match the tweaked grammar. For the Tcl runtime all tests pass except a few showing mishandling of numeric lexemes. A fix for that is waiting on trunk. RT-C still crashing. check-in: 6dfddb13e8 user: aku tags: language-json
20:58
Fix mishandling of lexemes interpretable as Tcl number by the Tcl runtime (lexer component). By going through `expr` a lexeme which looks like a number can be shimmered and may change its string rep when printed. Example: For JSON the lexeme `1E-2` became `0.01`. check-in: 2a442c3255 user: aku tags: trunk
00:51
The json testsuite is becoming more functional. Of the must-accept-inputs only 10 failures over 95 inputs. Some unexpected parse failures with bogus inputs. These are in part - Possibly due to reading of input with the wrong encoding (Need utf-8?). - Unexpected numeric reformatting reaching the AST (1E-2 vs 0.01) One crash in the RT-C to investigate. Tweaked the grammar a bit to have proper symbols for the constants, and to separate G1 and L0 better. check-in: dda6670b00 user: aku tags: language-json
2018-01-30
23:25
Pulled fix for Tcl code generator issue into the branch where it was discovered. check-in: 5f8cb41c75 user: aku tags: language-json
23:19
Fix issue in the core code generator for parsers and lexer using the Tcl-based runtime. A bug in package `char` (See `char quote tcl`) caused the generation of bogus Tcl charclass regexes from the internal data, when non-ASCII characters in [:control:] are involved. The generator now works around the issue. check-in: 65b1517840 user: aku tags: trunk
21:02
Added the first larger grammar example outside of the SLIF meta grammar: JSON. Known issues at this point: * Due to apparent trouble with Kettle (`build.tcl test` seems to ignore `--include-dir`) the testsuite is not yet functional. A basic test via `tools/trial` however works. * The generated Tcl parser is bogus. The main character class for string characters (`plain`) is bogus, it contains a bad range which is rejected by Tcl's `regexp` during parser construction. The C-based parser is ok however, modulo lurking unknowns. check-in: 5199afa673 user: aku tags: language-json
10:17
Fix oops, forgot to add test output for the slif meta grammar. check-in: 466c1ebc4d user: aku tags: trunk
10:16
Added formatter producing a SLIF grammar from a grammar container. Note, this is not fully round-trip at the moment (The special @LEX symbols can not be read back, violating identifier syntax). It is also sub-optimal with regard to LATM flags, g1 actions, etc. These are shown as attributes of each rule instead of making use of defaults to reduce duplication. It should be good enough however to serve as debugging aid. check-in: 3bfc0de63c user: aku tags: trunk
2018-01-29
19:28
Extended the set of formatters producing code initializing a grammar container (GC). Renamed the existing GC formatter to `gc-compact`. Added two formatters to generate non-compact human-readable code, using reduction rules for Tcl and C. check-in: 8d77fed34b user: aku tags: trunk
2017-10-17
16:30
README tweaks check-in: d2d1b00d53 user: aku tags: trunk
16:22
Updated the README to match the current organization of the (code in the) repository. check-in: f45f21924c user: aku tags: trunk
03:18
Merged fixes on flush behaviour to mainline. check-in: 62d99b6274 user: aku tags: trunk
03:13
Fixed demo grammar (wrong start symbol), then shown fix vs not in Tcl vs C runtimes. Then fixed C runtime flush behaviour. Further fixed mishandling of lexeme value and length in the presence of redo. Closed-Leaf check-in: a78dda3a4d user: aku tags: flush-fix
2017-10-16
23:17
Demonstrate the multi-flush bug. Fixed RT-C issue with actual lastchar lost/overwritten by redo, messing up the error message generated. check-in: 886eb6bb40 user: aku tags: trunk
22:24
And back check-in: ce762c6d5a user: aku tags: trunk
22:20
Pull trunk. Closed-Leaf check-in: f32641a83d user: aku tags: runtime-tests
22:12
Pull in the fix for L0 discard past G1 end, updated tests, fixed a few more things in the Tcl runtime (To early destruction of the parser-level recognizer prevented generation of proper error message for non-discard token after G1 end). check-in: bbff87f317 user: aku tags: runtime-tests
21:17
Added tests demonstrating bad behaviour when exhausting a parser while still having input (discards and not). check-in: e288571010 user: aku tags: runtime-tests
20:30
Added foundation of testing the runtime with arbitrary grammar/input pairs, and highlevel test drivers for the Tcl and C runtimes. check-in: e7ab54549b user: aku tags: runtime-tests
2017-10-15
16:55
Use OSX fixes. They were done as separate branches to remember to check behaviour when back on linux. check-in: 97bbaff3f9 user: aku tags: trunk
16:54
Silence compiler complaint on OSX. Leaf check-in: 09b264fb4a user: aku tags: osx-complaints
16:53
Added return after assert to silence compiler comlaint (OSX). check-in: 12ad722f66 user: aku tags: osx-complaints
16:50
Fixed problems in the handling of charclass as set of code-points and -ranges. Range validation was incomplete, allowing bad input to crash. Fixed, and tests added. Tracing as well, plus more notes when certain code paths will be reached. check-in: ac18987fd3 user: aku tags: trunk
04:46
Moving critcl after tcl solves OSX issue with install dependency order. Check if this breaks linux. Leaf check-in: 243e280f60 user: aku tags: build-order-trouble
2017-10-12
06:59
Tcl runtime. Fix flush issue where partial flush and redo needs recognition check-in: f26d4f328e user: aku tags: flush-fix
2017-10-11
05:28
Mark recognizer cons/dest points better check-in: 08e6e9634d user: aku tags: trunk
2017-10-06
22:01
Equivalent changes in the C runtime. 1. The C runtime already inter-twined tree extraction, valuation and hand-over which was added to the Tcl runtime in the previous commit. 2. Fixed same issue with possible L0 discards after G1 end. 3. Fixed bad assertions in symset, byteset, exposed by 2. check-in: 32c320340a user: aku tags: trunk
20:12
Reworked parser completion handling. Do not pull and save all possible parse trees into memory anymore. Instead eval each tree immediately after extraction and pass the resulting SV to the outer backend. Further a bug fix, tell the lexer about expected terminals (none), so that it can still handle any L0 discards which may occur after the G1 end symbol. I.e. while we are not expecting the G1 token stream to continue the L0 byte stream may still have input to process. TODO: Have to add test cases for this situation, both where only the expected discards occur, and where unexpected actual G1 tokens are present. check-in: 8c6bdade0a user: aku tags: trunk
19:36
Fix in Tcl runtime tracing. Bring necessary variable into scope. This was forgotten when placing various operations into their own methods for clarity. check-in: bbe2253bdb user: aku tags: trunk
19:33
Debugging enhancement, show actual semantic values in valuation steps. check-in: 4f1c755959 user: aku tags: trunk
19:31
Big tangle of single package sliced into several packages, each containing just related code. check-in: 5fde5977d2 user: aku tags: trunk
2017-10-05
21:51
Fix package meta data typo. Closed-Leaf check-in: d38f475f67 user: aku tags: slice
21:39
Reworked naming of the generator packages, and associated namespaces. Searching for plugins, i.e. more generators is now simpler (no special cases to exclude). check-in: 912cadf759 user: aku tags: slice
18:59
Updated marpa-gen to new sliced setup, and filled `list-plugins` in marpa::export::config. Next up, look into renaming packages for nicer structure. Start with exporters. check-in: 649487dd0c user: aku tags: slice
08:06
Heal fork, complete. check-in: 444c10e2e4 user: aku tags: slice
08:05
Heal fork Closed-Leaf check-in: 2175b86257 user: aku tags: slice-2
08:04
Split the remaining pieces into three packages: - C runtime - builtin parser (C runtime) - Low-level C wrapper for Tcl runtime foundation Updated tests to work again. More reshuffling. check-in: 77883b0ffd user: aku tags: slice-2
03:34
Fix missing requirements in the internal tool to re-create the builtin parser. check-in: ae36822717 user: aku tags: slice
2017-10-04
23:49
Took Tcl runtime out of the tangle. Left tangled are the low-level C wrapper and the C runtime. Some shuffling of parts. Note: Needs Kettle commit [kettle:c0f0b90c04] (kt::local* addition, scan fix, @owns fix) to work. check-in: 30d4d13ed3 user: aku tags: slice
22:14
Detangled precedence rewriting, and the exporters, mostly. Have places using an exporter where we need only part (gc formatting). Structure does not make for nice format/plugin discovery either. check-in: 4843e825d1 user: aku tags: slice