Marpa

Timeline
Login

Timeline

Bounty program for improvements to Tcl and certain Tcl packages.

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

191 check-ins using file slif-precedence.tcl version 867ac842b1

2018-07-19
19:39
Ignore a few more work things in the src area. check-in: dea206a3fa user: aku tags: parse-events-rtc
07:08
Created input indexing per doc.1/INDEX_3.md. Integrated indexing with inbound. Demo tests now pass. Still to do: Passing in alternate symbols and sem values. check-in: 3d4c0b8e3a user: aku tags: parse-events-rtc
2018-07-18
20:49
Demonstrate location mis-tracking by rtC when moving in input with multi-byte characters from a parse event handler (non-ascii-b). Show general ok tracking without moves by the user (non-ascii-a). check-in: 0f203e8f47 user: aku tags: parse-events-rtc
18:59
Continued fill-in for before/after events. `Inbound`, `gate`, and `lexer` rewritten to match rt-Tcl behaviour with regard to stepping through the input, rewind, and flush. __Attention__ `Inbound` location tracking handles only ASCII correctly, still, not multi-byte UTF. Still mulling over possible index structures to enable quick movement in the presence of multi-bytes with low memory overhead for very uniform input. Changed the `lexer` field `m_sv` to an int stack, the component tracks ids, with the associated SVs already remembered in the `store`. Forced changes to the implementation of the facade, and to the API between `lexer` and `parser` (changed signature and implementation of `marpatcl_rtc_parser_enter`). Fixed forgotten difference between discard and other events (ACS symbols vs G1 symbols) in the C generator code and the id/symbol conversion done by the facade. check-in: dc070a04f8 user: aku tags: parse-events-rtc
05:05
Fix comment typo check-in: 2a46239e90 user: aku tags: parse-events-rtc
2018-07-17
16:51
Started on before/after events. Will not compile, see all the XXX markers, just a checkpoint. check-in: 91ad806001 user: aku tags: parse-events-rtc
05:29
Filled in most of the facade. Still missing parts relevant to before/after events. Location accessors implemented, only correct for ASCII. UTF support is still to do. Moved the API functions into inbound and lexer, with pieces in the pedesc class. Dropped the separate pedesc header and C sources. Optimized moveto, add the delta to the positions before delivery to the engine. Lexer now manages all the new fields for the match state (initialization, update, reset). lexeme data for SV now coming out of the new fields and accessors. Event testing now recording any error we may receive from the match facade. check-in: dd2dfa9d07 user: aku tags: parse-events-rtc
2018-07-13
23:47
Attention: This commit will likely not even compile. It is saved scratch state of work on the rtc lexer match state to complete the parse event facade. Reworking the lexer state internals for cached access to the information, ability to modify. This will affect the lexer/parser interface, namely the transmission of found symbols and semantic values. check-in: 96dc913d9e user: andreask tags: parse-events-rtc
22:27
Added tracing of the new event matching and reporting code. Tweaked tracing of the EH functions (separate stream). Added detection and reporting of discard events. Fixed generated lexer event entries, engine operates with the ACS symbols. Updated tests. Discard events are detected and reported. Testsuite fails because the PE descriptor facade is not completely filled out yet, and the incomplete methods have divergent signatures from the expected. While the Tcl errors thrown by the event recorder callback used in the testsuite are ignored during execution, they are properly seen in the narrative tracing proving that the discard callbacks works. Fill out structure and facade before implementing the before/after callbacks. check-in: 91aa29f8f2 user: andreask tags: parse-events-rtc
18:06
Linked the PE descriptor facade into the lexer/parser templates. Removed scratch notes from the facade. Updated tests. check-in: c20d03a3ef user: andreask tags: parse-events-rtc
2018-07-12
21:35
Extended the facade with set/get for the class rtc variable needed by the constructor check-in: 9e2ca3d617 user: andreask tags: parse-events-rtc
20:58
Started work on generic parse event descriptor access. Using a critcl::class as facade to the structures, ensemblified methods. Requires critcl::class 1.1 to disable tcl-api, generate c-api. check-in: 84fa7104f2 user: andreask tags: parse-events-rtc
20:01
Fixed typo in comment check-in: 915f151c40 user: andreask tags: parse-events-rtc
05:43
Fixed [f5e6063aeb] memory smash. Miscounted references to the `self` argument of parse event callbacks. Updated tests with all the generator changes. rtc-runtime tests: L0 parse event cases failing as expected, as event generation is not done yet. However we are now at the point where we can start on adding this in. Most of the foundations are now present. Notably still missing are the parse event descriptor structures and its linkage to the lexer state. check-in: 66339c7f20 user: aku tags: parse-events-rtc
2018-07-11
22:52
Filled in the marpatcl_rtc_eh_... functions and structures. Fixed bogus declaration syntax for the generated event structures. __Attention__: Seg.fault in core. Likely caused by the new functions, structures, and their use. Update: Fixed with [66339c7f20]. check-in: f5e6063aeb user: andreask tags: parse-events-rtc
20:28
Follow up to lex-only refactoring, updated tests. check-in: 64a75ee092 user: andreask tags: parse-events-rtc
19:58
Ripped the general structures and code for lex-only token/value handling out of the clex template and placed them into the marpa::runtime::c package for sharing. As part of that the result and event callbacks from RTC now have separate client data information. Note, the yet undefined `eh` structures and functions already referenced by the event handling code are a similar thing for that aspect. check-in: fce9c19274 user: andreask tags: parse-events-rtc
19:47
Always initialize variable check-in: f3f39059c2 user: andreask tags: parse-events-rtc
19:46
Added notes about uninitialized memory to the two set implementations. check-in: 8202ba0482 user: andreask tags: parse-events-rtc
19:43
Memory smash fixed. SV ref miscount in the lexer in lex-only mode when one SV is re-used for multiple tokens in the same location. Win for the (SEM_)REF_DEBUG functionality coming out of the mem-limit, memory-cleanup branches. Valgrind was no help. check-in: 1d1b5cb7d8 user: andreask tags: parse-events-rtc
04:36
Fixed missing cleanup of callback field, caused a bogus 2nd destroy on object destruction. Tweaked comment in test support code. check-in: afeb1c4ea6 user: aku tags: parse-events-rtc
00:07
Filling in the C-level data structures and API changes implied by the template changes for clex and cparse generators. ATTENTION: Manually modified the builtin parsers (slif, literals) to match the changed structures and function signatures. ATTENTION: Memory smash somewhere in the zeta-rtc-lexer tests. (Fixed with commit [1d1b5cb7d8]) check-in: 3df1d4cea4 user: andreask tags: parse-events-rtc
2018-07-10
23:13
Show i-gen critcl command in the log. Added main tclsh include to the set of include paths to search. check-in: e274d2eda6 user: aku tags: parse-events-rtc
22:21
Converted the eof/enter callbacks of clex to critcl::callback. check-in: c230bc5be1 user: andreask tags: parse-events-rtc
21:13
Extended asset management to allow more than one asset. Generate C event datastructures, lex & parse. Updated and extended tests. check-in: c7e33a1f26 user: andreask tags: parse-events-rtc
2018-06-26
18:20
Make rewind more robust. Tweak post-event input movement. Added event scratch docs check-in: 22e19935f3 user: andreask tags: parse-events
2018-04-20
21:28
inbound: Absolute movement, tweaked for consistency. Extended with optional delta arguments. gate: Extended with forwards to the input location accessor and modifiers. This provides parse event handlers with the ability to move in the input. lexer: Moved redo to before we handle parse events. This enables parse event handlers to modify the location without interference from the system itself. Modified pre-lexeme event generation to move input to the start of the lexeme. Fixed pe-fill method, forgot to set flag for the incremental rebuild of the symbol/sv tables. parse descriptor: See gate. Further fixed accessor setup. Extended view to report input location. testsuite: Report input location. Move input location to the end of the lexeme. Needed now that pre-lexeme events have the location at the start. Updated expected results. check-in: faebf8fba4 user: aku tags: parse-events
04:33
Pulled memory fixes into the feature branch implementing parse events. check-in: 07ca2e2be0 user: aku tags: parse-events
2018-04-19
19:47
Update main line with the accumulated memory fixes. check-in: dcf340cd32 user: aku tags: trunk
19:44
Merged series of fixes for memory issues (memory leaks, management mismatches, ...). Together with the push of memory-intensive tests into child processes done here the testsuite should now be properly constrained again wrt memory usage. Updated tests. Closed-Leaf check-in: dc34e1afbb user: aku tags: mem-limit
18:18
Added test against unbound parser memory usage in parser-core. Found and fixed SV ref-count mismanagement (RCM) in the lex-only code path of the RTC. Added narrative tracing to the code path. Cleanup of lexer-core testsuite with regard to memory debugging. Found and fixed Tcl_Obj* RCM in the lex-only critcl template. Added narrative tracing to the template. Closed-Leaf check-in: a29c4613b0 user: aku tags: memory-cleanup
2018-04-18
23:41
Do not mix malloc and Tcl allocation routines. Even when the malloc is hidden, here in `strdup`. Used to allocate strings with strdup, then release with ckfree/Tcl_Free. This messed up memory mgmt internals to the point of memory smashes. Fixed, now using our own implementation of strdup based on the allocation macros from environment.h. This ensures that alloc and free match. Thank you, valgrind. check-in: 714d438603 user: aku tags: memory-cleanup
03:08
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the amount of memory taken by the test process' is limited to the generated package itself instead of all the packages needed to perform the generation. Further modified lexer-core to place the sets of tests for each variant grammar/lexer-action into a child process as well. This ensures that the memory needed by each variant grammar is limited to that child process, instead of accumulating in the controlling test process. __Attention__: This change requires a Kettle with support for `kt::sub`, added to Kettle with commit [ef384673c5] (2018-04-18 02:28:17). check-in: ddc1e67640 user: aku tags: mem-limit
2018-04-17
20:36
Moved the main parts of `test/support/gen.tcl` into `bin/i-gen`. This new internal generator application uses the public `marpa-gen` as the underlying workhorse. The support code now invokes the internal generator instead of doing everything itself. With the operation confined to a child process the test process' amount of memory taken is limited to the generated package instead of also keeping all the overhead of generating Note, this does not help with the test suites based on lexer-core as that suite still loads/sources ten different lexer packages into the same process. Handling this requires more work, i.e. pushing the individual tests into their own child processes. check-in: ad4f1d4287 user: aku tags: mem-limit
2018-04-11
23:33
Extended the narrative tracing in `sem_tcl.c` to track refcounts. Found and fixed the cause for the orphaned Tcl_Obj*'s. The function `marpatcl_rtc_sv_astcl` did an extraneous refcount increment on the conversion result (SV tree to Tcl_Obj* tree). Parser operation based on RTC now does not leak anything anymore. check-in: 302f47227a user: aku tags: memory-cleanup
07:43
Added code (sem_debug.c) to track SV allocation and release, and dump orphan SVs at the end. All SVs were orphaned, nothing released. Tracked to a bad guard condition in function `marpatcl_rtc_sv_unref`. As the code checks the refcount before decrementing it both 0 and 1 must trigger destruction, not just 0. Fixed. All SV structures are now properly released at the end (parser instance destruction). Still orphaned things left, these however are Tcl_Obj's. check-in: 0d6e6c1dbe user: aku tags: memory-cleanup
2018-04-10
04:55
container: Fixed leakage of priority and priority alternate objects. Test support extended with various diagnostic tools. Note, it looks like the RTC and glue into Tcl are also leaking like mad, especially around the semantic values and the (partial) ASTs we are constructing. Continue investigation and fixing. Started a new branch for this, check-in: c3ff9015b6 user: aku tags: memory-cleanup
2018-04-06
06:20
inbound, gate, lexer: Added a back-link from gate to inbound, analogous to the gate/lexer and lexer/parser connections. Dropped history management from the gate and changed its redo method to simply rewind the input instead of re-entering the tail end of the history. This is the first use of the new cursor movement methods added to inbound. Under the old regime using foreach, and steadily marching forward in the input all re-processing was done by remembering and recursively re-entering characters as needed, with additional loops at the various stages of the pipeline (`gate`). With the while-based cursor on the other hand we have only one (nested) processing loop (`inbound`), and all movement is handled by it. We cannot recurse, and we can't have a loop in `gate`. The nested loop in `inbound` is required to handle the case where we reach eof and the later stages then tell us `not yet`, i.e. bounce us back. The inner loop is the main processor, and the outer loop restarts it until eof actually succeeds. In `gate` the flush-signalling changed. In 'lexer' eof signalling to the `parser` is prevented when it bounces the input away from eof. check-in: 065653b213 user: aku tags: parse-events
04:25
Pull gate readability changes into parse-event feature support. check-in: cc42a78587 user: aku tags: parse-events
2018-04-05
08:29
runtime-tcl, gate: Moved a few code blocks into their own methods to make their now-caller more readable. check-in: 0dff9cc32b user: aku tags: trunk
2018-04-04
17:57
inbound: Converted the `foreach`-loop processing the physical input stream into a `while`. The location information now is a cursor into the input, instead of a dependent variable. This allows us to move backwards in the input, or ahead, as we see fit. Added movement methods (absolute, relative) for the cursor. No stream expansion yet, nor virtual streams. check-in: 9c1fa89ac4 user: aku tags: parse-events
16:58
Added generation of pre- and post-lexeme events. Extended the testsuite to demonstrate them. Note, the pre-lexeme trigger location does not match Marpa::R2 yet. check-in: 7693f8601c user: aku tags: parse-events
06:44
Reworked the lexer somewhat. Moved match state into a nested object for easier access. Exposed to parse event handlers via a limiting facade. Outside entrypoint is parser method 'match', an ensemble. Moved to single event handler call bundling all relevant events. Internally also used to simplify GetSemanticValue. Updated discard event test. check-in: 97d4a5397c user: aku tags: parse-events
2018-04-03
20:42
Pull recent fixes into the language work check-in: da745b6e55 user: aku tags: parse-events
20:37
Fixed bug in the semantics' handling of :lexeme. Do not exclude the symbol from LATM fixup if the :lexeme adverbs do _not specify_ latm information. Facepalm. Found playing with lexeme events, and GC missing the crucial latm information, breaking generated test parsers. Updated tests to correct results. check-in: e5e442db2a user: aku tags: trunk
17:47
Tests: Remove a leftover break from debugging something, wrongly committed. check-in: 7c7353ad24 user: andreask tags: trunk
04:42
Get latest docs to work with. check-in: 1c73889373 user: aku tags: parse-events
04:41
Extended the Tcl runtime with basic support for events via callback. Set/unset/query, forwarding from the inner objects. Definition and storage of event maps. Proper pre-processing of such maps for the lexer, not yet for the parser. Added generation of discard events. Extended the testsuite to demonstrate the basic infrastructure, and discard events. Removal of trailing spaces. check-in: 0f5c6931e2 user: aku tags: parse-events
2018-03-30
23:42
Docs: More small fixes Leaf check-in: d89aeafc1a user: aku tags: docs
23:38
Get doc fixes check-in: d41168399d user: aku tags: trunk
23:37
Docs: Typo fixes, phrasing fixes. check-in: 2a1acc9132 user: aku tags: docs
23:24
Make recent documentation work official. check-in: 607761c40d user: aku tags: trunk
23:23
Docs work - Moved architecture from intro to dev guide. - Made intro a dispatcher to other documents based on readers goals. - Added reference for marpagen. - Added placeholder for SLIF, referencing the upstream Marpa::R2 documentation. check-in: 9f10a242a4 user: aku tags: docs
2018-03-29
23:14
Docs: Completed addition of feedback sections. Added audience/target information sections. check-in: fdf9132f0d user: aku tags: docs
22:36
Docs: Added changes, license documents, libmarpa requisite for installer, factored welcome message, added feedback in parts check-in: 2db7ae6cd3 user: aku tags: docs
21:46
Updated docs work with trunk changes. check-in: 9ad599cfa5 user: aku tags: docs
05:34
Started implementation of parse events. rt-Tcl first. Implemented generation from container, with fixes to have access to the stored G1 events. Extended testsuite to show event information, and updated older tests. Some whitespace corrections (removal of trailing spaces). Some tracing tag fixes. check-in: f99351071e user: aku tags: parse-events
2018-03-27
19:58
Grammar edits: - Tweaked some of the formatting. - Main change: Redone the spine of the document structure. Moved the nullability around, enabling use of sequence rules. The price is paid by the paragraphs, having their separators in the AST (Cannot mask/hide the separator of a sequence from the AST). Explicit recursion for multiple separators in sequence however allows hiding that in a single separator. Still, AST nesting significantly reduced. Further doctoring is something for full custom command actions, or the semantics taking the AST. Regenerated parsers. Updated test suite to match. check-in: 1a2a8af59c user: aku tags: language-doctools
07:17
Grammar: - Edited to provide the keywords with proper lexeme symbols instead of the ugly automatic names. - Reworked command definitions to enforce space after a command lead-in. - Added lots of custom actions (::first), to simplify the returned AST. Regenerated parsers. Went over test suite again. Removed the fail cases from Tcllib, completely bogus in places against the more strict specification. Reworked the ok cases and added the first proper ok results. Still thinking about the main spine of man pages and how to express nicely. The current explicit recursive structure nests a bit deep. check-in: de0a504dfd user: aku tags: language-doctools
2018-03-26
21:35
Created parsers from draft. Made test suite functional. First runs, all results bad (different error messages on failure, different type of AST structure). check-in: d46d0a963a user: aku tags: language-doctools
21:08
Documentation, installation guide: Added instructions for setting up `libmarpa`. check-in: 254c53ee1f user: andreask tags: trunk
2018-03-25
05:15
Added untested draft grammar. check-in: ecdb604ac6 user: aku tags: language-doctools
02:53
Added test-vectors used by Tcllib. check-in: 7e961f36fa user: aku tags: language-doctools
2018-03-24
23:33
Pull alias support into the example. check-in: 26bb8496d5 user: aku tags: language-doctools
23:31
Activated alias support in main. check-in: d60e61a1ce user: aku tags: trunk
23:30
marpa::unicode - Added alias handling. Updated testsuite. Further: - Reworked the table generator, more separate phases, less intertwined operations. Split across several files now, with each a set of related commands to manage part of the data structures. - BMP/SMP are directly generated as aliases where possible - Fixed issue with long-form category names for aliases. Tcl has its own definition of `control` (cc+cf+co). - Updated boot parsers. Closed-Leaf check-in: 85df278cb8 user: aku tags: cc-aliases
04:23
marpa::unicode - Added alias handling. Updated testsuite. Further: - Tweaked generator output. - Fixed issues with missing :bmp/:smp information for some aliases. check-in: cb06be0e04 user: aku tags: cc-aliases
2018-03-23
23:59
Table generator: Convert `:bmp` and `:smp` CCs identical to their origin into aliases, reducing storage requirements. Output not usable anymore until the unicode layers gets extended to recognize and handle aliases. check-in: cc6426a1cc user: andreask tags: cc-aliases
23:47
Introduced char class aliases into the output of the table generator. check-in: 7771b3d5ad user: andreask tags: cc-aliases
05:31
Start on 2nd big example, doctools, of Tcllib check-in: ccb57223f3 user: aku tags: language-doctools
2018-03-22
03:48
Language example `JSON` is now official. check-in: c4414a4012 user: aku tags: trunk
03:46
Phase 1 documentation now official. check-in: 17146d5f3e user: aku tags: trunk
03:46
Intro and dev guide completed. Doc phase I ok. check-in: 0698f503c3 user: aku tags: docs
2018-03-21
22:12
Started on proper documentation. Basic guides, some placeholders, no package docs yet. check-in: 45c56e540d user: andreask tags: docs
06:55
The known bug was due to a subtle difference in the two parsers. The Tcl-based parser accepted standalone surrogates, the C-based one did not. This came down to a guard condition in the ASBR compiler, which excluded any surrogates found in the input CC from the result. A design bug, not an implementation bug. Removing the guard fixes the issues with the json parser. The first attempt at the fix, adding the standalone surrogates explicitly to the grammar run into the same guard, albeit in a different manner. The explicit range became an empty literal during reduction, got removed, and then the still-existing reference to it caused the generator backend to throw an error. Regenerated the C-based parser, and updated all test results to match the changes in the rule numeration. Closed-Leaf check-in: a72fb8f4d2 user: aku tags: language-json
06:54
Merged design fix to the json experiment. check-in: cc8068dba7 user: aku tags: language-json
06:53
Fixed a design issue. Surrogate handling. The low-level unicode layer used a guard to prevent the addition of the surrogate codepoints when creating an ASBR from a CC. Thus for any CC including one or more surrogates the resulting ASBR actually represented the CC minus surrogates. The thinking was sort of that we are working with characters at the high-level, and while we have an 1:1 mapping to codepoints for most, this is not true for the surrogates, each of which is only half of a character. And the reducer targeting the C runtime based on Tcl ensures that characters in SMP are properly converted into surrogate pairs. Working on the JSON parser now has driven home that there are situations where we want to accept standalone surrogates, at the highlevel, and also that the low-level removal was a bad idea as well. The latter because a negated char class handled by Tcl does accept the standalone surrogate code points, whereas the ASBR for C is mangled to reject them. The result is a very unwanted difference in the behaviour of what should be equivalent parsers. So, lots of writing for a very small change, code-wise, the removal of the guard mentioned in the 2nd paragraph. And undoing the removal of the surrogates as a named character class. Further, brought the bugfix from commit [546018b243] into the `unicode_ops.tcl` used by the table generator. Same issue, and forgotten when the initial fix was made and committed. Updated tests to match results due to rule renumeration and CC changes. check-in: 1695c17f13 user: aku tags: trunk
2018-03-20
19:13
Continued work on the json test suite. Processed all the i_* cases. A single known bug to investigate for rtC. check-in: 84467cbcf6 user: aku tags: language-json
05:57
The unicode work is good enough to solve the known issues with the json parser. Make it officially available to trunk. check-in: 81404d4d77 user: aku tags: trunk
05:55
Unicode work good enough to solve the known issues with the json parser. Integrated. check-in: b333341f7c user: aku tags: language-json
05:50
Known bugs all settled. Test results updated for the changes in the rtC counting (characters, not just bytes). Closed-Leaf check-in: 3bf1d3e8b3 user: aku tags: json-unify
05:47
Update the json/unicode mix branch with the latest fixes on unicode. check-in: 2c4895d98b user: aku tags: json-unify
05:46
Update unicode work with the latest fixes on trunk. Closed-Leaf check-in: 78ab6a1f68 user: aku tags: reunification
05:45
rtC's mixed use of both `char` and `unsigned char` in various APIs and code, plus interaction with `int` breaks lexing when attempting to go beyond ASCII, even when restricted to the BMP, due to bytes > 127 showing up as negative. Fixed by changing all uses of `char` to `unsigned char`. Further changed extraction of semantic values. - Lexeme length is now counted in characters, not bytes. - Similarly, lexeme end is now characters from start. - Input is now byte- and character counted, for proper lexeme start. Character counting in C strings pulled from tclUtf.c Error messages now use the new char offsets, plus byte offsets for partially read characters. check-in: b0d7fa6f75 user: aku tags: trunk
05:42
Some cleanup in the core testsuite. check-in: 65b182f33e user: aku tags: json-unify
05:40
Added textual decodings of the knownBug y_* inputs, for readability. check-in: ca443ec660 user: aku tags: json-unify
2018-03-16
23:44
Specials done, make them available to the main line. check-in: 754111fe8e user: aku tags: trunk
23:43
Implemented special semantic action `::first`. Added support in the tcl and rtc runtimes, and the generators for these runtimes. Plus test. Closed-Leaf check-in: ef22cbb99b user: aku tags: specials
23:24
Updated work on specials with bugfixes for issues found with it. check-in: 8918fc0839 user: aku tags: specials
23:22
Added test triggering rtC code path where a new lexer starts out with the parser exhausted, i.e. nothing acceptable. Fixed missing closing of the earleme for that case, and missing handling of the `lexer exhausted` error from libmarpa. The missing closing operation also caused miscommunication between lexer and gate, ultimately crashing the latter with a symbol id outside of the byte range. The gate change is only a tweak to get better tracing, i.e. print the acceptables before working with them, not after. check-in: 64775966bd user: aku tags: trunk
06:13
Added test triggering the code path for the generation of semantic actions for grammars which have multiple actions across their G1 symbols. Fixed variable name typo in that code path. check-in: 2f455e21d2 user: aku tags: trunk
2018-03-15
07:04
Added description of the various files found in the test grammars, and their relationships. check-in: eaefc0b630 user: aku tags: specials
2018-03-14
23:00
Implemented special semantic action `::array`. This one was trivial, maps to array action `values`. Mapping is done in the semantics, the backends do not see the special. Started on special semantic action `::first`. Tests, no implementation yet. check-in: 7626d0194b user: aku tags: specials
2018-03-13
17:38
Updated the json tests. The only tests directly affected by the merge where a number of expected-negatives where the error message did not match anymore, due to changed symbol ids coming out of the generator. Furthermore, the `process-file` changes reduced the set of known bugs, eliminating all expected-negatives from it (process-file sees the same data as process now,, making the error messages the same for the two methods). We still have issues with various expected-positive which still error. These are suspected to require changes to the json grammar. check-in: d4f0f54320 user: aku tags: json-unify
17:30
Bring the unicode work into the json experiment for eval. check-in: 3358e68801 user: aku tags: json-unify
15:01
Fix the `process-file` method of the rtC backends. `Tcl_Read` does not do encoding processing. Replaced with `Tcl_ReadChars` which does. Plus attendant changes to handle the different signature. Now `process-file` is equivalent to `process` in that it sees CESU-8, and MUTF-8. check-in: e43eff4a15 user: aku tags: reunification
2018-03-12
21:06
Updated the rtC backends (lexer, parser) to use `utf-8` as the encoding for `process-file`, and updated all places affected by this (test results, bootstrap and literal parser). check-in: 59665cfdcf user: aku tags: reunification
20:33
Updated the literal parser. The updated bootstrap parser was already pushed in the previous commit. Passes the entire testsuite. We are now mostly (*) ready to go back to branch `languages-json` and evaluate if the extended unicode support helps the json parser. (*) Have to fix the `process-file` method in RTC (encoding = utf-8` = Tcl's internal encoding, like for the `process` method). check-in: edd9f06929 user: aku tags: reunification
19:59
literal::parse - Add code to handle compat `try`. Oops. Updated the bootstrap parser as well, already. check-in: 8fc12e03e6 user: aku tags: reunification
19:58
runtime::tcl - Move code for compat `try` into main package entry check-in: d503ddfb50 user: aku tags: reunification
17:11
Merged completed literal work into the general unicode work. check-in: 47e7cce082 user: aku tags: reunification
16:34
Tweaked the API between reducer core and callbacks, to handle symbol creation for the custom tags. finalized the integration of the new reducers into the generator backends. Extended Tcl backend to refactor codepoint ranges used in ASSRs, like the rtC backend does for byte ranges. Updated tests. Mostly. See note below. Note: Boot parser has not been switched to the extended grammar yet (allowing for unicode references in the SMP). Closed-Leaf check-in: 7d263e069d user: aku tags: relit
2018-03-10
07:44
Brought case-expansion perf work into literal work. Tests for literal norm and reduce pass. Time spent on them now a minute, versus originally 45 minutes. Running tests often is useful again, instead of a pain. check-in: b5a818af28 user: aku tags: relit
07:39
Moved up in the stack, to `unicode::unfold`, and `unicode::fold/c`. These are the charclass and string ops based on `data::fold` and `data::fold/c`. Added the workhorse function `marpatcl_scr_unfold` to `c/scr.[ch]`. Tests pass, and time to run them went down to a second. Target reached. Closed-Leaf check-in: 5321722a73 user: aku tags: refold
05:13
Reworked the table generator to put the case-folding information into simple C-level arrays of integers, plus an accessor function. Replaced the low-level `data::fold/c` and `data::fold` procedures with C implementations. Extended the tests a bit and updated them to match minor changes (error message, and the fact that the generator has the latest tweaks from the checkout of `relit`). The tests now take about 15 seconds. check-in: 696e160cbe user: aku tags: refold
05:07
Added tests demonstrating the slow speed of case-expansion operations and related, first seen on branch `relit` with the much expanded set of literals to cover normalization and reduction. Each of the three .test files takes about fifteen minutes, with the majority taken by case-expansion of large character classes. The tests here take about 29 seconds. This is the beginning of branch `refold` to move the case-folding information and the operations on it into the C level. check-in: 3ceee77a53 user: aku tags: refold
2018-03-06
22:28
Continue previous commit ... check-in: 01ba5c2fe1 user: aku tags: relit
22:27
Rewrote the reductor for C engines linked to Tcl (mutf-8, cesu-8) based on the new framework. Redone testing, now using the same cases as for normalization. Extended the test cases to cover the custom tags as well. check-in: 6c97ea1a57 user: aku tags: relit
2018-03-05
20:22
Rewrote the reductor for Tcl engines based on the new framework. Redone testing, now using the same cases as for normalization. check-in: 9a39768f89 user: aku tags: relit
20:20
Moved highlevel methods for reduction of grammars into the reductor state class. See previous for the tests. check-in: 5146df36f3 user: aku tags: relit
20:18
Updated reductor state tests somewhat. Noted the parts not covered yet. check-in: dec03b44b2 user: aku tags: relit
20:17
Reworked testing of normalization using the new way of specivying cases and results. Systematic set of cases covering a large set of possible input. Fixed issues in normalization uncovered by the new set of cases. Modified normalization to pass literal using unknown/custom type tags as is. check-in: 845a016aa9 user: aku tags: relit
20:14
Added another way of specifying large amounts of test cases and related results (files, one line per case/result, except comments and empty lines) check-in: 0a1ccc69b3 user: aku tags: relit
20:11
Extended `negate-class` to allow limiting negation to SMP. No tests yet. check-in: fb0f7497c5 user: aku tags: relit
20:10
Remove leftover unused procs. Extended `data::cc` to handle `%foo` as case-unfolded form of `foo`. No tests for this yet. check-in: 30928789a2 user: aku tags: relit
20:09
Make the area limits available to scripts. Tests added. check-in: 4c80a213e5 user: aku tags: relit
20:07
Standardize on `SMP` as the shorthand for all codepoints beyond the SMP. Updated generator. check-in: 9354659bfb user: aku tags: relit
2018-03-02
06:33
Merged bugfix from unicode redo into literal redo check-in: e549aa5b94 user: aku tags: relit
06:30
Merged bugfix from trunk into the unicode redo. check-in: d437c9ded0 user: aku tags: reunification
06:23
unicode. Fixed bad condition for handling the final element in negate-class, after the main loop. Triggered by the next-to-final element ending just before the UNI_MAX. Test added, demonstrating bug and fix. check-in: 546018b243 user: aku tags: trunk
2018-02-28
19:52
Merged accumulated unicode rework to literal rework. check-in: d0a032f526 user: aku tags: relit
19:51
Tool tweak. For each named class FOO generate additional named classes `FOO:bmp` and `FOO:high` which are limited to the parts of FOO inside and outside the BMP. Note: Empty classes are not generated at all. Example: `adlam:bmp` does not exist because it would be empty. In turn `adlam::high` == `adlam`. Note 2: The `:` in the names of the new classes prevents use from SLIF. This may change in the future, as it might be sensible to provide languages access to the limited classes. check-in: 975b4f5f90 user: aku tags: reunification
2018-02-27
23:47
SCR datastructure tweak. While we keep allocating the ranges as part of the main struct the definitions now allow for separate allocation (looking ahead to char classes stored in constant/static C structures). check-in: 9a2f1aa29a user: aku tags: reunification
23:32
Oops. Fixed tracing support broken by [bd70f9d2de] (intro of separate .c/.h files). check-in: b19a1b0d5f user: aku tags: reunification
21:48
Implemented ASSR, an ASBR equivalent for char classes based on surrogate pairs (for codepoints outside the BMP). Added tests. check-in: be0e4cf1a2 user: aku tags: reunification
17:41
Added `2char` to the low-level unicode support. Needed for the reworking of literal handling, see branch `relit`. Todo: `2ascr`, i.e. ASBR equivalent for char classes based on surrogate pairs. Todo: Intersections of named char classes with BMP and outside. Todo: Alias mechanism to save on storage space. check-in: 6c41cde6ea user: aku tags: reunification
2018-02-26
23:34
Moved main loop for reduction into the rstate class, with changes. The new main loop's API to reduction is a callback instead of a set of rule names. Redid the DSL commands of `reducer` as methods of `rstate`. Moved old public API to internal (renamed methods). Updated existing tests for the now internal methods. Added tests for the new public API. TODO: Reimplement the reducer rules as proper callbacks for the new state class. check-in: 35be2317de user: aku tags: relit
2018-02-23
06:58
Reworked internals of the normalizer. Split the big switch into a series of nicely contained procedures, each handling the type they are named for. Dispatch is dynamic on the type tag of the literal. Split the remainder (reducer, rstate, parser) out into their own packages. Updated users. Now ready to look into alternate implementations. check-in: f29be57f3c user: aku tags: relit
05:23
Split the remainder (reducer, rstate, parser) out into their own packages. Updated users. Now ready to look into alternate implementations. check-in: eae1cbc1d8 user: aku tags: relit
2018-02-22
23:35
Split utilities and normalization into their own packages. check-in: 682c4b1bb8 user: aku tags: relit
22:00
Fixed oops. Added the forgotten new test files. check-in: 0ac7c21442 user: aku tags: relit
21:55
Split the big literal package into several smaller, and more focused pieces. Started with slicing the testsuite. check-in: 1165c36601 user: aku tags: relit
06:09
Reworked the internals of the unicode layer. Strong split into C files and the C/Tcl glue. Exposed the codepoint validation as custom critcl type, and added use of it to 2utf. Extended 2utf and 2asbr with optional flag argument to control the coding (mutf, cesu, tcl is both). Due to the new argument being optional (and last) the existing call sites will not fail, and operate in full mode. Updated the layer's test suite demonstrating the mutf and cesu coding. Added more tests for invalid arguments. check-in: bd70f9d2de user: aku tags: reunification
2018-02-20
18:40
Added more literals outside of the BMP to test cases. check-in: 8d53644d07 user: aku tags: reunification
07:35
Created a Marpa parser for the string and CC lexemes. This parser handles all the various forms for character escapes. Plus a semantics backend which generates the internal literal representation directly from the AST. The above replaced the entire existing literal processor (parse, decode, unescape, type, tags). This was needed because Tcl (especially `subst` as the core of the old `unescape`) was/is not able to handle the full set of unicodepoints (at this time). Updated the bootstrap slif parser to handle the extended escapes too (\u hex x5/x6, \U hex x8). Updated tests, although not all. With this commit the entire input side is now able to handle the full set of unicode, with suitable escape sequences for characters outside of the BMP as well. check-in: ac7dc5acdc user: aku tags: reunification
2018-02-19
05:57
Merge C code generator fix into unicode work. check-in: eac4dc279c user: aku tags: reunification
05:55
marpa::gen::runtime::c Fixed mishandling of zero-length chunks. Generates bad C syntax. Triggered by grammar without any `:discard` clauses. The fix prevents insertion of discard chunks if there are no such. Furthermore now also errors out in the low level ChunkedArray code for zero-length chunks, to catch possible future problems. Reviewed all uses, made notes that none are zero-length now. Added a test demonstrating the possibility. check-in: 14698e1f84 user: aku tags: trunk
2018-02-18
10:27
Continued rework of the unicode layer. See first commit in the branch for the plan. Reworked the big tables of test cases in `literal.test` and moved their setup into separate files (`tests/cases/...`). check-in: a69c32992a user: aku tags: reunification
09:03
Continued rework of the unicode layer. See first commit in the branch for the plan. Created a wrapper around the foreach loops to make the tables of test cases look a bit nicer, and more semantic like. check-in: 60f9856faa user: aku tags: reunification
05:05
Continued rework of the unicode layer. See first commit in the branch for the plan. Removed full vs bmp from the `unidata` tool and generated tables. The tool now always generates tables covering the full unicode range. Updated some tests, but not all. The known test failures are in the various generators and the middleware, due to the CC differences coming from full coverage. Fixing it now does not make sense, because we will have to clean it up again after the introduction of MUTF-8 and CESU-8 support into the middle layers. We clean up these up after that is all done. check-in: 9db9661dae user: aku tags: reunification
03:38
Continued rework of the unicode layer. See first commit in the branch for the plan. Dropped ASBR and grammar generation for the named classes from the `unidata` tool. With ASBR creation in the C level of the main Marpa package it is fast enough to not require caching. This also removes the cache of byte ranges shared among the classes. Remember also that the C code generator backends do their own automatic sharing of byte ranges and refactoring for sharing. check-in: 604b92a271 user: aku tags: reunification
02:37
Start a rework of the unicode layer. The overall plan is to remove the distinction between bmp and full in this layer, and move it into the generators, with some support in the middle, i.e. in literal handling and the transform from codepoints to byte sequences. Optimization: Moved the commands `2utf`, `mode`, and `max` into the C layer. check-in: b481c06f7c user: aku tags: reunification
2018-02-03
00:51
Merged fix for the issue of RT-C mishandling the `proper`-flag into the branch where it was found. Updated test results to match. Marked a number of tests touching on unicode/utf handling as known bugs. Address them when the general utf handling trouble is more solved. Still to address: All the `i_` tests. check-in: 9ed3031891 user: aku tags: language-json
00:28
Fixed mismatch between Tcl and C runtimes. Issue in the C runtime. Forgot to properly convert a boolean `proper` into the flag taken by `marpa_g_sequence_new()`. Conversion added, test cases added. check-in: b86ffae080 user: aku tags: trunk
00:27
Re-enable full set of lang/result checks check-in: 94232b7386 user: aku tags: trunk
2018-02-02
23:19
Fix bad phrasing in comment check-in: 90ce04b75a user: aku tags: trunk
19:50
Added a script to run a fixed demo, from grammar to parser to its use. Plus example json files to use as input. check-in: f82c10dea8 user: aku tags: language-json
18:23
Continued testsuite work. Fixed definition of JSON `whitespace`. Regenerated parsers. Updated parse failure results to match. check-in: eb17740e4d user: aku tags: language-json
2018-02-01
23:56
Continued testsuite work - Added the n_* cases (must reject), and 1st round of results. Reorganized the input/ and result/ directories to separate the various groups better (y|n|i, c|tcl, ...) - A number failures to reject input. - 4x grammar error: \f is not whitespace. - 10x input accepted which should not be (c (bad) vs tcl (ok)) - process vs process-file differences in rt-c (encoding differences?) check-in: 20cb6ea243 user: aku tags: language-json
20:37
Pull rt-c bug fix into the branch which exposed the issue. check-in: 2945ca6cd1 user: aku tags: language-json
20:36
Testsuite work - Clean up of the support code, removed unused procedures. - Ensure that files are read with the proper encoding before fed into the string 'process' (See `fgetc` decl and use). - Allow setting of constraints, runtime-specific - Set __known bug__ constraint for eight y_* tests where rt-c currently diverges from rt-tcl (1). (1) These are all in the unicode/utf-8 handling, which differs between the available runtimes. * rt-tcl operates on chars and defers to Tcl's parsing of utf-8 sequences. * rt-c OTOH operates on bytes, does its own utf-8 parsing, and is more strict (invalid sequences are a parse error). I have to see if I can define a char class (:invalid:) to contain the invalid sequences. Using that would allow me to either accept or discard them (depending on context). Similarly I might have to allow the class of surrogates (:Cs:), as acceptable characters, and as sequences for the characters past BMP. That would allow such characters even in Marpa limited to BMP. These are all things in the MarpaTcl core however, and not something specific to JSON. JSON just exposed the issues here. check-in: 7ab2e4bf63 user: aku tags: language-json
20:17
Added tool similar to `od`, to decode and display utf8 sequences in the input (file, stdin). check-in: 669660f659 user: aku tags: trunk
20:15
Changed gate to lexer flush signaling from in-band `(byte) -1` to a separate function. This removes any possibility that a `(byte) -1` from actual input causes a bogus flush. Added debug function allowing INBOUND to properly print a batch of input bytes. Fixed a crash of the RT-C where the loop searching for the end of the lexeme tried to pop a byte from the empty lexeme, triggering an underflow assert. This may happen when `lexer_complete` is called for an empty-valued lexeme. I.e. when the GATE rejects the first byte after the end of a lexeme as invalid and signals a flush before any byte was entered into the lexer at all. Note that this does not necessarily indicate a mismatch. The current set of acceptable lexemes may contain some which allow an empty value. We have to keep recognizing them. And after that the new context may have caused the invalid byte to be valid. So we only skip the attempt of making an empty value even emptier. The deeper issue is that for LATM-mode symbols the earley-set id does not match the length of the lexeme due to the zero-width ACS guards in front; causing an additional round through the loop before it can declare mismatch. The concrete example which triggered the issue are the `string` and `lstring` symbols in the JSON grammar, for input `[""]`. check-in: 0de21b2314 user: aku tags: trunk
2018-01-31
21:16
Pull the Tcl lexer fix over into the branch where the issue was found. check-in: d411dda199 user: aku tags: language-json
21:09
Fixed typo in the spec of escaped characters in strings. Fixed definition of `control` characters for JSON. Updated the results to match the tweaked grammar. For the Tcl runtime all tests pass except a few showing mishandling of numeric lexemes. A fix for that is waiting on trunk. RT-C still crashing. check-in: 6dfddb13e8 user: aku tags: language-json
20:58
Fix mishandling of lexemes interpretable as Tcl number by the Tcl runtime (lexer component). By going through `expr` a lexeme which looks like a number can be shimmered and may change its string rep when printed. Example: For JSON the lexeme `1E-2` became `0.01`. check-in: 2a442c3255 user: aku tags: trunk
00:51
The json testsuite is becoming more functional. Of the must-accept-inputs only 10 failures over 95 inputs. Some unexpected parse failures with bogus inputs. These are in part - Possibly due to reading of input with the wrong encoding (Need utf-8?). - Unexpected numeric reformatting reaching the AST (1E-2 vs 0.01) One crash in the RT-C to investigate. Tweaked the grammar a bit to have proper symbols for the constants, and to separate G1 and L0 better. check-in: dda6670b00 user: aku tags: language-json
2018-01-30
23:25
Pulled fix for Tcl code generator issue into the branch where it was discovered. check-in: 5f8cb41c75 user: aku tags: language-json
23:19
Fix issue in the core code generator for parsers and lexer using the Tcl-based runtime. A bug in package `char` (See `char quote tcl`) caused the generation of bogus Tcl charclass regexes from the internal data, when non-ASCII characters in [:control:] are involved. The generator now works around the issue. check-in: 65b1517840 user: aku tags: trunk
21:02
Added the first larger grammar example outside of the SLIF meta grammar: JSON. Known issues at this point: * Due to apparent trouble with Kettle (`build.tcl test` seems to ignore `--include-dir`) the testsuite is not yet functional. A basic test via `tools/trial` however works. * The generated Tcl parser is bogus. The main character class for string characters (`plain`) is bogus, it contains a bad range which is rejected by Tcl's `regexp` during parser construction. The C-based parser is ok however, modulo lurking unknowns. check-in: 5199afa673 user: aku tags: language-json
10:17
Fix oops, forgot to add test output for the slif meta grammar. check-in: 466c1ebc4d user: aku tags: trunk
10:16
Added formatter producing a SLIF grammar from a grammar container. Note, this is not fully round-trip at the moment (The special @LEX symbols can not be read back, violating identifier syntax). It is also sub-optimal with regard to LATM flags, g1 actions, etc. These are shown as attributes of each rule instead of making use of defaults to reduce duplication. It should be good enough however to serve as debugging aid. check-in: 3bfc0de63c user: aku tags: trunk
2018-01-29
19:28
Extended the set of formatters producing code initializing a grammar container (GC). Renamed the existing GC formatter to `gc-compact`. Added two formatters to generate non-compact human-readable code, using reduction rules for Tcl and C. check-in: 8d77fed34b user: aku tags: trunk
2017-10-17
16:30
README tweaks check-in: d2d1b00d53 user: aku tags: trunk
16:22
Updated the README to match the current organization of the (code in the) repository. check-in: f45f21924c user: aku tags: trunk
03:18
Merged fixes on flush behaviour to mainline. check-in: 62d99b6274 user: aku tags: trunk
03:13
Fixed demo grammar (wrong start symbol), then shown fix vs not in Tcl vs C runtimes. Then fixed C runtime flush behaviour. Further fixed mishandling of lexeme value and length in the presence of redo. Closed-Leaf check-in: a78dda3a4d user: aku tags: flush-fix
2017-10-16
23:17
Demonstrate the multi-flush bug. Fixed RT-C issue with actual lastchar lost/overwritten by redo, messing up the error message generated. check-in: 886eb6bb40 user: aku tags: trunk
22:24
And back check-in: ce762c6d5a user: aku tags: trunk
22:20
Pull trunk. Closed-Leaf check-in: f32641a83d user: aku tags: runtime-tests
22:12
Pull in the fix for L0 discard past G1 end, updated tests, fixed a few more things in the Tcl runtime (To early destruction of the parser-level recognizer prevented generation of proper error message for non-discard token after G1 end). check-in: bbff87f317 user: aku tags: runtime-tests
21:17
Added tests demonstrating bad behaviour when exhausting a parser while still having input (discards and not). check-in: e288571010 user: aku tags: runtime-tests
20:30
Added foundation of testing the runtime with arbitrary grammar/input pairs, and highlevel test drivers for the Tcl and C runtimes. check-in: e7ab54549b user: aku tags: runtime-tests
2017-10-15
16:55
Use OSX fixes. They were done as separate branches to remember to check behaviour when back on linux. check-in: 97bbaff3f9 user: aku tags: trunk
16:54
Silence compiler complaint on OSX. Leaf check-in: 09b264fb4a user: aku tags: osx-complaints
16:53
Added return after assert to silence compiler comlaint (OSX). check-in: 12ad722f66 user: aku tags: osx-complaints
16:50
Fixed problems in the handling of charclass as set of code-points and -ranges. Range validation was incomplete, allowing bad input to crash. Fixed, and tests added. Tracing as well, plus more notes when certain code paths will be reached. check-in: ac18987fd3 user: aku tags: trunk
04:46
Moving critcl after tcl solves OSX issue with install dependency order. Check if this breaks linux. Leaf check-in: 243e280f60 user: aku tags: build-order-trouble
2017-10-12
06:59
Tcl runtime. Fix flush issue where partial flush and redo needs recognition check-in: f26d4f328e user: aku tags: flush-fix
2017-10-11
05:28
Mark recognizer cons/dest points better check-in: 08e6e9634d user: aku tags: trunk
2017-10-06
22:01
Equivalent changes in the C runtime. 1. The C runtime already inter-twined tree extraction, valuation and hand-over which was added to the Tcl runtime in the previous commit. 2. Fixed same issue with possible L0 discards after G1 end. 3. Fixed bad assertions in symset, byteset, exposed by 2. check-in: 32c320340a user: aku tags: trunk
20:12
Reworked parser completion handling. Do not pull and save all possible parse trees into memory anymore. Instead eval each tree immediately after extraction and pass the resulting SV to the outer backend. Further a bug fix, tell the lexer about expected terminals (none), so that it can still handle any L0 discards which may occur after the G1 end symbol. I.e. while we are not expecting the G1 token stream to continue the L0 byte stream may still have input to process. TODO: Have to add test cases for this situation, both where only the expected discards occur, and where unexpected actual G1 tokens are present. check-in: 8c6bdade0a user: aku tags: trunk
19:36
Fix in Tcl runtime tracing. Bring necessary variable into scope. This was forgotten when placing various operations into their own methods for clarity. check-in: bbe2253bdb user: aku tags: trunk
19:33
Debugging enhancement, show actual semantic values in valuation steps. check-in: 4f1c755959 user: aku tags: trunk
19:31
Big tangle of single package sliced into several packages, each containing just related code. check-in: 5fde5977d2 user: aku tags: trunk
2017-10-05
21:51
Fix package meta data typo. Closed-Leaf check-in: d38f475f67 user: aku tags: slice
21:39
Reworked naming of the generator packages, and associated namespaces. Searching for plugins, i.e. more generators is now simpler (no special cases to exclude). check-in: 912cadf759 user: aku tags: slice
18:59
Updated marpa-gen to new sliced setup, and filled `list-plugins` in marpa::export::config. Next up, look into renaming packages for nicer structure. Start with exporters. check-in: 649487dd0c user: aku tags: slice
08:06
Heal fork, complete. check-in: 444c10e2e4 user: aku tags: slice
08:05
Heal fork Closed-Leaf check-in: 2175b86257 user: aku tags: slice-2
08:04
Split the remaining pieces into three packages: - C runtime - builtin parser (C runtime) - Low-level C wrapper for Tcl runtime foundation Updated tests to work again. More reshuffling. check-in: 77883b0ffd user: aku tags: slice-2