TIP 111: Dictionary Values and Manipulators

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2017 Conference, Houston/TX, US, Oct 16-20
Send your abstracts to tclconference@googlegroups.com
by Aug 21.
Author:         Donal K. Fellows <donal.k.fellows@man.ac.uk>
Author:         David S. Cargo <dcargo@marix.com>
State:          Final
Type:           Project
Vote:           Done
Created:        05-Oct-2002
Post-History:   
Tcl-Version:    8.5

Abstract

This TIP proposes adding a standard value format (and supporting commands) to Tcl that implements a value-to-value mapping, just as Tcl's list values can be regarded as implementing a number-to-value mapping.

Rationale

What is a dictionary? It is a translation from arbitrary values to arbitrary values, often also known as an associative map. Many computer languages, especially higher-level ones, have them as part of the language or the standard library. It would be nice to have them in Tcl too.

Now, I realise that Tcl already contains arrays which provide dictionary functionality, but they are not quite the same thing. Tcl's arrays are collections of variables indexable by name, and not collections of values. This has some far-reaching implications; it is possible to set traces on individual elements of the array, but it is not possible to pass the array by value. However, one of the main concerns is the sheer cost of arrays in terms of memory space; aside from the hash table used as the core of the implementation (and the representations of the keys and values, of course) there is a substantial overhead for each array to support traces on the array as a whole, plus a similar overhead per element that stems from the fact that elements are variables in their own right. By contrast, a dictionary value should be a lot more frugal.

Value Syntax and Semantics

Naturally, it is desirable for dictionary values to have human-readable forms that are similar to those that currently exist. I propose using key value key value ... form with list-style quoting for keys and values that contain characters that are significant to Tcl, which should be immediately familiar to users of the array get and array set commands. No special interpretation will be placed on the amount of whitespace separating keys and values, just as with lists (indeed, any list with an even number of elements can be regarded as a dictionary.) For example, the following value represents a mapping from selected languages to a possible program to invoke to compile them:

	C gcc C++ g++ FORTRAN f77 Java javac

Empty dictionaries are those that contain no mappings from keys to values. Any representation of an empty list will also be a representation of an empty dictionary. There is no upper bound on the number of items that a dictionary may hold.

It should be specially noted that dictionary values have copy-on-write semantics just like lists. This means that if I hand a dictionary value into a procedure as an argument, and that procedure updates the variable containing that value, the value as seen by the caller will not have changed. This is in complete contrast with arrays which cannot (currently) be passed by value other than through using array get to convert the array to a list form and array set to convert back again.

This specification does not state what order the keys and values are listed in. That depends on the implementation.

Command Syntax and Semantics

I propose that all operations that work with dictionary values (where not done through adaptations of existing commands) will go through the dict command. The alternatives are "array" which is already in use, "dictionary" which is rather long for what I believe will be a fairly commonly used command, "alist" (association list) which is probably too easy to confuse with existing commands, and "map" which is probably better reserved for future use as something for applying an operation to a list (or other collection of values), "hash" (which is perhaps too common), and "table" (which is used for this type of data structure in the Icon programming language).

Most subcommands operate on either a dictionary value (exists, for, get, info, keys, remove, replace, size, and values), or on a variable containing a dictionary value (append, incr, lappend, set, and unset).

Proposed subcommands:

dict create: Make a dictionary.

dict create ?key1 value1 key2 value2 ...?

This will create a new dictionary from the given keys and values and return it as the result. The command will take an even number of arbitrary strings (or other objects, naturally) and will use the first, third, fifth, etc. as keys and the second, fourth, sixth, etc. as values. From the point of view of string representations, this command will behave the same as the list command with an even number of arguments. There is no restriction on the possible representations of keys or values. It is legal to call this command with no arguments at all, which creates an empty dictionary.

dict get: Get value for given key.

dict get dictionaryValue ?key ...?

Given a dictionary value (first argument) and a key (second argument), this will retrieve the value for that key. Where several keys are supplied, the behaviour of the command shall be as if the result of dict get dictVal key was passed as the first argument to **dict get_ with the remaining arguments as second (and possibly subsequent) arguments. This facilitates lookups in nested dictionaries. For example, the following two commands are equivalent:

	dict get $dict foo bar spong
	dict get [dict get [dict get $dict foo] bar] spong

If no keys are provided, dict would return a list containing pairs of elements in a manner similar to array get. That is, the first element of each pair would be the key and the second element would be the value for that key.

It is an error to attempt to retrieve a value for a key that is not present in the dictionary.

dict replace: Create a new dictionary that is a copy of an old one except with some values different or some extra key/value pairs added.

dict replace dictionaryValue ?key value ...?

This is very much the analogue of lreplace, taking a dictionary value as its first argument and then a list of key/value pairs. The result of the command is a new dictionary value that is a copy of the supplied dictionary other than that whenever a key is one of those supplied to this command, the returned dictionary will map that key to the associated value. It is legal for this command to be called with no key/value pairs, but illegal for this command to be called with a key but no value.

dict remove: Create a new dictionary that is a copy of an old one except without the key/value mappings whose keys are listed.

dict remove dictionaryValue ?key key ...?

This operation does what dict replace can't do; removes keys and values. The result of the command is a new dictionary value that does not contain mappings for any of the keys listed; it is not an error if either there are no keys listed, or if any of the listed keys does not exist in the supplied dictionary.

dict set: Set value for given key in a dictionary in a variable.

dict set dictionaryVar key ?key ...? value

This operation takes the name of a variable containing a dictionary value and places an updated dictionary value in that variable containing a mapping from the given key to the given value. In a manner analogous to lset, where multiple keys are present, they do indexing into nested dictionaries.

dict unset: Remove association for given key in a dictionary in a variable.

dict unset dictionaryVar key ?key ...?

This operation takes the name of a variable containing a dictionary value and places an updated dictionary value in that variable that does not contain a mapping for the given key. Where multiple keys are present, this describes a path through nested dictionaries to the mapping to remove. At least one key must be specified.

dict keys: List all keys (with optional criteria matching) in dictionary.

dict keys dictionaryValue ?globPattern?

Return a list of all keys in the given dictionary value. If a pattern is supplied, only those keys that match it (according to the rules of string match) will be returned. The returned keys will be in an arbitrary implementation-specific order.

dict values: List all values (with optional criteria matching) in the dictionary.

dict values dictionaryValue ?globPattern?

Return a list of all values in the given dictionary value. If a pattern is supplied, only those values that match it (according to the rules of string match) will be returned. The returned keys will be in an arbitrary implementation-specific order, though where no pattern is supplied the ith key returned by dict keys will be the key for the ith value returned by dict values applied to the same dictionary value.

dict for: Iterate across all key/value mappings in the dictionary.

dict for {keyVar valueVar} dictionaryValue body

This takes three arguments, the first a pair of variable names (for the key and value respectively of each mapping in the dictionary), the second the dictionary value to iterate across, and the third a script to be evaluated for each mapping with the key and value variables set appropriately (in the manner of foreach.) The result of the command is an empty string. If any evaluation of the body generates a TCL_BREAK result, no further pairs from the dictionary will be iterated over and the dict for command will terminate successfully immediately. If any evaluation of the body generates a TCL_CONTINUE result, this shall be treated exactly like a normal TCL_OK result.

dict filter: Create a new dictionary from an old one containing just a selection of key/value pairs.

dict filter dictionaryValue key globPattern

dict filter dictionaryValue value globPattern

dict filter dictionaryValue script {keyVar valueVar} script

This takes a dictionary value and returns a new dictionary that contains just those key/value pairs that match the specified rule. Three rules are outlined above. The key rule only matches those key/value pairs whose keys match the given glob-style pattern. The value rule only matches those key/value pairs whose values match the given glob-style pattern. The script rule tests for matching by assigning the key to the keyVar and the value to the valueVar, and then evaluating the given script which should return a boolean value (with the key/value pair only being included in the result of the dict filter when a true value is returned.)

dict append: Append a string to the value for a particular key in the dictionary.

dict append dictionaryVar key ?string ...?

This appends the given string (or strings) to the value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to an empty string.

dict incr: Increment the value for a particular key in the dictionary.

dict incr dictionaryVar key ?increment?

This adds the given increment value (an integer that defaults to 1 if not specified) to the value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to 0. It is an error to increment a value for an existing key if that value is not an integer.

dict lappend: Append an item to the list-value for a particular key in the dictionary.

dict lappend dictionaryVar key ?item ...?

This appends the given items to the list value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to an empty list, and it is legal for there to be no items to append to the list. It is an error for the value that the key maps to to not be representable as a list.

dict exists: Test whether a mapping exists for a key.

dict exists dictionaryValue key ?key ...?

This returns a boolean value indicating whether the given key (or path of keys through a set of nested dictionaries) exists in the given dictionary value. This returns a true value exactly when dict get on that path will succeed.

dict size: Get the number of key/value mappings in a dictionary.

dict size dictionaryValue

This returns the size of the dictionary, which will be exactly half the value that llength dictionaryValue would return. It is an error to apply this command to a non-dictionary value.

dict info: Get implementation-specific information about the dictionary value.

dict info dictionaryValue

This returns information (intended for display to people) about the given dictionary though the format of this data is dependent on the implementation of the dictionary. For dictionaries that are implemented by hash tables, it is expected that this will return the string produced by Tcl_HashStats().

Other Related Changes

There are a few other commands that change:

Naturally, dictionary handling will form its own maintenance area. [16] and [24] will be updated as necessary.

C API

There will be a new public structure and a few new public functions to allow C-level access to dictionary values:

The new structure (called Tcl_DictSearch) will be there to allow for searches (i.e. traversals) of a dictionary. This TIP does not specify the fields of the structure; the declaration is just to allow for allocation of these structures on the C stack.

Many public API functions are capable of generating error messages; these generally indicate some type-conversion failure. Sharing constraint violations (where applicable) cause panics as they indicate basic programming errors which should not be causable by scripts. The public API functions are:

Tcl_Obj *Tcl_NewDictObj(void);

Tcl_Obj *Tcl_DbNewDictObj(CONST char *file, int line);

These functions (in non-debug and debug versions) create a new dictionary object and return it.

int Tcl_DictObjPut(Tcl_Interp *interp, Tcl_Obj *dictPtr, Tcl_Obj *keyPtr, Tcl_Obj *valuePtr);

This function inserts a new key/value pair into a dictionary, or updates a key/value pair already in the dictionary. The dictionary object must be unshared. Note that both the key and value objects will have their reference count increased. The return value is a normal TCL_OK/TCL_ERROR result, with the interp for error reporting.

int Tcl_DictObjGet(Tcl_Interp *interp, Tcl_Obj *dictPtr, Tcl_Obj *keyPtr, Tcl_Obj **valuePtrPtr);

This function looks up the value for a key in a dictionary. The variable pointed to by the valuePtrPtr argument is updated to contain a reference to the value, or a NULL if there is no mapping for the key in the dictionary. No reference counts are manipulated by this function. The return value is a normal TCL_OK/TCL_ERROR result, with the interp for error reporting.

int Tcl_DictObjRemove(Tcl_Interp *interp, Tcl_Obj *dictPtr, Tcl_Obj *keyPtr);

This function removes the key/value pair with the given key from the dictionary. It is not an error if the key is not present in the dictionary. The dictionary must be unshared. The return value is a normal TCL_OK/TCL_ERROR result, with the interp for error reporting.

int Tcl_DictObjSize(Tcl_Interp *interp, Tcl_Obj *dictPtr, int *sizePtr);

This function updates the integer variable pointed to by sizePtr with the number of key/value pairs in the dictionary. The return value is a normal TCL_OK/TCL_ERROR result, with the interp for error reporting.

int Tcl_DictObjFirst(Tcl_Interp *interp, Tcl_Obj *dictPtr, Tcl_DictSearch *searchPtr, Tcl_Obj **keyPtrPtr, Tcl_Obj **valuePtrPtr, int *donePtr);

This function starts a search of (i.e. iteration over) the given dictionary, using the structure pointed to by searchPtr as context. The return value is a normal TCL_OK/TCL_ERROR result, with the interp for error reporting. Three variables are updated to indicate what was found; keyPtrPtr is used for reporting the key of a key/value pair and valuePtrPtr is used for reporting the corresponding value. Finally, donePtr is used for indicating whether the search has found all the values; if the variable it points to is set to 1 there are no key/value pairs in the dictionary (i.e. the variables pointed to by keyPtrPtr and valuePtrPtr were not updated), but if the variable is set to 0, a key/value pair was found and Tcl_DictObjNext() should be called to discover whether that was the last value or if there are further ones in the dictionary. Note that if this function indicates that the search is not done but the calling code wishes to extract no further values from the dictionary, Tcl_DictObjDone() must be called to release the internal locks on the representation of the value.

void Tcl_DictObjNext(Tcl_DictSearch *searchPtr, Tcl_Obj **keyPtrPtr, Tcl_Obj **valuePtrPtr, int *donePtr);

This function gets the next key/value pair from the search of a dictionary, using the search referenced by searchPtr. The meaning of the keyPtrPtr, valuePtrPtr and donePtr variables is much the same as in Tcl_DictObjFirst(), along with the restriction that if the function indicates that the search is not done but the calling code wishes to extract no further values from the dictionary, Tcl_DictObjDone() must be called to release the internal locks on the representation of the value.

void Tcl_DictObjDone(Tcl_DictSearch *searchPtr);

This function terminates a search of a dictionary before all the values in the dictionary have been iterated over, releasing the internal locks on the dictionary representation.

int Tcl_DictObjPutKeyList(Tcl_Interp *interp, Tcl_Obj *dictPtr, int keyc, Tcl_Obj *CONST *keyv, Tcl_Obj *valuePtr);

This function is a variant on Tcl_DictObjPut() that takes a list of keys so as to work with nested dictionaries.

int Tcl_DictObjRemoveKeyList(Tcl_Interp *interp, Tcl_Obj *dictPtr, int keyc, Tcl_Obj *CONST *keyv);

This function is a variant on Tcl_DictObjRemove() that takes a list of keys so as to work with nested dictionaries.

Examples

Counting the number of unique words in a file and the number of times each word occurs:

set f [open someFile.txt]
set contents [read $f]
close $f
foreach word [regexp -all -inline {\w+} $contents] {
   dict incr count $word
}
puts "There are [dict size $count] unique words."
foreach word [lsort -dictionary [dict keys $count]] {
   puts "${word}: [dict get $count $word] occurrences"
}

A localisable string toupper implementation:

set capital [dict create C [dict create]]
foreach c {abcdefghijklmnopqrstuvwxyz} {
   dict set capital C $c [string toupper $c]
}
dict set capital en [dict get $capital C]
# ... and so on for other supported languages ...
set upperCase [string map [dict get $capital $env(LANG)] $string

Copyright

This document has been placed in the public domain.


Appendices

These appendices are not formally part of the proposal and exist merely to help understanding.

~ Implementation Notes

Implement using hash tables (of course.) Need efficient ways to convert to/from lists, perhaps making lists know what's going on underneath the covers?

~ Future Directions

Alternate implementations of mappings, like trees or disk-backed databases?

History