list(n) 1.2.2 struct "Tcl Data Structures"

NAME

list - Procedures for manipulating lists

SYNOPSIS

package require Tcl 8.0
package require struct ?1.3?

::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOccurs?
::struct::list lcsInvert lcsData len1 len2
::struct::list lcsInvert2 lcs1 lcs2 len1 len2
::struct::list lcsInvertMerge lcsData len1 len2
::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
::struct::list reverse sequence
::struct::list assign sequence ?varname?...
::struct::list flatten ?-full? ?--? sequence
::struct::list map sequence cmdprefix
::struct::list fold sequence initialvalue cmdprefix
::struct::list iota n
::struct::list equal a b
::struct::list repeat value size...

DESCRIPTION

The ::struct::list namespace contains several useful commands for processing Tcl lists. Generally speaking, they implement algorithms more complex or specialized than the ones provided by Tcl itself.

It exports only a single command, struct::list. All functionality provided here can be reached through a subcommand of this command.

COMMANDS

::struct::list longestCommonSubsequence sequence1 sequence2 ?maxOccurs?
Returns the longest common subsequence of elements in the two lists sequence1 and sequence2. If the maxOccurs parameter is provided, the common subsequence is restricted to elements that occur no more than maxOccurs times in sequence2.

The return value is a list of two lists of equal length. The first sublist is of indices into sequence1, and the second sublist is of indices into sequence2. Each corresponding pair of indices corresponds to equal elements in the sequences; the sequence returned is the longest possible.

::struct::list longestCommonSubsequence2 sequence1 sequence2 ?maxOccurs?
Returns an approximation to the longest common sequence of elements in the two lists sequence1 and sequence2. If the maxOccurs parameter is omitted, the subsequence computed is exactly the longest common subsequence; otherwise, the longest common subsequence is approximated by first determining the longest common sequence of only those elements that occur no more than maxOccurs times in sequence2, and then using that result to align the two lists, determining the longest common subsequences of the sublists between the two elements.

As with longestCommonSubsequence, the return value is a list of two lists of equal length. The first sublist is of indices into sequence1, and the second sublist is of indices into sequence2. Each corresponding pair of indices corresponds to equal elements in the sequences. The sequence approximates the longest common subsequence.

::struct::list lcsInvert lcsData len1 len2
This command takes a description of a longest common subsequence (lcsData), inverts it, and returns the result. Inversion means here that as the input describes which parts of the two sequences are identical the output describes the differences instead.

To be fully defined the lengths of the two sequences have to be known and are specified through len1 and len2.

The result is a list where each element describes one chunk of the differences between the two sequences. This description is a list containing three elements, a type and two pairs of indices into sequence1 and sequence2 respectively, in this order. The type can be one of three values:

added
Describes an addition. I.e. items which are missing in sequence1 can be found in sequence2. The pair of indices into sequence1 describes where the added range had been expected to be in sequence1. The first index refers to the item just before the added range, and the second index refers to the item just after the added range. The pair of indices into sequence2 describes the range of items which has been added to it. The first index refers to the first item in the range, and the second index refers to the last item in the range.

deleted
Describes a deletion. I.e. items which are in sequence1 are missing from sequence2. The pair of indices into sequence1 describes the range of items which has been deleted. The first index refers to the first item in the range, and the second index refers to the last item in the range. The pair of indices into sequence2 describes where the deleted range had been expected to be in sequence2. The first index refers to the item just before the deleted range, and the second index refers to the item just after the deleted range.

changed
Describes a general change. I.e a range of items in sequence1 has been replaced by a different range of items in sequence2. The pair of indices into sequence1 describes the range of items which has been replaced. The first index refers to the first item in the range, and the second index refers to the last item in the range. The pair of indices into sequence2 describes the range of items replacing the original range. Again the first index refers to the first item in the range, and the second index refers to the last item in the range.


 
    sequence 1 = {a b r a c a d a b r a}
    lcs 1      =   {1 2   4 5     8 9 10}
    lcs 2      =   {0 1   3 4     5 6 7}
    sequence 2 =   {b r i c a     b r a c}

    Inversion  = {{deleted  {0  0} {-1 0}}
                  {changed  {3  3}  {2 2}}
                  {deleted  {6  7}  {4 5}}
                  {added   {10 11}  {8 8}}}

Notes:



::struct::list lcsInvert2 lcs1 lcs2 len1 len2
Similar to lcsInvert. Instead of directly taking the result of a call to longestCommonSubsequence this subcommand expects the indices for the two sequences in two separate lists.

::struct::list lcsInvertMerge lcsData len1 len2
Similar to lcsInvert. It returns essentially the same structure as that command, except that it may contain chunks of type unchanged too.

These new chunks describe the parts which are unchanged between the two sequences. This means that the result of this command describes both the changed and unchanged parts of the two sequences in one structure.

 
    sequence 1 = {a b r a c a d a b r a}
    lcs 1      =   {1 2   4 5     8 9 10}
    lcs 2      =   {0 1   3 4     5 6 7}
    sequence 2 =   {b r i c a     b r a c}

    Inversion/Merge  = {{deleted   {0  0} {-1 0}}
                        {unchanged {1  2}  {0 1}}
                        {changed   {3  3}  {2 2}}
                        {unchanged {4  5}  {3 4}}
                        {deleted   {6  7}  {4 5}}
                        {unchanged {8 10}  {5 7}}
                        {added    {10 11}  {8 8}}}



::struct::list lcsInvertMerge2 lcs1 lcs2 len1 len2
Similar to lcsInvertMerge. Instead of directly taking the result of a call to longestCommonSubsequence this subcommand expects the indices for the two sequences in two separate lists.

::struct::list reverse sequence
The subcommand takes a single sequence as argument and returns a new sequence containing the elements of the input sequence in reverse order.

::struct::list assign sequence ?varname?...
The subcommand assigns the first n elements of the input sequence to the zero or more variables whose names were listed after the sequence, where n is the number of specified variables.

If there are more variables specified than there are elements in the sequence the empty string will be assigned to the superfluous variables.

If there are more elements in the sequence than variable names specified the subcommand returns a list containing the unassigned elements. Else an empty list is returned.

 
    tclsh> ::struct::list assign {a b c d e} foo bar
    c d e
    tclsh> set foo
    a
    tclsh> set bar
    b



::struct::list flatten ?-full? ?--? sequence
The subcommand takes a single sequence and returns a new sequence where one level of nesting was removed from the input sequence. In other words, the sublists in the input sequence are replaced by their elements.

The subcommand will remove any nesting it finds if the option -full is specified.

 
    tclsh> ::struct::list flatten {1 2 3 {4 5} {6 7} {{8 9}} 10}
    1 2 3 4 5 6 7 {8 9} 10
    tclsh> ::struct::list flatten -full {1 2 3 {4 5} {6 7} {{8 9}} 10}
    1 2 3 4 5 6 7 8 9 10



::struct::list map sequence cmdprefix
The subcommand takes a sequence to operate on and a command prefix (cmdprefix) specifying an operation, applies the command prefix to each element of the sequence and returns a sequence consisting of the results of that application.

The command prefix will be evaluated with a single word appended to it. The evaluation takes place in the context of the caller of the subcommand.

 
    tclsh> # squaring all elements in a list

    tclsh> proc sqr {x} {expr {$x*$x}}
    tclsh> ::struct::list map {1 2 3 4 5} sqr
    1 4 9 16 25

    tclsh> # Retrieving the second column from a matrix
    tclsh> # given as list of lists.

    tclsh> proc projection {n list} {::lindex $list $n}
    tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
    b 2 f



::struct::list fold sequence initialvalue cmdprefix
The subcommand takes a sequence to operate on, an arbitrary string initial value and a command prefix (cmdprefix) specifying an operation.

The command prefix will be evaluated with two words appended to it. The second of these words will always be an element of the sequence. The evaluation takes place in the context of the caller of the subcommand.

It then reduces the sequence into a single value through repeated application of the command prefix and returns that value. This reduction is done by

1
Application of the command to the initial value and the first element of the list.

2
Application of the command to the result of the last call and the second element of the list.

...
i
Application of the command to the result of the last call and the i'th element of the list.

...
end
Application of the command to the result of the last call and the last element of the list. The result of this call is returned as the result of the subcommand.


 
    tclsh> # summing the elements in a list.
    tclsh> proc + {a b} {expr {$a + $b}}
    tclsh> ::listx fold {1 2 3 4 5} 0 +
    15



::struct::list iota n
The subcommand returns a list containing the integer numbers in the range [0,n). The element at index i of the list contain the number i.

For "n == 0" an empty list will be returned.

::struct::list equal a b
The subcommand compares the two lists a and b for equality. In other words, they have to be of the same length and have to contain the same elements in the same order. If an element is a list the same definition of equality applies recursively.

A boolean vlaue will be returned as the result of the command. This value will be true if the two lists are equal, and false else.

::struct::list repeat value size...
The subcommand creates a (nested) list containing the value in all positions. The exact size and degree of nesting is determined by the size arguments, all of which have to be integer numbers greater than or equal to zero.

A single argument size which is a list of more than one element will be treated as if more than argument size was specified.

If only one argument size is present the returned list will not be nested, of length size and contain value in all positions. If more than one size argument is present the returned list will be nested, and of the length specified by the last size argument given to it. The elements of that list are defined as the result of Repeat for the same arguments, but with the last size value removed.

An empty list will be returned if no size arguments are present.

 
    tclsh> lrepeat  0 3 4
    {0 0 0} {0 0 0} {0 0 0} {0 0 0}
    tclsh> lrepeat  0 {3 4}
    {0 0 0} {0 0 0} {0 0 0} {0 0 0}
    tclsh> lrepeat  0 {3 4 5}
    {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}}

LONGEST COMMON SUBSEQUENCE AND FILE COMPARISON

The longestCommonSubsequence subcommand forms the core of a flexible system for doing differential comparisons of files, similar to the capability offered by the Unix command diff. While this procedure is quite rapid for many tasks of file comparison, its performance degrades severely if sequence2 contains many equal elements (as, for instance, when using this procedure to compare two files, a quarter of whose lines are blank. This drawback is intrinsic to the algorithm used (see the Reference for details).

One approach to dealing with the performance problem that is sometimes effective in practice is arbitrarily to exclude elements that appear more than a certain number of times. This number is provided as the maxOccurs parameter. If frequent lines are excluded in this manner, they will not appear in the common subsequence that is computed; the result will be the longest common subsequence of infrequent elements. The procedure longestCommonSubsequence2 implements this heuristic. It functions as a wrapper around longestCommonSubsequence; it computes the longest common subsequence of infrequent elements, and then subdivides the subsequences that lie between the matches to approximate the true longest common subsequence.

REFERENCES

J. W. Hunt and M. D. McIlroy, "An algorithm for differential file comparison," Comp. Sci. Tech. Rep. #41, Bell Telephone Laboratories (1976). Available on the Web at the second author's personal site: http://www.cs.dartmouth.edu/~doug/

KEYWORDS

assign, common, comparison, diff, differential, equal, equality, flatten, folding, list, longest common subsequence, map, reduce, repeating, repetition, reverse, subsequence

COPYRIGHT

Copyright © 2003 by Kevin B. Kenny. All rights reserved