Array uniq, without list translation.

marsd · Apr 12, 2002

For one reason or another I always need to uniquely
sort an array, by string value. Using lists is fine, but it gets messy, and doesn't really allow you to become very complex with the stored data.
I know tclx has keyed lists and that 8.4 has
improvements, but has anyone else ever figured out how
to pass an array to a function and return it sorted in
base tcl?

I came up with this, but I feel it is very flawed.
If anyone has a better idea, even with the value
passing from different stack levels or a way to rename
the global array in the calling scope to the original array name that would be great.

proc sort_array {arr {lvl "#0"}} {
global loc_arr
upvar $lvl $arr loc

set orig [subst -novariables $arr]
array set loc_arr {}
set x 0
foreach elem [array names loc] {
set loc_arr([incr x]) $loc($elem)
}

for {set i 1} {$i <= [array size loc_arr]} {incr i} {
for {set m 1} {$m <= [array size loc_arr]} {incr m} {
if {[string compare $loc_arr($i) $loc_arr($m)] == 0} {
set t $loc_arr($i)
set loc_arr($m) "" ; set loc_arr($i) ""
set loc_arr($i) $t
puts "Element $i val to $t."
}
}
}
foreach el [array names loc_arr] {
if {[string length $loc_arr($el)] < 1} {
unset loc_arr($el)
}
}
uplevel $lvl "array unset $orig"
return
}

HAVE A NICE WEEKEND.

ulis · Apr 12, 2002

I'm not sure to understand what you want.
Tcl arrays do not preserve the order in which elements are added.
So I cheated and put a sorted list in the array() element (this with an empty key).
Here are a key_sorting proc and a value_sorting proc (a bit more complex).

Code:

array set array1 {
  one   01
  two   02
  three 03
  four  04
  five  05
  six   06
  seven 07
  eight 08
  nine  09
  ten   10
}

array set array2 {
  eight 08
  five  05
  four  04
  nine  09
  one   01
  seven 07
  six   06
  ten   10
  three 03
  two   02
}

proc key_sort {array_name} {
  upvar 1 $array_name array
  set array() [lsort [array names array]]
}

proc getn {array_name n} {
  upvar 1 $array_name array
  set key [lindex $array() $n]
  return [list $key $array($key)]
}

proc print {array_name} {
  upvar 1 $array_name array
  set max [llength $array()]
  for {set i 0} {$i < $max} { incr i }   {
    puts [join [getn array $i] \t]
  }
}

proc value_sort {array_name} {
  upvar 1 $array_name array
  foreach key [array names array]   { set values($array($key)) $key }
  key_sort values
  set max [llength $values()]
  for {set i 0} {$i < $max} { incr i }   {
    set list [getn values $i]
    lappend array() [lindex $list 1]
  }
}

catch { console show }
# entered in order/unsorted
puts &quot;--------&quot;
puts &quot;unsorted&quot;
puts &quot;--------&quot;
foreach key [array names array1] { puts &quot;$key\t$array1($key)&quot; }
# key sorting
key_sort array1
# printing
puts &quot;-----------&quot;
puts &quot;key sorting&quot;
puts &quot;-----------&quot;
print array1
# value sorting
value_sort array2
# printing
puts &quot;-------------&quot;
puts &quot;value sorting&quot;
puts &quot;-------------&quot;
print array2

ulis

marsd · Apr 13, 2002

There is at least the opportunity to see a better programming style here

But it seems to me there should be a more
direct way of uniquely sorting an array "as an array", ala C, awk, etc..without making it a list first.

Maybe it's just not practical, but it bothers me that
I have to switch from array format to list format to
uniquely sort an array.

With the function I wrote above I succeed in
keeping the array format throughout, but have to
unset the original and "replace" it with the sorted
loc_arr array.
Before sort:

parray bogus
bogus(34) = red
bogus(blu3) = green
bogus(blue) = green
bogus(mauve) = red
bogus(oran4) = blue
bogus(orange) = blue
bogus(pur1) = mauve
bogus(purpl4) = red
bogus(purple) = mauve
bogus(re1) = green
bogus(re2) = blue
bogus(re3) = green
bogus(red) = blue

after sort:

marsd · Apr 13, 2002

Thanks Ulis,
Maybe it's just not practical, but it bothers me that
I have to switch from array format to list format to
uniquely sort an array.

With the function I wrote above I succeed in
keeping the array format throughout, but have to
unset the original and "replace" it with the sorted
loc_arr array.

Before sort:
parray bogus
bogus(34) = red
bogus(blu3) = green
bogus(blue) = green
bogus(mauve) = red
bogus(oran4) = blue
bogus(orange) = blue
bogus(pur1) = mauve
bogus(purpl4) = red
bogus(purple) = mauve
bogus(re1) = green
bogus(re2) = blue
bogus(re3) = green
bogus(red) = blue

after sort:
parray bogus
"bogus" isn't an array

parray loc_arr
loc_arr(1) = green
loc_arr(10) = mauve
loc_arr(3) = blue
loc_arr(8) = red

My question is basically this:
I wonder if it would be possible to "rename" the loc_arr
array from the proc at the callers stack level?
That way the switch would be transparent after the
uplevel $lvl "array unset $orig"

Is this feasible?

ulis · Apr 13, 2002

Sorry but I don't know how to rename a var with Tcl.

ulis

marsd · Apr 13, 2002

Thanks anyway. I'll research it and post it if I
find one.

AviaTraining · Apr 18, 2002

I'm sorry, marsd, but I must be particulatly dense today. I can't for the life of me figure out what you're trying to do.

I guess a lot of it is that I've been steeped in the Tcl way of doing things for so long. In Tcl, an array is inherently an unordered data structure. (Internally, it's implemented as a hash table. It's equivalent to Perl's "hash" and similar to Python's "dictionary.&quot

The best you can do is to assign keys that you can iterate over in order. But the actual data is still stored in "random" order internally. So, when you asked about "a direct way of uniquely sorting an array 'as an array,' ala C, awk, etc...", I'm still not sure what you're asking for. (In fact, I'm really confused by your comparison to awk, becuase awk arrays are associative arrays, just like Tcl's.) And I couldn't even figure out what you were trying to accomplish with the example you gave above. I must definitely need some more coffee this morning...

By the way, a bit of trivia... Did you know that arrays weren't implemented in the original versions of Tcl? They were a feature introduced in the TclX extension, and eventually incorporated into the Tcl core. - Ken Jones, President
Avia Training and Consulting

http://www.avia-training.com

866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax

marsd · Apr 18, 2002

"By the way, a bit of trivia... Did you know that arrays weren't implemented in the original versions of Tcl? They were a feature introduced in the TclX extension, and eventually incorporated into the Tcl core"

Answer:
Yep, I knew that one: I was reading Cliff Flynt's
"Tcl/Tk for Real Programmers" the other night.

Subject:
All I wanted avia was a *nix style "uniq" function
that was able to return the "pointed(upvar)" array
variable back to the caller. This is difficult I under
stand from my research.
I did not want to translate the array into a list and
back into an array again to do this: even if the over
head is minimal, it shouldn't be required IMO.

The only way I could figure to do this was load the
pointed array into a local array with ordered numerical
indexes
{
set i 1
foreach el [array names array] {
incr i
set loc_arr($i) $array($el)
}
}

"uniq" the created array
for {set m 1} {$m <= [array size loc_arr]} {incr m} {
for {set i 1} {$i <= [array size loc_arr]} {incr i} {
if {[string compare $loc_arra($m) $loc_arr($i)]} {
unset loc_arr($i)
(or set loc_arr($i) "&quot

}
}
}

This kludge is fine with me since:
" an array is inherently an unordered data structure. (Internally, it's implemented as a hash table. It's equivalent to Perl's "hash" and similar to Python's "dictionary.&quot

"

To end the cycle of kludgery what was necessary is to
1) unset the original array in the calling scope
set orig [subst -novariables $arra]
uplevel #0 "array unset $orig"

2) "return" the sorted replacement array:
global loc_arr

What I would like is to be able to "globally alias"
the loc_arr procedure as the now defunct array parent
within the proc so there actually would be some
transparency to the operation.
I guess I could look at the tcl_api for some of this
but much better "real" programmers than I have
considered this before and it still isn't done so
I assume there are some major difficulties.

I know it is not "practical" to do things this way in
tcl at present, but eventually I think there are going
to be a lot of calls for this type of behavior.

As always thanks for your help, and it is good to see
that you are back with authoritative answers.

AviaTraining · Apr 21, 2002

Welllllll... Let me get on my podium for a while to talk about Tcl arrays. The following comments aren't intended for you personally, marsd. I don't know the overall structure of your application, and so can't comment as to whether lists or arrays (or more complex data structures provided by a Tcl extension) are a more appropriate choice. But I'd like for anyone who happens to be reading this thread to understand why one would choose lists vs. arrays to manage data in a Tcl application.

Let's start by reiterating that all languages have their strengths and weaknesses, and arrays are one of Tcl's weaknesses. But their primary weakness is that they aren't "first class" objects in Tcl -- you can't pass the value of an array around in Tcl like you can strings, lists, etc., but have to resort to upvar tricks to do pass-by-reference. Life would be easier if we could pass the value of an array to a procedure when we wanted.

But the more troublesome issue with Tcl arrays, in my opinion, is the name. When people hear "array," they immediate think of C-style, integer-indexed arrays, and try to use Tcl arrays in the same way. I think that if we had named these data structures "hashes" or "dictionaries," people wouldn't be so quick to fall into that trap. But once people latch onto that word "array," they start using these data structures in situations where a Tcl list is much better suited. But the primary purpose of Tcl arrays is to map arbitrary-string keys to arbitrary-string values.

What's the problem with using a Tcl array as an integer-indexed array? Nothing as such, but it's far less efficient than using a Tcl list. In C (and other languages that use true integer-indexed arrays), the values stored in an array are of a fixed and equal size. (Even in the case of an array of "strings," which turns out to be an array of character pointers to the actual string values.) Thus, when you use "a(327)" in a C program, the application simply has to multiply 327 by the fixed element size and add that to the address of the beginning of the array. You can even use a pointer into the array and increment or decrement the pointer to get to different array elements.

In contrast, every array access in Tcl is actually an access into a hash table. When you use a reference like "a(327)" in Tcl, it has to compute the hash of the string "327", then use that reference into the hash table (and then check to see if there were multiple entries stored at that hash location, and if so, pick out the right one). It's even worse if you're iterating through an array with a for loop, because when you do your incr of your index value, Tcl has to convert the index to a number, but when you then do your array reference, Tcl has to convert the index to a string (because Tcl uses strings for its hash values). Thus, you've got your loop index constantly "shimmering" between an integer and a string representation, adding to the overhead.

On the other hand, Tcl lists are represented internally as simple, C-style arrays of pointers to the element values. Thus, when you say lindex $list 327, Tcl can take the memory pointer to the list structure, add an offset of 327 times the size of an element pointer, and immediately grab the value of the list element.

As a demonstration of the efficiency of a Tcl list vs. a Tcl array, let's do some timing experiments on accessing random elements from the two different structures. I ran the following on my laptop, which is still using Tcl version 8.3.2 (I really should upgrade to 8.3.4):

[tt]% for {set i 0} {$i < 100000} {incr i} {
lappend testlist $i
set testarray($i) $i
}
% [ignore]proc test1 {iter} {
global testlist
for {set i 0} {$i < $iter} {incr i} {
set index [expr { int(rand()*100000) }]
set dummy [/ignore][ignore][lindex $testlist $index][/ignore]
}
}
% [ignore]proc test2 {iter} {
global testarray
for {set i 0} {$i < $iter} {incr i} {
set index [expr { int(rand()*100000) }][/ignore]
set dummy $testarray($index)
}
}
% time {test1 1000000}
5498000 microseconds per iteration
% time {test2 1000000}
6799000 microseconds per iteration[/tt]

As you can see, each array element access was approximately 1.3 microseconds longer than each corresponding list element access.

Okay, enough of this soapbox. Let's look a bit more at your situation, marsd. Once again, Tcl lists are superior for doing Unix-style "uniq" operations. Especially since Tcl version 8.3, when we added the lsort -unique option, which removes duplicates from the sorted result. But let's assume that we need to create a unique procedure that weeds out elements in an array that have duplicate values.

In regards to your question, it is possible for a Tcl procedure to delete, modify, or create variables in the calling scope using upvar. Once you've created your alias with upvar, any action you perform on the alias is also performed on the variable in the calling scope. If you unset an alias, the corresponding variable is deleted. And if you assign a value to an alias when the variable doesn't exist, Tcl creates the variable for you in the calling scope. As an example, let's create a simple procedure called swap that modifies an existing array variable, swapping each array element's key and value:

[tt]% [ignore]proc swap {var} {
upvar 1 $var x
foreach {key value} [array get x] {
set newx($value) $key
}
unset x
array set x [array get newx]
}[/ignore]
% array set test {1 a 2 b 3 c}
% parray test
test(1) = a
test(2) = b
test(3) = c
% swap test
% parray test
test(a) = 1
test(b) = 2
test(c) = 3[/tt]

So, we can apply the same technique to your uniq procedure. But let's see how we can make that more efficient.

Currently, you've got nested for loops doing explicit comparisons of all the array element values. As coded, it's an order N^2 algorithm, which is going to get bogged down with large arrays. I know you don't like array/list trickery, marsd, but consider that every time you use a foreach loop to process an array, you're doing a conversion to a list with the array names command. If you really wanted to maintain the data in array format only, you'd use the array startsearch, array anymore, array nextelement, and array donesearch commands to actually walk the array's hash table. (Don't feel bad. I don't think I've ever seen these array commands used to walk an array in any production code I've seen. It's so much easier to do the foreach/array names trick.) So, let me show you a technique for doing a "uniq" on an array with some minimal list trickery, just to walk the array.

We're going to take advantage of the fact that all array elements must have unique keys. If you try to set the value of an array element and the key already exists, Tcl just overwrites the existing element with the new value. So, I'm going to perform a "swap" of our input array's keys and values. Thus, whenever a value in our original array is repeated, we'll end up overwriting the element in our new, swapped array. This has the effect of eleminating all duplicate values. Then, I'll perform another "swap" of the keys and values so that when I return our new, double-swapped array, the data is in the same format as originally. Here it is:

[tt]% [ignore]proc uniq {arr} {
upvar 1 $arr x
foreach {key value} [array get x] {
set newx($value) $key
}
foreach {key value} [array get newx] {
set newx2($value) $key
}
unset x
array set x [array get newx2]
}[/ignore]
% array set dup {a 1 b 2 c 3 d 1 e 1 f 2 g 3 h 2 i 1}
% uniq dup
% parray dup
dup(a) = 1
dup(b) = 2
dup(c) = 3[/tt]

Of course, it was pure chance that by removing the duplicates, we retained the first three keys. We just as easily could have ended up with "dup(i) = 1" instead of "dup(a) = 1".

And just to demonstrate once more how lists will be more efficient than arrays in a situation like this, lets compare the uniq procedure to the lsort -unique command. In the following example, I perform the same initialization sequence in each timing experiment so that differences in the setup of the runs aren't a factor:

[tt]% [ignore]for {set i 0} {$i < 1000} {incr i} {
set val [expr { int(rand()*100) } ]
lappend origlist $val
set origarray($i) $val
}[/ignore]
% [ignore]time {
set vallist $origlist
array set valarray [array get origarray]
uniq valarray
} 1000[/ignore]
10816 microseconds per iteration
% [ignore]time {
set vallist $origlist
array set valarray [array get origarray]
lsort -unique $vallist
} 1000[/ignore]
6149 microseconds per iteration[/tt]

If we then time our setup code, we can then subtract that from each case to find out how much time was consumed by the actual "unique" operations:

[tt]% [ignore]time {
set vallist $origlist
array set valarray [array get origarray]
} 1000[/ignore]
4677 microseconds per iteration[/tt]

Thus, our array version took 10816-4677=6139 microseconds to do the uniq, whereas our list version took 6149-4677=1472 microseconds.

As to your comments: "As always thanks for your help, and it is good to see that you are back with authoritative answers.

"

My pleasure. It's a good way for me to exercise my Tcl knowledge, and to explore areas that I might not otherwise on my own. Of course, those who actually pay for my knowledge get top priority, which is why I was absent from this forum for a while. And while I enjoy helping people out here, I must say that I need to be "absent with pay" more frequently so that I can make my mortgage. Let's hope that the economy continues to pick up. (And I will happily accept any training or consulting referrals that people might have...) As for the "authoritative answers" part, if I'd known that people were putting that much stock in my opinions, I would have put a bit more thought into my replies to make sure they were correct.

- Ken Jones, President
Avia Training and Consulting

http://www.avia-training.com

866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax

marsd · Apr 21, 2002

I've only been coding (on/off) for 2 years avia, so picking your brain is excellent exercise! Tcl was a
third language, picked up because of expect.
I have coded C and studied it's structure more than
others so my prejudice towards arrays is hopefully understandable.
With your info though I see that when using tcl it
is to my advantage to use lists for most operations
when the application is amenable.

As far as the unset goes

Braindeath occurred, I know this. Why it didn't occur to me just unset and recreate is beyond me.
I WAS working with traces earlier in the day so maybe
that is a valid excuse, they are a little trickier to work with inside local scopes.

Thanks Much
MMD

AviaTraining · Apr 21, 2002

Oh, your "prejudice" towards arrays is quite understandable. We all like to use the tools we're most comfortable with. And it's not to say that arrays are the wrong tool for a particular application. But when I hear people complaining that a certain operation is difficult in Tcl, it usually turns out that there's a simpler way of doing it in Tcl that isn't necessarily "obvious" to someone coming in from a different language. So I view my role as often being that of showing people the easier ways to accomplish their goals (rather than hitting them over the head with Brent's Practical Programming in Tcl and Tk book while screaming, "You're doing it wrong!&quot

. Like I said earlier, depending on what you're trying to accomplish in the rest of your application, it might very well be better to use arrays overall, but with the unfortunate side effect of some of your operations (like the "uniq&quot

being more complex than they'd otherwise be.

I'll also be among the first to admit that "$a(327)" is a much more "natural" thing to write than lindex $a 327. It's unfortunate that Tcl doesn't have some shortcut syntax to accomplish the same result. Given the extra verbosity (and decreased clarity), it's no surprise to me that people end up chosing arrays over lists much of the time. The Blt extension does have a nice feature called vectors, that have some of the advantages of list storage with array-like syntax. However, Blt vectors are limited to real number storage only, which limits their utility. There have also been some interesting experiments using variable traces to provide array-like access to list-managed data. But I suspect that the extra overhead of the variable traces would negate any efficiencies provided by the list storage.

Speaking of variable traces, you do have to be careful with them when doing manipulation like the swap and uniq procedures I described above. If you delete a variable (even through its alias, like I did in those procedures), you trigger the variable's unset trace, and all of the traces on it are removed once the unset trace is fired. So, another approach might be to unset all of the array's elements, rather than unsetting the entire array, before setting the new value in the procedure.

And I agree, working with variable traces inside of a procedure is tricky business. It can screw up your thinking for the rest of the week if you're not careful.

Anyway, it's Sunday. Shouldn't we both be doing something else besides Tcl coding? - Ken Jones, President
Avia Training and Consulting

http://www.avia-training.com

866-TCL-HELP (866-825-4357) US Toll free
415-643-8692 Voice
415-643-8697 Fax

Guest_imported · May 7, 2002

Is there any way to return an array from a procedure. As far as I can see 'upvar' only works for passing variables from higher up the stack from calling functions.

I've seen one example which first converts the array to a string then uses CLI_parse to convert it back to an array. Is there a better way?

Thanks in advance.

N Blundell
Lancaster University
England

Guest_imported · May 7, 2002

Returning an array from a procedure,

I managed to sort this one. Just before I gave up I realised that if it was possible to return a list from a procedure then I could do this;

# Here's my function belonging to some object.
SomeObject getArray {

# Convert the array to a list and return it.
return [array get myArray]
}

Then in the calling routine I did this.

# Call the proc to get the array (actually a list)
set arrayList [$obj getArray]

# Convert the list back to an array.
array set myArray $arrayList

I couldn't find an example of this anywhere, hope it helps somebody.

Nick Blundell
Lancaster University
England

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Array uniq, without list translation.

marsd

IS-IT--Management

ulis

Programmer

marsd

IS-IT--Management

marsd

IS-IT--Management

ulis

Programmer

marsd

IS-IT--Management

AviaTraining

Instructor

marsd

IS-IT--Management

AviaTraining

Instructor

marsd

IS-IT--Management

AviaTraining

Instructor

Guest_imported

New member

Guest_imported

New member

Similar threads

Part and Inventory Search

Sponsor