« Paper of the Week | Main | Reading Cluto Files in Matlab (Part 1) »

Sorting two arrays simultaneously

Suppose, for the sake of this article, that you have two arrays in C++ with the same number of element.  For example,

int a1[] = {5, 8, 9, 1, 4, 3, 2};
double v[] = {3., 4., 5., 2., 1., 9., 8.};

Now, for reasons that perhaps only I care about.  I want to sort the array a1, and permute v in according to the same permutation. 

// a1 = {1,  2,  3,  4,  5,  8,  9}
// v =  {2., 8., 9., 1., 3., 4., 5.,}

(For those who care and who know what I'm talking about.  I have a compressed sparse row matrix represented in the AIJ format and I want to sort the element of each row in increasing order so I can do O(log n) binary search to determine if an element exists.  However, as always, the problem is more generic than the instance I care to solve.)

To wit, I wish to sort one array and "take the other" along for a ride.

This problem has a few trivial solutions:

1.  Sort the array a1 implicitly by sorting a permutation vector that indexes into a1.  Then permute a1, and v by this array.

2.  Write a custom sorting routine that does the operation for this special case.

3.  Try to shoehorn this problem into an existing sorting array.

In terms of performance, the fastest solution is probably 2.  The second fastest is probably 1.  Finally, 3. is likely the slowest.

Solution 3, however, has two huge advantages.  First, I don't have to write my own sorting routine.  This is important as writing a general purpose sort is somewhat non-trivial.  Also, it is fairly likely that the input to the sort will be nearly sorted so I can't use a general purpose quicksort routine which has O(n^2) performance on a sorted array.  The second advtange is that it does not require extra memory as solution 1 does.  In fact, solution 1 requires quite a bit of extra memory.  While it is a tractable amount, it is nonetheless superfluous. 

Thus, I decided to look at solution 3.  Between STL and Boost, there are quite robust and generic C++ sorting libraries.  How hard could this be?  Maybe 20, 30 minutes of work?

Quickly, I realized how such thoughts were hopelessly naive. 

In a nutshell, the requirements are:

1.  Use the C++ STL sorting routine (which has good O(n log n) worst case performance).

2.  Do not create extra memory for the sort.

Thankfully, I did not impose any sort of "performance" requirement on myself.  In fact, many of the arrays will be rather small; so performance for large arrays is not quite so important. 

First, C++ STL sort does not work on boost's zip_iterators, which would have been the natural solution. In fact, there seem to be a number of debates on this matter; and more generally on the requirements of iterators, zip_iterators, and the STL Sort function.

http://user.it.uu.se/~krister/Research/iterators.pdf
http://lists.boost.org/Archives/boost/2004/07/68758.php
http://web.onetel.com/~anthony_w/cplusplus/pair_iterators.pdf
http://cplusplus.anthonyw.cjb.net/

Second, the fundamental problem is that "pairs" of array references do not behave like they should for things to work nicely.  That is, there are no clean generalizations of a set of pointers.  The best that exists is the boost::tuple class with a set of reference.  However, that class fails because references and values are quite strange and do not behave quite like you think.   

Instead, I simply decided to abuse the notation of an iterator and write something that works.

This involved writing, effectively, a non-conforming iterator where the reference of the value type is not the same as the reference type.

Here is the code for anyone who cares.  (This is a slightly modified version from my code, so it may not compiler, but the fixes should be trivial.)

template <class SortIter, class PermuteIter>
struct sort_permute_iter_helper_type
{
    typedef boost::tuple<
        typename std::iterator_traits<SortIter>::value_type,
        typename std::iterator_traits<PermuteIter>::value_type >
        value_type;

    typedef boost::tuple<
        typename std::iterator_traits<SortIter>::value_type&,
        typename std::iterator_traits<PermuteIter>::value_type& >
        ref_type;
};
 
template <class SortIter, class PermuteIter>
class sort_permute_iter
    : public boost::iterator_facade<
        sort_permute_iter<SortIter, PermuteIter>,
        typename sort_permute_iter_helper_type<
           SortIter, PermuteIter>::value_type,
        std::random_access_iterator_tag,
        typename sort_permute_iter_helper_type<
            SortIter, PermuteIter>::ref_type,
        typename std::iterator_traits<SortIter>::difference_type
    >
{
public:
    sort_permute_iter()
    {}

    sort_permute_iter(SortIter ci, PermuteIter vi)
        : _ci(ci), _vi(vi)
    {
    }

    SortIter _ci;
    PermuteIter _vi;


private:
    friend class boost::iterator_core_access;

    void increment()
    {
        ++_ci; ++_vi;
    }

    void decrement()
    {
        --_ci; --_vi;
    }

    bool equal(sort_permute_iter const& other) const
    {
        return (_ci == other._ci);
    }

    typename
        sort_permute_iter_helper_type<
            SortIter, PermuteIter>::ref_type dereference() const
    {
        return (typename
            sort_permute_iter_helper_type<
                ColIter, PermuteIter>::ref_type(*_ci, *_vi));
    }

    void advance(difference_type n)
    {
        _ci += n;
        _vi += n;
    }

    difference_type distance_to(sort_permute_iter const& other) const
    {
        return ( other._ci - _ci);
    }
};


template <class SortIter, class PermuteIter>
struct sort_permute_iter_compare
    hehe: public std::binary_function<
    typename sort_permute_iter_helper_type<
        SortIter, PermuteIter>::value_type,
    typename sort_permute_iter_helper_type<
        SortIter, PermuteIter>::value_type,
    bool>
{
    typedef
        typename sort_permute_iter_helper_type<
            SortIter, PermuteIter>::value_type T;
    bool operator()(const  T& t1, const T& t2)
    {
        return (boost::get<0>(t1) < boost::get<0>(t2));
    }
};

template <class SortIter, class PermuteIter>
sort_permute_iter<SortIter, PermuteIter>
make_sort_permute_iter(SortIter ci, PermuteIter vi)
{
    return sort_permute_iter<SortIter, PermuteIter>(ci, vi);
};

Finally, then, we can write the following sort routine for our two arrays and get the correct result:

std::sort(make_sort_permute_iter(&a1[0],&v[0]),
          make_sort_permute_iter(&a1[0]+7,&v[0]+7),
          sort_permute_iter_compare());

Wow.  That was a lot of work.  I hope someone finds it useful.

TrackBack

TrackBack URL for this entry:
https://www.stanford.edu/~dgleich/cgi-bin/mt/mt-tb.cgi/16

Comments (3)

Andreas Fabri:

It's rather disappointing that the STL algorithms don't work with the zip_iterator.

Did you check your iterators with other STL algorithms than std::sort? For example std::random_shuffle (which needs a specialization of iter_swap for the zip_iterator in order to work correctly).

andreas

I didn't check this with anything but sort. That's was the only one I needed to work :-).

eley:

thank you,it is so useful

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on March 24, 2006 9:27 PM.

The previous post in this blog was Paper of the Week.

The next post in this blog is Reading Cluto Files in Matlab (Part 1).

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34