Beamer makes it somewhat difficult to accommodate this system if you have backup slides in your presentation. You know, those things you are supposed to have prepared in case you get one of the "hard" questions? Anyway, these slides aren't a part of the standard presentation, and so they shouldn't count to the total slide number --- but they need to be there in your presentation file.

I had previously faced this issue and realized that you can just store and set the frame numbers in beamer at arbitrary points in the presentation.

% All your regular slides % After your last numbered slide \appendix \newcounter{finalframe} \setcounter{finalframe}{\value{framenumber}} % Backup frames \setcounter{framenumber}{\value{finalframe}} \end{document}

However, Andrew wanted to go beyond just skipping backup slides and "uncount" the outline slides that appear at the beginning of a section, as well as the title slide. He proposed using the trick above with the following adjustments.

1. Add \addtocounter{framenumber}{-1} on the \AtBeginSubsection frame 2. Add \setcounter{framenumber}{0} or \setcounter{framenumber}{1} to the \titlepage frame and/or the \tableofcontents frame depending on taste.

There you have it, all the tricks about frame counters in beamer that we know!

I suspect there is some way to accomplish the "backup" slides with a modification of the \appendix command and the \AtEndDocument directive. My brief attempt at making these changes ended in failure and frustration. Failure because it didn't work. Frustration because I forgot to write down that it didn't work and didn't remember it the next time I wanted to use this feature. (In fact, I really wish that beamer modified the \appendix command to implement precisely this feature. What else would an appendix do in a presentation?)

]]>This question is easy to answer with a bit of Googling. A Dartmouth professor has precisely the required data --- though only for a single year.

http://www.dartmouth.edu/~chance/teaching_aids/data/birthday.txt

I used R to make a quick display of the data.

Cutting and pasting this into R produces the following output for me.

No, that isn't a data problem. There really are two groups of birthdays.

While looking for the overall date of birth data, I discovered another file from the CDC that explains the effect.

http://www.cdc.gov/nchs/data/statab/t941x16.pdf

The data in that file show many fewer births on weekends compared with weekdays. This effect is precisely what we see in the plot, which R helps us validate.

This analysis was good enough for my own personal edification. There is still a bit of work left to make these claims statistically valid, but that isn't my point here.

]]>* Update* It turns out Live Writer decided to change the quote setting to insert gnarly "curly" quotes instead of the standard straight quotes. Ick, now I have to fix all these ugly quotes. I'm not sure what isn't working now.

The paper tackles one problem that irritates me --- aggregating online ratings --- and uses a solution I've previously considered and wanted to investigate --- profiling the raters. (It's always nice to discover you don't have to do ALL the work yourself.)

The problem is illustrated by the following snapshot from Amazon.

With only three ratings, no product merits a 5-star rating. To Amazon's credit, they clearly note there are only three ratings aggregated into a single 5-star rating. They also show a histogram of the individual ratings elsewhere on the product page.

A simple fix would be to apply well known statistical confidence intervals to the ratings and use the lower bound. Of these fixes, the most naive would be to add pseudo-counts for each possible rating. With three 5-stars, the pseudo-count score would be 3.75. Evan Miller proposes a more sophisticated technique based on Wilson's score in a recent blog post.

Ho and Quinn do something different. They propose a model that incorporates the raters behavior on the site and thus, all ratings are not equal. The types of behavior captured are

- Uncritical --- easy to please,
- Non-discriminating --- useless, and
- Discriminating --- a "critic."

Go read the paper for the details of the model. The punch line is that the aggregate rating depends on *all* ratings submitted to the site.

After looking at lots of rating data from Netflix, LAUNCHcast, and even the smaller datasets from HelloMovies, incorporating this information seems critical to generate useful aggregate ratings.

One issue with their techniques is the lack of transparency. As a user, I would *really* have to trust your site to take the ratings seriously. The Amazon approach with the histogram allows me to evaluate the data myself; though I lack the critical context that the Ho/Quinn ratings provide. A second issue is computation. I did not fully check their paper, but I don't think the model is trivial to fit and could be done in real time.

These are just theoretical issues. They have an R package: Ratings, so go give it a try.

]]># Download Clp-1.9.0.tgz wget http://www.coin-or.org/download/source/Clp/Clp-1.9.0.tgz # untar tar xzvf Clp-1.9.0.tgz # configure and make cd Clp-1.9.0 ./configure make make install # get mexclp wget http://control.ee.ethz.ch/~joloef/mexclp.zip unzip mexclp.zip # compile in matlab matlab -nojvm -nodesktop -nodisplay ldf=['LDFLAGS="\$LDFLAGS -Wl,--rpath -Wl,' pwd '/lib"']; mex(ldf,'-Iinclude/coin', '-Llib', '-lClp', '-lCoinUtils', 'mexclp.cpp')

After compiling (hopefully successfully!)

>> clp([],[1 2 1],[1 1 1],-1,[],[]) ans = 0 -1 0

For Matlab 2008b on Ubuntu 8.10, I found it necessary to install g++-4.1, setup mex with g++-4.1 and run

./configure CXX=g++-4.1 make make install

before everything would work.

]]>The arev package is a nice compromise: it has a nice I glyph. Continue reading for some screenshots.

]]>Going forward, my goal is going to be a post a week.

]]>So don't bother with kqemu under Ubuntu 8.04 --- just use kvm.

]]>I believe in making presentations fun and wanted to use the Google colored version of the term. After some googling, I found the RGB colors and painstakingly converted them to floating point values to use with the latex xcolor package. To make my life easier, I encoded everything into the following command.

\newcommand{\Google}{{ \color[rgb]{ 0.2000 , 0.3922 , 0.7647}G% \color[rgb]{ 0.9529 , 0.0980 , 0.0118}o% \color[rgb]{ 0.9686 , 0.8431 , 0.1529}o% \color[rgb]{ 0.2000 , 0.3922 , 0.7647}g% \color[rgb]{ 0.2667 , 0.7686 , 0.0235}l% \color[rgb]{ 0.9529 , 0.0980 , 0.0118}e% } }

Later, I learned that I could have used the RGB values with the color package directly. Oh well, it would have saved a little bit of time.

]]>Personally, I don't have a 64-bit installation of a Windows environment (XP, Server, or Vista) which always made testing a little difficult. A while back, the ICME sysadmin (a really huge help!) setup a machine with a 64-bit copy of XP to test for one person. I got the code compiled and everything tested. (Unfortunately, I couldn't reproduce the problem he identified.)

In theory, I thought it would work quite simply. I compile a 64-bit libmbgl, and they compile the .mexw64 files on their system.

Life is never that simple.

]]>Today, let’s see a Matlab solution to reading and writing CLUTO data files. CLUTO is a clustering toolkit by George Karypis at U of Minn. There are FOUR possible input files CLUTO might see.

- Dense Graph
- Sparse Graph
- Dense Matrix
- Sparse Matrix

The difference between the dense and sparse files is simply a matter of header information. Let’s describe a few CLUTO files.

- Suppose we have a 5-node line graph

v1 <-> v2 <-> v3 <-> v4 <-> v5

In dense graph format, this graph is

5 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0

In sparse graph format, this graph is

5 8 2 1 1 1 3 1 2 1 4 1 3 1 5 1 4 1

To wit, the dense graph format is merely an explicit specification of the adjacency matrix for the graph with a single line specifying the number of vertices. The sparse graph format is a sparse adjacency representation. The sparse adjacency is somewhat strange in that it uses 1 based columns, and implicit rows.

More formally, the sparse adjacency structure has 1 line of header information:

<number of vertices> <number of edges*2>

and the *i+1*th line of the file contains

adj_1 weight_1 adj_2 weight_2 … adj_d weight_d

where adj_j is the *j*th adjacent vertex and weight_j is the weight of that edge and *d* is the degree of the *i*th vertex. The input must be symmetric.

The sparse and dense matrix file formats are similar. The difference is the matrices involved are not square which changes the header.

- dense matrix header:

<number of rows> <number of columns> - sparse matrix header:

<number of rows> <number of columns> <number of nonzeros>

The dense matrix format is just a row-by-row listing of the elements of the matrix. The sparse matrix format is a sparse row-by-row listing. In the sparse matrix format, the *i+1*th line of the file contains information about the non-zeros in row *i*.

column_1 value_1 column_2 value_2 … column_d value_d

Coming next, we’ll see how to read these files in Matlab using a combination of mex files and scripts.

]]>int a1[] = {5, 8, 9, 1, 4, 3, 2};

double v[] = {3., 4., 5., 2., 1., 9., 8.};

Now, for reasons that perhaps only I care about. I want to sort the array a1, and permute v in according to the same permutation.

// a1 = {1, 2, 3, 4, 5, 8, 9}

// v = {2., 8., 9., 1., 3., 4., 5.,}

(For those who care and who know what I'm talking about. I have a compressed sparse row matrix represented in the AIJ format and I want to sort the element of each row in increasing order so I can do O(log n) binary search to determine if an element exists. However, as always, the problem is more generic than the instance I care to solve.)

To wit, I wish to sort one array and "take the other" along for a ride.

This problem has a few trivial solutions:

1. Sort the array a1 implicitly by sorting a permutation vector that indexes into a1. Then permute a1, and v by this array.

2. Write a custom sorting routine that does the operation for this special case.

3. Try to shoehorn this problem into an existing sorting array.

In terms of performance, the fastest solution is probably 2. The second fastest is probably 1. Finally, 3. is likely the slowest.

Solution 3, however, has two huge advantages. First, I don't have to write my own sorting routine. This is important as writing a general purpose sort is somewhat non-trivial. Also, it is fairly likely that the input to the sort will be nearly sorted so I can't use a general purpose quicksort routine which has O(n^2) performance on a sorted array. The second advtange is that it does not require extra memory as solution 1 does. In fact, solution 1 requires quite a bit of extra memory. While it is a tractable amount, it is nonetheless superfluous.

Thus, I decided to look at solution 3. Between STL and Boost, there are quite robust and generic C++ sorting libraries. How hard could this be? Maybe 20, 30 minutes of work?

Quickly, I realized how such thoughts were hopelessly naive.

In a nutshell, the requirements are:

1. Use the C++ STL sorting routine (which has good O(n log n) worst case performance).

2. Do not create extra memory for the sort.

Thankfully, I did not impose any sort of "performance" requirement on myself. In fact, many of the arrays will be rather small; so performance for large arrays is not quite so important.

First, C++ STL sort does not work on boost's zip_iterators, which would have been the natural solution. In fact, there seem to be a number of debates on this matter; and more generally on the requirements of iterators, zip_iterators, and the STL Sort function.

http://user.it.uu.se/~krister/Research/iterators.pdf

http://lists.boost.org/Archives/boost/2004/07/68758.php

http://web.onetel.com/~anthony_w/cplusplus/pair_iterators.pdf

http://cplusplus.anthonyw.cjb.net/

Second, the fundamental problem is that "pairs" of array references do not behave like they should for things to work nicely. That is, there are no clean generalizations of a set of pointers. The best that exists is the boost::tuple class with a set of reference. However, that class fails because references and values are quite strange and do not behave quite like you think.

Instead, I simply decided to abuse the notation of an iterator and write something that works.

This involved writing, effectively, a non-conforming iterator where the reference of the value type is not the same as the reference type.

Here is the code for anyone who cares. (This is a slightly modified version from my code, so it may not compiler, but the fixes should be trivial.)

template <class SortIter, class PermuteIter>

struct sort_permute_iter_helper_type

{

typedef boost::tuple<

typename std::iterator_traits<SortIter>::value_type,

typename std::iterator_traits<PermuteIter>::value_type >

value_type;

typedef boost::tuple<

typename std::iterator_traits<SortIter>::value_type&,

typename std::iterator_traits<PermuteIter>::value_type& >

ref_type;

};

template <class SortIter, class PermuteIter>

class sort_permute_iter

: public boost::iterator_facade<

sort_permute_iter<SortIter, PermuteIter>,

typename sort_permute_iter_helper_type<

SortIter, PermuteIter>::value_type,

std::random_access_iterator_tag,

typename sort_permute_iter_helper_type<

SortIter, PermuteIter>::ref_type,

typename std::iterator_traits<SortIter>::difference_type

>

{

public:

sort_permute_iter()

{}

sort_permute_iter(SortIter ci, PermuteIter vi)

: _ci(ci), _vi(vi)

{

}

SortIter _ci;

PermuteIter _vi;

private:

friend class boost::iterator_core_access;

void increment()

{

++_ci; ++_vi;

}

void decrement()

{

--_ci; --_vi;

}

bool equal(sort_permute_iter const& other) const

{

return (_ci == other._ci);

}

typename

sort_permute_iter_helper_type<

SortIter, PermuteIter>::ref_type dereference() const

{

return (typename

sort_permute_iter_helper_type<

ColIter, PermuteIter>::ref_type(*_ci, *_vi));

}

void advance(difference_type n)

{

_ci += n;

_vi += n;

}

difference_type distance_to(sort_permute_iter const& other) const

{

return ( other._ci - _ci);

}

};

template <class SortIter, class PermuteIter>

struct sort_permute_iter_compare

hehe: public std::binary_function<

typename sort_permute_iter_helper_type<

SortIter, PermuteIter>::value_type,

typename sort_permute_iter_helper_type<

SortIter, PermuteIter>::value_type,

bool>

{

typedef

typename sort_permute_iter_helper_type<

SortIter, PermuteIter>::value_type T;

bool operator()(const T& t1, const T& t2)

{

return (boost::get<0>(t1) < boost::get<0>(t2));

}

};

template <class SortIter, class PermuteIter>

sort_permute_iter<SortIter, PermuteIter>

make_sort_permute_iter(SortIter ci, PermuteIter vi)

{

return sort_permute_iter<SortIter, PermuteIter>(ci, vi);

};

Finally, then, we can write the following sort routine for our two arrays and get the correct result:

std::sort(make_sort_permute_iter(&a1[0],&v[0]),

make_sort_permute_iter(&a1[0]+7,&v[0]+7),

sort_permute_iter_compare());

Wow. That was a lot of work. I hope someone finds it useful.

]]>This paper addresses an important aspect of quantum computing. The authors show that designing a quantum circuit is equivalent to minimizing the difference between two linear operators. Further, the "cost" of the circuit (in number of gates) is proportional to the minimum geodesic (shortest path) distance between the trivial operator and the desired quantum operator.

Intriguingly, the authors make the statement that we can use their results to find the initial point to begin a search for the quantum circuit; but (and this is an important but) we do not know the initial velocity along the shortest path. In all likelihood, this problem will be resolved in the future. Nevertheless, this statement harks to the uncertainty principle. Suppose a similar result is true for these Riemannian spaces? We cannot know both the initial search position and the initial velocity. Again, probably my naïveté.

Still fantastically cool... enjoy reading!

]]>