Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Basic Question
A research participant sees some sequence of dots. Later they are asked to say what order they think the dots appeared in. We want a meaningful way to say how accurate they are.
Setup
The dots are numbered 1, 2, ..., n
The participant does not see the numbers
The dots always appear in numerical order, cycling from dot 1 back to dot n
The participant sees several cycles
The participant's response can be encoded as a reordering of the numbers 1, ..., n
Question
Quantify the accuracy of responses
Say what fraction of the n! possible responses are at the same or greater accuracy as a given response
Desiderata
Participant responses are scored by a real number
A cyclic rearrangements should have the same score
Responses which have more dots in sequence should have better scores than those which have fewer dots in sequence.
Technicalities
Permutations
A sequence of numbers which has no repeats and includes all numbers in a range from 1 to n is called a permutation. It can be thought of as a rearrangement of the standard sequence (1,2,3,...,n).
Metrics
A metric is a way of assigning distance. It is a function of two variables which returns a number. There are a few other axioms to require, and they ensure that the function gives a meaningful notion of "distance" between things.
Mainly interested in invariant metrics, i.e., metrics d such that for permutations x,y,z d(zx, zy) = d(x,y)
Implies that metric is independent of how the dots are numbered
Might imply that metric is independent of cyclic rearrangement (below). I need to think about that.
For an invariant metric d, d(x,y) = d(e,x^{-1}y), where e is the identity permutation.
So just need w(x)= d(e,x); this is called a weight function
We have a number of different ideas for what metrics make sense to use. In fact, we could use all of them, and give some kind of composite score.
Cyclic rearrangement
We want to score permutations the same if they differ by a cyclic rearrangement. One easy way to achieve this is, given any permutation, "rotate" it until the first term in the sequence is 1. (Start with 1, and go in the order specified by the sequence. When you get to the end, start back at the beginning until you get to the term before 1.) This can be done uniquely for every permutation.
For example, (6,2,3,1,4,5,7) would rotate to (1,4,5,7,6,2,3).
Note: Suppose n = 4. Consider the permutation given by (2,4,1,3). This would be rotated to (1,3,2,4). In both cases, there is exactly 1 correct transition, from 4 back to 1. We should be sure to count those transitions in our ranking.
Another Note: If we want to do some counting over the set of all permutations, we can speed up by only considering those which are already rotated to start with 1, and multiplying by the number of rotations (n).
Another: Some weight functions are already independent of rotation in this sense. Permutation weight and transition weight, for example.
Another: If a weight function is not independent of rotation, it could happen that different rotation has smaller weight. For example, the Hamming weight of (1,7,2,3,4,5,6) is 6, but (7,2,3,4,5,6,1) has Hamming weight 2.
Some metrics on permutations
Chris's idea
Here is my understanding of the metric that Chris described. I call it the "transition metric" because it counts how many correct transitions are in the sequence.
Given a permutation a = (a[1], a[2], ..., a[n]), count the number of i such that a[i+1] is not equal to a[i] + 1. For i = n, take i+1 = 1.
This is similar in spirit to the Kendall tau below, but not quite the same.
Standard metrics on permutations
There are (at least) 3 interesting problem domains where this kind of question comes up. Each one has a couple of standard metrics, and a whole host of modified versions for variants of the basic problems. Since none of these exactly match our setup, I propose that we just start with these.
Comparing different rankings (e.g. My top 10 v.s. yours). Each ranking is a permutation of some given list, and metrics quantify the difference between them. Standard metrics here are:
Spearman footrule, a.k.a. Taxicab, a.k.a. ell_1
Kendall tau
Signal processing (Filtering noise, detecting typos)
Hamming distance
Transposition distance, a.k.a. Cayley distance
Text changes (linguistics, biology (DNA mutations))
Ulam distance