Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Path: 2015-10-21-151514.sagews

Views: ¹⁶⁶

Basic Question

A research participant sees some sequence of dots. Later they are asked to say what order they think the dots appeared in. We want a meaningful way to say how accurate they are.

Setup

The dots are numbered 1, 2, ..., n
The participant does not see the numbers
The dots always appear in numerical order, cycling from dot 1 back to dot n
The participant sees several cycles
The participant's response can be encoded as a reordering of the numbers 1, ..., n

Question

Quantify the accuracy of responses
Say what fraction of the n! possible responses are at the same or greater accuracy as a given response

Desiderata

Participant responses are scored by a real number
A cyclic rearrangements should have the same score
Responses which have more dots in sequence should have better scores than those which have fewer dots in sequence.

Technicalities

Permutations

A sequence of numbers which has no repeats and includes all numbers in a range from 1 to n is called a permutation. It can be thought of as a rearrangement of the standard sequence (1,2,3,...,n).

Metrics

A metric is a way of assigning distance. It is a function of two variables which returns a number. There are a few other axioms to require, and they ensure that the function gives a meaningful notion of "distance" between things.

Mainly interested in invariant metrics, i.e., metrics d such that for permutations x,y,z d(zx, zy) = d(x,y)
- Implies that metric is independent of how the dots are numbered
- Might imply that metric is independent of cyclic rearrangement (below). I need to think about that.
For an invariant metric d, d(x,y) = d(e,x^{-1}y), where e is the identity permutation.
So just need w(x)= d(e,x); this is called a weight function

We have a number of different ideas for what metrics make sense to use. In fact, we could use all of them, and give some kind of composite score.

Cyclic rearrangement

We want to score permutations the same if they differ by a cyclic rearrangement. One easy way to achieve this is, given any permutation, "rotate" it until the first term in the sequence is 1. (Start with 1, and go in the order specified by the sequence. When you get to the end, start back at the beginning until you get to the term before 1.) This can be done uniquely for every permutation.

For example, (6,2,3,1,4,5,7) would rotate to (1,4,5,7,6,2,3).

Note: Suppose n = 4. Consider the permutation given by (2,4,1,3). This would be rotated to (1,3,2,4). In both cases, there is exactly 1 correct transition, from 4 back to 1. We should be sure to count those transitions in our ranking.

Another Note: If we want to do some counting over the set of all permutations, we can speed up by only considering those which are already rotated to start with 1, and multiplying by the number of rotations (n).

Another: Some weight functions are already independent of rotation in this sense. Permutation weight and transition weight, for example.

Another: If a weight function is not independent of rotation, it could happen that different rotation has smaller weight. For example, the Hamming weight of (1,7,2,3,4,5,6) is 6, but (7,2,3,4,5,6,1) has Hamming weight 2.

Some metrics on permutations

Chris's idea

Here is my understanding of the metric that Chris described. I call it the "transition metric" because it counts how many correct transitions are in the sequence.

Given a permutation a = (a[1], a[2], ..., a[n]), count the number of i such that a[i+1] is not equal to a[i] + 1. For i = n, take i+1 = 1.

This is similar in spirit to the Kendall tau below, but not quite the same.

Standard metrics on permutations

There are (at least) 3 interesting problem domains where this kind of question comes up. Each one has a couple of standard metrics, and a whole host of modified versions for variants of the basic problems. Since none of these exactly match our setup, I propose that we just start with these.

Comparing different rankings (e.g. My top 10 v.s. yours). Each ranking is a permutation of some given list, and metrics quantify the difference between them. Standard metrics here are:
- Spearman footrule, a.k.a. Taxicab, a.k.a. ell_1
- Kendall tau
Signal processing (Filtering noise, detecting typos)
- Hamming distance
- Transposition distance, a.k.a. Cayley distance
Text changes (linguistics, biology (DNA mutations))
- Ulam distance

factorial(12)

479001600

sign(-3)

-1

n = 5
factorial(n)

120

a = [5,2,6,3,4,1,7,8,9]

print a[a.index(1):]+a[:a.index(1)]

[1, 7, 8, 9, 5, 2, 6, 3, 4]

"""
the rotate function
"""
def rotate(a):
    return a[a.index(1):]+a[:a.index(1)]

"""
Transition weight

Number of incorrect transitions
"""
def transition_weight(a):
    a = rotate(a)
    s = sum([1 for i in range(len(a)-1) if a[i]+1 != a[i+1]])
    l = 1 if a[-1] == len(a) + 1 else 0 # the last transition from n back to 1
    return s+l

"""
Spearman / Taxicab / ell_1

For each position i, take absolute value of difference between a[i] and e[i].  Add these up.
"""
def spearman_weight(a):
    a = rotate(a)
    return sum([abs(a[i] - (i+1)) for i in range(len(a))])

"""
Kendall tau

For each position j, count number of positions i > j such that a[i] < a[j].  This gives the
number of adjacent swaps necessary to get from a back to e.
"""
def tau_weight(a):
    a = rotate(a)
    return sum([1 for j in range(len(a)) for i in range(j) if a[j] > a[i]])

"""
Hamming weight

Count number of positions where a[i] differs from e[i]
"""
def hamming_weight(a):
    a = rotate(a)
    return sum([1 for i in range(n) if a[i] != (i+1)])

"""
Transposition weight

Minimum number of transpositions to get to a from e.  This is equal to

n - number of cycles in cycle decomposition of a
"""
def transp_weight(a):
    #a = rotate(a) #rotation not necessary for this one
    return len(a) - len(Permutation(a).to_cycles(singletons=True))

"""
Ulam weight

n - length of longest increasing subsequence
"""
def ulam_longest(a):
    "length of longest increasing subsequence"
    breaks = [0]+[i+1 for i in range(len(a)-1) if a[i+1] < a[i]]
    subseqs = [a[breaks[i]:breaks[i+1]] for i in range(len(breaks)-1)]+[a[breaks[-1]:]]
    return max([len(s) for s in subseqs])

def ulam_weight(a):
    return len(a) - ulam_longest(a)

ulam_weight([12,13,14,15,16,17,18,2,4,8,9,1,3])

6

n = 7
e = Permutation(range(1,n+1))
def A(d,t):
    """
    d is a distance function on permutations
    Return number of permutations x such that
    d(e,x) <= t
    """
    S = Permutations(n)
    num = 0
    for s in S:
        if d(e,s) <= t:
            num += 1
    return float(num)/float(factorial(n))

list_plot([(i,A(transp_dist,i)) for i in range(n)])

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Basic Question

Setup

Question

Desiderata

Technicalities

Permutations

Metrics

Cyclic rearrangement

Some metrics on permutations

Chris's idea

Standard metrics on permutations

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

Basic Question

Setup

Question

Desiderata

Technicalities

Permutations

Metrics

Cyclic rearrangement

Some metrics on permutations

Chris's idea

Standard metrics on permutations

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.