The Algorithm Design Manual (3rd) Exercise Solution 'E4-37'

Table of Contents

Introduction
Repository
Exercise 4-37
- Question
- Solution
  - Subproblem (a)
  - Subproblem (b)
    - Solution1
    - Solution2

Introduction

This series is my solutions for algorithm exercises in'The Algorithm Design Manual' 3rd edition.
It's my solutions, not model answers, so if you find some mistakes or more sophisticated solutions, please post in comment area below.

Repository

Here

Exercise 4-37

Question

Consider the problem of sorting a sequence of n 0’s and 1’s using comparisons. For each comparison of two values x and y, the algorithm learns which of $x<y$ , $x = y$ , or $x>y$ holds.

(a) Give an algorithm to sort in $n − 1$ comparisons in the worst case. Show that your algorithm is optimal.
(b) Give an algorithm to sort in $\frac{2}{3}n$ comparisons in the average case (assuming each of the n inputs is 0 or 1 with equal probability). Show that your algorithm is optimal.

Solution

Subproblem (a)

We partition the sequence for index range $[0, n-2]$ , comparing each element with 0. If it is equal to zero, move to left.

During the comparisons, we keep the position where at first '1' appears in the sequence. Let the position be pos-f1.

After the comparisons, swap the element in the last and that in the pos-f1. Swapping requires no comparison.

So, the number of comparisons is $n-1$ .

Subproblem (b)

There are two solutions as long as I came to mind.

Solution1

Outline

In the first traversal, we compare each pair of two adjacent bits in the sequence.
After the traversal, if there are some pairs that the two bits are not equal, we move the larger one to '1's group, and the smaller one to '0's group.

Then we run the same traversal to remaining elements, but this time we treat each pair of the first traversal as one bit, because they have the same bit value.

We continue this until there are no bit or just one bit. If there are one bit (actually, this may mass of bits of the same bit value), we compare that with '0', and according the result we move the bit either to '0's group or '1's.

Finally, we concatenate '0's group and '1's group.

Cost Analysis

Since each of the n inputs is 0 or 1 with equal probability, the number of the pairs that the two bits are not equal is the same as the remaining pairs. And before the next traversal, the remaining pairs are transformed to one bit. So in one traversal, the total number is reduced to $\frac{1}{4}$ .

Thus, the total number of comparisons is represented as the geometric series as following:

\begin{align} &\frac{1}{2}n+\frac{1}{2}n*\frac{1}{4}+\frac{1}{2}n*(\frac{1}{4})^2+...\\ =&\frac{1}{2}n\sum_{i=1}^{\lceil\lg{n}\rceil}(\frac{1}{4})^{i-1}\\ \simeq&\frac{2}{3}n \quad \text{(by sum of geometric series)} \end{align}

Code

#include <algorithm>
#include <cassert>
#include <iostream>
#include <vector>
using namespace std;

void sort_seq(vector<int>& v, int& cnt_comparison) {
    cnt_comparison = 0;
    int cnt_0{0};
    int cnt_1{0};

    // repeat while there are at least 2 elements to compare
    size_t ir_open{v.size()};
    int weight{1};
    for (; 1 < ir_open; weight <<= 1) {
        size_t new_iro{0};
        // traverse and do pairwise comparisons
        //// among the comparison, construct the new array for the next
        //// traversal
        for (size_t i = 0; i < ir_open - 1; i += 2) {
            ++cnt_comparison;
            if (v[i] == v[i + 1]) {
                v[new_iro++] = v[i];
            } else {
                cnt_0 += weight;
                cnt_1 += weight;
            }
        }
        if (ir_open & 1) {
            ++cnt_comparison;
            if (!v[ir_open - 1])
                cnt_0 += weight;
            else
                cnt_1 += weight;
        }
        ir_open = new_iro;
    }
    if (ir_open == 1) {
        ++cnt_comparison;
        if (!v[0])
            cnt_0 += weight;
        else
            cnt_1 += weight;
    }
    // reconstruct the array by the count of 0 and 1
    fill_n(v.begin(), cnt_0, 0);
    fill(v.begin() + cnt_0, v.end(), 1);
}

int main(int argc, char const* argv[]) {
    vector<int> v;
    int cnt_comparison;

    v = {0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0}));
    v = {1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{1}));

    v = {0, 1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 1}));
    v = {1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 1}));

    v = {0, 0, 1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));
    v = {0, 1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));
    v = {1, 0, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));

    v = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1}));
    assert(cnt_comparison == 6);
    v = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}));
    assert(cnt_comparison == 11);

    v.resize(1000);
    auto it{v.begin()};
    fill(it, it + 500, 0);
    it += 500;
    fill(it, it + 500, 1);
    vector cp_v{v};
    random_shuffle(v.begin(), v.end());
    sort_seq(v, cnt_comparison);
    assert(v == cp_v);
    cout << "Comparison No1| Count: " << cnt_comparison
         << ", Propertion:" << (double)cnt_comparison / 1000 << endl;
    v.resize(10000);
    it = v.begin();
    fill(it, it + 5000, 0);
    it += 5000;
    fill(it, it + 5000, 1);
    cp_v = v;
    random_shuffle(v.begin(), v.end());
    sort_seq(v, cnt_comparison);
    assert(v == cp_v);
    cout << "Comparison No2| Count: " << cnt_comparison
         << ", Propertion:" << (double)cnt_comparison / 10000 << endl;

    return 0;
}

Computed Comparison Counts

Comparison No1| Count: 663, Propertion:0.663
Comparison No2| Count: 6648, Propertion:0.6648

Solution2

Outline

We use "pool" of elements and check whether the pool is empty or not.
If the pool is frequently empty, we can reduce comparison because we simply put the element dealed with that time.

Algorithm

Let P be pool for elements.
Loop until (n-1) elements examined.
During the loop, we trace the count of '0' and '1'.
    Take a element from the rest of the elements.
    If p is empty, put it in P.
    Else
        compare the value of the element with any of the elements in the P.
        If the former is larger
            Increment the count of '1' and increase the count of '0' by the total amount of elements in P.
        Else if the latter is larger
            Increment the count of '0' and increase the count of '1' by the total amount of elements in P.
        Else put it in P.
If P is not empty, compare any of elements in it with 0, and increase the count of '0' or that of '1' according to the result.

Then, reconstruct the sequence by both counts.

The last element is placed between the last '0' and the first '1', so we don't have to compare and clarify which value the last element is.

The number of comparison is:

The best case: $\lfloor\frac{n}{2}\rfloor$
The worst case: $n-1$

Cost Analysis

Let p_m be a probability that the pool is empty just before examining the m-th element.

To think the relation between $p_m$ and $p_{m+2}$ , we think the transition of the state of the pool from the time just before examining m-th element to the time just before examining (m+2)-th element.

From Figure 1, we can formulate the relation:

\begin{align} p_{m+2}& =\frac{1}{2}p_m+(\frac{1}{2})^2*(1-p_m) \\ &=\frac{1}{4}p_m+\frac{1}{4} \\ \iff p_{m+2}-\frac{1}{3}&=\frac{1}{4}(p_m-\frac{1}{3}) \end{align}

Then, initial states are:

\begin{equation} p_0=1, p_1=0 \end{equation}

So, even and odd sequence's relations are:

\begin{equation} p_{2m}=\frac{1}{3}+\frac{2}{3}(\frac{1}{4})^m \end{equation}

\begin{equation} p_{2m+1}=\frac{1}{3}-\frac{1}{3}(\frac{1}{4})^m \end{equation}

If the pool is not empty, 1 comparison is required for each element. So Let $C_{av}$ be the total count of comparisons in the average case, then $C_{av}$ is:

\begin{align} C_{av}&=\sum_{i=0}^{\lfloor\frac{n}{2}\rfloor}(1-p_{2i})+\sum_{i=0}^{\lfloor\frac{n}{2}\rfloor}(1-p_{2i+1}) \\ &=\sum_{i=0}^{\lfloor\frac{n}{2}\rfloor}(\frac{2}{3}-\frac{2}{3}(\frac{1}{4})^i)+\sum_{i=0}^{\lfloor\frac{n}{2}\rfloor}(\frac{2}{3}+\frac{1}{3}(\frac{1}{4})^i) \\ &=\frac{2}{3}\left\lfloor\frac{n}{2}\right\rfloor-\frac{1}{3}\sum_{i=0}^{\lfloor\frac{n}{2}\rfloor}(\frac{1}{4})^i \end{align}

The second term in the equation above is asymtotically constant.
So, $C_{av}\simeq\frac{2}{3}n$ .

Code

#include <algorithm>
#include <cassert>
#include <iostream>
#include <vector>
using namespace std;

struct Pool {
    int val;
    size_t cnt{0};
};

void sort_seq(vector<int>& v, int& cnt_comparison) {
    cnt_comparison = 0;

    // examine data
    size_t cnt_zero{0};
    size_t cnt_one{0};
    Pool p;
    for (size_t i{0}; i < v.size() - 1; ++i) {
        if (!p.cnt) {
            p.val = v[i];
            p.cnt++;
        } else {
            ++cnt_comparison;
            if (p.val < v[i]) {
                cnt_zero += p.cnt;
                ++cnt_one;
                p.cnt = 0;
            } else if (v[i] < p.val) {
                ++cnt_zero;
                cnt_one += p.cnt;
                p.cnt = 0;
            } else
                p.cnt++;
        }
    }

    // clean pool
    if (p.cnt) {
        ++cnt_comparison;
        if (!p.val)
            cnt_zero += p.cnt;
        else
            cnt_one += p.cnt;
    }

    // reconstruct
    auto it{v.begin()};
    fill(it, it + cnt_zero, 0);
    it += cnt_zero;
    *it++ = v.back();
    fill(it, it + cnt_one, 1);
}

int main(int argc, char const* argv[]) {
    vector<int> v;
    int cnt_comparison;

    v = {0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0}));
    v = {1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{1}));

    v = {0, 1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 1}));
    v = {1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 1}));

    v = {0, 0, 1};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));
    v = {0, 1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));
    v = {1, 0, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 1}));

    v = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1}));
    assert(cnt_comparison == 5);
    v = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    sort_seq(v, cnt_comparison);
    assert(v == (vector<int>{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}));
    assert(cnt_comparison == 10);

    v.resize(1000);
    auto it{v.begin()};
    fill(it, it + 500, 0);
    it += 500;
    fill(it, it + 500, 1);
    vector cp_v{v};
    random_shuffle(v.begin(), v.end());
    sort_seq(v, cnt_comparison);
    assert(v == cp_v);
    cout << "Comparison No1| Count: " << cnt_comparison
         << ", Propertion:" << (double)cnt_comparison / 1000 << endl;
    v.resize(10000);
    it = v.begin();
    fill(it, it + 5000, 0);
    it += 5000;
    fill(it, it + 5000, 1);
    cp_v = v;
    random_shuffle(v.begin(), v.end());
    sort_seq(v, cnt_comparison);
    assert(v == cp_v);
    cout << "Comparison No2| Count: " << cnt_comparison
         << ", Propertion:" << (double)cnt_comparison / 10000 << endl;

    return 0;
}

Computed Comparison Counts

Comparison No1| Count: 663, Propertion:0.663
Comparison No2| Count: 6631, Propertion:0.6631