The Algorithm Design Manual (3rd) Exercise Solution 'E4-8'

Table of Contents

Introduction
Repository
Exercise 4-8

Introduction

This series is my solutions for algorithm exercises in'The Algorithm Design Manual' 3rd edition.
It's my solutions, not model answers, so if you find some mistakes or more sophisticated solutions, please post in comment area below.

Repository

Here

Exercise 4-8

Question

Outline a reasonable method of solving each of the following problems. Give the order of the worst-case complexity of your methods.

(a) You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.

(b) You are given a printed list containing the title, author, call number, and publisher of all the books in a school library and another list of thirty publishers. Find out how many of the books in the library were published by each company.

(c) You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

Solution

Subproblem_(a)

Algorithm:

Sort both billsand checks by their recipient names.
Loop while there is at least one bill in workspace.
    Set x as the name of the bill at the top of the stack.
    If (x corresponds to the name of the check at the top of the stack of check) then
        Remove both the bill and the check from the workspace as correctly processed.
    Else then
        Record the name.
        Remove the bill from the workspace.
The names we've recorded are the ones we're searching.

Next, we think the complexity.
Suppose the number of bills is n, and the number of checks is m.
Apparently, n>m.
So, the complexity is

\begin{equation} O(nlogn)+O(mlogm)+O(n)=O(nlogn) \end{equation}

Subproblem_(b)

Algorithm:

Loop while there is at least one book not labeled as checked.
    Pick one book and see the publisher.
    Count up the count of the books the publisher published.
    Label the book as checked.
Then we have the number of books published by each publisher.

The complexity for counting-up the publisher of the book is O(30).
So, the complexity is $O(30n)=O(n)$ .

Subproblem_(c)

Outline

For simplicity, suppose all names follow name convention of "First_name Last_name" (separated by one space)
First, we distribute the names of cards into hash table with hashing using two heading letter of first name and that of last name.
Then, sort each bucket and record distinct names in it.

Complexity:

\begin{align} O((n/(26**4)*log(n/(26**4)))*26**4)&= O(nlog(n/456976))\\ &= O(nlogn) \end{align}

The complexity is the same as the case if we don't distribute into buckets. But the time required for sorting is smaller, so whole performance is better.

Code

// For simplicity, suppose all names follow name convention of
//  "First_name Last_name" (separated by one space)
// First, we distribute the names of cards into hash table
//  with hashing using two heading letter of first name and that of last name
// Then, sort each bucket and record distinct names in it.

// Complexity: O((n/(26**4)*log(n/(26**4)))*26**4)
//  = O(nlog(n/456976))
//  = O(nlogn)
//  The complexity is the same as the case if we don't distribute into buckets.
//  But the time required for sorting is smaller, so whole performance is
//      better.

#include <algorithm>
#include <cassert>
#include <string>
#include <unordered_map>
#include <vector>

using namespace std;

constexpr char DELIM{' '};
vector<string> identify_distinct_names(const vector<string>& names) {
    unordered_map<string, vector<string>> buckets;
    for (const auto& nm : names) {
        auto p_dl{nm.find(DELIM)};
        string key{nm.substr(0, min((size_t)2, p_dl)) + nm.substr(p_dl + 1, 2)};
        buckets[key].push_back(nm);
    }

    vector<string> dist_nms;
    for (auto& [k, v] : buckets) {
        sort(v.begin(), v.end());
        string_view prev{""};
        for (const auto& nm : v) {
            if (nm != prev) {
                dist_nms.push_back(nm);
                prev = nm;
            }
        }
    }
    return dist_nms;
}

int main(int argc, char const* argv[]) {
    auto result{identify_distinct_names(
        {"Abb Cdd", "Abc Cdd", "Abb Cde", "Abb Cdd", "Abc Cdd", "Abc Cde",
         "Add Cbb", "Add Cbb", "Add Cbb", "Add Cbb", "Adc Cbe", "Adc Cbb",
         "Adc Cbe", "Adc Cbe", "Add Cbe", "Add Cbb", "Eff Ghh", "Eff Ghh",
         "Efi Ghh", "Efi Ghh", "Eff Ghj", "Eff Ghj", "Rff Ghh", "RRR KKK"})};
    assert(result.size() == 13);
    sort(result.begin(), result.end());
    assert(result[0] == "Abb Cdd");
    assert(result[1] == "Abb Cde");
    assert(result[2] == "Abc Cdd");
    assert(result[3] == "Abc Cde");
    assert(result[4] == "Adc Cbb");
    assert(result[5] == "Adc Cbe");
    assert(result[6] == "Add Cbb");
    assert(result[7] == "Add Cbe");
    assert(result[8] == "Eff Ghh");
    assert(result[9] == "Eff Ghj");
    assert(result[10] == "Efi Ghh");
    assert(result[11] == "RRR KKK");
    assert(result[12] == "Rff Ghh");

    return 0;
}