Sunday, July 26, 2020

Leetcode 1531. String Compression II

https://leetcode.com/problems/string-compression-ii/description/
Notes:

One way to solve this problem is that: for each char, we can either delete it or keep it. If we try all the possibilities within the limits, then we will find the optimal answer. But the complexity is factorial (2^N, where N is the length of the string).

This method can be optimized by dynamic programming (dp). For dp, the key is to find a set of parameters that can describe a state completely.

To this question, we can use 4 parameters:
index:  the position in the string
k:  the number of char that can be deleted
last:  the last char
ct:  the count of the last char.

Then the transition formula can be written as:
if we delete a char:
    dp[index][k][last][ct] = dp[index+1][k-1][last][ct];
if we keep it:
there are two sub cases:
    1): when s[index] != last char
        dp[index][k][last][ct] = 1 + dp[index+1][k][s[index]][1];
    2): when s[index] == last char
    there are another 4 sub cases:
        a): if the ct == 1
        based on the question description, when the count is 1, it will not be written in the compressed string. But now the count will becomes 2, so it will be written into the compressed string, thus
            dp[index][k][last][ct] = 1 + dp[index+1][k][last][2];
        b): if the ct == 9
        the new count will be 10, thus
            dp[index][k][last][ct] = 1 + dp[index+1][k][last][10];
        c): if the ct == 99
            dp[index][k][last][ct] = 1 + dp[index+1][k][last][100];
        d): others, the count will be increased by 1, but there is no change on the length of the compressed string
            dp[index][k][last][ct] = dp[index+1][k][last][ct+1];

for each step:  we need to get the minimum of all these possible choices.

Further optimization can be done to reduce the size of the dp array. For example, we do not have to set the count (ct) range up to 100, since when ct in the range from 10 to 99, there is no change on the compressed string. So we can handle the case when s.size() == 100 separately. Then the ct can be set in the range of 10 (since 10 to 99 contributes then same length to the compressed string).

Another optimization is that: we can group some sub-cases together.

A top-down dfs + memo can be employed to this dp method.

See the code below:

class Solution {
public:
    int getLengthOfOptimalCompression(string s, int k) {
        // special case: otherwise needs too much memory for the memo:  27 x 4 M vs 2.7 x 4 M
        this->s = s;
        if(k==0) {
            int ct = 1;
            for(int i=1; i<s.size(); ++i) {
                if(s[i] == s[i-1]) ++ct;
            }
            if(ct == 100) return 4;
        }
        std::memset(memo, -1, sizeof memo);
        return dfs(0, k, 26, 0);
    }

private:
    int memo[101][101][27][11];
    string s;
    int dfs(int id, int k, int last, int ct) {
        if(k<0) return 99999;
        if(id == s.size()) return 0;
        if(memo[id][k][last][ct] != -1) return memo[id][k][last][ct];
        // remove s[id]
        int res = dfs(id+1, k-1, last, ct);
        // use s[id]
        if(s[id] - 'a' != last) {
            res = min(res, 1 + dfs(id+1, k, s[id] - 'a', 1));
        }
        else {
            int t = (ct == 1 || ct == 9) ? 1 : 0;
            res = min(res, t + dfs(id+1, k, last, min(10, ct + 1)));
        }
        return memo[id][k][last][ct] = res;
    }
};

No comments:

Post a Comment