Monday, September 7, 2020

Leetcode 730: Count Different Palindromic Subsequences

 https://leetcode.com/problems/count-different-palindromic-subsequences/description/



Notes:

[update]:    3-dimensional dp

The two dimensional dp  (see the second method) is not easy to understand, a better way (in rationalization) is to use a three dimensional dp.

There is one condition that has not been used: there are only 4 types of letter.

So we can define a dp[i][j][k], meaning the unique number of palindrome ending with letter (k + 'a') in the range of [i, j].  Since it is palindrome, so it should start with (k+'a') as well.

Then the total number of unique palindromes in the range of [i, j] should be the Sum (dp[i][j][k]) over all k values.

Now let us look at the transition formula:

1): if(i>j) dp[i][j][k] = 0;
2): if(i==j && s[i] == k + 'a')  dp[i][j][k] = 1;
3): if (s[i] != s[j])  dp[i][j][k] = dp[i+1][j][k] + dp[i][j-1][k] - dp[i+1][j-1][k];
4): if (s[i] == s[j])  dp[i][j][k] = 2 + Sum (dp[i+1][j-1][a]), where a can be all letters. 
     (note: 2 for single "s[i]" and double "s[i]s[j]")

See the code below:

class Solution {
public:
    int countPalindromicSubsequences(string S) {
        int n = S.size(), mod = 1e9+7;
       // vector<vector<vector<long>>> dp(n, vector<vector<long>>(n, vector<long>(4, 0)));
        for(int i=0; i<n; ++i) dp[i][i][S[i]-'a'] = 1;
        
        for(int i=n-2; i>=0; --i) {
            for(int j=i+1; j<n; ++j) {
                for(int k=0; k<4; ++k) {
                    if(S[i] != S[j] || S[i] != (k + 'a')) {
                        dp[i][j][k] = (dp[i][j][k] + dp[i+1][j][k] + dp[i][j-1][k] - dp[i+1][j-1][k] + mod)%mod;
                    }
                    else {
                        dp[i][j][k] = (dp[i][j][k] + 2)%mod;
                        for(int t=0; t<4; ++t) {
                            dp[i][j][k] = (dp[i][j][k] + dp[i+1][j-1][t])%mod;
                        }
                    }
                }
            }
        }
        long res = 0;
        for(int i=0; i<4; ++i) {
            res = (res + dp[0][n-1][i])%mod;
        }
        return res;
    }
private:
    int dp[1000][1000][4];
};


 2-dimensional dp

This question is not easy to solve. Even though it may indicate using dp, but it is not easy to derive the correct state transition formula. 

One of the difficulties is how to avoid duplicated counting. If we distinguish the palindromes by their indexes in the string rather than themselves, then this question becomes relative easier. So let start with this first.

number of palindromes distinguished by their indexes in the string

Let use dp[i][j] to represent the number of palindromes in the range [i, j] with i <= j. Then,

1): if s[i] != s[j], dp[i][j] = dp[i+1][j] + dp[i][j-1] - dp[i+1][j-1];
2): if s[i]== s[j], we have to choices: a): using s[i] and s[j]; b): not using s[i] and s[j]

Apparently, b) is the same as the case in 1). And in a), all the palindrome have to start with s[i] and end with s[j]. Thus, there are dp[i+1][j-1] + 1 in total. The 1 means "s[i]s[j]".

So the total palindromes in case 2) is:  number in a) + number in b)
       dp[i+1][j-1] + 1 + dp[i+1][j] + dp[i][j-1] - dp[i+1][j-1] = dp[i+1][j] + dp[i][j-1] +1.

But this is not what this question asks about! In this question, distinguished palindromes are defined by their letters not by their indexes in the string. So we need to remove the duplicates.

number of palindromes distinguished by their letters

To start, let us define two kinds of dps: one is dp[i][j], another one is bp[i][j].

dp[i][j]: means the number of unique palindromes in the range [i, j] not necessary starting with the ith letter and ending with the jth letter;

bp[i][j]: means the number of unique palindromes in the range [i, j] starting with the ith letter and ending with the jth letter;

Then let re-look at the two situations we had analyzed above:

1): when s[i] != s[j], then dp[i][j] = dp[i+1][j] + dp[i][j-1] - dp[i+1][j-1]
2): when s[i] == s[j], we still have two cases: a): using s[i] and s[j]; b) not using s[i] and s[j]

Based on our destinations above, a) is bp[i][j] = dp[i+1][j-1] + 1, and b) is dp[i+1][j] + dp[i][j-1] - (duplicated part). If there are still duplicates between a) and b), we still need to remove them.

There are another three different situations for 2). 
(i) when there is no more s[i] letter in the range of [i+1, j-1]
     in this case, the count in b) is:
         dp[i+1][j] = dp[i+1][j-1] + 1   (1 for s[i])
         dp[i][j-1] = dp[i+1][j-1] + 1    (1 for s[j]) 
     Thus, the duplicated part is  dp[i+1][j-1] + 1   (since s[i] and s[j] are the same).
     So the total count for b) is  dp[i+1][j-1] + 1.
    
      Now let check the overlap between a) and b). In a), palindrome has to start with s[i] and end with s[i] as well, so there is no overlap between them. 
      Thus, the overall count is:  union( a + b)
      union(bq[i][j]   +  dp[i+1][j-1] + 1)  = dp[i+1][j-1] + 1 + dp[i+1][j-1] + 1 
       = 2*dp[i+1][j-1] + 2

(ii) there is only one more s[i] letter in the range [i+1, j-1], let say it is s[k], where i<k<j.
      then the count for b) is:
       dp[i+1][j] = dp[i+1][j-1] + bp[k][j],   the single "s[i]" has been included in dp[i+1][j-1], s[i] == s[k]
       dp[i][j-1] = dp[i+1][j-1] + bp[i][k],    the single "s[i]" has been included in dp[i+1][j-1], s[i] == s[k]

      Apparently, one overlap part between dp[i+1][j] and dp[i][j-1] is dp[i+1][j-1], so we just take one of them in the final counting. 

      But how to deal with bp[k][j] and bp[i][k]? They may have overlap as well!

      It seems very complicated, but actually is not. Since we have to remove the duplicates between 1) and 2) as well, so we just look at 1) first. In 1): the total counting (before removing duplicates with 2))  is bp[i][j]. It is easy to tell that both bp[k][j] and bp[i][k] are subsets of bp[i][j]. Thus, both of them can be removed directly!
      Then, the overall count becomes:  union( a + b)
      
       union(bq[i][j]   +  dp[i+1][j-1] + bp[k][j] + bp[i][k]) = dp[i+1][j-1] + 1 + dp[i+1][j-1] 
       = 2 * dp[i+1][j-1] + 1

(iii) where there are more than 1 s[i] letter in the range of [i+1, j-1]. 
        Let say, s[l] and s[r], i<l<r<j, and
        s[l] is the first letter equals to s[i] from the left in the range of [i+1, j-1];
        s[r] is the first letter equals to s[i] from the right in the range of [i+1, j-1];
       then the count for b) is:
       dp[i+1][j] = dp[i+1][j-1] + bp[l][j],   both single "s[i]" and "s[i]s[j]" have been included in dp[i+1][j-1],  due to that s[l] == s[r] == s[i];
       dp[i][j-1] = dp[i+1][j-1] + bp[i][r],   both single "s[i]" and "s[i]s[j]" have been included in dp[i+1][j-1],  due to that s[l] == s[r] == s[i].

        Similar to (ii), both bp[l][j] and bp[i][r] can be removed directly, since both of them are the subsets to bp[i][j]. For b) only one dp[i+1][j-1] remaining. Is there more overlap between a) and b)?

         The answer is yes. 
         dp[i][j-1] should include bp[l][r], and bp[i][j] should also include bp[l][r].

         Thus, we need to remove one bp[l][r] by - bp[l][r] = -(dp[l+1][r-1] + 1);

         Then the overall union is:  union( a + b)

        union(bq[i][j]   +  dp[i+1][j-1] + bp[l][j] + bp[i][r]) = union(bq[i][j] +  dp[i+1][j-1])
        = union (dp[i+1][j-1] + 1 + dp[i+1][j-1] - bq[l][j] - 1) =
        = 2 * dp[i+1][j-1]  - dp[l+1][r-1]


In summary, the station transition formula are:

1): if s[i] != s[j], dp[i][j] = dp[i+1][j] + dp[i][j-1] - dp[i+1][j-1];
2): if s[i] == s[j],
      a): if there is no more letter s[i] in [i+1, j-1]
            dp[i][j] = 2 * dp[i+1][j-1] + 2;
      b): if these is only one letter s[i] in [i+1, j-1]
            dp[i][j] = 2 * dp[i+1][j-1] + 1;
      c): if there are 2 or more letter s[i] in [i+1, j-1]
           dp[i][j] = 2 * dp[i+1][j-1]  - dp[l+1][r-1]. (for the meaning of l and r, see the above analysis)

See the code below:

class Solution {
public:
    int countPalindromicSubsequences(string S) {
        int n = S.size(), mod = 1e9+7;
        vector<vector<long>> dp(n+1, vector<long>(n, 0));
        for(int i=0; i<n; ++i) dp[i][i] = 1;
        for(int i=n-2; i>=0; --i) {
            for(int j=i+1; j<n; ++j) {
                if(S[i] != S[j]) dp[i][j] = (dp[i+1][j] + dp[i][j-1] - dp[i+1][j-1] + mod)%mod;
                else {
                    int left = i + 1, right = j-1;
                    while(left<n && S[left] != S[i]) ++left;
                    while(right>=0 && S[right] != S[i]) --right;
                    if(left>right) dp[i][j] = (dp[i+1][j-1]*2 + 2)%mod;
                    else if(left == right) dp[i][j] = (dp[i+1][j-1]*2 + 1)%mod;
                    else dp[i][j] = (dp[i+1][j-1]*2 - dp[left+1][right-1] + mod)%mod;
                }
            }
        }
        return dp[0][n-1];
    }
};

No comments:

Post a Comment