Longest increasing subsequence

37,956

Solution 1

I just stumbled in this problem, and came up with this Python 3 implementation:

def subsequence(seq):
    if not seq:
        return seq

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by 
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    return result[::-1]    # reversing

Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:

  • seq is the input sequence.
  • L is a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.
  • M is a list. M[j-1] will point to an index of seq that holds the smallest value that could be used (at the end) to build an increasing subsequence of length j.
  • P is a list. P[i] will point to M[j], where i is the index of seq. In a few words, it tells which is the previous element of the subsequence. P is used to build the result at the end.

How the algorithm works:

  1. Handle the special case of an empty sequence.
  2. Start with a subsequence of 1 element.
  3. Loop over the input sequence with index i.
  4. With a binary search find the j that let seq[M[j] be < than seq[i].
  5. Update P, M and L.
  6. Traceback the result and return it reversed.

Note: The only differences with the wikipedia algorithm are the offset of 1 in the M list, and that X is here called seq. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answer and it passed all tests.


Example:

seq = [30, 10, 20, 50, 40, 80, 60]

       0    1   2   3   4   5   6   <-- indexes

At the end we'll have:

M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]

As you'll see P is pretty straightforward. We have to look at it from the end, so it tells that before 60 there's 40,before 80 there's 40, before 40 there's 20, before 50 there's 20 and before 20 there's 10, stop.

The complicated part is on M. At the beginning M was [0, None, None, ...] since the last element of the subsequence of length 1 (hence position 0 in M) was at the index 0: 30.

At this point we'll start looping on seq and look at 10, since 10 is < than 30, M will be updated:

if j == L or seq[i] < seq[M[j]]:
    M[j] = i

So now M looks like: [1, None, None, ...]. This is a good thing, because 10 have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)

Now it's the turn of 20. With 10 and 20 we have subsequence of length 2 (index 1 in M), so M will be: [1, 2, None, ...]. (The new 2 is the index of 20)

Now it's the turn of 50. 50 will not be part of any subsequence so nothing changes.

Now it's the turn of 40. With 10, 20 and 40 we have a sub of length 3 (index 2 in M, so M will be: [1, 2, 4, None, ...] . (The new 4 is the index of 40)

And so on...

For a complete walk through the code you can copy and paste it here :)

Solution 2

Here is how to simply find longest increasing/decreasing subsequence in Mathematica:

 LIS[list_] := LongestCommonSequence[Sort[list], list];
 input={0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15};
 LIS[input]
 -1*LIS[-1*input]

Output:

{0, 2, 6, 9, 11, 15}
{12, 10, 9, 5, 3}

Mathematica has also LongestIncreasingSubsequence function in the Combinatorica` libary. If you do not have Mathematica you can query the WolframAlpha.

C++ O(nlogn) solution

There's also an O(nlogn) solution based on some observations. Let Ai,j be the smallest possible tail out of all increasing subsequences of length j using elements a1, a2, ... , ai. Observe that, for any particular i, Ai,1, Ai,2, ... , Ai,j. This suggests that if we want the longest subsequence that ends with ai + 1, we only need to look for a j such that Ai,j < ai + 1 < = Ai,j + 1 and the length will be j + 1. Notice that in this case, Ai + 1,j + 1 will be equal to ai + 1, and all Ai + 1,k will be equal to Ai,k for k!=j+1. Furthermore, there is at most one difference between the set Ai and the set Ai + 1, which is caused by this search. Since A is always ordered in increasing order, and the operation does not change this ordering, we can do a binary search for every single a1, a2, ... , an.

Implementation C++ (O(nlogn) algorithm)

#include <vector>
using namespace std;

/* Finds longest strictly increasing subsequence. O(n log k) algorithm. */
void find_lis(vector<int> &a, vector<int> &b)
{
  vector<int> p(a.size());
  int u, v;

  if (a.empty()) return;

  b.push_back(0);

  for (size_t i = 1; i < a.size(); i++) {
      if (a[b.back()] < a[i]) {
          p[i] = b.back();
          b.push_back(i);
          continue;
      }

      for (u = 0, v = b.size()-1; u < v;) {
          int c = (u + v) / 2;
          if (a[b[c]] < a[i]) u=c+1; else v=c;
      }

      if (a[i] < a[b[u]]) {
          if (u > 0) p[i] = b[u-1];
          b[u] = i;
      }   
  }

  for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v;
}

/* Example of usage: */
#include <cstdio>
int main()
{
  int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 };
  vector<int> seq(a, a+sizeof(a)/sizeof(a[0]));
  vector<int> lis;
        find_lis(seq, lis);

  for (size_t i = 0; i < lis.size(); i++)
      printf("%d ", seq[lis[i]]);
        printf("\n");    

  return 0;
}

Source: link

I have rewritten the C++ implementation to Java a while ago, and can confirm it works. Vector alternative in python is List. But if you want to test it yourself, here is link for online compiler with example implementation loaded: link

Example data is: { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 } and answer: 1 3 4 5 6 7.

Solution 3

Here is a pretty general solution that:

  • runs in O(n log n) time,
  • handles increasing, nondecreasing, decreasing and nonincreasing subsequences,
  • works with any sequence objects, including list, numpy.array, str and more,
  • supports lists of objects and custom comparison methods through the key parameter that works like the one in the builtin sorted function,
  • can return the elements of the subsequence or their indices.

The code:

from bisect import bisect_left, bisect_right
from functools import cmp_to_key

def longest_subsequence(seq, mode='strictly', order='increasing',
                        key=None, index=False):

  bisect = bisect_left if mode.startswith('strict') else bisect_right

  # compute keys for comparison just once
  rank = seq if key is None else map(key, seq)
  if order == 'decreasing':
    rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank)
  rank = list(rank)

  if not rank: return []

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # seq[i] can extend a subsequence that ends with a lesser (or equal) element
    j = bisect([rank[k] for k in lastoflength], rank[i])
    # update existing subsequence of length j or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # remember element before seq[i] in the subsequence
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  # trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1
  def trace(i):
    if i is not None:
      yield from trace(predecessor[i])
      yield i
  indices = trace(lastoflength[-1])

  return list(indices) if index else [seq[i] for i in indices]

I wrote a docstring for the function that I didn't paste above in order to show off the code:

"""
Return the longest increasing subsequence of `seq`.

Parameters
----------
seq : sequence object
  Can be any sequence, like `str`, `list`, `numpy.array`.
mode : {'strict', 'strictly', 'weak', 'weakly'}, optional
  If set to 'strict', the subsequence will contain unique elements.
  Using 'weak' an element can be repeated many times.
  Modes ending in -ly serve as a convenience to use with `order` parameter,
  because `longest_sequence(seq, 'weakly', 'increasing')` reads better.
  The default is 'strict'.
order : {'increasing', 'decreasing'}, optional
  By default return the longest increasing subsequence, but it is possible
  to return the longest decreasing sequence as well.
key : function, optional
  Specifies a function of one argument that is used to extract a comparison
  key from each list element (e.g., `str.lower`, `lambda x: x[0]`).
  The default value is `None` (compare the elements directly).
index : bool, optional
  If set to `True`, return the indices of the subsequence, otherwise return
  the elements. Default is `False`.

Returns
-------
elements : list, optional
  A `list` of elements of the longest subsequence.
  Returned by default and when `index` is set to `False`.
indices : list, optional
  A `list` of indices pointing to elements in the longest subsequence.
  Returned when `index` is set to `True`.
"""

Some examples:

>>> seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]

>>> longest_subsequence(seq)
[0, 2, 6, 9, 11, 15]

>>> longest_subsequence(seq, order='decreasing')
[12, 10, 9, 5, 3]

>>> txt = ("Given an input sequence, what is the best way to find the longest"
               " (not necessarily continuous) non-decreasing subsequence.")

>>> ''.join(longest_subsequence(txt))
' ,abdegilnorsu'

>>> ''.join(longest_subsequence(txt, 'weak'))
'              ceilnnnnrsssu'

>>> ''.join(longest_subsequence(txt, 'weakly', 'decreasing'))
'vuutttttttssronnnnngeee.'

>>> dates = [
...   ('2015-02-03', 'name1'),
...   ('2015-02-04', 'nameg'),
...   ('2015-02-04', 'name5'),
...   ('2015-02-05', 'nameh'),
...   ('1929-03-12', 'name4'),
...   ('2023-07-01', 'name7'),
...   ('2015-02-07', 'name0'),
...   ('2015-02-08', 'nameh'),
...   ('2015-02-15', 'namex'),
...   ('2015-02-09', 'namew'),
...   ('1980-12-23', 'name2'),
...   ('2015-02-12', 'namen'),
...   ('2015-02-13', 'named'),
... ]

>>> longest_subsequence(dates, 'weak')

[('2015-02-03', 'name1'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> from operator import itemgetter

>>> longest_subsequence(dates, 'weak', key=itemgetter(0))

[('2015-02-03', 'name1'),
 ('2015-02-04', 'nameg'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> indices = set(longest_subsequence(dates, key=itemgetter(0), index=True))

>>> [e for i,e in enumerate(dates) if i not in indices]

[('2015-02-04', 'nameg'),
 ('1929-03-12', 'name4'),
 ('2023-07-01', 'name7'),
 ('2015-02-15', 'namex'),
 ('1980-12-23', 'name2')]

This answer was in part inspired by the question over at Code Review and in part by question asking about "out of sequence" values.

Solution 4

    int[] a = {1,3,2,4,5,4,6,7};
    StringBuilder s1 = new StringBuilder();
    for(int i : a){
     s1.append(i);
    }       
    StringBuilder s2 = new StringBuilder();
    int count = findSubstring(s1.toString(), s2);       
    System.out.println(s2.reverse());

public static int findSubstring(String str1, StringBuilder s2){     
    StringBuilder s1 = new StringBuilder(str1);
    if(s1.length() == 0){
        return 0;
    }
    if(s2.length() == 0){
        s2.append(s1.charAt(s1.length()-1));
        findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2);           
    } else if(s1.charAt(s1.length()-1) < s2.charAt(s2.length()-1)){ 
        char c = s1.charAt(s1.length()-1);
        return 1 + findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2.append(c));
    }
    else{
        char c = s1.charAt(s1.length()-1);
        StringBuilder s3 = new StringBuilder();
        for(int i=0; i < s2.length(); i++){
            if(s2.charAt(i) > c){
                s3.append(s2.charAt(i));
            }
        }
        s3.append(c);
        return Math.max(findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2), 
                findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s3));
    }       
    return 0;
}

Solution 5

Here is some python code with tests which implements the algorithm running in O(n*log(n)). I found this on a the wikipedia talk page about the longest increasing subsequence.

import unittest


def LongestIncreasingSubsequence(X):
    """
    Find and return longest increasing subsequence of S.
    If multiple increasing subsequences exist, the one that ends
    with the smallest value is preferred, and if multiple
    occurrences of that value can end the sequence, then the
    earliest occurrence is preferred.
    """
    n = len(X)
    X = [None] + X  # Pad sequence so that it starts at X[1]
    M = [None]*(n+1)  # Allocate arrays for M and P
    P = [None]*(n+1)
    L = 0
    for i in range(1,n+1):
        if L == 0 or X[M[1]] >= X[i]:
            # there is no j s.t. X[M[j]] < X[i]]
            j = 0
        else:
            # binary search for the largest j s.t. X[M[j]] < X[i]]
            lo = 1      # largest value known to be <= j
            hi = L+1    # smallest value known to be > j
            while lo < hi - 1:
                mid = (lo + hi)//2
                if X[M[mid]] < X[i]:
                    lo = mid
                else:
                    hi = mid
            j = lo

        P[i] = M[j]
        if j == L or X[i] < X[M[j+1]]:
            M[j+1] = i
            L = max(L,j+1)

    # Backtrack to find the optimal sequence in reverse order
    output = []
    pos = M[L]
    while L > 0:
        output.append(X[pos])
        pos = P[pos]
        L -= 1

    output.reverse()
    return output

# Try small lists and check that the correct subsequences are generated.

class LISTest(unittest.TestCase):
    def testLIS(self):
        self.assertEqual(LongestIncreasingSubsequence([]),[])
        self.assertEqual(LongestIncreasingSubsequence(range(10,0,-1)),[1])
        self.assertEqual(LongestIncreasingSubsequence(range(10)),range(10))
        self.assertEqual(LongestIncreasingSubsequence(\
            [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9]), [1,2,3,5,8,9])

unittest.main()
Share:
37,956
Jungle Hunter
Author by

Jungle Hunter

I just started using my account here on Stack Overflow yesterday (July 21, 2010).

Updated on October 07, 2021

Comments

  • Jungle Hunter
    Jungle Hunter over 2 years

    Given an input sequence, what is the best way to find the longest (not necessarily continuous) increasing subsequence

    [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]  # input
    
    [1, 9, 13, 15]  # an example of an increasing subsequence (not the longest)
    
    [0, 2, 6, 9, 13, 15]  # longest increasing subsequence (not a unique answer)
    [0, 2, 6, 9, 11, 15]  # another possible solution
    

    I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.