Difference between Closed and open Sequential Pattern Mining Algorithms

10,208

Solution 1

Glad that you are using my SPMF software.

The support of a sequential pattern is the number of sequences that contains the sequential pattern.

A frequent sequential pattern is a pattern that appears in at least "minsup" sequences of a sequence database, where minsup is a parameter set by the user.

A frequent closed sequential pattern is a frequent sequential pattern such that it is not included in another sequential pattern having exactly the same support.

Algorithms such as PrefixSpan finds frequent sequential patterns. Algorithms such as BIDE+ finds frequent closed sequential patterns. BIDE+ is usually much faster than PrefixSpan because it uses pruning techniques to avoid generating all sequential patterns. Moreover, the set of closed patterns is usually much smaller than the set of sequential patterns so BIDE+ is also more memory efficient.

Another important thing to know is that closed sequential patterns are a compact and lossless representation of all sequential patterns. This means that the set of closed sequential patterns is usually much smaller but it is lossless, which means that it allows recovering the full set of sequential patterns (no information is loss), which is very convenient.

I can give you a simple example.

Let's consider 4 sequences:

a  b  c  d  e
a  b  d
b  e  a  
b  c  d  e

Let's say that minsup = 2.

b c is a frequent sequential patterns because it appears in two sequences (it has a support of 2). b c is not a closed sequential patterns because it is contained in a larger sequential pattern b c d having the same support.

b c d has a support of 2. It is also not a closed sequential pattern because it is contained in a larger sequential pattern b c d e having the same support. b c d e is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.

By the way, you can also check my survey about sequential pattern mining. It gives a good introduction about this topic and the different algorithms.

Solution 2

Check out this chapter on Frequent Itemsets and Frequent item sets Mining & Association Rules

Solution 3

Google for "closed frequent itemsets". There will be plenty of pages explaining this, as will any data mining book (look for the APRIORI algorithm).

"Closed" says that there is no larger itemset with the same support. There can be larger itemsets, but they must have lower support.

For most use cases it is either sufficient to look at maximal or at closed itemsets only.

Share:
10,208

Related videos on Youtube

leon
Author by

leon

Updated on October 03, 2022

Comments

  • leon
    leon over 1 year

    I want to use some algorithms to mine my log data.

    I found a pattern mining framework on: http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php

    I have tried several algorithms, the BIDE+ algorithm performs the best.

    The BIDE+ algorithm is for mining frequent closed sequential patterns from a sequence database.

    Can someone explain the definition about "closed" sequential patterns and open ones?

  • R Claven
    R Claven almost 9 years
    This is going to help w my dissertation. Seriously. Thank you!
  • LancelotHolmes
    LancelotHolmes over 6 years
    First of all,thanks for your survey and SPMF,and here your explanation is quite clear,but the example may be not quite appropriate since the pattern b c appears in three sequences(1,3,4), so a minor change may be better.
  • Phil
    Phil over 6 years
    @LancelotHolmes Thanks for the comment. Yes, you are right. Fixed that error. Glad you like the survey and SPMF :-)
  • linello
    linello about 4 years
    Great explanation of the closed-open concept.
  • Joe Huang
    Joe Huang almost 4 years
    Hi, what about sequential generator patterns? what's generator? I can't find any details about it by google.