Sequential Pattern Extraction From Server Access Logs Using Plpc-Tree Algorithm

This contribution concerns mining frequent sequential patterns on click stream data applications contend with many challenges such as limited memory for unlimited data. Existing work on mining frequent patterns on data streams are mostly for non-sequential patterns and mines the frequent sequences from the WAP-tree by recursively re-constructing intermediate trees, starting with suffix sequences and ending with prefix sequences. This paper proposes an algorithm that uses data structure PLPC-tree to handle the complexities of mining frequent sequential patterns in data streams by totally elimination of numerous re-construction of intermediate WAP-trees during mining. The proposed algorithm constructs the tree while finding frequent individual events and then builds the frequent header node links of the original WAP-tree in an ordered fashion and uses the position code of each node to identify the ancestor/descendant relationships between nodes of the tree. It then, finds each frequent sequential pattern, through progressive prefix sequence search, starting with its first prefix subsequence event. Experiments show good performance gain over the WAP-tree technique.

For full paper refer attachment