An Efficient Parallel FP-Growth Algorithm for Big Data Association Rule Mining

Main Article Content

John Michael Smith

Abstract

The FP-Growth algorithm is a proficient method for correlation analysis, notable for its ability to operate without generating candidate sets and requiring only two database scans. In practical applications, it can represent transaction data within a compressed FP-Tree in memory. This approach addresses the limitations of the Apriori algorithm by decreasing the number of database scans and reducing the candidate sets. However, when applied to large datasets, the FP-Growth algorithm can lead to an excessively large tree structure, straining memory capacity.To address this issue, this paper introduces a parallel FP-Growth algorithm tailored for a big data framework, known as the MRFPG algorithm. This algorithm incorporates load balancing and utilizes intermediate agents to distribute transaction data fragments across a computer cluster. Each node then initiates a map-reduce process to filter out non-frequent items and identify all frequent patterns. Compared to the traditional FP-Growth algorithm, the MRFPG algorithm significantly reduces I/O overhead. Experimental results demonstrate that the MRFPG algorithm is both efficient and rapid.

Article Details

How to Cite
Smith, J. M. (2024). An Efficient Parallel FP-Growth Algorithm for Big Data Association Rule Mining. Journal of Computer Science and Software Applications, 4(1), 1–8. https://doi.org/10.5281/jcssa.v4i1.57
Section
Articles

References

Han J W, Micheline K. Concept and technology of data mining[M]. Beijing: Machinery Industry Press, 2013.

Cui Yan, ZQ Bao. Summary of association rules [J]. Computer application research, 2016, 33(02):33-37.

Zeng,Y,Yin S,Liu J,Zhang M. Research of improved FP-growth algorithm in association rules mining[J]. Scientific Programming, 2015, 46(10),281-285.

Yan Yun, YX Luo. Improvement of FP-Tree algorithm [J]. Computer engineering and design, 2010,31(07):1506-1510.

Rathi,Sheetal,Dhote,C.A. Parallel implementation of FP growth algorithm on XML data using multiple GPU[J]. Advances in Intelligent Systems and Computing, 2015.8, 339,581-589.

Tsai,Chih-Fong. Big data mining with parallel computing:A comparison of distributed and MapReduce methodologies[J]. Journal of Systems and Software, 2016,122,83-92.

DEAN J,GHEMAWAT S. MapReduce:simplified data processing on large clusters[J]. Communications Of The ACM, 2008,51(1):107-113.

SH Li, ZW Lv, DY Che. Maximum frequent itemset mining algorithm based on ordered FP- tree[J]. Northeast Normal University newspaper (NATURAL SCIENCE), 2016,48(2):65-69.

[Pang-NingT, VipinK, Michael S. Introduction to Data Mining[M]. Beijing:People's post and Telecommunications Press, 2011.

LJ Zhou, X Wang. Research on association rules algorithm in cloud environment[J]. Computer engineering and design, 2014,35(02):499-503.

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.