An Efficient Parallel FP-Growth Algorithm for Big Data Association Rule Mining
Main Article Content
Abstract
The FP-Growth algorithm is a proficient method for correlation analysis, notable for its ability to operate without generating candidate sets and requiring only two database scans. In practical applications, it can represent transaction data within a compressed FP-Tree in memory. This approach addresses the limitations of the Apriori algorithm by decreasing the number of database scans and reducing the candidate sets. However, when applied to large datasets, the FP-Growth algorithm can lead to an excessively large tree structure, straining memory capacity.To address this issue, this paper introduces a parallel FP-Growth algorithm tailored for a big data framework, known as the MRFPG algorithm. This algorithm incorporates load balancing and utilizes intermediate agents to distribute transaction data fragments across a computer cluster. Each node then initiates a map-reduce process to filter out non-frequent items and identify all frequent patterns. Compared to the traditional FP-Growth algorithm, the MRFPG algorithm significantly reduces I/O overhead. Experimental results demonstrate that the MRFPG algorithm is both efficient and rapid.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.
References
Han J W, Micheline K. Concept and technology of data mining[M]. Beijing: Machinery Industry Press, 2013.
Cui Yan, ZQ Bao. Summary of association rules [J]. Computer application research, 2016, 33(02):33-37.
Zeng,Y,Yin S,Liu J,Zhang M. Research of improved FP-growth algorithm in association rules mining[J]. Scientific Programming, 2015, 46(10),281-285.
Yan Yun, YX Luo. Improvement of FP-Tree algorithm [J]. Computer engineering and design, 2010,31(07):1506-1510.
Rathi,Sheetal,Dhote,C.A. Parallel implementation of FP growth algorithm on XML data using multiple GPU[J]. Advances in Intelligent Systems and Computing, 2015.8, 339,581-589.
Tsai,Chih-Fong. Big data mining with parallel computing:A comparison of distributed and MapReduce methodologies[J]. Journal of Systems and Software, 2016,122,83-92.
DEAN J,GHEMAWAT S. MapReduce:simplified data processing on large clusters[J]. Communications Of The ACM, 2008,51(1):107-113.
SH Li, ZW Lv, DY Che. Maximum frequent itemset mining algorithm based on ordered FP- tree[J]. Northeast Normal University newspaper (NATURAL SCIENCE), 2016,48(2):65-69.
[Pang-NingT, VipinK, Michael S. Introduction to Data Mining[M]. Beijing:People's post and Telecommunications Press, 2011.
LJ Zhou, X Wang. Research on association rules algorithm in cloud environment[J]. Computer engineering and design, 2014,35(02):499-503.