AN Yu,CHEN Gui-fen,LI Jing.Research on Soybean Pre-Micro RNA Prediction Model Based on Recursive Feature Elimination and Random Forest Fusion Algorithm[J].Soybean Science,2020,39(03):401-405.[doi:10.11861/j.issn.1000-9841.2020.03.0401]
基于递归特征消除和随机森林融合算法的大豆前体MicroRNA预测模型研究
- Title:
- Research on Soybean Pre-Micro RNA Prediction Model Based on Recursive Feature Elimination and Random Forest Fusion Algorithm
- 关键词:
- 大豆; Pre-microRNA; 递归特征消除; 随机森林; 预测模型
- Keywords:
- Soybean; Pre-MicroRNA; Recursive Feature Elimination(RFE); Random Forest(RF); Prediction model
- 文献标志码:
- A
- 摘要:
- 随着大豆RNA基因的生物调控作用研究的不断深入,利用数据挖掘技术对大豆前体MicroRNA(pre-microRNA)进行有效的预测已成为该领域的重要发展方向。针对常规的随机森林算法在pre-microRNA预测模型中存在识别精度较低的问题,研究提出并构建基于递归特征消除(recursive feature elimination, RFE)与随机森林(random forest, RF)融合算法的大豆pre-microRNA预测模型。首先利用递归特征消除法筛选大豆pre-microRNA序列的最优特征子集;然后结合随机森林算法构建大豆pre-microRNA的预测模型;最后利用十折交叉验证法,将递归特征消除与随机森林(RFE-RF)融合模型的预测结果与单一随机森林和支持向量机分类模型的预测结果对比。研究结果表明:融合后构建的大豆pre-microRNA预测模型精度有明显提高,达到84.62%,相比于支持向量机算法(support vector machine, SVM)构建的模型精度提高了17.02%,相比于单独使用随机森林算法构建的模型精度提高了14.58%。该研究方法为大豆的pre-microRNA基因预测提供了新思路。
- Abstract:
- With the continuous in-depth research on the biological regulatory effects of small genes in soybean, the use of data mining technology to effectively predict the pre-MicroRNA of soybean has become an important development direction in this field. To solve the problem that conventional Random Forest (RF) algorithm has low recognition accuracy in pre-MicroRNA prediction model, this study proposed and constructed a soybean pre-microRNA prediction model based on Recursive Feature Elimination (RFE) and RF fusion algorithm. Firstly, we used the RFE method to select the optimal feature subset of soybean pre-MicroRNA sequences. Then, we constructed a prediction model of soybean pre-MicroRNA based on RF algorithm. Finally, we compared the prediction results of the RFE-RF fusion model with the prediction results of the single RF and Support Vector Machine(SVM) classification model. The results showed that the accuracy of the soybean Pre-MicroRNA prediction model constructed after fusion was significantly improved, reaching 84.62%, 17.02% higher than the model constructed by SVM algorithm, and 14.58% higher than the model constructed by RF algorithm alone. This method provides a new idea for the prediction of pre-MicroRNA genes in soybean.
参考文献/References:
[1]Bartel D P. MicroRNAs: Genomics,biogenesis,mechanism, and function[J]. Cell, 2004, 116: 281- 297.[2]Ambros V. The functions of animal MicroRNAs[J]. Nature, 2004, 431(76): 350-352.[3]Reinhart B J, Weinstein E G. MicroRNAs in plant[J]. Gene Development, 2002, 16(13): 1616-1626.[4]金伟波, 李楠楠, 吴方丽, 等. 水稻MicroRNA的预测及实验验证[J].中国生物化学与分子生物学报, 2007, 23(9): 743-750. (Jin W B, Li N N, Wu F L, et al. Prediction and experimental verification of rice MicroRNA [J]. Chinese Journal of Biochemistry and Molecular Biology, 2007, 23 (9): 743-750.)[5]金伟波. 基于支持向量机方法的植物miRNA预测及小麦miRNA的克隆[D]. 杨凌: 西北农林科技大学, 2007. (Jin W B. Prediction of miRNA in plants and cloning of miRNA in wheat based on support vector machine [D]. Yangling: North West Agriculture and Forestry University, 2007.)[6]刘永鑫, 韩英鹏, 常玮, 等. 一种适合大豆MicroRNA鉴定的RT-PCR方法[J].大豆科学, 2009, 28(4): 600-604. (Liu Y X, Han Y P, Chang W, et al. A RT-PCR method suitable for identification of soybean MicroRNA [J]. Soybean Science,2009, 28(4):600-604.)[7]陈旭. 玉米microRNA的计算机预测与克隆及在干旱下的差异表达分析[D]. 雅安: 四川农业大学,2009.(Chen X. Computer prediction and cloning of maize microRNA and differential expression analysis in drought [D]. Ya′an: Sichuan Agricultural University, 2009.)[8]Huang Y, Zou Q, Sun X H, et al. Computational identification of microRNAs and their targets in perennoal ryegrass (Lolium perenne)[J]. Applied Biochemistry and Biotechnology, 2014, 173(4): 1011-1122.[9]李小平,曾庆发,赵娟.大豆生长素响应因子GmARF16器官表达特征及抗降解表达载体的构建[J]. 大豆科学, 2014, 33(5):661-666. (Li X P, Zeng Q F, Zhao J. Expression characteristics of soybean auxin response factor GmARF16 organ and construction of anti-degradation expression vector[J]. Soybean Science, 2014, 33(5): 661-666.)〖ZK)〗[10]倪志勇,于月华,陈全家, 等. 大豆gma-miR1510a生物信息学分析及人工microRNA植物表达载体构建[J]. 大豆科学, 2016, 35(2): 239-244. (Ni Z Y, Yu Y H, Chen Q J, et al. Bioinformatics analysis of soybean gma-miR1510a and construction of artificial microRNA expression vectors [J]. Soybean Science, 2016, 35(2): 239-244.)[11]王颖, 李金, 王磊, 等. 基于机器学习的microRNA预测方法研究进展[J].计算机科学,2015,42(2):7-13.(Wang Y, Li J, Wang L, et al. Research progress of microRNA prediction method based on machine learning [J]. Computer Science, 2015, 42(2): 7-13.)[12]Jiang P, Wu H, Wang W, et al. MiPred:Classification of real and pseudo MicroRNAs precursors using random forest prediction model with combined features [J]. Nucleic Acids Research, 2007, 35: 339-343.[13]Huang K Y, Lee T Y, Teng Y C, et al. ViralmiR: A support-vector-machine-based method for predicting viral microRNA precursors[J]. BMC Bioinformatics, 2015,16(1): 1-7.[14]Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning, 2002, 46(1-3): 389-422.[15]Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.[16]吴辰文,梁靖涵,王伟,等.基于递归特征消除方法的随机森林算法[J].统计与决策,2017(21):60-63.(Wu C W, Liang J H, Wang W, et al. Random forest algorithm based on recursive feature elimination [J]. Statistics and Decision Making, 2017(21): 60-63.)[17]刘笑笑. 基于RF-RFE算法的森林生物量遥感特征选择方法研究[D]. 泰安: 山东农业大学,2016. (Liu X X. Research on forest biomass remote sensing feature selection based on RF-RFE algorithm[D]. Taian: Shandong Agricultural University,2016.)[18]魏小敏,徐彬,关佶红.基于递归特征消除法的蛋白质能量热点预测[J].山东大学学报(工学版), 2014,44(2):12-20. (Wei X M, Xu B, Guan J H. Prediction of protein energy hotspots based on recursive feature elimination[J]. Journal of Shandong University (Engineering Science Edition),2014,44(2):12-20.)[19]董红斌, 石丽, 李涛.一种改进的microRNA预测模型集成方法[J].计算机科学,2018,45(2): 69-75.(Dong H B, Shi L, Li T. An improved integrated method for microRNA prediction model[J]. Computer Science, 2008,45(2):69-75.)[20]林云光.基于计算智能方法的microRNA预测[D]. 济南: 济南大学, 2013.(Lin Y G. MicroRNA prediction based on computational intelligence[D].Jinan: Jinan University ,2013.)[21]张璇.基于生物异构网络的疾病microRNA预测研究[D]. 厦门: 厦门大学,2017. (Zhang X. Prediction of disease microRNA based on biological heterogeneous network[D]. Xiamen: Xiamen University,2017.)
相似文献/References:
[1]刘章雄,李卫东,孙石,等.1983~2010年北京大豆育成品种的亲本地理来源及其遗传贡献[J].大豆科学,2013,32(01):1.[doi:10.3969/j.issn.1000-9841.2013.01.002]
LIU Zhang-xiong,LI Wei-dong,SUN Shi,et al.Geographical Sources of Germplasm and Their Nuclear Contribution to Soybean Cultivars Released during 1983 to 2010 in Beijing[J].Soybean Science,2013,32(03):1.[doi:10.3969/j.issn.1000-9841.2013.01.002]
[2]李彩云,余永亮,杨红旗,等.大豆脂质转运蛋白基因GmLTP3的特征分析[J].大豆科学,2013,32(01):8.[doi:10.3969/j.issn.1000-9841.2013.01.003]
LI Cai-yun,YU Yong-liang,YANG Hong-qi,et al.Characteristics of a Lipid-transfer Protein Gene GmLTP3 in Glycine max[J].Soybean Science,2013,32(03):8.[doi:10.3969/j.issn.1000-9841.2013.01.003]
[3]王明霞,崔晓霞,薛晨晨,等.大豆耐盐基因GmHAL3a的克隆及RNAi载体的构建[J].大豆科学,2013,32(01):12.[doi:10.3969/j.issn.1000-9841.2013.01.004]
WANG Ming-xia,CUI Xiao-xia,XUE Chen-chen,et al.Cloning of Halotolerance 3 Gene and Construction of Its RNAi Vector in Soybean (Glycine max)[J].Soybean Science,2013,32(03):12.[doi:10.3969/j.issn.1000-9841.2013.01.004]
[4]张春宝,李玉秋,彭宝,等.线粒体ISSR与SCAR标记鉴定大豆细胞质雄性不育系与保持系[J].大豆科学,2013,32(01):19.[doi:10.3969/j.issn.1000-9841.2013.01.005]
ZHANG Chun-bao,LI Yu-qiu,PENG Bao,et al.Identification of Soybean Cytoplasmic Male Sterile Line and Maintainer Line with Mitochondrial ISSR and SCAR Markers[J].Soybean Science,2013,32(03):19.[doi:10.3969/j.issn.1000-9841.2013.01.005]
[5]卢清瑶,赵琳,李冬梅,等.RAV基因对拟南芥和大豆不定芽再生的影响[J].大豆科学,2013,32(01):23.[doi:10.3969/j.issn.1000-9841.2013.01.006]
LU Qing-yao,ZHAO Lin,LI Dong-mei,et al.Effects of RAV gene on Shoot Regeneration of Arabidopsis and Soybean[J].Soybean Science,2013,32(03):23.[doi:10.3969/j.issn.1000-9841.2013.01.006]
[6]杜景红,刘丽君.大豆fad3c基因沉默载体的构建[J].大豆科学,2013,32(01):28.[doi:10.3969/j.issn.1000-9841.2013.01.007]
DU Jing-hong,LIU Li-jun.Construction of fad3c Gene Silencing Vector in Soybean[J].Soybean Science,2013,32(03):28.[doi:10.3969/j.issn.1000-9841.2013.01.007]
[7]张力伟,樊颖伦,牛腾飞,等.大豆“冀黄13”突变体筛选及突变体库的建立[J].大豆科学,2013,32(01):33.[doi:10.3969/j.issn.1000-9841.2013.01.008]
ZHANG Li-wei,FAN Ying-lun,NIU Teng-fei?,et al.Screening of Mutants and Construction of Mutant Population for Soybean Cultivar "Jihuang13”[J].Soybean Science,2013,32(03):33.[doi:10.3969/j.issn.1000-9841.2013.01.008]
[8]盖江南,张彬彬,吴瑶,等.大豆不定胚悬浮培养基因型筛选及基因枪遗传转化的研究[J].大豆科学,2013,32(01):38.[doi:10.3969/j.issn.1000-9841.2013.01.009]
GAI Jiang-nan,ZHANG Bin-bin,WU Yao,et al.Screening of Soybean Genotypes Suitable for Suspension Culture with Adventitious Embryos and Genetic Transformation by Particle Bombardment[J].Soybean Science,2013,32(03):38.[doi:10.3969/j.issn.1000-9841.2013.01.009]
[9]王鹏飞,刘丽君,唐晓飞,等.适于体细胞胚发生的大豆基因型筛选[J].大豆科学,2013,32(01):43.[doi:10.3969/j.issn.1000-9841.2013.01.010]
WANG Peng-fei,LIU Li-jun,TANG Xiao-fei,et al.Screening of Soybean Genotypes Suitable for Somatic Embryogenesis[J].Soybean Science,2013,32(03):43.[doi:10.3969/j.issn.1000-9841.2013.01.010]
[10]刘德兴,年海,杨存义,等.耐酸铝大豆品种资源的筛选与鉴定[J].大豆科学,2013,32(01):46.[doi:10.3969/j.issn.1000-9841.2013.01.011]
LIU De-xing,NIAN Hai,YANG Cun-yi,et al.Screening and Identifying Soybean Germplasm Tolerant to Acid Aluminum[J].Soybean Science,2013,32(03):46.[doi:10.3969/j.issn.1000-9841.2013.01.011]
备注/Memo