International Journal of Distributed and Parallel Systems (IJDPS) Vol.8, No.3, May2017
17
[21]S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, W. S. Cleveland: “Large Complex Data: Divide and
Recombine (D&R) with RHIPE”, Stat, 2012, to appear.
[22]Sergei V Chekanov: Scientific data analysis using Jython Scripting and Java. Springer Science &
Business Media, 2010.
[23]Dierk Koenig, Andrew Glover, Paul King, Guillaume Laforge, Jon Skeet: Groovy in action. Manning,
2007.
[24]J Uma Mahesh, R Madan Mohan: “Real World Application of Big Data in Data Mining Tools”.
[25]Aditya B Patel, Manashvi Birla, Ushma Nair: “Addressing big data problem using Hadoop and Map
Reduce”, Engineering (NUiCONE), 2012 Nirma University International Conference on, pp. 1—5,
2012.
[26]Brian Babcock, Mayur Datar, Rajeev Motwani: “Sampling from a moving window over streaming
data”, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, pp.
633—634, 2002.
[27]Nele Verbiest, Enislay Ramentol, Chris Cornelis, Francisco Herrera: “Preprocessing noisy
imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection”, Applied Soft
Computing, pp. 511—517, 2014.
[28]Guohua Liang, Chengqi Zhang: “An efficient and simple under-sampling technique for imbalanced
time series classification”, Proceedings of the 21st ACM international conference on Information and
knowledge management, pp. 2339—2342, 2012.
[29]Wei Pan, Kun She, Pengyuan Wei, Kai Zeng: NearestNeighbor Condensation Based on Fuzzy Rough
Set for Classification in Rough Sets and Knowledge Technology. Springer, 2014.
[30]Yufeng Ding, Jeffrey S Simonoff: “An investigation of missing data methods for classification trees
applied to binary response data”, Journal of Machine Learning Research, pp. 131—170, 2010.
[31]Qiang Yang, Charles Ling, Xiaoyong Chai, Rong Pan: “Test-cost sensitive classification on data with
missing values”, IEEE Transactions on Knowledge and Data Engineering, pp. 626—638, 2006.
[32]Senzhang Wang, Zhoujun Li, Wenhan Chao, Qinghua Cao: “Applying adaptive over-sampling
technique based on data density and cost-sensitive SVM to imbalanced learning”, Neural Networks
(IJCNN), The 2012 International Joint Conference on, pp. 1—8, 2012.
[33]Dragan Gamberger,Nada Lavrač,Sašo Džeroski: “Noise elimination in inductive concept learning: A
case study in medical diagnosis”, International Workshop on Algorithmic Learning Theory, pp. 199—
212, 1996.
[34]Dragan Gamberger, Nada Lavrac, Ciril Groselj: “Experiments with noise filtering in a medical
domain”, ICML, pp. 143—151, 1999.
[35]Sofie Verbaeten, Anneleen Van Assche: “Ensemble methods for noise elimination in classification
problems”, International Workshop on Multiple Classifier Systems, pp.317—325, 2003.
[36]Rehan Akbani, Stephen Kwek, Nathalie Japkowicz: “Applying support vector machines to
imbalanced datasets”, European conference on machine learning, pp. 39—50, 2004.
[37]Yuchun Tang, Yan-Qing Zhang, Nitesh V Chawla, Sven Krasser: “SVMsmodeling for highly
imbalanced classification”, IEEE Transactions on Systems, Man, and Cybernetics, Part B
(Cybernetics), pp. 281—288, 2009.
[38]Konstantinos Veropoulos, Colin Campbell, Nello Cristianini, others: “Controlling the sensitivity of
support vector machines”, Proceedings of the international joint conference on AI, pp. 55—60, 1999.
[39]Markus Hofmann, Ralf Klinkenberg: RapidMiner: Data mining use cases and business analytics
applications. CRC Press, 2013.
[40]Roland Bouman, Jos Van Dongen: Pentaho solutions: business intelligence and data warehousing
with Pentaho and MySQL. Wiley Publishing, 2009.
[41]Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey
Fox: “Twister: a runtime for iterative mapreduce”, Proceedings of the 19th ACM International
Symposium on High Performance Distributed Computing, pp. 810—818, 2010.
[42]Sangwon Seo, Edward J Yoon, Jaehong Kim, Seongwook Jin, Jin-Soo Kim, Seungryoul Maeng:
“Hama: An efficient matrix computation with themapreduce framework”, Cloud Computing
Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pp. 721—726,
2010.
[43]Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, Joseph
Hellerstein: “Graphlab: Anew framework for parallel machine learning”, arXiv preprint
arXiv:1408.2041, 2014.