COMAND: Context-Aware Machine Learning Middleware for Real-Time Distributed Streaming Big Data Analysis
Emerging applications and instrumental systems produce massive streaming data every day. It is essential that the big streaming data be quickly and accurately processed and analyzed to extract useful information for timely decision making. Despite substantial work being proposed for streaming data processing and analysis, e.g., SAMOA, Jubatus, etc., many shortcomings have not been well addressed. InfoBeyond advocates Context-aware Machine Learning Middleware for Real-time Distributed Streaming Big Data Analysis (COMAND) to address the challenges of distributed massive streaming data analysis. COMAND includes machine learning (ML) based algorithms that are designed to facilitate its implementation and integration with the existing infrastructures and algorithms. The operation is implemented via a two-stage operational architecture for efficient and accurate processing of massive streaming data. We will first design and develop a middleware architecture that enables efficient and scalable capabilities in distributed environments. Next, we will analyze and develop three key algorithms, namely, Distributed Streaming Data Clustering (DISDC), Distributed Optimal Context-Aware Data Classification (DOCDC), and Real-time Publish/Subscribe Data Model Update (RPSDMU) for the middleware. The system is wrapped by APIs and Adapters for facilitating the implementation and integration of COMAND with the existing systems.