MRS builds a reliable, secure, and easy-to-use O&M platform and provides storage and analysis capabilities for massive data, helping address enterprise data storage and processing demands. It enables you to quickly build a Hadoop, Hive, or Spark cluster to analyze massive data, build an HBase cluster for querying massive data in milliseconds, build a Kafka, Storm cluster to process data streams of a massive scale on a real-time basis. MRS supports the following scenarios:
- Analysis and computing of massive data: log analysis, Click stream analysis, Genomics, and recommender systems.
User can import the data to be analyzed from OBS or HDFS and submit an analysis job. The following job types are supported:
MapReduce job: Hadoop is supported. MapReduce is used to implement parallel computing of large data sets.
Spark/Spark SQL job: Spark is supported. Spark provides analysis and mining and iterative memory computing capabilities. Additionally, Spark SQL enables data to be queried and analyzed using SQL statements.
HQL job: Hive is supported. Hive is a data warehouse framework built on Hadoop. It provides storage of structured data using the HQL, a language like the SQL. Hive converts HQL statements to MapReduce or Spark tasks for querying and analyzing massive data stored in clusters.
- Processing of massive data in a quasi real-time manner: risk management, CDR query and so on.
HBase: HBase is a column-based distributed storage system that features high reliability, performance, and scalability. It is designed to eliminate the limitations of relational databases in processing massive data, and ensure quasi real-time data access to large tables.
In the scenario described above, there are two ways to access the cluster: One is used the native interface, the user application could access the cluster by native interface when the application is deployed in the same subnet with the cluster; The other is user the MRS console, the user could Submit job and management the file on HDFS.
- Provides a real-time distributed computing framework, analysis and computing massive data streams such as click streams and IoT data streams in a real-time manner.
User can write data streams to message queues such as Kafka. Storm can obtain real-time messages from Kafka, perform high-throughput and low-latency real-time computing, analyzing, querying, and collecting on a real-time platform, and export processed results to HDFS, OBS, or other databases, or directly push the result data to the user interface.