Traditional Culture Encyclopedia - Hotel accommodation - A record of six major technological changes in China’s big data

A record of six major technological changes in China’s big data

The Six Major Technological Changes of Big Data in China_Data Analyst Examination

A collection of the essence of "Hadoop China Cloud Computing Conference" and "CSDN Big Data Technology Conference", all previous China The Big Data Technology Conference (BDTC) has developed into the de facto top technology event in the industry in China. From the 60-person Hadoop salon in 2008 to the current technology feast for thousands of people, as a professional exchange platform with great practical value in the industry, each China Big Data Technology Conference faithfully depicts the technical hot spots in the field of big data. He has accumulated practical experience in the industry and witnessed the development and evolution of the entire big data ecosystem technology.

On December 12-14, 2014, the 2014 China Big Data Technology Conference ( Big Data Technology Conference 2014, BDTC 2014) will kick off at the Crowne Plaza Hotel Beijing New Yunnan. The conference will last for three days. With the purpose of promoting the development of big data technology in industry applications, it is planned to establish "big data infrastructure", "big data ecosystem", "big data technology", "big data application" and "big data Internet". Financial Technology", "Intelligent Information Processing" and other theme forums and industry summits. The "2014 Second CCF Big Data Academic Conference" hosted by the China Computer Federation, hosted by the CCF Big Data Expert Committee, and co-organized by Nanjing University and Fudan University will also be held at the same time, and the keynote report will be shared with the technology conference.

This conference will invite nearly 100 top foreign experts and front-line practitioners in the field of big data technology to discuss in depth the latest progress of open source software such as Hadoop, YARN, Spark, Tez, HBase, Kafka, OceanBase, and NoSQL /The development trend of NewSQL, in-memory computing, stream computing and graph computing technology, the OpenStack ecosystem’s thinking on big data computing needs, and the latest industry applications of visualization, machine learning/deep learning, business intelligence, data analysis, etc. under big data , sharing technical features and practical experience in actual production systems.

In the early stage of the conference, we specially sorted out the highlights of previous conferences to record the development process of China's big data technology field, and looked forward to the upcoming BDTC 2014 based on the current status of the ecosystem:

Follow up In this trace, learn about the six major technological changes of big data

With the development of the big data technology conference, we have experienced the arrival of the era of big data technology and application in China, and also witnessed the development of the entire big data ecosystem technology And evolution:

1. Distribution of computing resources—from grid computing to cloud computing. Looking back at previous BDTC conferences, it is not difficult to find that since 2009, the organization and scheduling of resources has gradually transformed from cross-domain distributed grid computing to locally distributed cloud computing. Today, cloud computing has become the only platform for big data resource protection.

2. Data storage changes—HDFS and NoSQL emerged as the times require. As data formats become more and more diverse, traditional relational storage can no longer meet the application needs of the new era. New technologies such as HDFS and NoSQL have emerged and become an indispensable part of many large-scale application architectures. With the development of customized computers/servers, it has also become one of the hottest technologies in the big data ecosystem.

3. The computing model changes - Hadoop computing becomes mainstream. In order to support its search service better and more cheaply, Google created Map/Reduce and GFS. Inspired by the Google paper, former Yahoo engineer Doug Cutting created a Hadoop software ecosystem that was completely different from the high-performance computing model and moved computing closer to data. Hadoop is inherently noble and has become the most "Hot" open source project of the Apache Foundation. It is also recognized as the de facto standard for big data processing. Hadoop provides massive data processing capabilities in a distributed environment at low cost. Therefore, Hadoop technology discussion and practice sharing have always been one of the most eye-catching features of all previous China Big Data Technology Conferences.

4. Introduction of stream computing technology - to meet the low-latency data processing needs of applications. As business needs expand, big data gradually moves out of the category of offline batch processing. Stream processing frameworks such as Storm and Kafka that fully demonstrate real-time, scalability, fault tolerance, and flexibility have allowed the old message middleware technology to be reborn. It has become a beautiful scenery in previous BDTCs.

5. In-memory computing is taking shape—upstart Spark dares to challenge the veteran. Spark originated from the cluster computing platform of AMPLab at the University of California, Berkeley. It is based on in-memory computing, starting from multi-iteration batch processing, and is compatible with multiple computing paradigms such as data warehouses, stream processing, and graph computing. It is a rare all-rounder. In just 4 years, Spark has developed into a top-level project of the Apache Software Foundation, with 30 Committers. Its users include IBM, Amazon, Yahoo!, Sohu, Baidu, Alibaba, Tencent and many other well-known companies, including Spark SQL, Spark Streaming, MLlib, GraphX ??and many other related projects. There's no doubt that Spark has found its footing.

6. Evolution of relational database technology—NewSQL rewrites database history. The research and development of relational database systems has not stopped, and it is also making continuous progress in horizontal expansion, high availability and high performance. Practical applications have the most urgent demand for MPP (Massively Parallel Processing) databases for online analytical processing (OLAP), including MPP database learning and the adoption of new technologies in the field of big data, such as multi-copy technology, column storage technology, etc. Online transaction processing (OLTP)-oriented databases are evolving towards high performance, with the goal of high throughput and low latency. Technology development trends include full memory, lock-free, etc.

Based on sailing, look at the development of the big data ecosystem in 2014

Time flies, and in the blink of an eye, the 2014 China Big Data Technology Conference will be held as scheduled. As technology advances with each passing day, what insights can be gained from BDTC in 2014? Here we might as well focus on the current technology development trends:

1. MapReduce has become a decline, can YARN/Tez achieve greater success? For Hadoop, 2014 is an exciting year - EMC, Microsoft, Intel, Teradata, Cisco and many other giants have increased their investment in Hadoop. However, this year has not been easy for many organizations: based on MapReduce’s real-time shortcomings and organizations’ needs for a more general big data processing platform, the transformation to Hadoop 2.0 has become imperative. So, what kind of challenges will organizations encounter during transformation? How can organizations better take advantage of the new features brought by YARN? What major changes will there be in the future development of Hadoop? To this end, BDTC 2014 specially invited top international Hadoop experts such as Apache Hadoop committer, Apache Hadoop Project Management Committee (PMC) member Uma Maheswara Rao G, Apache Hadoop committer Yi Liu, Bikas Saha (PMC member of the Apache Hadoop and Tez), etc. We might as well discuss it face to face.

2. Times have changed, and the future of stream computing frameworks such as Storm and Kafka is uncertain. If the slowness of MapReduce has brought opportunities to many stream computing frameworks, then as the components of the Hadoop ecosystem become more mature and Spark becomes easier to use, what will greet these stream computing frameworks? Here we might as well have a side understanding based on the practice sharing of nearly a hundred sessions at BDTC 2014, or communicate face to face with experts.

3. Spark, is it a subversion or a supplement? Compatibility with the Hadoop ecosystem enables Spark to develop rapidly.

However, according to the sorting results recently announced by Sort Benchmark, in terms of sorting massive (100TB) offline data, compared to the previous champion Hadoop, Spark completed the same data in less than one-tenth of the machine and only used one-third of the time. Quantitative sorting. There is no doubt that Spark is no longer limited to real-time computing, and its goal is directly aimed at a general big data processing platform. It may be possible to terminate Shark and open Spark SQL. So, when Spark becomes more mature and supports offline computing more natively, who will receive the honor of the open source big data standard processing platform? Here we look forward to it together.

4. Infrastructure layer, what can we use to improve our network? Today, the network has become the target of many big data processing platforms. For example, in order to overcome network bottlenecks, Spark replaced the original NIO network module with a new Netty-based network module, thereby improving the utilization of network bandwidth. So, how can we overcome the bottleneck of the network at the infrastructure layer? How much performance improvement can directly use more efficient network equipment, such as Infiniband, bring? Building a smarter network that adaptively adjusts the data transfer requirements in the split/merge phase through each stage of computation, not only improves speed but also improves utilization. At BDTC 2014, we can learn valuable experience from Infiniband/RDMA technology and application speeches, as well as several SDN actual combats.

5. The soul of data mining—machine learning. In recent years, the competition for talents in the field of machine learning has become fierce. Google, IBM, Microsoft, Baidu, Alibaba, and Tencent have also invested more and more in the field of machine learning, including chip design, system structure (heterogeneous computing), All aspects of software systems, model algorithms and in-depth applications. Big data marks the arrival of a new era. PB data allows people to sit on the golden mountain. However, without intelligent algorithms and the soul of machine learning, the extraction of value will undoubtedly become a mirror image. At this conference, we have also prepared several machine learning-related sharing sessions for you, waiting for your participation.

In addition to technology sharing, the second CCF Big Data Academic Conference in 2014 will also be held at the same time, and the keynote report will be shared with the technology conference. By then, we will also be able to obtain many of the latest scientific research results from the academic field.

The above is the relevant content shared by the editor about the six major technological changes in China's big data. For more information, you can follow Global Ivy to share more dry information