Traditional Culture Encyclopedia - Weather forecast - Nature selected 10 computer code projects that changed science.
Nature selected 10 computer code projects that changed science.
In 20 19, the horizon telescope team let the world see the black hole for the first time. However, the image of this luminous ring object released by the researchers is not a traditional picture, but is obtained by calculation. Using the data obtained by radio telescopes in the United States, Mexico, Chile, Spain and Antarctica, the researchers made mathematical transformations and finally synthesized this iconic picture. The research team also released the programming code used to realize this feat, and wrote an article to record this discovery, on which other researchers can further analyze.
This model is becoming more and more common. From astronomy to zoology, computers are involved in every major scientific discovery in modern times. Michael levitt, a computational biologist at Stanford University in the United States, shared the 20 13 Nobel Prize in chemistry with two other researchers for "creating a multi-scale model of a complex chemical system". He pointed out that the memory and clock speed of today's notebook computer is 10000 times that of the computer made in the laboratory when he won the prize in 1967. "We do have considerable computing power today," he said. "The problem is that we still need to think."
Without software that can solve research problems and researchers who know how to write and use software, a computer, no matter how powerful, is useless. Today's scientific research has been fundamentally linked with computer software, which has penetrated into all aspects of research work. Recently, Nature turned its attention behind the scenes, focusing on the key computer codes that changed scientific research in the past few decades, and listed 10 key computer projects.
This CDC 3600 computer was delivered to the National Center for Atmospheric Research in Boulder, Colorado in 1963, and the researchers programmed it with the help of Fortran compiler.
Pioneer of Language: Fortran Compiler (1957)
The first modern computer is not easy to operate. The programming at that time was actually realized by manually connecting wires into rows of circuits. Later, machine language and assembly language appeared, allowing users to program computers with codes. However, both languages need a deep understanding of computer architecture, which makes it difficult for many scientists to master.
In 1950s, with the development of symbolic languages, especially Fortran, a formula translation language developed by john balks and his IBM team in San Jose, California, this situation changed. With Fortran, users can program with human-readable instructions, such as x = 3+5. Then the compiler converts these instructions into fast and efficient machine code.
However, this process is still not easy. Early programmers used punched cards to input codes, while complex simulations may require tens of thousands of punched cards. Nevertheless, Syukuro Manabe, a climatologist at Princeton University in New Jersey, points out that Fortran enables non-computer scientists to program. "This is the first time that we can program the computer ourselves." The climate model developed by him and his colleagues in this language is one of the earliest successful models.
Fortran has developed into the eighth decade, and is still widely used in climate modeling, fluid mechanics, computational chemistry and other disciplines, which all involve complex linear algebra and require powerful computers to process numbers quickly. Fortran generates code quickly, and now many programmers know how to write it. The ancient Fortran code base is still active in laboratories and supercomputers all over the world. "Previous programmers knew what they were doing," said Frank Giraldo, an applied mathematician and climate modeler at the US Naval Research Institute. "They attach great importance to memory because they have poor memory."
Signal processor: fast Fourier transform (1965)
When radio astronomers scan the sky, they capture complex signal noise that changes with time. In order to understand the nature of these radio waves, they need to see what these signals are like as a function of frequency. A mathematical process called "Fourier transform" can help researchers, but it is inefficient. For a data set of size n, n 2 calculations are needed.
1965, American mathematicians James Cooley and John Duck came up with a method to speed up this process. Fast Fourier transform (FFT) simplifies the calculation problem of Fourier transform into N log2(N) steps by recursion (a programming method to solve the problem by repeatedly decomposing the problem into similar subproblems). As n increases, the speed will also increase. For 1000 point, the speed is increased by about1000 times; 1 10,000 points is 50,000 times.
This "discovery" is actually a rediscovery, because the German mathematician Gauss studied it in 1805, but he never published it. James Cooley and John Duck did it. They initiated the application of Fourier transform in digital signal processing, image analysis, structural biology and other fields, and became one of the major events in applied mathematics and engineering. FFT has been used many times in code. In recent years, a popular scheme is FFTW, which is considered as the fastest FFT in the world.
Paul adams, director of the Department of Molecular Biophysics and Integrated Bioimaging at Lawrence Berkeley National Laboratory in California, recalled that when he improved the structure of bacterial protein gel in 1995, it would take "many hours or even days" to calculate it even with FFT and supercomputer. "If I try to do this without FFT, I don't know how to do it in reality," he said. "It may take a long time."
Molecular Cataloguing: Biological Database (1965)
Databases are an indispensable part of today's scientific research, so it is easy to forget that they are also driven by software. In the past few decades, the scale of database resources has expanded rapidly, affecting many fields, but perhaps no field has changed more dramatically than biology.
The protein database has more than 6,543,800+7,000 files of molecular structure, including "expressing me" of this bacterium, and its function is to combine the process of RNA and protein synthesis.
Today, the huge genome and protein database used by scientists stems from the work of American physical chemist Margaret Dehoff, who is also a pioneer in the field of bioinformatics. In the early 1960s, when biologists tried to sort out protein's amino acid sequence, Dehoff began to sort out this information to find clues about the evolutionary relationship between different species. 1965, she and three co-authors published the sequence and structure map of protein, describing the sequence, structure and similarity of 65 species of protein known at that time. Historian Bruno strasser wrote in 20 10 that this is the first data set that has nothing to do with a specific research problem. It encodes data in punched cards, which makes it possible to expand the database and search.
Other "computerized" biological databases followed. The protein database was put into use in 197 1, and now more than 70,000 macromolecular structures of1are recorded in detail. Russell doolittle, an evolutionary biologist at the University of California, San Diego, created another protein database named Newat in 198 1. 1982, the National Institutes of Health (NIH) cooperated with many institutions to establish GenBank database, which is an open access DNA sequence database.
These database resources proved their existence value in July 1983. At that time, the team led by biochemist Michael Waterfield from protein of Imperial Cancer Research Foundation in London and doolittle's team independently reported the similarity between a special human growth factor sequence and protein, a virus that causes cancer in monkeys. The observation results show a mechanism of virus-induced tumor-by imitating a growth factor, virus can induce uncontrolled growth of cells. James Austell, former director of the National Center for Biotechnology Information (NCBI), said: "This result has inspired some biologists who are not interested in computers and statistics: we can learn about cancer by comparing sequences."
Austell also said that this discovery marks the "arrival of objective biology". In addition to designing experiments to verify specific assumptions, researchers can also mine public data sets to find connections that those who actually collect data may never have thought of. When different data sets are connected together, this power will increase dramatically. For example, NCBI programmers achieved this through Entrez at 199 1; Entrez is a tool that allows researchers to freely search and compare DNA, protein and documents.
Person in charge of forecast: general circulation model (1969)
At the end of World War II, john von neumann, a computer pioneer, began to turn the computer used to calculate trajectory and weapon design a few years ago to the problem of weather prediction. Before that, "weather forecast is only empirical", that is, using experience and intuition to predict what will happen next. In contrast, von Neumann's team "tries to make numerical weather prediction according to the laws of physics".
Venkatramani Balaji, head of the modeling system department at the Geophysical Fluid Dynamics Laboratory of the National Oceanic and Atmospheric Administration (NOAA) in Princeton, New Jersey, said that people have been familiar with these equations for decades. But early meteorologists could not actually solve these problems. To do this, you need to enter the current conditions, calculate how they will change in a short time, and repeat them repeatedly. This process is very time-consuming, and it is impossible to complete the mathematical operation before the weather conditions really appear. 1922, mathematician Lewis Frye Richardson spent several months calculating the 6-hour forecast of Munich, Germany. According to a historical record, his results are "extremely inaccurate", including the prediction that "it is impossible under any known land conditions". Computers make this problem easy to solve.
In the late 1940s, von Neumann set up a weather forecasting group at the Institute for Advanced Studies in Princeton. 1955, the second team-Geophysical Fluid Dynamics Laboratory-started what he called "infinite prediction", namely climate modeling.
Shu Lang Zhenyou joined the climate modeling team at 1958 and began to study the atmospheric model. His colleague Kirk Bryan applied this model to ocean research. From 65438 to 0969, they successfully combined the two, and in 2006, they created what Nature magazine called a "milestone" in scientific computing.
Today's model can divide the earth's surface into 25 km and 25 km squares, and divide the atmosphere into dozens of layers. In contrast, the joint ocean-atmosphere model of Shu Lang and Bryan Maccotto divides the area into 500 square kilometers and divides the atmosphere into nine levels, covering only one sixth of the earth. Despite this, Venkatramani Balaji said that "this model is very good", enabling the research team to predict the impact of rising carbon dioxide content through computers for the first time.
Digital calculator: BLAS( 1979)
Scientific calculation usually involves relatively simple mathematical operations using vectors and matrices, but there are too many such vectors and matrices. However, in the 1970s, there was no universally recognized computing tool to perform these operations. Therefore, programmers who are engaged in scientific work will spend their time designing efficient codes to perform basic mathematical operations instead of focusing on scientific problems.
Cray- 1 supercomputer at Lawrence Livermore National Laboratory in California. Before 1979 BLAS programming tool came out, there was no linear algebra standard for researchers to work on Cray- 1 supercomputer.
The programming world needs a standard. 1979 saw the emergence of such a standard: Basic Linear Algebra Subroutine (BLAS). This is an application programming interface (API) standard, which is used to standardize the publication of numerical libraries for basic linear algebraic operations, such as vector or matrix multiplication. The standard has been developed to 1990, which defines dozens of basic routines for vector mathematics and later matrix mathematics.
Jack Dongarra, a computer scientist at the University of Tennessee and a member of the BLAS development team, said that in fact, BLAS simplified matrix and vector mathematics into basic calculation units like addition and subtraction.
Robert van de Geijn, a computer scientist at the University of Texas at Austin, pointed out that BLAS "may be the most important interface defined for scientific computing". In addition to providing standardized names for commonly used functions, researchers can also ensure that BLAS-based code works in the same way on any computer. The standard also enables computer manufacturers to optimize the installation and activation of BLAS for quick operation on their hardware.
For more than 40 years, BLAS represents the core of the scientific computing stack, that is, the code that makes scientific software run. Lauraine Barba, a mechanical and aerospace engineer at George Washington University in the United States, called it "the machinery in the five-layer code". Jack Dongarra said, "It provides the basic structure for our calculation."
Microscope: NIH image (1987)
In the early 1980s, the programmer Wayne Rasband worked in the brain imaging laboratory of the National Institutes of Health in Bethesda, Maryland. There is a scanner in the laboratory, which can digitize X-rays, but it cannot be displayed or analyzed on the computer. To this end, Rasband wrote a program.
This program is specially designed for a small PDP- 1 1 computer with a value of 15000 dollars. This computer is installed on a shelf, which is obviously not suitable for personal use. Then in 1987, Apple released Macintosh II, which was a more friendly and affordable choice. Rasband said: "In my opinion, this is obviously a better laboratory image analysis system." He transferred the software to a new platform, renamed it, and established an image analysis ecosystem.
NIH Image and its subsequent versions enable researchers to view and quantify almost any image on any computer. The software series includes ImageJ, a Java-based version written by Rasband for Windows and Linux users; And the distributed version of Fiji, ImageJ, developed by Pavel Tomancak team of Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany, including key plug-ins. "ImageJ is undoubtedly the most basic tool we have," said Beth Cemini, a computational biologist on the imaging platform of the Braud Institute (founded by MIT and Harvard University). "I have never talked to a biologist who has used a microscope but never used ImageJ or Fiji."
Rasband said that part of the reason may be that these tools are free. But Kevin Eliceiri, a biomedical engineer at the University of Wisconsin-Madison, points out that another reason is that users can easily customize tools according to their own needs. Since Rasband retired, Kevin Eliceiri's team has been leading the development of ImageJ. ImageJ provides a seemingly simple and minimalist user interface, which has remained basically unchanged since the 1990s. However, due to its built-in macro recorder (allowing users to save the workflow by recording the sequence of mouse clicks and menu selections), extensive file format compatibility and flexible plug-in architecture, the tool has unlimited scalability. Curtis Ruden, the programming director of the team, said that "hundreds of people" contributed plug-ins to ImageJ. These newly added functions greatly expand researchers' tool sets, such as tracking objects in videos or automatically identifying cells.
Kevin Eliceiri said: "The purpose of this program is not to do everything or end everything, but to serve the users' goals. Unlike Photoshop and other programs, ImageJ can be anything you want. "
Sequence searcher: BLAST (1990)
Perhaps nothing illustrates cultural relevance better than turning the software name into a verb. When it comes to search, you will think of Google; When it comes to genetics, researchers immediately think of BLAST.
By substitution, deletion, deletion and rearrangement, organisms etch evolutionary changes into molecular sequences. Finding the similarity between sequences, especially between protein, will enable researchers to discover evolutionary relationships and have a deeper understanding of gene functions. In the rapidly expanding molecular information base, it is not easy to be fast and accurate.
Margaret Dehoff provided key progress in 1978. She designed a "point acceptance mutation" matrix, so that researchers can evaluate the genetic relationship of two protein sequences not only according to their similarity, but also according to their evolutionary distance.
In 1985, William Pearson of the University of Virginia and David Leigh Pullman of NCBI introduced FASTP, which is an algorithm that combines the Dehoff matrix and fast search capability.
A few years later, Leapman, together with Warren Kish and Stephen Atshur of NCBI, Weber Miller of Pennsylvania State University and Gene Myers of Arizona University, developed a more powerful and improved technology: blast (Basic Local Comparison Search Tool). BLAST, published in 1990, combines the search speed needed to deal with the rapidly growing database with the ability to extract longer-distance matching results in evolution. At the same time, the tool can also calculate the probability of these matches.
Atshur said that the calculation results will come out soon. "You can enter the search content, have a sip of coffee, and the search is finished." But more importantly, BLAST is easy to use. In an era of updating the database by email, Warren Gish established an email system, and later established a network-based architecture, allowing users to remotely run searches on NCBI computers, thus ensuring that the search results are always up-to-date.
Sean Eddie, a computational biologist at Harvard University, said that the BLAST system provided a revolutionary tool for the field of genome biology at that time, that is, a method to find out the possible functions of unknown genes according to related genes. It also provides a novel verb for sequencing laboratories all over the world. "This is one of many examples of nouns becoming verbs," Eddie said. "You'll say you're going to blow up your sequence."
Preprint platform: arXiv.org( 199 1 year)
In the late 1980s, high-energy physicists often sent copies of their papers and manuscripts to their peers for comments-but only to a few people. Physicist Paul Kingspug wrote in 20 17: "People at the lower position in the food chain depend on the achievements of front-line researchers, while aspiring researchers in non-elite institutions are often outside the privileged circle."
199 1 year, Ginsberg, who worked in Los Alamos National Laboratory in New Mexico at that time, wrote an automatic email reply program, hoping to establish a level playing field. Subscribers receive a pre-printed list every day, and each article is associated with an article identifier. With just one email, users all over the world can submit or retrieve papers from the computer system of the laboratory, get a list of new papers, or search by author or title.
Ginsburg's plan is to keep the paper for three months, and the content is limited to the field of high-energy physics. But a colleague advised him to keep these articles indefinitely. He said: "At that moment, it changed from a bulletin board to an archives." As a result, papers began to flood in from all fields. 1993, Ginsburg migrated this system to the internet, and named it arXiv.org in 1998, which is still in use today.
ArXiv was founded nearly 30 years ago, with a pre-printed copy of about 6.5438+0.8 million, all of which were provided free of charge. More than 6.5438+0.5 million papers were submitted every month, with 30 million downloads. 10 years ago, the editor of Nature Photonics wrote when commenting on the 20th anniversary of the founding of arXiv: "It is not difficult to see why arXiv's service is so popular. This system allows researchers to display their work quickly and conveniently, while avoiding the trouble and time cost when submitting traditional peer-reviewed journals. "
The success of arXiv website has also promoted the prosperity of similar preprint websites in biology, medicine, sociology and other disciplines. This influence can be seen in tens of thousands of preprints about COVID-19 that have been published today. "It's nice to see that a method that was considered heretical 30 years ago outside the field of particle physics is now generally regarded as bland and natural," Ginsberg said. "In this sense, it is like a successful research project."
Data browser: Ipython notebook (year 20 1 1)
In 200 1 year, fernando peres was a graduate student who wanted to "find procrastination", and he decided to adopt a core component of Python.
Python is an interpreted language, which means that the program is executed line by line. Programmers can use a computing call and response tool called "Read-Evaluate-Print Loop" (REPL for short), enter code in it, and then the interpreter executes the code. REPL allows rapid exploration and iteration, but Perez points out that Python's REPL was not built for scientific purposes. For example, it does not allow users to conveniently preload code modules or turn on data visualization. Therefore, Perez himself wrote another version.
The result is the birth of IPython, an "interactive" Python interpreter introduced by Perez in February 2006 5438+0 65438+. * * There are 259 lines of code. 10 years later, Perez cooperated with physicist Brian Granger and mathematician Evan Patterson, migrated the tool to a web browser, and launched IPython Notebook, which started a revolution in data science.
Like other computing notebooks, IPython Notebook combines code, results, graphics and text into one document. But unlike other similar projects, IPython Notebook is open source and has invited a large number of developer communities to participate. It supports Python, which is a very popular language for scientists. In 20 14, IPython evolved into Jupyter, supporting about 100 languages, allowing users to explore data on remote supercomputers as easily as on their own laptops.
Nature wrote in 20 18: "Jupyter has actually become a standard for data scientists." There were 2.5 million Jupyter notebooks at that time; On GitHub code sharing platform; Today, this number has developed to100000, which played an important role in the discovery of gravitational waves in 20 16 and the imaging of black holes in 20 19. Perez said: "We have made a small contribution to these projects, which is very worthwhile."
Fast learner: AlexNet(20 12)
There are two types of artificial intelligence. One is to use coding rules, and the other is to let the computer "learn" by simulating the neural structure of the brain. Jeffrey, a computer scientist at the University of Toronto, Canada? Hinton said that for decades, artificial intelligence researchers have always regarded the latter as "nonsense." However, in 20 12, his graduate students Alex Cresset and Ilya Suzkwei proved that this was not the case.
In the annual ImageNet competition, researchers were asked to train artificial intelligence in a database containing 6.5438+0 million images of everyday objects, and then test the generated algorithm on a single image set. Hinton said that the best algorithm at that time wrongly classified about a quarter of the images. Creaser and Su Zikewei's AlexNet is a "deep learning" algorithm based on neural network, which reduces the error rate to 16%. Hinton said: "We basically halved the error rate, or almost halved it."
Hinton also pointed out that the team's success in 20 12 reflects the combination of large enough training data set, excellent programming and powerful capabilities of emerging graphics processing units. Graphics processing unit is a kind of processor, originally designed to accelerate computer video performance. "Suddenly, we can speed up [the algorithm] by 30 times," he said. "Or, we can learn as much as 30 times the data."
The real algorithm breakthrough actually happened three years ago, when Hinton's lab created a neural network, which can recognize speech more accurately than the traditional artificial intelligence that has been improved for decades. "Just a little better," Hinton said, "but it already shows something."
These successes indicate the rise of deep learning in laboratory research, clinical medicine and other fields. Through deep learning of artificial intelligence, mobile phones can understand voice queries, and image analysis tools can easily identify cells in micrographs; This is why AlexNet will become one of the tools to fundamentally change science and the world. (Ren Tian)
- Previous article:Describe the cold weather
- Next article:What's the temperature in Shenzhen in winter? Is winter long?
- Related articles
- What moistureproof preparations should be made in rainy season?
- Winter in Xianning
- What are the seasonal construction measures for the shantytown-to-community project?
- "He who speaks with a thousand thoughts will gain something".
- Summary of 60 Humorous Sentences of Farmhouse Courtyard Barbecue
- What do you mean by asking him what he is sad about?
- Poetry related to the weather
- God, how do you pronounce pinyin?
- Why does the weather forecast show the name of the person who bought the mobile phone?
- Software information of Yahoo weather