… There has always been some form of machine learning in materials science. Thermochemical software packages such as FactSage  could predict the phase diagrams of select compositions with some accuracy. Properties such as refractive indices, dielectric constants, yield strengths etc have been predicted using models fit through available training data. Refractive index, for example, can be known with over 90 % accuracy from just the chemical composition of the material . These results have been limited and cannot address the large picture queries that we have been pursuing so far.
Materials scientists have attempted to predict material properties through computational modelling, which attempts to accurately predict material properties over the select length and time scales from extensive data fed into a program. These range from molecular dynamics that model atoms and molecules, to finite element software, that model the movement of bridges when elephants pass over them.
These models run on complicated physics-driven algorithms that usually end in numerically solving some differential equation. Machine learning and AI-driven materials development build on these modelling attempts or exceed them by training a machine to learn materials’ behaviour from existing data. The machine might know some physics, but there is no ‘if this, then that’ type of logic. The AI learns how something behaves by observing a collection of similar items under somewhat similar circumstances. The true power of a materials AI would then be when the machine learns how something behaves by observing something dissimilar under dissimilar circumstances, using its knowledge of physics and chemistry.
A gene for a material?
In an early work, a team at Duke University created a ‘fingerprint’ for crystals — a visual representation of selected physical and chemical properties of a material — and used this to predict new superconductors . The algorithm worked well on test data and some of the predicted compositions have indeed been shown to be superconductive.
The idea of creating a unique representation — a gene — for any given material is a persistent idea in materials science today. The Materials Genome Initiative  set up by the US government in 2011 attempts to encourage AI-driven development of materials both in academia and industry. No suitable gene has so far been identified, and given how complicated material is, it is not quite sure if a gene will ever be found.
For example, lead zirconium titanate (PZT) is a very common piezoelectric — a ceramic that converts electrical energy to mechanical energy and vice versa. However, mixing lead, zirconium and titanium in the correct ratio does not always result in the same material with the same properties. The performance of PZT is affected by the grain size, grain shape, the presence of impurities, the presence of voids, domain shape and domain size, clamping effects to a substrate, the size of the material, the atmosphere at which it was sintered and it’s thermal/electrical history, to name a few.
How does one find a representation that captures all of this? Even if there is a representation, how do we know how much data to include?
The war of databases
Currently, there are no answers to this question. In particular, there is no consensus on what is the minimum amount of data required for a given property prediction. Further, large machine-readable property databases are mostly absent in this field. The data collected through decades of experimentation and theory are locked in texts and journal articles. Therefore, the first part of the AI effort is driven by data platform initiatives.
Two of the most well-known are the Materials Project  and the AFLOW library . Both are computational databases of materials calculated through density functional theory (DFT). The Materials Project has over a million materials in its database with band structures, piezoelectric, elastic properties etc. The database is fast growing and has over 50,000 users at the time of writing, mostly from the computational materials community.
While a necessary first step, computed (theoretically calculated) data of a material is not quite the same as the final empirically measured material property, due to reasons explained above. In addition to theoretical databases, we also need an empirical database, one that contains a curated list of materials and their experimental properties. A fully comprehensive database is not available yet, through initiatives such as Matmatch are a step in that direction.
Some initiatives come from natural language processing of scientific texts with tools such as ChemDataExtractor  that can identify chemical formulas and property relations from text. This has been used to automatically extract the magnetic properties of selected inorganics from collections of text . Other approaches, specially related to material synthesis, rely on text mining in conjunction with synthetic data generation such as by using variational autoencoders .
Still, the extraction of semantic property-processing-performance relations is still in its infancy and might be the single biggest bottleneck in the progress to a materials AI.
Deep learning in materials
Deep learning, when applied to images, is automating such labour-intensive tasks as the identification of defects in electron microscopy images and the reading of X-ray diffraction spectra to label phases. A group at the Oak Ridge National Laboratory has demonstrated how a convolutional neural network (CNN) could be trained to identify vacancies in a transmission electron microscopy image . Other groups have demonstrated how CNNs can pick up subtle features in images such as particle size distributions and grain orientations to classify materials accurately based on their microstructures .
At another level, deep learning is simplifying density functional theory calculations that usually require supercomputers. Work at the University of California Irvine has applied deep learning to approximate density functionals used to calculate the distribution of electrons in substances . Such DFT calculations are often the best way to model materials and are widely used in many branches of physics and chemistry.
The automation of science
In yet another direction, we are beginning to see automated high throughput experimentation reach materials science. These are automated systems that perform thousands of experiments at a time followed by characterisation and measurement. This allows one to scan a phase space quickly and efficiently.
For example, the High Throughput Experimental Database at the National Renewable Energy Laboratory contains over 1307 sample libraries with over 60,000 thin film samples prepared by co-sputtering metals . The database contains structural, electrical and optical information of these materials and is accessible to the public.
Other works in this direction include the development of an Autonomous Research System (ARES) to grow carbon nanotubes at controlled rates , while other groups have applied this to organic syntheses  as well as the development of nickel-titanium-based shape-memory alloys . These methods vastly outperform human labour, and are bound to change the PhD process of the future — in that most grad students might be replaced by robots!
Finally, as machines and algorithms take over laboratories and thought processes, we are reckoning with materials all over again. What is the material? What do we know about them or don’t? Is it possible that a neural network can identify a higher dimensional material property, a 100-dimensional behemoth that has no analogue in our minds? New ontologies of materials science are being built for robots and humans .
We will explore these and other topics in depth in the coming articles. Until then, we keep asking ourselves, where is vibranium?
 Factsage.com. Factsage. [cited 2019 13 January];
 Shannon, R.C.L., Barbara; Shannon, RD; Downs, Robert T; Fischer, Reinhard XD, Refractive indices of minerals and synthetic compounds. American Mineralogist, 2017. 101: p. 1906-1914.
 Isayev, O.O., Corey; Toher, Cormac; Gossett, Eric; Curtarolo, Stefano; Trophsa, Alexander, Universal fragment descriptors for predicting properties of inorganic crystals. Nature Communications, 2017. 8.
 Materials Genome Initiative. 2008 [cited 2019 13 January];
 Jain, A.P.O., Shyue; Hautier, Geoffroy; Chen, Wei; Richard, William Davidson; Dacek, Stephen; Cholia, Shreyas; Gunter, Dan; Skinner, David; Ceder, Gerbrand; Persson, Kristin A., Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials, 2013. 1.
 Curtarolo, S.S., Wahyu ; Hart, Gus L.W.; Jahnateka, Michal; Chepulskiia, Roman V ; Taylora, Richard H.; Wanga, Shidong ; Xuea, Junkai ; Yanga, Kesong ; Levy, Ohad; Mehle, Michael ; Stokes, Harold ; Demchenkof, Denis ; Morgang, Dane AFLOW: An automatic framework for high-throughput materials discovery. Computational Material Science, 2012. 58: p. 218-229.
 Swain, M.C.C., Jacqueline M, ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. Journal of Chemical Information and Modeling, 2016. 56(10): p. 1894-1906.
 Court, C.J.C., Jacqueline M. , Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction. Nature Scientific Data, 2018. 5.
 Edward Kim, K.H., Stefanie Jegelka & Elsa Olivetti Virtual screening of inorganic materials synthesis parameters with deep learning. NPJ Computational Materials, 2017. 3.
 Ziatdinov, M.M., Artem;Kalinin, Sergei V. , Learning surface molecular structures via machine vision. NPJ Computational Materials, 2017.
 Butler, K.T.D., Daniel W; Cartwright, Hugh ; Isayev, Olexandr; Walsh, Aron, Machine learning for molecular and materials science. Nature, 2018. 559: p. 547-555.
 Burke, K.H., Jacob; Baker, Thomas, Can exact conditions improve machine-learned density functionals? The Journal of Chemial Physics, 2018. 148(24): p. 241743.
 Sun, W.B., Christopher ; Arca, Elisabetta; Bauers, Sage; Matthews, Bethany; Orvañano, Bernardo ; Chen, Bor-Rong ; Toney, Michael F.; Schelhas, Laura T.; Tumas, William ; Tate, Janet; Zakutayev, Andriy ; Lany, Stephan; Holder, Aaron; Ceder, Gerbrand. A Map of the Inorganic Ternary Metal Nitrides. in MRS Fall 2018. 2018. Boston: MRS.
 Nikolaev, P.H., Daylond; Webber, Frederick; Rao, Rahul ; Decker, Kevin; Krein, Michael; Poleski, Jason ; Barto, Rick ; Maruyama, Benji, Autonomy in materials research: a case study in carbon nanotube growth. NPJ Computational Materials, 2016. 2(2).
 Hood Heath, J.P., ME; Lin, B., Systems biology and new technologies enable predictive and preventative medicine. Science, 2004. 640(3).
 Ross D. King1, Jem Rowland1, Stephen G. Oliver2, Michael Young3, Wayne Aubrey1, Emma Byrne1, Maria Liakata1, Magdalena Markham1, Pinar Pir2, Larisa N. Soldatova1, Andrew Sparkes1, Kenneth E. Whelan1, Amanda Clare1, The Automation of Science. Science, 2004. 324(5923).
 Aspuru-Guzik, A.P., Kristin Materials Acceleration Platform: Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods and Artificial Intelligence., in Mission Innovation: Innovation Challenge 6, A.P. Aspuru-Guzik, Kristin Editor. 2018, Harvard University: Boston
*This article is the work of the guest author shown above. The guest author is solely responsible for the accuracy and the legality of their content. The content of the article and the views expressed therein are solely those of this author and do not reflect the views of Matmatch or of any present or past employers, academic institutions, professional societies, or organizations the author is currently or was previously affiliated with.