TY - JOUR
T1 - Investigating the Physics of Tokamak Global Stability with Interpretable Machine Learning Tools
AU - Andrea, Murari
AU - Contributors, JET
A2 - Avotiņa, Līga
A2 - Baumane, Larisa
A2 - Čonka, Dāvis
A2 - Haļitovs, Mihails
A2 - Igaune, Ieva
A2 - Jansons, Juris
A2 - Ķizāne, Gunta
A2 - Kovaldins, Ričards
A2 - Leščinskis, Andris
A2 - Leščinskis, Broņislavs
A2 - Pajuste, Elīna
A2 - Vītiņš, Aigars
A2 - Zariņš, Artūrs
A2 - Zariņš, Roberts
N1 - Publisher Copyright:
© 2020 by the authors.
PY - 2020/10/1
Y1 - 2020/10/1
N2 - The inadequacies of basic physics models for disruption prediction have induced the community to increasingly rely on data mining tools. In the last decade, it has been shown how machine learning predictors can achieve a much better performance than those obtained with manually identified thresholds or empirical descriptions of the plasma stability limits. The main criticisms of these techniques focus therefore on two different but interrelated issues: poor "physics fidelity" and limited interpretability. Insufficient "physics fidelity" refers to the fact that the mathematical models of most data mining tools do not reflect the physics of the underlying phenomena. Moreover, they implement a black box approach to learning, which results in very poor interpretability of their outputs. To overcome or at least mitigate these limitations, a general methodology has been devised and tested, with the objective of combining the predictive capability of machine learning tools with the expression of the operational boundary in terms of traditional equations more suited to understanding the underlying physics. The proposed approach relies on the application of machine learning classifiers (such as Support Vector Machines or Classification Trees) and Symbolic Regression via Genetic Programming directly to experimental databases. The results are very encouraging. The obtained equations of the boundary between the safe and disruptive regions of the operational space present almost the same performance as the machine learning classifiers, based on completely independent learning techniques. Moreover, these models possess significantly better predictive power than traditional representations, such as the Hugill or the beta limit. More importantly, they are realistic and intuitive mathematical formulas, which are well suited to supporting theoretical understanding and to benchmarking empirical models. They can also be deployed easily and efficiently in real-time feedback systems.
AB - The inadequacies of basic physics models for disruption prediction have induced the community to increasingly rely on data mining tools. In the last decade, it has been shown how machine learning predictors can achieve a much better performance than those obtained with manually identified thresholds or empirical descriptions of the plasma stability limits. The main criticisms of these techniques focus therefore on two different but interrelated issues: poor "physics fidelity" and limited interpretability. Insufficient "physics fidelity" refers to the fact that the mathematical models of most data mining tools do not reflect the physics of the underlying phenomena. Moreover, they implement a black box approach to learning, which results in very poor interpretability of their outputs. To overcome or at least mitigate these limitations, a general methodology has been devised and tested, with the objective of combining the predictive capability of machine learning tools with the expression of the operational boundary in terms of traditional equations more suited to understanding the underlying physics. The proposed approach relies on the application of machine learning classifiers (such as Support Vector Machines or Classification Trees) and Symbolic Regression via Genetic Programming directly to experimental databases. The results are very encouraging. The obtained equations of the boundary between the safe and disruptive regions of the operational space present almost the same performance as the machine learning classifiers, based on completely independent learning techniques. Moreover, these models possess significantly better predictive power than traditional representations, such as the Hugill or the beta limit. More importantly, they are realistic and intuitive mathematical formulas, which are well suited to supporting theoretical understanding and to benchmarking empirical models. They can also be deployed easily and efficiently in real-time feedback systems.
KW - Classification and regression trees (CART)
KW - Data-driven theory
KW - Disruptions
KW - Ensemble of classifiers
KW - Knowledge discovery
KW - prediction
KW - Support vector machines
KW - Symbolic regression
UR - https://www.mdpi.com/2076-3417/10/19/6683
UR - https://www.scopus.com/pages/publications/85092538137
U2 - 10.3390/APP10196683
DO - 10.3390/APP10196683
M3 - Article
SN - 2076-3417
VL - 10
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 19
M1 - 6683
ER -