University researchers develop new type of interpretable neural networks

Researchers from the Massachusetts Institute of Technology, the California Institute of Technology, and Northeastern University have developed a new type of neural network: Kolmogorov-Arnold networks (KAN). KAN models outperform larger perceptron-based models in physics modeling tasks and provide more interpretable visualization.

KANs were inspired by the Kolmogorov-Arnold representation theorem, which states that any complex function of multiple variables can be rewritten as a sum of multiple functions of a single variable. While today’s neural networks are based on the perceptron, which learns a set of weights used to create a linear combination of its inputs, which is passed to an activation function, KANs learn an activation function for each input and the outputs of these functions are summed. The researchers compared the performance of KANs’ traditional multilayer perceptron (MLP) neural networks in modeling several problems from physics and mathematics and found that KANs achieved higher accuracy with fewer parameters; in some cases, 100x accuracy with 100 times fewer parameters. The researchers also showed that visualizing the KAN’s activation functions helped users discover symbolic formulas that could represent the physical process being modeled. According to the research team:

The reason large language models are so transformative is that they are useful to anyone who can speak natural language. The language of science is functions. KANs are made up of interpretable functions, so when a human user works with a KAN, it is like communicating with it in the language of functions.

KANs have a similar structure to MLPs, but instead of learning weights for each input, they learn a spline function. Due to their layered structure, the research team showed that KANs can not only learn features in the data, but “also optimize these learned features with great accuracy thanks to the splines.” The team also showed that KANs follow the same scaling laws as MLPs, such as increasing the number of parameters to improve accuracy, and they found that they could increase the number of parameters of a trained KAN, and thus its accuracy, “by simply making its spline grids finer.”

The researchers have created an interface that allows human users to interpret and manipulate the KAN. The visualization hides activation functions with small size, allowing users to focus on important features. Users can simplify the KAN by removing unimportant nodes. Users can also explore the spline functions and replace them with symbolic shapes such as trigonometric or logarithmic functions if needed.

In a Hacker News discussion about KANs, a user reported his own experience comparing KANs to traditional neural networks (NN):

My main takeaway was that KANs are very difficult to train compared to NNs. It is usually possible to get the loss per parameter roughly to the level of NNs, but this requires a lot of hyperparameter tuning and additional tricks in the KAN architecture. In comparison, vanilla NNs were much easier to train and performed well under a much wider set of conditions. Some people commented that we put in an incredible amount of effort to get really good at training NNs efficiently, and that many of the things in ML libraries (optimizers like Adam for example) are specifically designed and optimized for NNs. For this reason, it’s not really a good apples to apples comparison.

The KAN source code is available on GitHub.

Related Posts

Paraguay is open to trade agreements with China via Mercosur despite Taiwan relations, says Pena

Why it was created, why we celebrate it

Where to eat and drink in New York City at the 2024 US Open

Leave a Reply Cancel reply