Research Agenda

Author

Affiliation

Nishant Aswani

NYU Abu Dhabi, NYU Tandon

Published

December 7, 2025

Can we represent the nonlinear layers in neural networks as linear operations?

Progress

We developed layer scaling as a mechanism to “stretch” the steps of a dynamical into smaller increments.

Taking advantage of layer scaling, we used time delay embedding as an observable to feed into a dynamic mode decomposition procedure.
We individually replaced layers in two classifier networks (MNIST and YinYang) with varying success in preserving accuracy.
In the case of the YinYang network, we could visualize how the DMD replacements affected the final decision boundaries of the network.

Current Challenges

Open Questions

How do we produce a mechanistic understanding of the layer? What does this layer “do”?
What does the varying success rate in hybridization across layers of a network say about the role of each layer? Why are some layers easier to replace than others?
Can we use this method to “edit” trained models?
What are the roadblocks in extending to convolution and attention?

Can we improve the interpretability of sparse autoencoders with a sparse Koopman variant?

Progress

We implemented a Koopman sparse autoencoder with the three losses (reconstruction, linear prediction, dynamic prediction) defined in (BruntonNaturePaper?) and trained it on the activations of an MLP layer from GPT-2 with \(n=32 * 768\).
Initial experiments are documented here.

Current Challenges

Koopman autoencoders typically learn a \(\mathcal{K} \in \mathbb{R}^{n\times n}\) matrix, which is very costly when the latent dimension \(n\) is large. Learning the components of \(\mathcal{K}\) via DMD is not immediately feasible because that would require computing decompositions for each batch of data, which is even more costly.
Relative to a vanilla SAE, the number of “alive” dictionary components are very low for a trained KSAE. To remedy this, we should run sweeps and implement the feature density histogram metric

Open Questions

Are we able to see how the dictionary components are “stepped” forward by the MLP in a transformer block?

By analyzing the dynamics of a single vehicle, can we discover the dynamics of other vehicles?

Progress

TBD.

Current Challenges

TBD.

Open Questions

TBD.

Citation

BibTeX citation:

@online{aswani2025,
  author = {Aswani, Nishant},
  title = {Research {Agenda}},
  date = {2025-12-07},
  url = {https://nishantaswani.com/articles/agenda.html},
  langid = {en}
}

For attribution, please cite this work as:

Aswani, Nishant. 2025. “Research Agenda.” December 7, 2025. https://nishantaswani.com/articles/agenda.html.