Can we represent the nonlinear layers in neural networks as linear operations?
Progress
- We developed layer scaling as a mechanism to “stretch” the steps of a dynamical into smaller increments.
- Taking advantage of layer scaling, we used time delay embedding as an observable to feed into a dynamic mode decomposition procedure.
- We individually replaced layers in two classifier networks (MNIST and YinYang) with varying success in preserving accuracy.
- In the case of the YinYang network, we could visualize how the DMD replacements affected the final decision boundaries of the network.