The path forward requires sustained collaboration between researchers in
interpretability, effi-
ciency, and AI safety. Only through addressing the fundamental challenges of
knowledge extrac-
tion and representation can we realize the promise of hybrid architectures that
combine the best
aspects of large and small AI systems.
References
[1] Hinton, G., Vinyals, O., and Dean, J. Distilling the Knowledge in a Neural
Network. NIPS
Deep Learning and Representation Learning Workshop, 2015.
[2] Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., and
Sayres, R. Inter-
pretability Beyond Feature Attribution: Quantitative Testing with Concept
Activation Vectors.
International Conference on Machine Learning, 2018.
[3] Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal,
A., and Bengio,
Y. Toward Causal Representation Learning. Proceedings of the IEEE, 109(5):612-
634, 2021.
[4] Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., and
Dean, J. Outra-
geously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.
International
Conference on Learning Representations, 2017.
[5] Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M.,
Shazeer, N.,
and Chen, Z. GShard: Scaling Giant Models with Conditional Computation and
Automatic
Sharding. International Conference on Learning Representations, 2021.
[6] Gururangan, S., Lewis, M., Holtzman, A., Smith, N. A., and Zettlemoyer, L.
DEMix Layers:
Disentangling Domains for Modular Language Modeling. Proceedings of the 59th
Annual
Meeting of the Association for Computational Linguistics, 2021.
[7] Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., and Bengio, Y.
FitNets: Hints
for Thin Deep Nets. International Conference on Learning Representations,
2015.
[8] Zagoruyko, S. and Komodakis, N. Paying More Attention to Attention: Improving
the Per-
formance of Convolutional Neural Networks via Attention Transfer.
International Conference
on Learning Representations, 2017.
[9] Rogers, A., Kovaleva, O., and Rumshisky, A. A Primer in Neural Network Models
for Natural
Language Processing. Journal of Artificial Intelligence Research, 57:615-686,
2016.