Guo M H, Liu Z N, Mu T J, et al. Beyond self-attention: External
attention using two linear layers for visual tasks[J]. arXiv preprint
arXiv:2105.02358, 2021.
Rempe D, Birdal T, Zhao Y, et al. Caspr: Learning canonical
spatiotemporal point cloud representations[J]. Advances in neural
information processing systems, 2020, 33: 13688-13701.
Liu J, Xu D. GeometryMotion-Net: A strong two-stream baseline for 3D
action recognition[J]. IEEE Transactions on Circuits and Systems for
Video Technology, 2021, 31(12): 4711-4721.
Fan H, Yang Y, Kankanhalli M. Point 4D transformer networks for
spatio-temporal modeling in point cloud videos[C]//Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:
14204-14213.
Plizzari C, Cannici M, Matteucci M. Skeleton-based action
recognition via spatial and temporal transformer networks[J]. Computer
Vision and Image Understanding, 2021, 208: 103219.