UNeXt: MLP-based Rapid Medical Image Segmentation Network
Jeya Maria Jose
Vishal M. Patel
Johns Hopkins University


UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. However, these networks cannot be effectively adopted for rapid image segmentation in point-of-care applications as they are parameter-heavy, computationally complex and slow to use. To this end, we propose UNeXt which is a Convolutional multilayer perceptron (MLP) based network for image segmentation. We design UNeXt in an effective way with an early convolutional stage and a MLP stage in the latent stage. We propose a tokenized MLP block where we efficiently tokenize and project the convolutional features and use MLPs to model the representation. To further boost the performance, we propose shifting the channels of the inputs while feeding in to MLPs so as to focus on learning local dependencies. Using tokenized MLPs in latent space reduces the number of parameters and computational complexity while being able to result in a better representation to help segmentation. The network also consists of skip connections between various levels of encoder and decoder. We test UNeXt on multiple medical image segmentation datasets and show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance over the state-of-theart medical image segmentation architectures.


As medical imaging solutions become more applicable at point-of-care, it is important to focus on making the deep networks light-weight and fast while also being efficient. For example, point-of-care ultrasound (POCUS) devices phone camera based images are also being used to detect and diagnose skin conditions where we need light-weight networks with high inference speed.


UNeXt which is a convolutional and MLP-based network. We still follow a 5-layer deep encoder-decoder architecture of UNet with skip connections but change the design of each block. We have two stages in UNeXt- a convolutional stage followed by an MLP stage. We use convolutional blocks with less number of filters in the initial and final blocks of the network. In the bottleneck, we use a novel Tokenized MLP (TokMLP) block which is effective at maintaining less computation while also being able to model a good representation. Tokenized MLP projects the convolutional features into an abstract token and then uses MLPs to learn meaningful information for segmentation. We also introduce shifting operation in the MLPs to extract local information corresponding to different axial shifts. As the tokenized features are of the less dimensions and MLPs are less complicated than convolution or self-attention and transformers; we are able to reduce the number of parameters and computational complexity significantly while also maintaining a good performance.


We plot the comparison charts of F1 score vs. GLOPs, F1 score vs. Inference time and F1 Score vs. Number of Parameters. The F1 score used here corresponds to the ISIC dataset. It can be clearly seen from the charts that UNeXt and TransUNet are the best performing methods in terms of the segmentation performance. However, UNeXt clearly outperforms all the other networks in terms of computational complexity, inference time and number of parameters which are all important characteristics to consider for point-of-care imaging applications.

Paper and Supplementary Material

Tech Report
(hosted on ArXiv)

[Bibtex] @article{valanarasu2022unext, title={UNeXt: MLP-based Rapid Medical Image Segmentation Network}, author={Valanarasu, Jeya Maria Jose and Patel, Vishal M}, journal={arXiv preprint arXiv:2203.04967}, year={2022} }


This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.