Aim: This paper aims to advance autonomous surgical operations through imitation learning from video demonstrations. Methods: To address this objective, we propose two main contributions: (1) We introduce a new dataset of virtual kidney tumor environments to train our model on. The dataset is composed of video demonstrations of tumor removal from the kidney, executed in a virtual environment, and kinematic data of the robot tools; (2) We employed an imitation learning architecture composed of vision transformers (ViT) to handle the frames extracted from the videos and of a long short-term memory (LSTM) structure to process surgical motion sequences with a sliding window mechanism. This model processes video frames and prior poses to predict the poses for both robotic arms. A self-generating sequence approach was implemented, where each predicted pose served as the latest element in the sequence, subsequently used as input for the next prediction together with the current frame of the video. The choice of architecture and methodology was guided by the need to effectively model the sequential nature of surgical operations. Results: The model achieved promising results, exhibiting an average position error of 0.5 cm. The model was able to execute correctly 70% of the test tasks. This highlights the sequence-based approach's efficacy in capturing and predicting surgical trajectories. Conclusion: Our study supports imitation learning's viability for acquiring task execution policies in surgical robotics. The sequence-based model, combining ViT and LSTM architectures, successfully handles surgical trajectories.

Sequence-based imitation learning for surgical robot operations / Furnari, G.; Secchi, C.; Ferraguti, F.. - In: ARTIFICIAL INTELLIGENCE SURGERY. - ISSN 2771-0408. - 5:1(2025), pp. 103-115. [10.20517/ais.2024.32]

Sequence-based imitation learning for surgical robot operations

Furnari G.;Secchi C.;Ferraguti F.
2025

Abstract

Aim: This paper aims to advance autonomous surgical operations through imitation learning from video demonstrations. Methods: To address this objective, we propose two main contributions: (1) We introduce a new dataset of virtual kidney tumor environments to train our model on. The dataset is composed of video demonstrations of tumor removal from the kidney, executed in a virtual environment, and kinematic data of the robot tools; (2) We employed an imitation learning architecture composed of vision transformers (ViT) to handle the frames extracted from the videos and of a long short-term memory (LSTM) structure to process surgical motion sequences with a sliding window mechanism. This model processes video frames and prior poses to predict the poses for both robotic arms. A self-generating sequence approach was implemented, where each predicted pose served as the latest element in the sequence, subsequently used as input for the next prediction together with the current frame of the video. The choice of architecture and methodology was guided by the need to effectively model the sequential nature of surgical operations. Results: The model achieved promising results, exhibiting an average position error of 0.5 cm. The model was able to execute correctly 70% of the test tasks. This highlights the sequence-based approach's efficacy in capturing and predicting surgical trajectories. Conclusion: Our study supports imitation learning's viability for acquiring task execution policies in surgical robotics. The sequence-based model, combining ViT and LSTM architectures, successfully handles surgical trajectories.
2025
5
1
103
115
Sequence-based imitation learning for surgical robot operations / Furnari, G.; Secchi, C.; Ferraguti, F.. - In: ARTIFICIAL INTELLIGENCE SURGERY. - ISSN 2771-0408. - 5:1(2025), pp. 103-115. [10.20517/ais.2024.32]
Furnari, G.; Secchi, C.; Ferraguti, F.
File in questo prodotto:
File Dimensione Formato  
ais4032_down.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Licenza: [IR] creative-commons
Dimensione 816.09 kB
Formato Adobe PDF
816.09 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1383171
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact