Juan Gutiérrez
profile photo

Juan Gutiérrez

Email: juan.gutierrez [at] upm [dot] es

I am a Ph.D student in Computer Vision at UPM, supervised by Dr. José Luis Blanco Murillo, and, since September 2022, member of GAPS research group.

profile photo

News

Research

I'm interested in the principles of self-supervised representation learning for image and video foundation models, aiming to construct latent spaces that robustly encode semantics and spatiotemporal dynamics. I leverage the geometry of the resulting feature manifold to analyze data structure and propagate sparse relational supervision. Some papers are highlighted.

PANC visualization
PANC placeholder
PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs
Juan Gutiérrez, Víctor Gutiérrez, José Luis Blanco
Preprint, 2026
arXiv | cite

A weakly supervised spectral segmentation method that injects a few annotated tokens into a token–affinity graph to bias normalized-cut clustering for stable, controllable object masks.

HITL visualization
HITL placeholder
An Evaluation of Hybrid Annotation Workflows on High-Ambiguity Spatiotemporal Video Footage
Juan Gutiérrez, Víctor Gutiérrez-García, Ángel Mora-Sánchez, Silvia Rodríguez-Jiménez, José Luis Blanco-Murillo
Preprint, 2026
arXiv | bibtex

Benchmarking assisted annotation workflows using fine-tuned vision-language model for video.

Speech visualization
Speech placeholder
Open-Source System for Multilingual Translation and Cloned Speech Synthesis
Mateo Cámara, Juan Gutiérrez, María Pilar Daza, José Luis Blanco
Forum Acusticum / Euronoise 2025, Málaga, Spain
arXiv | bibtex

An open-source pipeline combining speech recognition, LLM-based translation, and voice-cloning TTS for real-time multilingual communication.

DCAI visualization
DCAI placeholder
AI-Boosted Video Annotation: Exploring Pre-Labeling with Cross-Modalities
Juan Gutiérrez, Ángel Mora Sánchez, Silvia Rodríguez Jiménez, José Luis Blanco
Distributed Computing and Artificial Intelligence (DCAI), 2024 (Springer LNCS, 2025)
springer | bibtex

Leveraging pre-trained cross-modal models within the Human-in-the-Loop paradigm to efficiently pre-annotate large-scale video datasets.

A Study on the Development of a Video Annotation Support System Using an Image- and Text-Agnostic Model
Juan Gutiérrez
Master's Thesis, UPM, 2023
thesis | bibtex

Developed a CLIP-based human-in-the-loop system for efficient video annotation via keyframe selection, semantic retrieval, and automatic label propagation.

Miscellanea

Teaching

- Design of Communications Systems and Equipment (M.S in Signal Theory and Communications, 1st course, UPM)
- Computing and Visualization Tools (B.S in Telecommunication Engineering, 2nd course, UPM)

Thesis Supervisor

B.S
- Development of a proof of concept for AI-supported video annotation
   Fernando Castell Miñón
- Development and evaluation of a video summary generation system using multimodal models
   Andrés Velasco Sánchez
M.S
- Analysis of crossmodal representation and understanding of actions using CLIP
   Imanol Torres Inchaurza
- Development of a mass search tool for digital assets based on cross-modal representations
   Pablo Regodón Cerezo

Talks

- Text-Based Video Retrieval through Hierarchical Content Representation (Great Talks @ Teleco, 2025, Madrid, Spain)
- Using Agnostic Models on Image and Text to Support Video Annotation (AIAI 2023, León, Spain)

Short Papers

Data Integration and Analytics of Cone Crusher Responses
G. Asbjörnsson, M. Evertsson, P. Plaza, S. Rodríguez-Jiménez, J. Gavilanes, J. Cortón-González, Juan Gutiérrez, J. L. Blanco, J. E. Ortiz
14th International Comminution Symposium (Comminution '25), Cape Town, 2025

BibTeX Citation