CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System

Existing video analysis models often lack explainability, perform poorly on long videos, and frequently hallucinate. Commercial solutions are closed-source and costly. We introduce CReLeRI, an open-source (https://github.com/michaelperez023/creleri-video) system for action detection in untrimmed videos. CReLeRI segments videos using scene and action transitions, detects actions and their arguments and grounds them in 3D space to improve interpretability and reduce hallucinations. The system promotes transparency and trust in AI-driven analysis of complex, real-world videos. A demonstration video is also available (https://youtu.be/XDCue9EYNTU).

Michael Perez
Rohith Venkatakrishnan
Jaime Ruiz

As well as: Yichi Yang, Yuheng Zha, Enze Ma, Danish Tamboli, Haodi Ma, Reza Shahriari, Vyom Pathak, Dzmitry Kasinets, Daisy (Zhe) Wang, Eric D. Ragan, Zhiting Hu, Eric Xing, & Jun-Yan Zhu

Michael Francis Perez, Yichi Yang, Yuheng Zha, Enze Ma, Danish Tamboli, Haodi Ma, Reza Shahriari, Vyom Pathak, Dzmitry Kasinets, Rohith Venkatakrishnan, Daisy (Zhe) Wang, Jaime Ruiz, Eric D. Ragan, Zhiting Hu, Eric Xing, and Jun-Yan Zhu. 2025. CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System. In Proceedings of the 33rd ACM International Conference on Multimedia (MM ’25). Association for Computing Machinery, New York, NY, USA, 13528–13530. https://doi.org/10.1145/3746027.3754479

@inproceedings{10.1145/3746027.3754479,
author = {Perez, Michael Francis and Yang, Yichi and Zha, Yuheng and Ma, Enze and Tamboli, Danish and Ma, Haodi and Shahriari, Reza and Pathak, Vyom and Kasinets, Dzmitry and Venkatakrishnan, Rohith and Wang, Daisy (Zhe) and Ruiz, Jaime and Ragan, Eric D. and Hu, Zhiting and Xing, Eric and Zhu, Jun-Yan},
title = {CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System},
year = {2025},
isbn = {9798400720352},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746027.3754479},
doi = {10.1145/3746027.3754479},
abstract = {Existing video analysis models often lack explainability, perform poorly on long videos, and frequently hallucinate. Commercial solutions are closed-source and costly. We introduce CReLeRI, an open-source system for action detection in untrimmed videos. CReLeRI segments videos using scene and action transitions, detects actions and their arguments and grounds them in 3D space to improve interpretability and reduce hallucinations. The system promotes transparency and trust in AI-driven analysis of complex, real-world videos. A demonstration video is also available.},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia},
pages = {13528–13530},
numpages = {3},
keywords = {grounding, human-centered computing, interpretability, large language models, multimedia interaction, object detection, video action detection, vision-language models},
location = {Dublin, Ireland},
series = {MM '25}
}

Conference Paper PDF (2.54 MB)

Concept-centric Representation, Learning, Reasoning, and Interaction (CReLeRI)

Authors

Citation

Bibtex

Downloads

Related Funding