Transcripción de audio a texto sesiones municipales en planetaria

Ruiz Melendres, Jaime Andrés

Publicación:
Transcripción de audio a texto sesiones municipales en planetaria

dc.contributor.advisor	Gómez Gómez, Jorge Eliecer
dc.contributor.author	Ruiz Melendres, Jaime Andrés
dc.contributor.jury	Gomez Gomez, Jorge
dc.date.accessioned	2024-07-23T17:25:07Z
dc.date.available	2024-07-23T17:25:07Z
dc.date.issued	2024-07-11
dc.description.abstract	El documento aborda la problemática de la transcripción manual de sesiones municipales en Planeta Rica. Se investiga sobre el uso de herramientas de código abierto para automatizar la transcripción de audio a texto en estas sesiones, con el objetivo de mejorar la eficiencia y la precisión de este proceso. Se destaca la importancia de la integración de modelos en el sistema para abordar diferentes aspectos y mejorar la calidad de las transcripciones. En este sentido, se mencionan dos modelos de inteligencia artificial: Whisper de OpenAI y Spleeter de Deezer. Whisper es un modelo de reconocimiento de voz de propósito general. Por otro lado, Spleeter es una herramienta de separación de pistas de audio que utiliza modelos previamente entrenados para separar las voces de cualquier pista de audio. Además, se desarrolla una arquitectura que permite la integración de estos modelos de forma automática. Esta arquitectura se basa en el uso de Python para el manejo de los modelos de inteligencia artificial, mientras que el backend de la aplicación se desarrolla con Go y el frontend con Next.js/React. Lo que permitio la automatización de las transcripciones de las sesiones del concejo municipal de Planeta Rica, mejorando la eficiencia y la precisión del proceso.	spa
dc.description.abstract	The document addresses the issue of manually transcribing municipal sessions in Planeta Rica. It investigates the use of open-source tools to automate the transcription of audio to text in these sessions with the aim of improving efficiency and accuracy in this process. The importance of integrating models into the system to address different aspects and enhance transcription quality is emphasized. In this regard, two artificial intelligence models are mentioned: OpenAI’s Whisper and Deezer’s Spleeter. Whisper is a general-purpose speech recognition model. On the other hand, Spleeter is an audio track separation tool that utilizes pre-trained models to separate voices from any audio track. Furthermore, an architecture is developed to enable the automatic integration of these models. This architecture is based on the use of Python for managing the artificial intelligence models, while the application’s backend is developed using Go and the frontend with Next.js/React. This allowed for the automation of transcriptions for Planeta Rica’s municipal council sessions, improving both efficiency and precision in the process.	eng
dc.description.degreelevel	Pregrado
dc.description.degreename	Ingeniero(a) de Sistemas
dc.description.modality	Trabajos de Investigación y/o Extensión
dc.description.tableofcontents	1 INTRODUCCIÓN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3	spa
dc.description.tableofcontents	2 DESCRIPCIÓN Y FORMULACIÓN DEL PROBLEMA . . . . . . . . . . . . . . . . . . . . 4	spa
dc.description.tableofcontents	3 JUSTIFICACIÓN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7	spa
dc.description.tableofcontents	4 OBJETIVOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10	spa
dc.description.tableofcontents	4.1. Objetivo General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10	spa
dc.description.tableofcontents	4.2. Objetivos Específicos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10	spa
dc.description.tableofcontents	5 ESTADO DEL ARTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11	spa
dc.description.tableofcontents	6 Marco Teorico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14	spa
dc.description.tableofcontents	6.1. La inteligencia artificial (IA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14	spa
dc.description.tableofcontents	6.1.1. ¿Qué es? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14	spa
dc.description.tableofcontents	6.2. Aplicación de Modelos Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14	spa
dc.description.tableofcontents	6.2.1. ¿Qué es? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14	spa
dc.description.tableofcontents	6.3. Aplicación de Redes Neuronales Convolucionales . . . . . . . . . . . . . . . . . . . . . . . . . . 15	spa
dc.description.tableofcontents	6.3.1. ¿Qué son? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15	spa
dc.description.tableofcontents	6.4. Aplicación de Procesamiento de Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16	spa
dc.description.tableofcontents	6.4.1. ¿Qué es? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16	spa
dc.description.tableofcontents	6.5. Conversión de Audio a Texto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17	spa
dc.description.tableofcontents	6.5.1. ¿Qué es? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17	spa
dc.description.tableofcontents	6.6. Herramienta de separacion de pistas Deezer/Spleeter. [17] . . . . . . . . . . . . . . . . . . . . . 18	spa
dc.description.tableofcontents	6.7. Herramienta de transcripcion de audio a texto Whisper . . . . . . . . . . . . . . . . . . . . . . . 19	spa
dc.description.tableofcontents	6.8. Servicios web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21	spa
dc.description.tableofcontents	7 Metodología . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22	spa
dc.description.tableofcontents	7.1. Fases de desarrollo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22	spa
dc.description.tableofcontents	7.2. Proceso de la investigación . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22	spa
dc.description.tableofcontents	7.2.1. Fase I: Estudio, análisis e interpretación de estudios previos . . . . . . . . . . . . . . . . 22	spa
dc.description.tableofcontents	7.2.2. Fase II: Modelado de arquitectura del sistema . . . . . . . . . . . . . . . . . . . . . . . . 23	spa
dc.description.tableofcontents	7.2.3. Fase III: Implementación de prototipo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23	spa
dc.description.tableofcontents	7.2.4. Fase IV: Evaluación del prototipo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23	spa
dc.description.tableofcontents	7.3. FLUJO DE INFORMACIÓN DEL PROYECTO . . . . . . . . . . . . . . . . . . . . . . . . . . . 23	spa
dc.description.tableofcontents	8 DISEÑO ARQUITECTÓNICO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25	spa
dc.description.tableofcontents	8.1. Requerimientos funcionales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25	spa
dc.description.tableofcontents	8.1.1. Sistema de inicio de sesión . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25	spa
dc.description.tableofcontents	8.1.2. Visualización de listado de transcripciones realizadas . . . . . . . . . . . . . . . . . . . 25	spa
dc.description.tableofcontents	8.1.3. Botón para creación de transcripción . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26	spa
dc.description.tableofcontents	8.1.4. Formulario de creación de transcripción . . . . . . . . . . . . . . . . . . . . . . . . . . . 26	spa
dc.description.tableofcontents	8.1.5. Sistema de división de archivos de audio . . . . . . . . . . . . . . . . . . . . . . . . . . 27	spa
dc.description.tableofcontents	8.2. DIAGRAMAS DE CASO DE USO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27	spa
dc.description.tableofcontents	8.2.1. Casos de uso general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28	spa
dc.description.tableofcontents	8.3. Diagramas de secuencia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29	spa
dc.description.tableofcontents	8.3.1. Creación de transcripción . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29	spa
dc.description.tableofcontents	8.4. Diagramas de estado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30	spa
dc.description.tableofcontents	8.5. Diseño de la base de datos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31	spa
dc.description.tableofcontents	9 Análisis y Resultados . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32	spa
dc.description.tableofcontents	9.1. Velocidad de separacion de pistas con Deezer/Spleeter . . . . . . . . . . . . . . . . . . . . . . . 32	spa
dc.description.tableofcontents	9.2. Transcripción de audio a texto con Whisper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34	spa
dc.description.tableofcontents	10 Conclusiones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37	spa
dc.format.mimetype	application/pdf
dc.identifier.instname	Universidad de Córdoba
dc.identifier.reponame	Repositorio universidad de Córdoba
dc.identifier.repourl	https://repositorio.unicordoba.edu.co/home
dc.identifier.uri	https://repositorio.unicordoba.edu.co/handle/ucordoba/8428
dc.language.iso	spa
dc.publisher	Universidad de Córdoba
dc.publisher.faculty	Facultad de Ingeniería
dc.publisher.place	Montería, Córdoba, Colombia
dc.publisher.program	Ingeniería de Sistemas
dc.relation.references	Concepto 97471 de 2020 Departamento Administrativo de la Función Pública - Gestor Normativo. (n.d.). Función Pública.
dc.relation.references	Shelley. (2023, September 19). ¿Cuánto cuesta la Inteligencia Artificial en 2022? Developers.
dc.relation.references	The rise of Speech AI: a Game-Changer in the Tech world. (n.d.). Nasscom \| the Official Community of Indian IT Industry.
dc.relation.references	Papers with Code - Contrastive Audio-Language Learning for Music. (2022, August 25).
dc.relation.references	Flach, Peter (2012) Machine Learning: The Art and Science of Algorithms that Make Sense of Data.
dc.relation.references	MultiComp Lab. (2017, October 4). Multimodal Machine Learning \| MultiComp. MultiComp \| MultiComp Lab’s Mission Is to Build the Algorithms and Computational Foundation to Understand the Interdependence Between Human Verbal, Visual, and Vocal Behaviors Expressed During Social Communicative Interactions.
dc.relation.references	C. Chen, D. Han, and J. Wang, “Multimodal EncoderDecoder Attention Networks for Visual Question Answering,” IEEE Access, pp. 1–1, 2 2020
dc.relation.references	Merritt, R. (2022, April 19). ¿Qué Es un Modelo Transformer? \| Blog de NVIDIA. Blog Oficial De NVIDIA Latino América.
dc.relation.references	Papers with Code - VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text. (2021, April 22).
dc.relation.references	¿Qué son las redes neuronales convolucionales? \| IBM. (n.d.).
dc.relation.references	Chan, W. (2015, August 5). Listen, attend and spell. arXiv.org.
dc.relation.references	Amodei, D. (2015, December 8). Deep Speech 2: End-to-End speech recognition in English and Mandarin. arXiv.org.
dc.relation.references	Llisterri, J. (n.d.). Las unidades de síntesis.
dc.relation.references	Olivier M. Emorine y Pierre M. Martin. 1988. El sistema de conversión de texto a voz MULTIVOC. En Actas de la segunda conferencia sobre procesamiento aplicado del lenguaje natural (ANLC ’88). Asociación de Lingüística Computacional, EE. UU., 115–120.
dc.relation.references	MHTTS: Fast Multi-head Text-to-speech For Spontaneous Speech With Imperfect Transcription. (2022, October 1). IEEE Conference Publication \| IEEE Xplore.
dc.relation.references	Van Den Oord, A. (2016, September 12). WaveNet: a generative model for raw audio. arXiv.org.
dc.relation.references	Hennequin, R., Khlif, A., Voituret, F., & Moussallam, M. (2020). Spleeter: a fast and efficient music source separation tool with pre-trained models. Journal of Open Source Software, 5(50), 2154.
dc.relation.references	A, L. a. C., & Ancy, C. A. (2021). Research on DNN Methods in Music Source Separation Tools with emphasis to Spleeter. International Research Journal on Advanced Science Hub, 3(Special Issue 6S), 24–28.
dc.relation.references	Ronneberger, O. (2015, May 18). U-NET: Convolutional Networks for Biomedical Image Segmentation. arXiv.org.
dc.relation.references	Radford, A. (2022, December 6). Robust speech recognition via Large-Scale Weak Supervision. arXiv.org.
dc.relation.references	OpenAi About. (n.d.).
dc.relation.references	Python documentation. (n.d.).
dc.relation.references	Maldeadora. (2018). Qué es Frontend y Backend: características, diferencias y ejemplos. Platzi.
dc.relation.references	Documentation - The Go Programming Language. (n.d.).
dc.relation.references	Docs. (n.d.). Next.js.
dc.relation.references	Smith, J. (2005). Fundamentals of Audio and Music Engineering: Part 1 Musical Sound & Electronics. Coursera. https://www.coursera.org/learn/audio
dc.relation.references	Johnson, M., & Smith, A. (2010). Audio-to-Text Conversion: A Comprehensive Review. Journal of Speech and Audio Processing, 25(3), 123-140.
dc.rights	Copyright Universidad de Córdoba, 2024
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.coar	http://purl.org/coar/access_right/c_abf2
dc.rights.license	Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.keywords	Inteligencia artificial	eng
dc.subject.keywords	Audio-to-text	eng
dc.subject.keywords	Transcripción	eng
dc.subject.keywords	Sesiones	eng
dc.subject.keywords	Whisper	eng
dc.subject.keywords	Spleeter	eng
dc.subject.proposal	Inteligencia artificial	spa
dc.subject.proposal	Audio-to-text	spa
dc.subject.proposal	Transcripción	spa
dc.subject.proposal	Sesiones	spa
dc.subject.proposal	Whisper	spa
dc.subject.proposal	Spleeter	spa
dc.title	Transcripción de audio a texto sesiones municipales en planetaria	spa
dc.type	Trabajo de grado - Pregrado
dc.type.coar	http://purl.org/coar/resource_type/c_7a1f
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.driver	info:eu-repo/semantics/bachelorThesis
dc.type.version	info:eu-repo/semantics/acceptedVersion
dspace.entity.type	Publication

Archivos

Bloque original

Mostrando 1 - 2 de 2

Nombre:: RuizMelendresJaimeAndres.pdf
Tamaño:: 1.1 MB
Formato:: Adobe Portable Document Format

Descargar

Nombre:: Autorización Publicación.pdf
Tamaño:: 347.69 KB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 15.18 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

F.E.A. Trabajos de Investigación y/o Extensión

Publicación: Transcripción de audio a texto sesiones municipales en planetaria

Archivos

Bloque original

Bloque de licencias

Colecciones

Publicación:
Transcripción de audio a texto sesiones municipales en planetaria