Overview
Work History
Education
Skills
Projects
LANGUAGES
Timeline
Generic

Zain Hazzouri

Berlin

Overview

6
6
years of professional experience

Work History

Working Student - AI Research Assistant

German Research Center for Artificial Intelligence
06.2024 - Current

Contributed to AI research projects focusing on health data processing and de-identification of sensitive information using Large Language Models (LLMs) and BERT.

• Built end-to-end data pipelines for preprocessing, training, and evaluation of de-identification models.

• Developed an interactive web interface using Streamlit to facilitate model interaction and result visualization.

• Leveraged the organization’s High-Performance Computing (HPC) internal cluster for model training, fine-tuning, and experimentation on large-scale datasets.

• Containerized workflows and deployments using SquashFS (.sqsh) for efficient portability and resource management.

• Ensured seamless integration of LLMs with the pipelines, optimizing both performance and scalability.

• Integrated LangChain to build efficient and modular retrieval-augmented systems for improving de-identification tasks and enhancing model response capabilities.

Working Student

Helping GmbH
03.2023 - 05.2024

• Provided support in maintaining and managing IT systems to ensure seamless operations.

• Assisted in troubleshooting system-level issues, enhancing performance and reliability.

• Gained hands-on experience with cloud infrastructure and automation processes.

• Contributed to writing scripts in Python and Bash for system automation, monitoring, and data processing tasks.

Machine Learning Software Engineer

Fraunhofer FOKUS
05.2021 - 02.2023

Designed and implemented a patented semantic search solution utilizing advanced Natural Language Processing (NLP) techniques.

• Deployed the solution as a full-stack microservices-based web application integrated into a robust CI/CD pipeline on a private cloud infrastructure, ensuring scalability and reliability.

• Developed and managed a complete ML workflow for training, testing, and inferencing across more than 16 AI models, optimized for cloud deployment.

• Leveraged MLFlow for model tracking, versioning, and monitoring throughout the development lifecycle.

• Integrated advanced NLP libraries such as Gensim, NLTK, and spaCy for semantic search and model development.

• Implemented APIs and backend services using FastAPI and Docker to ensure high performance and containerized deployments.

• Enhanced search performance with ElasticSearch and optimized data storage solutions using MongoDB.

• Developed user-friendly interfaces and interactive features using JavaScript, improving usability for end-users.

• Conducted experiment tracking and model optimization with Weights & Biases (W&B) for performance tuning and reproducibility.

internship in the direction of embedded systems

Denver Technologies
05.2019 - 04.2020

• Contributed to the development of embedded systems using Lidar technology and TPU programming.

• Utilized C programming to optimize performance and resource management in embedded devices.

• Collaborated with a multidisciplinary team to enhance sensor integration and system stability.

Education

Master of Science - Computer science

Technical University Berlin
Berlin
12-2034

Bachelor - Computer engineering

Technical Univeristy Berlin
Berlin
02.2024

Skills

    Skills

    Project Management: Jira, Miro, Confluence

    Programming Languages: Python, Java, JavaScript, C,C,C#

    Industry AI Knowledge: Generative AI Applications, Agentic Systems and LangChain, LLM Fine-tuning and Prompt Engineering Intent-aware Conversational AIs (NLP, NLU Pipelines), Synthetic Data Generation , Semantic Search Solutions

    MLOps & Deployment: CI/CD Pipelines, Kubernetes (Orchestration), Monitoring: Prometheus, Grafana, MLFlow, Kubeflow, KServe

    Cloud Computing:Google Cloud Platform (GCP), AWS

    Vector Databases: FAISS, Chromadb

    Relational Databases: PostgreSQL Database Management: Liquibase

    Backend Development: Flask,Django Spring Boot

    Data Science Tools: Scikit-learn, NumPy, Pandas

    spaCy for Natural Language Processing

Projects

Master Thesis: Exploring U-Net-Based Architectures and Explainable AI Techniques for Speech-Music Classification 12/2024 | TU Berlin

TU Berlin

• Developed and evaluated U-Net-based architectures (U-Net, Attention U-Net, R2-U-Net, and R2-Attention-U-Net) for efficient speech-music classification tasks.

• Utilized MFCCs, LFCC and other transformations as input features to improve audio classification performance.

• Integrated Explainable AI (XAI) techniques such as Grad-CAM to interpret model decisions and provide insights into feature importance.

• Optimized model performance using PyTorch and conducted rigorous experimentation on large-scale audio datasets.



Chat AI for Pipeline Configuration 02/2024 | Shell, TU Berlin

• Developed an AI chatbot similar to ChatGPT for automating ETL pipeline configuration.

• Used Retrieval-Augmented Generation (RAG) methodology to provide accurate and contextual responses.

• Integrated OpenAI API and Langchain for conversational capabilities.

• Utilized ChromaDB to create a highly efficient content store for real-time data retrieval.


Multi-Scale Speech-Music Classification 02/2023 | TU Berlin

• Designed and implemented a deep learning model to classify speech and music efficiently.

• Extracted audio features using MFCCs (Mel-Frequency Cepstral Coefficients).

• Compared and evaluated four U-Net model variations:

UNet, Attention UNet, R2-UNet, and R2-Attention-UNet.

• Improved classification accuracy through model optimization and hyperparameter tuning.


Twitter Data Analysis for Medical Trends 2022 | TU Berlin & Accenture

• Built a web scraper to collect tweets related to medical science topics over a 1-year period for trend identification and analysis.

• Applied sentiment analysis and TF-IDF (Term Frequency-Inverse Document Frequency) techniques to analyze and rank the most popular and relevant terms in medical discussions.

• Processed and cleaned large-scale datasets using Python, Pandas, and NLTK, ensuring data consistency and reliability.

• Visualized insights with Seaborn and Plotly, highlighting trends and providing actionable insights into emerging medical topics.

• Conducted data-driven evaluations to support decision-making processes in collaboration with TU Berlin and Accenture, combining academic research with industry relevance.


Data-to-Text Generation for Task-Oriented Dialog Systems 2022 | TU Berlin

• Fine-tuned advanced NLP models, including GPT-2, T5, and BERT, to generate human-like responses.

• Designed a system capable of generating accurate and context-aware textual outputs for task-oriented dialog applications.

• Evaluated model performance using metrics like BLEU, ROUGE, and perplexity scores.


Image Inpainting 2021 | TU Berlin

• Implemented an U-Net Convolutional Neural Network for reconstructing missing parts of images.

• The model efficiently filled white gaps within images by learning spatial structures and textures.

• Improved model performance through optimization of loss functions and training with high-resolution image datasets.

LANGUAGES

Arabic
Intermediate
B1
German
Proficient
C2
English
Proficient
C2

Timeline

Working Student - AI Research Assistant

German Research Center for Artificial Intelligence
06.2024 - Current

Working Student

Helping GmbH
03.2023 - 05.2024

Machine Learning Software Engineer

Fraunhofer FOKUS
05.2021 - 02.2023

internship in the direction of embedded systems

Denver Technologies
05.2019 - 04.2020

Master of Science - Computer science

Technical University Berlin

Bachelor - Computer engineering

Technical Univeristy Berlin
Zain Hazzouri