Merci d'avoir envoyé votre demande ! Un membre de notre équipe vous contactera sous peu.
Merci d'avoir envoyé votre réservation ! Un membre de notre équipe vous contactera sous peu.
Plan du cours
Introduction to Multimodal AI and Ollama
- Overview of multimodal learning
- Key challenges in vision-language integration
- Capabilities and architecture of Ollama
Setting Up the Ollama Environment
- Installing and configuring Ollama
- Working with local model deployment
- Integrating Ollama with Python and Jupyter
Working with Multimodal Inputs
- Text and image integration
- Incorporating audio and structured data
- Designing preprocessing pipelines
Document Understanding Applications
- Extracting structured information from PDFs and images
- Combining OCR with language models
- Building intelligent document analysis workflows
Visual Question Answering (VQA)
- Setting up VQA datasets and benchmarks
- Training and evaluating multimodal models
- Building interactive VQA applications
Designing Multimodal Agents
- Principles of agent design with multimodal reasoning
- Combining perception, language, and action
- Deploying agents for real-world use cases
Advanced Integration and Optimization
- Fine-tuning multimodal models with Ollama
- Optimizing inference performance
- Scalability and deployment considerations
Summary and Next Steps
Pré requis
- Strong understanding of machine learning concepts
- Experience with deep learning frameworks such as PyTorch or TensorFlow
- Familiarity with natural language processing and computer vision
Audience
- Machine learning engineers
- AI researchers
- Product developers integrating vision and text workflows
21 Heures