Menu
MongoDB Blog·February 17, 2026

Building a Mood-Based Movie Recommendation Engine with Vector Search

This article details the architecture and implementation of a mood-based movie recommendation engine, leveraging AI embeddings and vector search. It outlines the integration of Hugging Face models for embedding generation with MongoDB Atlas Vector Search for scalable storage and similarity queries, demonstrating a practical application of semantic search in a real-world system.

Read original on MongoDB Blog

Introduction to Mood-Based Recommendation Systems

Traditional recommendation systems often rely on explicit data like genres, actors, or user ratings. This article introduces a more nuanced approach: mood-based semantic search. Instead of exact keyword matches, this system understands the *intent* and *emotional state* of a user's query, matching it against movie plot descriptions to provide more relevant and empathetic recommendations. This capability is powered by AI embeddings, which capture the semantic meaning of text in high-dimensional vector spaces.

Core Architecture and Components

The recommendation engine's architecture is built around three primary components, showcasing a common pattern for AI-driven search systems:

  • Embedding Model (voyage-4-nano via Hugging Face Hub): Responsible for transforming raw text (user queries and movie plot summaries) into numerical vector embeddings. The article highlights the use of `Sentence Transformers` for simplifying this process, handling tokenization, pooling, normalization, and prompt formatting automatically. It also notes the use of asymmetric encoding (different prompts for queries and documents) to enhance retrieval quality.
  • Dataset (MongoDB/embedded_movies from Hugging Face Datasets): A collection of over 1500 movies with plot summaries, used as the knowledge base for recommendations.
  • Vector Database (MongoDB Atlas Vector Search): Stores the generated embeddings and efficiently performs similarity searches (cosine similarity in this case) to find movies whose plot embeddings are closest to a user's mood query embedding. It also supports filtering by metadata like genres and year.

Embedding Dimensions and Trade-offs

A key architectural decision discussed is the choice of embedding dimension. While `voyage-4-nano` supports up to 2048 dimensions, the tutorial deliberately truncates them to 1024 dimensions. This decision balances semantic quality, storage efficiency, and vector search latency, ensuring stable ranking behavior while optimizing for practical deployment. The ability to experiment with different dimensions (e.g., 512 for faster queries, 2048 for max quality) is a crucial design consideration for vector search systems.

Vector Search Indexing

The article demonstrates the creation of a vector search index in MongoDB Atlas. This index is critical for performance and accurately specifies the vector field (`plot_embedding`), its dimensions, and the similarity metric (cosine). It also includes filterable fields (`genres`, `year`), which enables hybrid search capabilities combining vector similarity with traditional metadata filtering.

💡

System Design Insight: Hybrid Search

Combining vector similarity search with traditional attribute filtering (e.g., genre, year) is a powerful hybrid search pattern. It allows systems to leverage the nuanced understanding of semantic search while still providing users with conventional filtering options, leading to more precise and satisfying results.

vector searchrecommendation enginemachine learningembeddingsMongoDB AtlasHugging Facesemantic searchAI/ML architecture

Comments

Loading comments...