Building a Multi-Modal AI System for Location-Based Content Generation

Hero Image – Recipe Transformation Pipeline

In today's rapidly evolving AI landscape, creating systems that can leverage multiple large language models (LLMs) while integrating real-time data has become a powerful approach for developing context-aware applications. In this post, I'll walk through a system I built that orchestrates various AI models to generate location-specific content with a focus on weather forecasts and local information.

At its core, this project implements what I call "model orchestration" - dynamically routing content generation requests to different AI providers based on content type, availability, and specific use case requirements. The system leverages three major AI platforms:

OpenAI's GPT models
Anthropic's Claude
Google's Gemini

Rather than being locked into a single AI provider, this approach provides both redundancy and the ability to leverage the unique strengths of each model.

// Model orchestration implementation
const isOpenAi = type === 'news' ? 7 : Math.ceil(Math.random() * 3);

// Based on the selection, route to appropriate AI provider
if (isOpenAi === Number(1)) {
  // Use Google's Gemini
  const message = await genAI.getGenerativeModel({
    model: 'gemini-2.0-flash-lite',
    generationConfig: { temperature: 0.75 },
  });
  // Process and return response
} else if (isOpenAi === Number(2)) {
  // Use OpenAI's models
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: prompt }],
    temperature: comedic ? 1 : 0.85,
  });
  // Process and return response
} else {
  // Use Anthropic's Claude
  const msg = await anthropic.messages.create({
    model: 'claude-3-haiku-20240307',
    max_tokens: 100,
    temperature: comedic ? 1 : 0.85,
    messages: [{ role: 'user', content: [{ type: 'text', text: prompt }] }],
  });
  // Process and return response
}

Retrieval-Augmented Generation (RAG) Implementation

A key innovation in this system is how it implements RAG patterns to enhance the quality and relevance of AI-generated content. Unlike traditional RAG systems that might use vector databases, this implementation:

Retrieves structured weather data and location information
Processes and formats this data for AI consumption
Augments prompts with this contextual information
Generates natural language responses that incorporate the retrieved data

This creates a much more personalized and accurate result than using the AI models alone.

Advanced Prompt Engineering

The system implements sophisticated prompt engineering techniques to ensure consistent, high-quality outputs across different AI providers. Each content type has a dedicated prompt template that includes:

Specific formatting requirements
Length constraints
Tone guidance (comedic vs. straightforward)
Context-specific instructions

const getPromptTodayWeather = (): string => {
  let output = `Give me a ${
    textLength ? textLength : 80
  } word weather forecast for the next ${hoursLeftInDay} hours.`;

  if (hoursLeftInDay < 5) {
    output = `Give me an hourly weather forecast for the remaining ${hoursLeftInDay} hours today and also a weather forecast for tomorrow both days incorporating known weather averages for ${location}.`;
  }

  return `
    ${
      comedic
        ? `In your most punny, sarcastic style and tone, report an exaggerated and humorous weather forecast for ${location}, while making as many weather-related puns as possible.`
        : `In your most cheerful, upbeat, optimistic, punny style and tone, report a weather forecast for ${location}, while making as many weather-related puns as possible.`
    }. Use the given Weather data, Weather definitions, and RULES below to help describe the current weather data.
  
    It is currently ${clientDateTime}. The hourly weather data is in an array of hourly JSON objects: Today: ${todayHourData} ${
    tomorrowHourData !== '[]' ? 'Tomorrow' : ''
  }: ${
    tomorrowHourData !== '[]' ? tomorrowHourData : ''
  }. The weather data can be defined using the weather definitions in JSON Format here: ${JSON.stringify(
    weatherKeys
  )}.

    RULES:
    ${output},
    ${rules}
  `;
};

Real-Time Data Enrichment

The system doesn't rely solely on AI knowledge; it actively enriches requests with:

Weather data with detailed parameters like temperature, humidity, precipitation, and more
Location coordinates and display information
Time-aware context (time of day, hours left in day)
RSS feeds for local news content

This real-time data enrichment allows the AI models to generate responses that are both current and contextually relevant.

Context-Aware Content Generation

One of the most powerful aspects of this system is its context awareness. The content generation adapts based on:

Time of day and how many hours remain
Location-specific information
Weather conditions
User preferences for content style
Required content length

This creates a highly personalized experience that can adapt to changing conditions.

Cross-Model Response Normalization

An interesting challenge when working with multiple AI providers is handling the different response formats and styles. The system implements response normalization to ensure consistent output regardless of which model generated the content:

// Response normalization with source attribution
const response = `${msg.content[0].text} -anthropic`;
// or
const response = `${completion.choices[0].message?.content} -openai`;
// or
const response = `${response.text()} -gemini`;

Practical Applications

This multi-modal AI system has several practical applications:

Personalized weather forecasts with natural language descriptions
Local area guides with relevant cultural, historical, or recreational information
Local news aggregation filtered by relevance and recency
Travel recommendations based on weather conditions and local attractions

Conclusion

Building systems that combine multiple AI models with real-time data enrichment represents the next evolution in practical AI applications. By implementing techniques like retrieval-augmented generation, prompt engineering, and model orchestration, we can create experiences that are more personalized, accurate, and useful than what any single AI model could provide on its own.

This approach demonstrates how developers can move beyond simple API calls to create sophisticated AI systems that leverage the best capabilities of multiple providers while compensating for their individual limitations.

The Architecture: Multi-Modal LLM Orchestration