Geo-Narrator: Your Personal AI Tour Guide

Geo-Narrator is an immersive, AI-powered web application that transforms your device into a personal tour guide. Discover the stories behind any location, listen to AI-generated audio guides in multiple languages, and have a real-time voice conversation with an AI expert to ask questions and learn more.

Key Features

The application is split into two primary experiences: Discover and Converse.

🗺️ Discover View

Location-Based Discovery: Find unique, interesting, and "hidden gem" locations by searching for a city or landmark, or by using your device's current GPS location.
AI-Curated Suggestions: Leverages the Gemini API with Google Search and Maps grounding to provide relevant, up-to-date, and contextually aware place recommendations.
Rich Place Details: Each suggested location comes with a name, a catchy one-liner, a detailed description, and a category.
AI-Powered Audio Guides: Generate and listen to an audio narration of any place's description.
- Multi-Language Support: Choose between a natural-sounding Indian English accent or a Hindi translation.
- Powered by Gemini TTS: Utilizes the gemini-2.5-flash-preview-tts model for high-quality text-to-speech generation.
Verifiable Sources: All information is backed by grounding sources from Google Search and Maps, which are displayed with links for further exploration.

🎙️ Converse View

Live Voice Conversation: Engage in a seamless, real-time voice chat with an AI tour guide. Ask follow-up questions, request more details, or explore related topics conversationally.
Low-Latency Interaction: Built with the Gemini Live API (gemini-2.5-flash-native-audio-preview-09-2025) for a fluid and natural conversational experience.
Real-Time Transcription: The entire conversation between you and the AI is transcribed and displayed live in a familiar chat interface.
Intelligent Interruption: The AI can be interrupted, just like in a natural conversation, providing a more human-like interaction.
Context-Aware AI: The AI is given a system instruction to act as a helpful and curious travel guide, ensuring its responses are engaging and relevant.

How It Works

Technology Stack

Frontend: React, TypeScript, Tailwind CSS
AI & Core Logic:
- Google Gemini API (@google/genai): The central library for all AI features.
- gemini-2.5-flash: Used for generating location suggestions, backed by googleSearch and googleMaps grounding tools for factual, up-to-date information.
- gemini-2.5-flash-preview-tts: Powers the text-to-speech functionality for audio guides.
- gemini-2.5-flash-native-audio-preview-09-2025: Enables the low-latency, real-time audio conversation in the Converse view.
Browser APIs:
- Geolocation API: To fetch the user's current location.
- Web Audio API & getUserMedia: For capturing microphone input and playing back generated audio.

Architecture Overview

The application is a single-page application (SPA) built with React.

App.tsx: The main component that handles navigation between the DiscoverView and ConversationView.
services/geminiService.ts: This file abstracts all interactions with the Gemini API.
- getNearbyPlaces(): Constructs a detailed prompt for the Gemini model, requests JSON output, and uses grounding tools to find locations. It parses the response and its sources.
- getTextToSpeech(): Sends text to the TTS model and returns the Base64-encoded audio data.
- startLiveConversation(): Initializes a new Gemini Live session, setting up callbacks to handle audio streaming, transcription, and connection lifecycle events.
components/DiscoverView.tsx: Manages the state for searching, displaying places, and handling audio guide playback.
components/ConversationView.tsx: Manages the entire lifecycle of the live conversation, from requesting microphone permissions to streaming audio to and from the Gemini API and rendering transcripts.
utils/audioUtils.ts: Contains helper functions for encoding and decoding audio data to and from the formats required by the Gemini Live API and the Web Audio API.

Setup and Running Locally

Prerequisites

A modern web browser (e.g., Chrome, Firefox, Safari).
A valid Google Gemini API key.

Running the App

API Key: The application expects the Gemini API key to be available in the environment as process.env.API_KEY. The hosting environment or build tool must provide this variable.
Serve Files: Host the contents of the project directory (especially index.html) using a simple static file server.
Permissions: When using the app, you will be prompted by your browser to grant permissions for Geolocation (for "Use My Current Location") and Microphone (for the "Converse with AI" feature). These must be accepted for the features to work.