Hybrid Search: Combining Vector and Full-text Search

Modern search applications need to understand both semantic meaning and exact keyword matches. Hybrid search combines the best of both worlds: vector search for semantic understanding and full-text search for precise keyword matching.

What is Hybrid Search?

Hybrid search merges two powerful search techniques:

Vector Search: Understands semantic meaning and context using embeddings
Full-text Search: Provides precise keyword matching and traditional relevance scoring
Fusion Methods: Intelligently combines results from both approaches

This approach ensures you never miss relevant results, whether users search with natural language or specific keywords.

TiDB Hybrid Search Demo

Prerequisites

Before starting, ensure you have:

Python 3.10+
TiDB database instance (Create a free TiDB Serverless Cluster)
OpenAI API key (Get your API key)

Note: Full-text search is currently available in specific regions:

TiDB Cloud Starter: Frankfurt (eu-central-1), Singapore (ap-southeast-1)

How to run

Step 1: Clone the repository to local

git clone https://github.com/pingcap/pytidb.git
cd pytidb/examples/hybrid_search/

Step 2: Install the required packages and set up the environment

python -m venv .venv
source .venv/bin/activate
pip install -r reqs.txt

Step 3: Set up environment to connect to TiDB

Go to TiDB Cloud console and get the connection parameters, then set up the environment variable like this:

cat > .env <<EOF
TIDB_HOST={gateway-region}.prod.aws.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME={prefix}.root
TIDB_PASSWORD={password}
TIDB_DATABASE=hybrid_search_demo
OPENAI_API_KEY=your-openai-api-key
EOF

Step 4: Run the Streamlit app

streamlit run app.py

Step 5: Open your browser and visit http://localhost:8501

How It Works

💡 Source Code: You can find the complete source code for this example on GitHub. This working example includes all the necessary files to get you started with hybrid search in minutes.

1. Schema Definition

Define a table schema with both vector and full-text fields using pytidb.schema.TableModel:

# Define table schema
class Document(TableModel):
    __tablename__ = "documents"
    __table_args__ = {"extend_existing": True}

    id: int = Field(primary_key=True)
    text: str = FullTextField()
    text_vec: list[float] = embed_fn.VectorField(
        source_field="text",
    )
    meta: dict = Field(sa_type=JSON)

2. Embedding Function

Configure automatic embedding generation with OpenAI embeddings - vectors are automatically generated when inserting text.

3. Hybrid Search with Fusion

Use the search API to combine vector and full-text search with fusion methods:

# Perform hybrid search with different search types
def hybrid_search(query_text, search_type="hybrid", limit=5):
    query = (
        table.search(query_text, search_type=search_type)
        .distance_threshold(0.8)
        .fusion(method="rrf")  # Reciprocal Rank Fusion
        .limit(limit)
    )

    return query.to_pandas()

# Example: Vector search only
vector_results = hybrid_search("database performance", search_type="vector")

# Example: Full-text search only
fulltext_results = hybrid_search("TiDB distributed", search_type="fulltext")

# Example: Hybrid search (combines both)
hybrid_results = hybrid_search("TiDB performance optimization", search_type="hybrid")

4. Results

Get ranked results with combined scores from both vector similarity and text matching, displayed in an interactive Streamlit interface.

Related Resources

Source Code: View on GitHub
TiDB Vector Documentation: Vector Data Types
Full-text Search: Full-text Search in TiDB
Hands-on Lab: Build Hybrid Search Apps Using Jupyter Notebook (60 min)

Tutorial: Hybrid Search - Combining Vector and Full-text Search