mdcms/sample-sites/neuraldb-docs/pages/ops-migration.md

---
title: Migration
sort: 130
section-id: operations
keywords: migration, import, Postgres, Pinecone, Weaviate, data migration, ETL
description: Migrating data to NeuralDB from PostgreSQL, Pinecone, Weaviate, and other sources
language: en
---

# Migration

This guide covers migrating data into NeuralDB from common sources: PostgreSQL (with or without pgvector), Pinecone, and Weaviate.

## From PostgreSQL (without vectors)

If you are migrating a standard PostgreSQL database to NeuralDB, the simplest path is a logical dump and restore:

```bash
# 1. Dump from source Postgres
pg_dump \
  -h source-host \
  -U source-user \
  -d source-database \
  --format=custom \
  --compress=9 \
  > source-backup.dump

# 2. Create the target database in NeuralDB
psql -h neuraldb-host -U neuraldb -c "CREATE DATABASE myapp;"

# 3. Restore into NeuralDB
pg_restore \
  -h neuraldb-host \
  -U neuraldb \
  -d myapp \
  --jobs=8 \
  --no-owner \
  source-backup.dump
```

### Adding Vector Columns Post-Migration

After restoring the schema and data, add vector columns and generate embeddings:

```sql
-- Add the vector column
ALTER TABLE documents ADD COLUMN embedding VECTOR(1536);

-- Create the index (do this before backfilling on large tables)
CREATE INDEX CONCURRENTLY documents_embedding_idx
ON documents USING hnsw (embedding vector_cosine_ops);
```

Then backfill embeddings in batches:

```python
import openai
from neuraldb import NeuralDB

client = NeuralDB(connection_string)
openai_client = openai.OpenAI()

BATCH_SIZE = 100

while True:
    rows = client.query("""
        SELECT id, content FROM documents
        WHERE embedding IS NULL
        LIMIT %s
    """, [BATCH_SIZE])

    if not rows:
        break

    texts = [row['content'] for row in rows]
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )

    updates = [
        (response.data[i].embedding, rows[i]['id'])
        for i in range(len(rows))
    ]

    client.executemany(
        "UPDATE documents SET embedding = %s WHERE id = %s",
        updates
    )
    print(f"Backfilled {len(rows)} rows")
```

## From PostgreSQL + pgvector

pgvector uses the same `VECTOR` type as NeuralDB. Migration is a direct dump and restore with minimal adjustments.

```bash
# Dump — exclude pgvector extension (NeuralDB has native vector support)
pg_dump \
  -h source-host -U source-user -d source-db \
  --format=custom \
  --exclude-extension=vector \
  > pgvector-backup.dump

pg_restore \
  -h neuraldb-host -U neuraldb -d myapp \
  --jobs=8 \
  pgvector-backup.dump
```

### Re-create HNSW Indexes

pgvector HNSW indexes are not transferred. Recreate them in NeuralDB:

```sql
-- Drop pgvector-created indexes
DROP INDEX IF EXISTS documents_embedding_idx;

-- Create NeuralDB HNSW index (same syntax, better performance)
CREATE INDEX CONCURRENTLY documents_embedding_idx
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
```

## From Pinecone

Pinecone stores vectors with metadata. Export using the Pinecone SDK and ingest into NeuralDB:

```python
import pinecone
from neuraldb import NeuralDB, BulkIngestor

# Source: Pinecone
pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index = pc.Index("my-index")

# Target: NeuralDB
client = NeuralDB(os.environ["NEURALDB_URL"])

# Create target table
client.execute("""
    CREATE TABLE IF NOT EXISTS pinecone_migration (
        id TEXT PRIMARY KEY,
        embedding VECTOR(1536),
        metadata JSONB,
        migrated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
    )
""")

client.execute("""
    CREATE INDEX IF NOT EXISTS pinecone_migration_emb_idx
    ON pinecone_migration USING hnsw (embedding vector_cosine_ops)
""")

# Paginate through all Pinecone vectors
ingestor = BulkIngestor(client, table="pinecone_migration", batch_size=500)

with ingestor as ing:
    for ids_batch in paginate_pinecone_ids(index, batch_size=1000):
        fetch_response = index.fetch(ids=ids_batch)

        for vector_id, vector_data in fetch_response.vectors.items():
            ing.add({
                "id": vector_id,
                "embedding": vector_data.values,
                "metadata": vector_data.metadata or {}
            })

print(f"Migrated {ingestor.total_inserted} vectors")
```

### Mapping Pinecone Metadata to Columns

Flatten commonly-queried metadata fields into dedicated columns for better query performance:

```python
# Instead of: metadata JSONB
# Create typed columns for common filter fields:
client.execute("""
    ALTER TABLE pinecone_migration
    ADD COLUMN IF NOT EXISTS category TEXT GENERATED ALWAYS AS (metadata->>'category') STORED,
    ADD COLUMN IF NOT EXISTS created_date DATE GENERATED ALWAYS AS ((metadata->>'date')::DATE) STORED;

    CREATE INDEX ON pinecone_migration (category);
    CREATE INDEX ON pinecone_migration (created_date);
""")
```

## From Weaviate

Export Weaviate data using the Weaviate client SDK:

```python
import weaviate
from neuraldb import NeuralDB, BulkIngestor

weaviate_client = weaviate.connect_to_local()
neuraldb_client = NeuralDB(os.environ["NEURALDB_URL"])

collection = weaviate_client.collections.get("Document")

# Create target schema
neuraldb_client.execute("""
    CREATE TABLE weaviate_documents (
        id UUID PRIMARY KEY,
        content TEXT,
        category TEXT,
        source TEXT,
        embedding VECTOR(1536)
    );
    CREATE INDEX ON weaviate_documents USING hnsw (embedding vector_cosine_ops);
""")

ingestor = BulkIngestor(neuraldb_client, table="weaviate_documents", batch_size=500)

with ingestor as ing:
    for item in collection.iterator(include_vector=True):
        ing.add({
            "id": str(item.uuid),
            "content": item.properties.get("content", ""),
            "category": item.properties.get("category"),
            "source": item.properties.get("source"),
            "embedding": item.vector.get("default"),
        })

weaviate_client.close()
print(f"Migrated {ingestor.total_inserted} objects")
```

## Verifying Migration

After any migration, verify data integrity:

```sql
-- Row count comparison
SELECT COUNT(*) FROM documents;

-- Sample vector similarity (should match source)
SELECT id, content, 1 - (embedding <=> (SELECT embedding FROM documents LIMIT 1)) AS sim
FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents LIMIT 1)
LIMIT 5;

-- Check for null embeddings
SELECT COUNT(*) FROM documents WHERE embedding IS NULL;

-- Index health
SELECT index_name, hnsw_in_memory, estimated_recall
FROM neuraldb_stat_vector_indexes;
```