Turn existing PostgreSQL databases into AI-searchable Pinecone vector knowledge bases without manually defining every table and column.
This template discovers your PostgreSQL schema, selects text-rich tables, builds readable embedding documents from rows, generates Google Gemini embeddings, upserts vectors into Pinecone, and tracks per-table sync state for incremental indexing.
It also includes a companion semantic search webhook workflow that embeds a user query, queries Pinecone, and returns source row metadata for RAG and AI assistant use cases.
information_schema.gemini-embedding-001.The main workflow runs through these stages:
The workflow inspects PostgreSQL metadata, groups columns by table, and chooses indexable tables based on:
The workflow excludes sensitive columns before documents are built. By default, it filters column names containing:
passwordpasswdpwdtokensecretapi_keyapikeyaccess_keyrefresh_tokenreset_tokenotppinsessioncookiehashsaltprivate_keyclient_secretYou can add more column names in the excludedColumns setting.
Important: review your database schema and compliance requirements before enabling indexing in production. This workflow provides safe defaults, but it cannot understand every business-specific sensitive field.
The workflow treats these as text-rich embedding candidates:
textvarcharcharacter varyingcharcharacterjsonjsonbWhen includeScalarContextColumns is enabled, the document also includes useful scalar context from:
Create or select a PostgreSQL credential in n8n with read-only access.
Recommended permissions:
GRANT USAGE ON SCHEMA public TO your_n8n_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO your_n8n_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO your_n8n_user;
For production, prefer a dedicated read-only user and restrict schemas with the workflow allowedSchemas setting.
1536 dimensions from Gemini so it can use a Pinecone index configured for 1536-dimensional vectors.your-index-abc123.svc.us-east-1-aws.pinecone.io.Api-KeypineconeIndexHost in the workflow configuration.The workflow uses Pinecone namespaces in this format:
postgres-{schema}.{table}
Example:
postgres-public.tickets
Create an n8n HTTP Header Auth credential:
x-goog-api-keyThe default embedding model is:
gemini-embedding-001
This workflow requests geminiOutputDimensionality = 1536. Pinecone index dimensions must match the vectors you upsert.
n8n has native AI/vector nodes, including embedding and Pinecone vector store nodes. This template uses HTTP Request nodes for Gemini and Pinecone because the workflow needs precise control over production indexing behavior:
postgres::{schema}::{table}::{primary_key_value}.The native nodes are excellent for standard document ingestion and retrieval flows. For this auto-discovery template, HTTP requests make the workflow more predictable and easier to customize for database indexing.
Open the Workflow Configuration node and set:
| Field | Description |
|---|---|
allowedSchemas |
Comma-separated schemas to include. Default: public. Leave empty to allow all non-system schemas. |
allowedTables |
Optional comma-separated table names or schema.table names to include. |
excludedTables |
Tables to skip. Supports table names or schema.table. |
excludedColumns |
Additional columns to exclude. |
maxRowsPerTable |
Maximum rows fetched per table per run. Default: 10 for safe first runs. |
maxDocumentsPerRun |
Hard cap on documents sent to Gemini in one execution. Default: 10 so first tests finish quickly. |
fullSyncTablesWithoutTimestamp |
If true, tables without timestamp columns receive limited full syncs. |
skipTablesWithoutPrimaryKey |
If true, tables without primary keys are skipped. |
includeScalarContextColumns |
Adds scalar values such as status, price, priority, dates, and UUIDs to documents. |
maxDocumentCharacters |
Truncates very large documents before embedding. Default: 2000. |
geminiEmbeddingModel |
Gemini embedding model. Default: gemini-embedding-001. |
geminiOutputDimensionality |
Output vector size requested from Gemini. Default: 1536. |
geminiBatchSize |
Number of documents sent to Gemini per embeddings request. Default: 1 for safe first runs. |
geminiSmokeTestMode |
Sends only one document to Gemini. Default: true. Disable after one-document embedding works. |
pineconeIndexHost |
Your Pinecone index host. |
pineconeNamespacePrefix |
Namespace prefix. Default: postgres-. |
pineconeBatchSize |
Number of vectors per Pinecone upsert request. |
Each PostgreSQL row becomes a readable document for embedding:
Database: postgres
Schema: public
Table: tickets
Id: 123
Updated At: 2026-05-17T12:00:00.000Z
Title: Payment failed for user
Description: Customer payment failed after webhook timeout
Status: open
Priority: high
The goal is to preserve business meaning instead of embedding raw JSON alone.
Each vector is upserted like this:
{
"id": "postgres::public::tickets::123",
"values": [0.0123, -0.0456],
"metadata": {
"source": "postgres",
"schema": "public",
"table": "tickets",
"primary_key": "id",
"primary_key_value": "123",
"updated_at": "2026-05-17T12:00:00.000Z",
"text_columns": "title,description",
"document_columns": "id,updated_at,title,description,status,priority"
}
}
Vector ID format:
postgres::{schema}::{table}::{primary_key_value}
workflow.json into n8n.Generate Gemini Embeddings.Upsert Vectors to Pinecone.Workflow Configuration.allowedSchemas, pineconeIndexHost, and any table or column filters.maxRowsPerTable for the first test.Return Sync Summary output.Then import semantic-search-workflow.json and configure its Gemini and Pinecone credentials.
Check:
allowedSchemas matches your schema.skipTablesWithoutPrimaryKey.fullSyncTablesWithoutTimestamp.Check:
x-goog-api-key.gemini-embedding-001.If you receive status 429, Gemini is rate limiting the workflow or the Google AI Studio project does not have enough available quota. n8n may show a generic batching/rate-limit message for both cases.
First run with smoke-test settings:
geminiBatchSize: 1geminiSmokeTestMode: truemaxRowsPerTable: 1maxDocumentsPerRun: 1maxDocumentCharacters: 1000The Generate Gemini Embeddings HTTP Request node should also have Options > Batching enabled with:
15000 or higherThis spaces out the HTTP requests after the documents have already been grouped into embedding payloads.
If the workflow still returns 429 with smoke-test mode enabled, check your Google AI Studio project:
gemini-embedding-001.Gemini embeddings are documented here: https://ai.google.dev/api/embeddings.
This usually means the workflow is processing more documents than expected while also spacing requests out to avoid rate limits. For example, 100 documents with a 10 second batch interval can take more than 16 minutes before retries.
For a quick smoke test, use:
maxRowsPerTable: 1maxDocumentsPerRun: 1geminiBatchSize: 1Generate Gemini Embeddings > Options > Batching > Batch Interval: 1000After the smoke test succeeds, increase the limits gradually.
Check:
Api-Key.pineconeIndexHost is the host only, without https://.1024.Check:
lastSync is not newer than your test row.Lower maxRowsPerTable, restrict allowedTables, or keep fullSyncTablesWithoutTimestamp disabled.
maxRowsPerTable such as 1 or 10.allowedSchemas and allowedTables to limit the first run.pineconeBatchSize between 25 and 100.Costs come primarily from:
To control costs:
maxDocumentCharacters.Common customizations:
Filter Indexable Tables.Return Sync Summary.updated_at, modified_at, or last_modified columns.geminiOutputDimensionality.