OrochiDB Documentation

Learn how to use OrochiDB's powerful PostgreSQL HTAP features including automatic sharding, time-series optimization, columnar storage, and more.

Quick Start

Get up and running in under 5 minutes

Sharding Guide

Learn about automatic horizontal sharding

Time-Series

Optimize your time-series workloads

Columnar Storage

Accelerate analytics with columnar format

Quick Start Guide

Get started with OrochiDB on HowlerOps in just a few steps.

1. Create an Account

2. Create Your First Cluster

From the dashboard, click "New Cluster" and configure:

Choose a name for your cluster
Select your cloud provider (AWS, GCP, or Azure)
Pick a region close to your users
Choose your cluster tier (Free, Standard, or Professional)

3. Connect to Your Database

Use your connection string to connect with any PostgreSQL client:

# Using psql
psql "postgresql://username:[email protected]:5432/dbname"

# Using Node.js
const { Pool } = require('pg');
const pool = new Pool({
  connectionString: 'postgresql://username:[email protected]:5432/dbname'
});

# Using Python
import psycopg2
conn = psycopg2.connect("postgresql://username:[email protected]:5432/dbname")

Automatic Sharding

OrochiDB automatically distributes your data across multiple nodes using hash-based sharding. This allows you to scale horizontally without changing your application code.

-- Create a distributed table with automatic sharding
SELECT create_distributed_table('orders', 'customer_id');

-- Or specify the number of shards
SELECT create_distributed_table(
  'orders',
  'customer_id',
  shard_count => 32
);

-- Create a reference table (replicated to all nodes)
SELECT create_reference_table('countries');

Best Practices

Choose your distribution column wisely: Pick a column with high cardinality that's frequently used in WHERE clauses and JOINs.

Use reference tables for lookup data: Small tables that are frequently joined should be reference tables.

Co-locate related tables: Tables that are frequently joined should use the same distribution column.

Time-Series Optimization

OrochiDB automatically partitions time-series data into chunks for optimal query performance and data lifecycle management.

-- Create a regular table
CREATE TABLE sensor_data (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INTEGER,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);

-- Convert to hypertable with 1-day chunks
SELECT create_hypertable(
  'sensor_data',
  'time',
  chunk_time_interval => INTERVAL '1 day'
);

-- Create distributed hypertable
SELECT create_distributed_hypertable(
  'sensor_data',
  'time',
  'sensor_id',
  chunk_time_interval => INTERVAL '1 day'
);

Columnar Storage

Store analytical data in column-oriented format with advanced compression for 10x+ faster analytics queries and up to 90% storage savings.

-- Convert existing table to columnar storage
ALTER TABLE analytics_events
SET ACCESS METHOD columnar;

-- Create new columnar table
CREATE TABLE events (
  id BIGINT,
  event_type TEXT,
  payload JSONB,
  created_at TIMESTAMPTZ
) USING columnar;

-- Set compression options
ALTER TABLE events SET (
  columnar.compression = zstd,
  columnar.stripe_row_limit = 150000,
  columnar.chunk_group_row_limit = 10000
);

Tiered Storage

Automatically move data between storage tiers based on age and access patterns. Reduce costs while maintaining query access to all your data.

-- Configure tiered storage policy
SELECT set_tiered_storage_policy(
  'events',
  hot_to_warm_after => INTERVAL '7 days',
  warm_to_cold_after => INTERVAL '30 days',
  cold_to_frozen_after => INTERVAL '90 days'
);

-- Hot tier: NVMe SSD (fastest, most expensive)
-- Warm tier: SSD (balanced performance/cost)
-- Cold tier: HDD (cost-effective storage)
-- Frozen tier: S3/Object storage (lowest cost)

-- View tier distribution
SELECT * FROM orochi_tier_stats('events');

-- Query across all tiers transparently
SELECT * FROM events
WHERE created_at > NOW() - INTERVAL '1 year';

Vector Search & AI

Build AI-powered applications with native vector storage and SIMD-optimized similarity search operations.

-- Create table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding vector(1536)  -- OpenAI ada-002 dimensions
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Insert embeddings from your AI model
INSERT INTO documents (title, content, embedding)
VALUES ('My Document', 'Content here...', '[0.1, 0.2, ...]');

-- Find similar documents
SELECT title, content,
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

-- Hybrid search with filters
SELECT * FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> query_embedding
LIMIT 10;

Change Data Capture (CDC)

Stream database changes to external systems in real-time. Perfect for event-driven architectures, data lakes, and search indexes.

-- Create CDC subscription
SELECT create_cdc_subscription(
  'orders_to_kafka',
  source_tables => ARRAY['orders', 'order_items'],
  destination => 'kafka://broker:9092/orders-topic',
  format => 'avro'
);

-- Configure CDC options
ALTER SUBSCRIPTION orders_to_kafka SET (
  include_transaction_ids => true,
  include_timestamps => true,
  batch_size => 1000
);

-- Monitor CDC lag
SELECT * FROM cdc_subscription_stats('orders_to_kafka');

-- Pause/resume subscription
SELECT pause_cdc_subscription('orders_to_kafka');
SELECT resume_cdc_subscription('orders_to_kafka');

Data Pipelines

Ingest data from Kafka, S3, and other sources directly into your database with built-in transformation capabilities.

-- Create Kafka ingestion pipeline
SELECT create_pipeline(
  'kafka_events',
  source => 'kafka://broker:9092/events',
  target_table => 'events',
  format => 'json',
  transform => $$
    SELECT
      data->>'id' AS id,
      data->>'type' AS event_type,
      (data->>'timestamp')::timestamptz AS created_at
    FROM source_data
  $$
);

-- Create S3 ingestion pipeline
SELECT create_pipeline(
  's3_logs',
  source => 's3://my-bucket/logs/*.parquet',
  target_table => 'access_logs',
  format => 'parquet',
  schedule => 'every 1 hour'
);

-- Monitor pipeline status
SELECT * FROM pipeline_stats();

Security & Access Control

Enterprise-grade security with encryption, role-based access control, and audit logging.

Encryption

TLS 1.3 for data in transit
AES-256 encryption at rest
Customer-managed encryption keys (Enterprise)

Authentication

Database username/password
JWT token authentication
SAML/SSO integration (Enterprise)
IP allowlisting

Compliance

SOC 2 Type II certified
GDPR compliant
HIPAA compliant (Enterprise)
Full audit logging

PgDog Connection Pooling

OrochiDB includes PgDog, an intelligent connection pooler with automatic query routing, read/write splitting, and scale-to-zero support for cost-effective serverless deployments.

# PgDog provides:
# - Connection pooling with transaction/session modes
# - Automatic read/write splitting
# - Query routing to appropriate shards
# - Scale-to-zero with wake-on-connect
# - Multi-tenant SNI-based routing

# Connection string format
postgresql://user:[email protected]:5432/mydb

# Connection pooling modes:
# - Transaction: Connection returned after each transaction (default)
# - Session: Connection held for entire session
# - Statement: Connection returned after each statement

PgDog Features

Connection Management

Transaction and session pooling modes
Connection queuing and limits
Automatic retry on connection failure
Health checks and failover

Query Intelligence

Read/write splitting
Shard-aware query routing
Query caching (optional)
2PC transaction support

Performance Benchmarks

OrochiDB includes a comprehensive benchmark suite for measuring performance across different workload types. View real-time results in the dashboard or run your own benchmarks.

Time-Series Performance

Insert Throughput2,000+ ops/s

Range Query (1hr)0.36ms

Range Query (1wk)0.72ms

Anomaly Detection2,914 ops/s

TPC-H Analytics

Q1 (Pricing Summary)45ms

Q6 (Forecast Revenue)12ms

Columnar Scan25ms

Distributed Query55ms

Run Your Own Benchmarks

# Clone the benchmark suite
git clone https://github.com/jbeck018/orochi-db
cd orochi-db/extensions/postgres/benchmark

# Run all benchmarks against your cluster
export PGHOST=your-cluster.orochi.cloud
export PGUSER=your-user
export PGPASSWORD=your-password
export PGDATABASE=your-db

# Run time-series benchmarks
cd timeseries && ./timeseries_bench

# Run TPC-H benchmarks
cd ../tpch && ./tpch_bench

# Run columnar storage benchmarks
cd ../columnar && ./columnar_bench

# Generate dashboard report
make dashboard

Next Steps

Create Your First Cluster

View on GitHub

Explore the open-source PostgreSQL extension