Clickhouse Setup

ClickHouse is the analytical database Statalog uses to store and query pageview data. Its columnar storage and vectorised query execution make it extremely fast for the aggregation queries that power your analytics dashboard.

1. Install ClickHouse (Ubuntu / Debian)

Add the official ClickHouse apt repository and install:

sudo apt-get install -y apt-transport-https ca-certificates dirmngr gnupg

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 \
  --recv 8919F6BD2B48D754

echo "deb https://packages.clickhouse.com/deb stable main" \
  | sudo tee /etc/apt/sources.list.d/clickhouse.list

sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client

2. Start and enable the service

sudo systemctl enable clickhouse-server
sudo systemctl start clickhouse-server
sudo systemctl status clickhouse-server

ClickHouse listens on two ports by default: 9000 (native TCP protocol, used by clickhouse-client) and 8123 (HTTP interface, used by Statalog).

3. Create a database and user

Connect to ClickHouse as the default user:

clickhouse-client --user default --password ''

Create a dedicated database and user:

CREATE DATABASE statalog;

CREATE USER statalog IDENTIFIED BY 'your-secure-password';

GRANT ALL ON statalog.* TO statalog;

Exit clickhouse-client with exit or Ctrl+D.

4. Configure firewall / binding

ClickHouse's HTTP interface (port 8123) should not be publicly accessible. By default it binds to all interfaces. To restrict it to localhost, edit /etc/clickhouse-server/config.xml and find the <listen_host> section:

<listen_host>127.0.0.1</listen_host>

Restart ClickHouse after any config change:

sudo systemctl restart clickhouse-server

5. Configure Statalog's .env

Add the following to your .env:

STATALOG_CLICKHOUSE_HOST=127.0.0.1
STATALOG_CLICKHOUSE_PORT=8123
STATALOG_CLICKHOUSE_DB=statalog
STATALOG_CLICKHOUSE_USER=statalog
STATALOG_CLICKHOUSE_PASSWORD=your-secure-password

6. Run ClickHouse migrations

With ClickHouse running and the .env configured, run:

php artisan clickhouse:migrate

This creates all required tables. The command is idempotent — running it multiple times is safe.

Tables created

Table Contents
pageviews One row per pageview: URL, referrer, country, device type, browser, OS, session hash, timestamp
custom_events One row per custom event fired via statalog('event', ...): event name, properties JSON, URL, timestamp
js_errors JavaScript errors captured by the error tracking module: message, stack, URL, browser, timestamp
heatmap_clicks Click coordinates recorded for heatmap generation: x%, y%, element selector, URL, viewport dimensions
heatmap_scrolls Scroll depth recordings: max_scroll_pct, URL, session hash, timestamp

All tables use the MergeTree engine with a primary key on (site_id, timestamp) for efficient time-range queries.

Production configuration

For a production server handling significant traffic, add these settings to /etc/clickhouse-server/users.xml under the statalog user profile, or in a separate profile file:

<profiles>
  <statalog_profile>
    <max_memory_usage>4000000000</max_memory_usage>        <!-- 4 GB per query -->
    <max_threads>4</max_threads>
    <max_execution_time>30</max_execution_time>            <!-- 30-second query timeout -->
    <readonly>0</readonly>
  </statalog_profile>
</profiles>

Also consider setting max_connections in config.xml to match your expected concurrency. For a single-site installation, the defaults are generally sufficient.

ClickHouse compresses data on disk automatically using LZ4 by default. You can expect approximately 10x compression compared to raw CSV — a site with 10 million pageviews will typically use 200–400 MB of disk space, not gigabytes.

Supervisor queue worker for analytics ingestion

Pageviews are written to ClickHouse asynchronously via the queue. Make sure your Supervisor worker is running:

supervisorctl status statalog-worker:*

If workers are not running, pageviews queued in Redis will not be flushed to ClickHouse and your dashboard will show stale data. See the Supervisor configuration in Installation step 10.

Verifying data is flowing

After installing the tracking snippet on your website and generating a few pageviews, query ClickHouse directly to confirm rows are being inserted:

SELECT
    count()               AS total_pageviews,
    uniq(session_hash)    AS unique_sessions,
    min(timestamp)        AS first_seen,
    max(timestamp)        AS last_seen
FROM statalog.pageviews
WHERE site_id = 'ST-A1B2C3';

You can run this query via clickhouse-client:

clickhouse-client \
  --user statalog \
  --password your-secure-password \
  --database statalog \
  --query "SELECT count() FROM pageviews WHERE site_id = 'ST-A1B2C3'"

Troubleshooting

Cannot connect to ClickHouse (connection refused on port 8123) Check that ClickHouse is running (systemctl status clickhouse-server) and listening on the correct interface. If you bound it to 127.0.0.1, connections from other hosts will be refused — ensure Statalog's PHP process and ClickHouse are on the same machine or adjust the bind address.

Queries are slow Confirm that the (site_id, timestamp) primary key is in place by running SHOW CREATE TABLE pageviews. If you are running very long date ranges (multiple years) on a large dataset, ClickHouse may need more memory — increase max_memory_usage in the user profile. For multi-server setups, consider ClickHouse replication and sharding.

High disk usage ClickHouse's compression is excellent, but very high-volume sites generate significant data over time. You can set a TTL to automatically delete old pageview data:

ALTER TABLE pageviews
MODIFY TTL toDate(timestamp) + INTERVAL 2 YEAR;

This retains two years of data and automatically purges older rows during ClickHouse's background merge process.

Migration command fails with "table already exists" The clickhouse:migrate command is designed to be idempotent and uses CREATE TABLE IF NOT EXISTS. If you see errors, check the ClickHouse server log at /var/log/clickhouse-server/clickhouse-server.err.log for details.