# Elasticsearch Infrastructure This directory contains the Elasticsearch setup using ECK (Elastic Cloud on Kubernetes) operator for full-text search on the Kubernetes cluster. ## Architecture - **ECK Operator**: Production-grade Elasticsearch deployment on Kubernetes - **Single-node cluster**: Optimized for your 2-node cluster (can be scaled later) - **Security enabled**: X-Pack security with custom role and user for Mastodon - **Longhorn storage**: Distributed storage with 2-replica redundancy - **Self-signed certificates**: Internal cluster communication with TLS ## Components ### **Core Components** - `namespace.yaml`: Elasticsearch system namespace - `repository.yaml`: Elastic Helm repository - `operator.yaml`: ECK operator deployment - Uses existing `longhorn-retain` storage class with backup labels on PVCs - `cluster.yaml`: Elasticsearch and Kibana cluster configuration ### **Security Components** - `secret.yaml`: SOPS-encrypted credentials for Elasticsearch admin and Mastodon user - `security-setup.yaml`: Job to create Mastodon role and user after cluster deployment ### **Monitoring Components** - `monitoring.yaml`: ServiceMonitor for OpenObserve integration + optional Kibana ingress - Built-in metrics: Elasticsearch Prometheus exporter ## Services Created ECK automatically creates these services: - `elasticsearch-es-http`: HTTPS API access (port 9200) - `elasticsearch-es-transport`: Internal cluster transport (port 9300) - `kibana-kb-http`: Kibana web UI (port 5601) - optional management interface ## Connection Information ### For Applications (Mastodon) Applications should connect using these connection parameters: **Elasticsearch Connection:** ```yaml host: elasticsearch-es-http.elasticsearch-system.svc.cluster.local port: 9200 scheme: https # ECK uses HTTPS with self-signed certificates user: mastodon password: ``` ### Getting Credentials The Elasticsearch credentials are stored in SOPS-encrypted secrets: ```bash # Get the admin password (auto-generated by ECK) kubectl get secret elasticsearch-es-elastic-user -n elasticsearch-system -o jsonpath="{.data.elastic}" | base64 -d # Get the Mastodon user password (set during security setup) kubectl get secret elasticsearch-credentials -n elasticsearch-system -o jsonpath="{.data.password}" | base64 -d ``` ## Deployment Steps ### 1. Encrypt Secrets Before deploying, encrypt the secrets with SOPS: ```bash # Edit and encrypt the Elasticsearch credentials sops manifests/infrastructure/elasticsearch/secret.yaml # Edit and encrypt the Mastodon Elasticsearch credentials sops manifests/applications/mastodon/elasticsearch-secret.yaml ``` ### 2. Deploy Infrastructure The infrastructure will be deployed automatically by Flux when you commit: ```bash git add manifests/infrastructure/elasticsearch/ git add manifests/cluster/flux-system/elasticsearch.yaml git add manifests/cluster/flux-system/kustomization.yaml git commit -m "Add Elasticsearch infrastructure for Mastodon search" git push ``` ### 3. Wait for Deployment ```bash # Monitor ECK operator deployment kubectl get pods -n elasticsearch-system -w # Monitor Elasticsearch cluster startup kubectl get elasticsearch -n elasticsearch-system -w # Check cluster health kubectl get elasticsearch elasticsearch -n elasticsearch-system -o yaml ``` ### 4. Verify Security Setup ```bash # Check if security setup job completed successfully kubectl get jobs -n elasticsearch-system # Verify Mastodon user was created kubectl logs -n elasticsearch-system job/elasticsearch-security-setup ``` ### 5. Update Mastodon After Elasticsearch is running, deploy the updated Mastodon configuration: ```bash git add manifests/applications/mastodon/ git commit -m "Enable Elasticsearch in Mastodon" git push ``` ### 6. Populate Search Indices Once Mastodon is running with Elasticsearch enabled, populate the search indices: ```bash # Get a Mastodon web pod MASTODON_POD=$(kubectl get pods -n mastodon-application -l app.kubernetes.io/component=web -o jsonpath='{.items[0].metadata.name}') # Run the search deployment command kubectl exec -n mastodon-application $MASTODON_POD -- bin/tootctl search deploy ``` ## Configuration Details ### Elasticsearch Configuration - **Version**: 7.17.27 (latest 7.x compatible with Mastodon) - **Preset**: `single_node_cluster` (optimized for single-node deployment) - **Memory**: 2GB heap size (50% of 4GB container limit) - **Storage**: 50GB persistent volume with existing `longhorn-retain` storage class - **Security**: X-Pack security enabled with custom roles ### Security Configuration Following the [Mastodon Elasticsearch documentation](https://docs.joinmastodon.org/admin/elasticsearch/), the setup includes: - **Custom Role**: `mastodon_full_access` with minimal required permissions - **Dedicated User**: `mastodon` with the custom role - **TLS Encryption**: All connections use HTTPS with self-signed certificates ### Performance Configuration - **JVM Settings**: Optimized for your cluster's resource constraints - **Discovery**: Single-node discovery (can be changed for multi-node scaling) - **Memory**: Conservative settings for 2-node cluster compatibility - **Storage**: Optimized for SSD performance with proper disk watermarks ## Mastodon Integration ### Search Features Enabled Once configured, Mastodon will provide full-text search for: - Public statuses from accounts that opted into search results - User's own statuses - User's mentions, favourites, and bookmarks - Account information (display names, usernames, bios) ### Search Index Deployment The `tootctl search deploy` command will create these indices: - `accounts_index`: User accounts and profiles - `statuses_index`: User's own statuses, mentions, favourites, bookmarks - `public_statuses_index`: Public searchable content - `tags_index`: Hashtag search ## Monitoring Integration ### OpenObserve Metrics Elasticsearch metrics are automatically collected and sent to OpenObserve: - **Cluster Health**: Node status, cluster state, allocation - **Performance**: Query latency, indexing rate, search performance - **Storage**: Disk usage, index sizes, shard distribution - **JVM**: Memory usage, garbage collection, heap statistics ### Kibana Management UI Optional Kibana web interface available at `https://kibana.keyboardvagabond.com` for: - Index management and monitoring - Query development and testing - Cluster configuration and troubleshooting - Visual dashboards for Elasticsearch data ## Scaling Considerations ### Current Setup - **Single-node cluster**: Optimized for current 2-node Kubernetes cluster - **50GB storage**: Sufficient for small-to-medium Mastodon instances - **2GB heap**: Conservative memory allocation ### Future Scaling When adding more Kubernetes nodes: 1. Update `discovery.type` from `single-node` to `zen` in cluster configuration 2. Increase `nodeSets.count` to 2 or 3 for high availability 3. Change `ES_PRESET` to `small_cluster` in Mastodon configuration 4. Consider increasing storage and memory allocations ## Troubleshooting ### Common Issues **Elasticsearch pods pending:** - Check storage class and PVC creation - Verify Longhorn is healthy and has available space **Security setup job failing:** - Check Elasticsearch cluster health - Verify admin credentials are available - Review job logs for API errors **Mastodon search not working:** - Verify Elasticsearch credentials in Mastodon secret - Check network connectivity between namespaces - Ensure search indices are created with `tootctl search deploy` ### Useful Commands ```bash # Check Elasticsearch cluster status kubectl get elasticsearch -n elasticsearch-system # View Elasticsearch logs kubectl logs -n elasticsearch-system -l elasticsearch.k8s.elastic.co/cluster-name=elasticsearch # Check security setup kubectl describe job elasticsearch-security-setup -n elasticsearch-system # Test connectivity from Mastodon kubectl exec -n mastodon-application deployment/mastodon-web -- curl -k https://elasticsearch-es-http.elasticsearch-system.svc.cluster.local:9200/_cluster/health ``` ## Backup Integration ### S3 Backup Strategy - **Longhorn Integration**: Elasticsearch volumes are automatically backed up to Backblaze B2 - **Volume Labels**: `backup.longhorn.io/enable: "true"` enables automatic S3 backup - **Backup Frequency**: Follows existing Longhorn backup schedule ### Index Backup For additional protection, consider periodic index snapshots: ```bash # Create snapshot repository (one-time setup) curl -k -u "mastodon:$ES_PASSWORD" -X PUT "https://elasticsearch-es-http.elasticsearch-system.svc.cluster.local:9200/_snapshot/s3_repository" -H 'Content-Type: application/json' -d' { "type": "s3", "settings": { "bucket": "longhorn-backup-bucket", "region": "eu-central-003", "endpoint": "" } }' # Create manual snapshot curl -k -u "mastodon:$ES_PASSWORD" -X PUT "https://elasticsearch-es-http.elasticsearch-system.svc.cluster.local:9200/_snapshot/s3_repository/snapshot_1" ```