Compare commits

..

4 Commits

Author SHA1 Message Date
91e6e2e502 more redaction 2025-12-24 14:39:47 +01:00
74324d5a1b add source code and readme 2025-12-24 14:35:17 +01:00
7c92e1e610 idea gitignore 2025-12-24 14:34:28 +01:00
f3fb0514d2 add gitignore 2025-12-24 14:34:11 +01:00

View File

@@ -5,7 +5,7 @@ This is a portion of the keyboad vagabond source that I'm open to sharing, based
This is something that I made using online guides such as https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talos-on-hetzner/ along with Cursor for help. There are some some things aren't ideal, but work, which I will try to outline. Frankly, things here may be more complicated than necessary, so I'm not confident in saying that anyone should use this as a reference, but rather to show work that I've done. I ran in to quite a few issuse that were unexpected, which I'll document to the best of my memory, so I hope that it may help someone.
## Background
This is a 3 node ARM VPS cluster running on Bare-metal kubernetes and hosting various fediverse software applications. My provider is not Hetzner, so not everything in the guide pertains to here. If you do use the guide, do NOT change your local domain from `cluster.local` to `local.your-domain`. It caused so many headaches that I eventually went back and restarted the process without that. It would up causing me a lot of issuse around Open Observe and there are a lot of things in there that are aliased incorrectly, but I now have dashboards working and don't want to change it. Don't use my OpenObserve as a reference for your project - it's a bit of mess.
This is a 3 node ARM VPS cluster running on Bare-metal kubernetes and hosting various fediverse software applications. My provider is not Hetzner, so not everything in the guide pertains to here. If you do use the guide, do NOT change your local domain from `cluster.local` to `local.your-domain`. It caused so many headaches that I eventually went back and restarted the process without that. It would up causing me a lot of issuse around open observer and there are a lot of things in there that are aliased incorrectly, but I now have dashboards working and don't want to change it. Don't use my OpenObserve as a reference for your project - it's a bit of mess.
I chose to go with the 10vCPU and 16GB of RAM nodes for around 11 Euros. I probably should have gone up to 15 Euros for the 24GB of RAM nodes. But for now, the 16GB nodes are doing fine.
@@ -16,7 +16,7 @@ The cluster runs Authentik, but I was unfortunately not able to run it for as ma
A minimalist blog. This one is using the local sqlite3 db, so only runs one instance. It was one of the first real apps that I installed, before Cloud Native Postgres was set up. I debate on whether that was a good enough choice or not. At one point I almost lost the blogs in a disaster recovery incident (self-inflicted, of course) because I forgot to add the longhorn attributes to the volume claim declaration, so I thought it was backed up to S3 when it wasn't.
- **Bookwyrm, Pixelfed, Piefed**
These all have their own custom builds that pull source code and create different images for workers and web projects. I don't mind the workers being more resource constrained, as they will catch up eventually and have horizontal scaling set at pretty high thresholds if they really need it, but that's rare. I definitely imagine that the docker builds can be cleaner and would always appreciate review. One of my concerns with the images was on the final size, which is around 300MB-400MBish for each application.
These all have their own custom builds that pull source code and create different images for workers and web projects. I don't mind the workers being more resource constrained, as they will catch up eventually and have horizontal scaling set at pretty high thresholds if they really need it, but that's rare. I definitely image that the docker builds can be cleaner and would always appreciate review. One of my concerns with the images was on the final size, which is around 300MB-400MBish for each application.
- **Infrastructure - FluxCD**
FluxCD is used for continuous delivery and maintaining state. I use this instead of ArgoCD because that's what the guide used. The same goes for Open Observe, though it has a smaller resource footprint than Grafana, which was important to me since I wanted to keep certain resource usages lower. SOPS is used as encryption since that's what the guide that I was using used, but I've checked in enough unencrypted secrets to source that I want to eventually self-host a secret manager. That's in the back of my mind as a nice-to-have.
@@ -40,7 +40,7 @@ One thing to note is that piefed has performance opitimizations to use for CDN c
## Database
The database is a specific image of postgresql with the gis plugin. What's odd here is that the default image of postgres does not include the gis extension and the main postgresql repository doesn't officially support ARM architecture. I managed to find one on version 16 and am using that for now. I am doing my own build based off of it and have it in the back of my mind to possibly do my own build and upgrade the version to a higher one. Bare this in mind if you go ARM.
Cloud Native PG is what I use for the database. There is one main(write) database and two read replicas with node anti-affinity so that theres only one per node. They currently are allowed up to 4GB of RAM but are using 1.5-1.7GB typically. Metrics reports that the buffer cache is hit nearly 100% of the time. Once more users show up I'll re-evaluate the resource allocations or see if I need to add a larger node. Some of the apps, like Mastodon, are pretty good with using read replica connection strings - that can help with spreading the load and using horizontal rather than vertical scaling.
Cloud Native PG is what I use for the database. There is one main(write) database and two read replicas with node anti-affinity so that theres only one per node. They currently are allowed up to around 3GB of RAM but are using 1.5-1.7GB typically. Metrics reports that the buffer cache is hit nearly 100% of the time. Once more users show up I'll re-evaluate the resource allocations or see if I need to add a larger node. Some of the apps, like Mastodon, are pretty good with using read replica connection strings - that can help with spreading the load and using horizontal rather than vertical scaling.
## Strange Things - Python app configmaps
The apps that run on python tend to use .env files for settings management. I was trying to come up with some way to handle the stateless nature of kubernetes with the stateful nature of .env files and settled on trying to have the configmap, secrets and all, encrypted and copied to the file system if there is no .env there already via script. The benefit is that I do have a baseline copy of the config that can be managed automatically, but the downside is that it's a reference that needs to be maintained and can make things a bit weird. I'm not sure if this is the best approach or not. But that's why you'll find some configmaps that have secrets and are encrypted in their entirety.
@@ -52,7 +52,7 @@ Open Observe became very bloated in its configurations and I believe that at the
There are a lot of documentation files in the source. Many of these are just as much for humans as they are for the AI agents. The .cursor directory is mainly for the AI to preserve some context about the project and provide examples of how things are done. Typically, each application will have its own ReadMe or other documentation based off of some issue that I ran in to. Most of it is more for reference for me rather than reference for a person trying to do an implementation, so take it for what it is.
## AI Usage
AI was used extensively in the process and has been quite good at doing templatey things once I got a general pattern set up. Indexing documentation sites (why can't we download the docs??) and downloading source code was very helpful for the agents. However, I am also aware that some things are probably too complicated or not quite optimized in the builds and that a more experienced person could probably do better. It is still a question in my mind on whether the AI tools helped save time or not. On one hand, they have been very fast at debugging issues and executing kubectl commands. That alone would have saved me a ton. However, I may have also wound up with something simpler. I think that it's a mixture of both because there were certainly some things that would have taken me far longer to find that the agent did quickly.
AI was used extensively in the process and has been quite good at doing templatey things once I got a general pattern set up. Indexing documentation sites (why can't we donwload the docs??) and downloading source code was very helpful for the agents. However, I am also aware that some things are probably too complicated or not quite optimized in the builds and that a more experienced person could probably do better. It is still a question in my mind on whether the AI tools helped save time or not. On one hand, they have been very fast at debugging issues and executing kubectl commands. That alone would have saved me a ton. However, I may have also wound up with something simpler. I think that it's a mixture of both because there were certainly some things that would have taken me far longer to find that the agent did quickly.
I'm still using the various agents provided by Cursor (I can't use the highest ones all the time because I'm on the $20/mth plan). I learned a lot about using cursor rules to help the agent, indexing documentation, etc to help it out rather than relying on its implicit knowledge.