ok - I took a bit different approach. Since I know what error in rabbitmq's log file is associated with things coming to a stop on fedia.io, I installed swatchdog and set it up to look for that word (which is, btw, "timeout"). I created a script that stops all the messengers, then stops php-fpm, keydb, and rabbitmq. Then it start rabbit, keydb, and php-fpm in order. Finally, it restarts the messengers.
I will be surprised if it works first time, so it may still crash again but I'll be watching
The cause is unclear to me, but the processes that manage incoming and outgoing federation were "stuck" until I manually intervened. It's running now. I'm adding to my to-do list for after my job ends in 10 days to implement some sort of detection and hopefully automated restart for when this happens again.
Interesting choice over there to close the issue so quickly rather than asking for more info, although you didn't give them much to go on.
I wonder what was the resolution of the previous problem that frequently caused this sort of thing? Was the error handling improved such that we might reasonably expect the processing to keep going when it hits something it doesn't like, or was it just a quick fix for the one specific thing that happened to be breaking it at the time? Did that one make it to the github tracker?
That GitHub tracker is for problems with mbin code, not with instances running Mbin software. The problem that Fedia experienced is very likely caused by a bug of some sort in mbin (or something that Mbin is depending on), but we don’t know for sure what it is yet. Once I’m unemployed in a week, I hope to have more time to debug issues like this and get them resolved.
Are we still having trouble? A thread I posted 45 minutes ago hasn't shown up on the instance I posted it to yet. The "Open original URL" menu on the post goes to the Fedia page instead of LemmyWorld. It seems like a comment I posted 2 hours ago is also not appearing on the original instance, either.
@jerry How long should a message take to propagate? I made a few posts over the last couple of days and they seem to have propagated correctly, but I just made a post on a thread from mander.xyz about 30 minutes ago and it hasn't yet appeared on the host instance.
Edit: Never mind, I was just too impatient. They're there now.
I am not 100% sure. Fedia.io is running on a beast of a server, and so long as it’s working correctly, it should be able to deliver it instantly. But that doesn’t mean that the receiving servers are able to consume and render them that fast.
Not entirely. It looks like the rabbit issue was only impacting one of the queues (“deliver”), though I would have expected that to impact things like microblog too. All I can say with clarity is that the instance was operating in a very unhealthy state.
The queue appears like it’ll take several hours to flush, but it’s working.
OK, I'm glad it's not just me. A few days ago, it took about 6 hours to reach Lemmy instances, and a post I made yesterday didn't federate at all.. I was able to get my most recent post to appear on kbin.melroy.org by searching for its fedia.io URL there. I don't think Lemmy has a similar mechanism, though.
Do you know whether there are any users on kbin.melroy.org that subscribe to your magazine? If not, that will explain why your post did not show up until you searched for it.
I've been noticing a problem too. I've posted on threads from lemmy.world, lemmy.zip, and lemmy.ml and when I check the threads on the home sites my posts do not appear.
just tried to view one of your recent posts from microfedi via sharkey - impossible > otoh, viewing threads from fedia.io posted to lemmy which don't appear at lemmy instances seems to be possible when i use sharkey (both link and thread type)
I'll paste you the Fedia and original links to the last comments I've made that don't seem to be federating. The comments are visible at the fedia, but not at the foreign instance. I can see by reviewing these that it appears to be all magazines and all foreign instances, because these are all the comments I've made in the last day or so.
But this one isn't. It is quite recent, and I don't know the intervals for shunting comments off to other instances, so it could be that. Edit ~20 minutes later: This one is also federated.
Outbound federation was indeed broken. I fixed it, but there was a huge backlog the server had to work through, which took about 12 hours to complete. I just checked and everything appears to be working ok. I am going to create some automation that will detect and alert on this (or other issues) happening in the future.
Excellent! What I saw this morning was that comment federation seemed to be taking between 15 and 20 minutes. As before, I don't know what your intervals are, but I figure you could use that data point to verify/validate as necessary.
there isn't an inteval per se - I don't yet know why it's not immediate. It's possible that the delay is on the receiving side - the server that fedia.io is quite substantial and unless there is some sort of bug, the processing should happen immediately.
fedia
Hot
This magazine is from a federated server and may be incomplete. Browse more on the original instance.