Are You Prepared to Go Insane?
This article summarises what I learned from trying to implement the most trivial web push notification system possible, to let subscribers know when a new strip of my S.O.N.A.I.S webcomic has been published.
This is not a neatly structured guide, rather a semi-random collection of facts about implementing notifications using Web Push. You should look elsewhere for tutorials, but be aware that those tutorials will usually leave out the small but important details. This webpage is meant to provide many of those details.
In general, you will want to use some ready-to-use library or framework to add push notifications to your website or web app, that shields you from all the nitty-gritty details discussed here. This article is meant to help those who need to implement those frameworks, or the masochists like me who want to try implementing their own push message system from scratch.
If I have to summarise my experiences in a single meme, it will be this one, borrowed from the StackOverflow answer mentioned below:
This text was written after toiling for a whole month on creating a vaguely usable Web Push implementation, repeatedly moving from one frustration to another. Defying Sauron is peanuts in comparison. As a result, this article is full of sarcasm, bile, and some minor strong language. If you cannot handle that, then don't read it.
My use case is very simple, almost trivial: I draw a web comic ‘S.O.N.A.I.S.’ that normally updates twice a week, but there may be extra strips sometimes, or I may want to take a break. I thought it would be convenient for whomever wants to read this comic, to get push notifications when a new strip is available. As far as use cases for push messages are concerned, this is about the simplest possible.
My goal: people can hit a subscribe button, there is nothing user-specific, subscriptions are not linked to logins, everyone is treated the same. Every subscriber will receive the exact same notification when a script on the server publishes a message to a single fixed topic. The same style of notification should always appear, even when the user is watching the main comic page, because I want a simple consistent user experience. When a user clicks/touches/punches the notification, always the same webpage opens. I would prefer the page to open in the user's browser of choice, but begrudgingly I would also find it acceptable if the page would open in an embedded browser inside a dedicated app-like thing. I don't even try to reload the page if they already have it open, because I consider it evil to touch someone's already open browser tabs, hence I always open a new window/tab.
As you can see, my goals do not consist of anything special, anything fancy. How hard could it be? As it proves, hard. Way too hard.
My general conclusion: implementing web push messages at this time is a surefire road to insanity. It took me an entire month to end up with an implementation that mostly works and which has no vague copy-pasted Cargo Cult I don't understand. Yet, still this implementation sometimes misbehaves for unknown reasons. On some devices, it works flawlessly. On other devices, it self-destructs very often and needs to be reanimated by the user for it to work again for a few more push messages, after which it will again break. I have no clue why, debugging this problem is as good as impossible, and the problem cannot be within my own code because it keeps working on other devices. 🤯
I have little confidence in the system I created, and I had to slap all kinds of disclaimers on the subscription page after verifying that the remaining issues seem fundamentally impossible to fix. The service worker has only 22 kBytes of code, but each line has been sculpted and crafted over the course of a month, making this amongst the most expensive code I have ever written. This was not a satisfying endeavour at all. Luckily it's just a notification system for a comic I draw purely as a hobby. I wouldn't want to use this for anything truly serious—I wouldn't want to have to deal with infuriated customers.
Alternate conclusion: I do not want to do any job that involves writing web applications beyond the very sporadical tiny simple thing. I value my sanity too much.
There are proposals to make web push notifications less insane, but for the time being, a lot of hoop-jumping is the norm. Implementing Web Push is like navigating a minefield: something will explode at the least expected moments. And, people have done their utmost best to coerce you into walking straight through this minefield with no opportunity to evade it.
One could try to take cover behind some pre-made framework like Angular, but at some point your push messages will break, and it will be very helpful if your knowledge goes further than a bit of Cargo Cult where you mimicked commands from a YouTube video and deployed magical config files. Also, no framework will protect against the inconsistent ways in which different vendors have implemented service workers, PWAs, push messages and notifications, which leads to certain annoying problems for which no satisfactory solution can possibly be implemented, not even in the best of frameworks or libraries. Arguably the worst offender is Apple, which first demands that you wrap your web app inside a PWA, and then provides a bug-ridden PWA implementation which they already have tried to kill at least once. Apple probably sees PWAs as a nuisance that bypasses their app store, and they only provided an implementation because of peer pressure. Their PWA and Web Push implementation is rife with bugs and annoying limitations, and their motivation to fix these issues is very low.
I decided to use Google's Firebase messaging, a.k.a. FCM, to handle the sending of messages. It has the convenient concept of ‘topics’ that matches my use case.
First, we create an app in the Firebase console. A specific instance of the ‘app’ running in someone's specific browser will then be assigned a token, which can be subscribed to a topic. When sending a message to the topic, the FCM server will then handle the fan-out of sending that message to each subscriber of the topic. It was exactly what I needed, and it all sounds good… in theory.
The first indications of things to come, became apparent when I started reading a guide on StackOverflow. The mere length of the answer, its lack of an ‘accepted’ state, and its concluding “One Does Not” meme which I replicated above, were not very encouraging. Still, I felt it was doable and I should be able to make it work within reasonable time. Unfortunately that SO answer proved to be just the tip of the iceberg.
I wanted to get FCM to work with minimal external dependencies. Most guides you will find, not only for push messaging but pretty much anything nowadays, will tell you to deploy utterly bloated framework X and then sprinkle it with some magic configuration, and then it will usually work and in the end you have no idea what you just did. The average web developer nowadays does not mind being dependent on external parties that create fancy libraries which hide all the complexities—I am not one of those people. I like to know how things work, and the fewer dependencies I have to import to get something to work, the happier I am. I have seen way too many things explode in the past due to those external developers creating bugs, or introducing breaking changes, or suddenly killing off the project.
Sure, if I would need to add web push to some existing product backed by a development team, I would choose for a framework right away, because going low-level would be insane in that case. I compare it to Kerbal Space Program. When I started playing it, I managed to perform a moon landing mission entirely manually, using rough calculations and wet-finger guesses, and a lot of trial-and-error and dead Kerbals. The satisfaction of finally seeing the capsule touch down after re-entry was immeasurable. But then I installed the MechJeb plug-in to automate a lot of the aspects for subsequent missions, because I wanted things to progress faster. However, all the experience I gained from my manual endeavours allowed me to notice when MechJeb was about to fail horribly, because I knew what it was trying to automate and what it was supposed to be doing and what not.
This very website is a hobby project that I want to keep pure and simple, and the push message use case seemed basic enough that it should be feasible to implement at a low level, with as added bonus that I would also learn how the whole system works and be more confident when delegating things to a framework. In the end, someone, for instance whoever has to implement those fancy frameworks, has to know the dirty details, and boy are they dirty indeed. I thought it could be useful to collect the dirty details I discovered on this journey.
I will now list a big steaming pile of pitfalls I have encountered, starting with things specific to FCM, then moving on to more general facts, and as grand finale, some iOS-specific things.
onBackgroundMessage
in your Service Worker. Instead, it will attempt to send a foreground message to that page, even if that page has no JavaScript whatsoever, let alone a messaging.onMessage
handler, and even if that page is not within the scope of the SW. The latter is mind-boggling: I find it perfectly reasonable to expect that only pages within the SW's scope can be considered foreground, but the developers of FCM think otherwise.onMessage
handler in Every Single Page Of Your Entire Website; and handle the event in some appropriate way. Have fun! This might be OK if you entire domain is dedicated to your ‘app’ only, but not if your site hosts various loosely related things.onMessage
and onBackgroundMessage
; instead add your own push
event handler to the service worker, and sniff event.data.json()
from the event. You can see whether it comes from FCM by looking for fcmMessageId
. Then extract the data
attribute as you would in onMessage
, and Bob's your uncle. Unclear whether you would still need to provide dummy onBackgroundMessage
and onMessage
handlers—maybe not a bad idea, to ensure FCM really knows you expect messages.getToken
must be followed by a check to compare the token to the last known one, as stored in an IndexedDB (or LocalStorage, but only IndexedDB is available to Service Workers). If the token is new or has changed, perform a call to your back-end to register/update the token in your database. If you're building something with Firebase, you could use the data store thingamajig it offers, but anything else will do.deleteToken
is invoked, but apparently this has bugs—oh what a surprise! These ghost subscriptions can accumulate, causing the same message to arrive multiple times if tokens have been repeatedly deleted before being unsubscribed. (This might be related to this issue on the FB GitHub, although it could also be another bug.)getToken
is invoked. I have also noticed that the token can change when the user merely disables and then re-enables messaging permission for your site, even without doing anything in between. Hence it is pretty much inevitable that existing tokens will be invalidated without any chance of first unsubscribing them from a topic. Once the token is replaced by a new one, you can no longer unsubscribe the old one, because it will be “gone” and cause an INVALID_ARGUMENT
or NOT_FOUND
response. This makes sense because the token is supposed to be gone—except sometimes it isn't.:
changes when tokens change. If any need arises to destroy this IID entirely, you can do that in firebase-admin, but it will take several days for it to be really gone, and the browser will keep on trying to use the old IID stored in certain IndexedDB instances for the website. I have seen no indications that this database is automatically wiped or reset when deleting the IID in the back-end. In other words, avoid deleting an IID unless it is really necessary. A reason might be that a user wants to keep using your app, but no longer wants to receive notifications from it, yet they still get these ghost notifications. If they insist on this being resolved, then you should try to obtain the user's IID and delete it entirely in the admin console. The user should also clear all your site data and re-initialise the app from a clean slate to be assigned a new IID. In general however, forcibly deleting an IID is to be avoided.getToken
result in actual communication with the FCM servers. This is important to debug problems that only occur in that case: if you want to reproduce a bug that is only seen when FCM goes through the whole token creation workflow, you will need to delete those IndexedDBs to force getToken
to phone home again. These IndexedDBs are easy to recognise, their names start with “firebase-”. At the time of this writing, there is a firebase-heartbeat-database, firebase-installations-database, and firebase-messaging-database.getToken()
being invoked often enough. If not, FCM could mark the token as “stale.” Documentation is vague about this, but I have seen a period of 2 months being mentioned as staleness threshold. Thanks to the caching, one cannot really invoke getToken
too often, the IndexedDB will prevent unnecessary traffic. But, it can certainly be invoked not often enough.getToken()
every time a push is received, and of course update the token in your back-end if it has changed. As will be explained below, this should be done after or in parallel with invoking showNotification
, not before. If you rely purely on this strategy, and will not be sending a message during 2 months, everyone's token will become stale and they will no longer receive messages (unless they open your app within due time and it performs a getToken
).getToken
call succeeding. Besides the obvious explanation of Firebase being a big steaming pile of bugs, the local token caching could also have a hand in this. In the most unlucky case, you performed a getToken
right at the end of the local cache period, and the FCM back-end did not see any sign of life for this token during that period. If the device is then offline for 10 days, or fails to refresh its token, then perhaps the token expiry period is reached in the back-end. I have no idea what the local caching period is, however. I suspect it might even be variable, based on some heuristics. My advice is therefore to invoke getToken
as often as possible.getToken
would help in the situation where there is nothing noteworthy to notify about during a period long enough for tokens to expire. However, as mentioned below, this is a total no-go in Safari, and on other platforms it can also have effects that are annoying for the users. You could try to use Periodic Sync, but forget about this: it also does not work in Safari, and even in other platforms it is designed to stop working exactly in this case where you need it the most—it is an anti-feature. The only thing you can do in this case, is to annoy all your users with a dummy message, to trigger their getToken()
calls. Isn't this all so much fun?brave://settings/privacy
and explicitly enable “Use Google services for push messaging,” because the Brave developers believe notifications are evil.getToken
, which in such cases will then almost always produce a token that differs from the last one. Of course the getToken
cannot be initiated from a push message, because those will be broken. There is no rhyme or reason behind this failure and due to its random nature and long timespans, trying to debug it is only something an absolute masochist would attempt. I suspect all the annoying energy saving and privacy hooey that is being incorporated in recent Android releases might be to blamed for this. I think Google is a big fan of the security/privacy approach illustrated in one of the strips of my comic.
On Android devices, you can see what is actually going on with FCM, by invoking the diagnostics page. On phones, this can be done by dialling *#*#426#*#*
, although this will likely only work in phone apps distributed by Google.
On Android devices that have no phone hardware and on which no official phone app can be installed, you will need to enable developer mode and connect through ADB. Then this command can be run from the debugging host machine to invoke the same diagnostics tool:
adb shell am start -n com.google.android.gms/.gcm.GcmDiagnostics
waitUntil
on the event, with a Promise that only resolves when all the work is done. Yes, you won't escape having to learn how to properly chain Promises.event.waitUntil(Promise.all([promise1, promise2, etc]))
.showNotification
.setTimeout
, it will not have any protections against being aborted. Wrap the timeout in a Promise.setTimeout
in a push handler, because it will suspend execution of your worker, greatly increasing the risk that it will not be resumed. This is especially the case when the push notification has woken the device (possibly also activating the screen). The device will then want to go back to low-power mode as soon as possible, and it may deem suspended tasks not worth resuming. You should finish all tasks in a push handler as soon as possible. Only if you're certain that the push handler is running on a desktop machine or something else with no extremely frugal power management, you may consider relying on a timeout, but even in that case it should never be longer than a few seconds.showNotification
, which will show up in newest Android versions in the status bar and inside the notifications list, must be a silhouette consisting of only white pixels on a transparent background (alpha channel). If the pixels are not white, your icon may end up being displayed in white on white backgrounds, making it invisible. A sensible size for the badge is 96x96 pixels, it is pointless to go above that. Keep the silhouette simple and recognisable.showNotification
, consider all of them as best effort. Every platform treats these differently, some platforms may not show any of them. Put all effort and information in the title and body text, and consider the rest as icing on the cake.clients.openWindow()
seems to have a mind of its own, and if a user has deployed your page as a PWA, then sometimes the user will be sent to the PWA's start_page
instead of the URL given as argument to openWindow
. See this StackOverflow question for more details.Notification.permission
stubbornly returns a value of default
, which means as much as: “you're totally f*cked.” This value seems to have the same semantics as denied
. When going to Chrome's settings while it is in this state, things make no sense. It might show that permission is set to “ask” for all sites, with no particular setting for your domain. In other words, it should allow invoking Notification.requestPermission()
and getting the permission dialog, yet it won't. It will act as if the user has explicitly denied permissions for the site, even when they haven't.url_handlers
member in the manifest, which was later on deprecated in favour of the handle_links
and scope_extensions
members. Neither seems to have any usable support at the time of this writing. I have attempted to set the values in my manifest and it did nothing, I cannot control how browsers handle links on my very own site. The best I could do, is detect when pages are inappropriately opened inside the PWA, and then yet again show confusing banners to users, explaining what a goddamn clusterfuck this whole design is, and allowing them to go back to the expected UI page.
Now comes the real fun part. Oh yes, it gets even worse.
FCM, or any kind of push messaging for that matter, inside Safari/iOS is a Royal Pain. Yes, even more pain than all the above. But given the market share of this platform, you will want to support it of course, so let's bring on the pain.
getToken
from within a service worker requires the worker registration to have a PushManager
. This is only supported in fairly recent versions of Safari, see the MDN info page.getToken
calls inside front-end code, but believe me, you don't want to, because it greatly complicates things. Respect your own sanity, and simply tell users of obsolete iOS devices that they will need to pay more Apple tax for a new device.PushManager in window
as an indicator to show a prompt to users that if they are using an iOS device, they need to install the PWA. Don't try to guess whether the user is on an iOS device, most of them will know it themselves. The ones that don't, should not even attempt to dive into the Web Push cesspool.PushManager
inside Service workers in plain browser tabs, but only starting from Ventura. However, there is a big fat bug that will cause the service worker to have no valid PushManager at semi-random moments, for instance when it is first created, even if notification permissions have already been granted.pushManager
property in the serviceWorkerRegistration
object, it may be undefined
. This breaks getToken()
.pushManager
at certain moments, like when a push message is received while Safari is closed (yes, somehow the service worker still runs even if Safari is closed, I guess it is never truly closed).www.gstatic.com
. Do not use the minified scripts from cdnjs.cloudflare.com
, they are entirely broken in Safari, both in MacOS and iOS. You will get an exception “TypeError: t is not a function
” when trying to invoke getToken
.getInstalledRelatedApps
, and Safari is not one of them.clients.openWindow
in iOS only) will show them inside an embedded browser thing inside the PWA, if they are not in the PWA's scope as defined in its manifest. (Do not confuse the PWA scope with the service worker scope, it is not the same.) If you forgot to properly define this scope, your PWA's UI pages may be replaced with other pages if the user follows links in the UI or opens a notification. Even if it is possible to navigate back to the UI page, at the least the user will be utterly confused and frustrated. You do not want this, hence make sure to properly set up the PWA's scope! Only pages related to the PWA's UI must be within the scope defined in the manifest.clients.openWindow()
in the service worker, for instance inside the worker's notificationClick
handler:
getRegistration()
.clients.openWindow()
open in the same PWA-embedded browser, they cannot obtain the service worker registration in the same way as described above. Why Apple, why? I expect that if the opened page is within the scope of the SW, it might have access to the SW, but I have not tested this, so don't trust my word for it.getToken()
will fail with something like:FirebaseError: Messaging: A problem occurred while subscribing the user to FCM: Request contains an invalid argument. (messaging/token-subscribe-failed).
showNotification
must get the highest priority, do it as quickly as possible and before anything else that has any risk of failing. Also invoke it even if the incoming message is not as expected. It is arguably better to show a notification “something went wrong,” than ending up in the situation where all future notifications are silently broken because Safari decided to revoke the app's permissions.getNotifications
, and invoke close()
calls on them. Denied! First of all, only iOS 17 or newer allows to obtain the list of notifications, but it is of no use: iOS will not honour any close
calls on the objects obtained as such. So, the only thing you can do, is apologise in advance to your iOS users about the fact that they are likely to start seeing duplicate push messages after a while. Isn't all this just grand? It's as if it has been designed by a total sadist.close()
on notifications, would allow to approximate invisible push messages, which as shown above, Apple considers the spawn of the devil.waitUntil
anymore, so make sure you use this where needed.notificationclick
handler by performing a clients.openWindow
with its start page, but then you cannot open the page that was supposed to be opened by the notification, because by design it is forbidden to perform more than one openWindow
call from a notificationclick
handler. You would either need to somehow trigger opening the PWA page when ‘Done’ is pushed (if that is possible at all), or you could also fix the PWA page as described above, and then trigger a banner/toast in it, to prompt the user to open the page that was actually supposed to open. You cannot do this automatically with window.open()
, because this also can only be performed ONCE from a user gesture handler.If I have to condense all the above into one sentence, then it would be: “Web Push in its current state is an abomination, avoid it if you can.” I have littered this page with memes, which seems fitting because Web Push is sort of a meme in itself.
If you cannot avoid having to implement something using Web Push, then even though pretty much every aforementioned point is worth looking at, the most important take home messages are:
getToken
call at least once a month to prevent the token from becoming stale. I practice I recommend at least once a week.waitUntil
where needed. If you don't, then you are likely to create something that seems to work, but will fail unpredictably when actually deployed, and it will be a total pain to debug.setTimeout
(unless in very specific circumstances).The sad thing is that it's not just Web Push which gives me an impression of being a hot mess. My general sentiment is that a lot of software nowadays is sliding down this slippery slope of becoming way too complicated, scattered across multiple vendors who have their own different interpretations of a standard. As a result, implementing something is like navigating a minefield in which the mines spontaneously change places at random moments. Nobody except an ever shrinking elite of geniuses has an over-arching vision, no mere mortal can explain all the intricacies of how certain projects work, or randomly fail to work. It's all frameworks stacked on top of frameworks—goddamn turtles all the way down.
Programming these days is all too often becoming like wizardry. Follow someone's magic book of magical formulas written in JSON and YAML, and utter some incantations in the latest trendy programming language. Once something nears maturity, deprecate it and replace it with something new that is again full of fresh bugs and mysteries. I am not surprised that nearly brand new airplanes basically fall apart mid-flight. It's not just Boeing, it's a sign of an underlying problem that permeates society as a whole. Instead of fixing a flat tyre, we reinvent the whole damn wheel every time and ignore all the associated costs of doing this.
Everything is full of bugs that are often hard to reproduce because they're caused by race conditions and async routines that are a total pain to debug. Often it are not the most correct tutorials and example code that are being replicated everywhere, it are the ones that were published first. And as someone who has done a master in A.I., if you believe A.I. will make it all easier and better, let me tell you that this is an utterly naïve idea, in fact things might get worse. We are making A.I. constructs spit out code that is generated using models trained on all this dodgy code written by humans who take shortcuts all the time. I already see this with ChatGPT and friends, it is very apt at producing total garbage with an air of utmost overconfidence, which I guess is why many are so impressed by it—it faithfully mimics typical human behaviour, but only superficially. It lacks the deeper understanding.
If you want to try out my attempt at producing a workable web push notification system, and read my own idiosyncratic web comic as a side effect, head over to S.O.N.A.I.S.