That's actually done already in XMPP (Jabber). Federativity, voice calls, E2EE, direct and offline file exchange, no sms verification and other bs, max privacy and convenience.
That sounds like perfect match for XMPP / Jabber. You can set up a server on your domain (e.g. 'Prosody') and then two XMPP clients that support calls (e.g. 'Conversations' for android) will be able to initiate voice/video calls, just like in telegram/whatsapp/viber/etc. More over, XMPP works like an e-mail, so user1@xmpp.org can message/call user2@yourdomian.com (if you allow your server to federate with other servers). Setting it up is not very simple though, you'd also need to set up STUN/TURN properly for direct calls and file transfers.
Yes, you're right, matrix too. However, I've tried ruining servers for both, synapse for matrix and prosody for xmpp and I should say matrix felt very sluggish and limited, while prosody is fast and insanely flexible. In addition, client software for XMPP is more diverse and feature-rich, I'm particularly impressed by movim (web) and conversations (android).
Also, there are variety of bridges for everything, e.g. matrix <-> whatsapp or xmpp <-> telegram, so one is not limited too much while committing to a certain messaging tech.