How to Implement VoIP Voice Call Using Twilio in iOS

Twilio (/ˈtwɪlioʊ/) is an American cloud communications platform as a service (CPaaS) company based in San Francisco, California. Twilio allows software developers to programmatically make and receive phone calls, send and receive text messages, and perform other communication functions using its web service APIs.

In mobile, device to device video call integration is done using VOIP and Video SDK.
Twilio team has developed their SDK based on WebRTC open source library.
In order to have a video call we need to implement VoIP call (Voice over IP), so that direct call can be made to peer device.
Video call provides single to multiple device connection and Realtime Video Chat.
Twilio Dashboard shows all the insights needed to rectify issues and checking up all the required data.

It also provides many SDK’s with respect to platform and language specific.
Twilio provides separate SDKs for VoIP and Video call functionality. So, to notify user that someone wants to have video call with him we have implemented VoIP call functionality.
In Twilio video call SDK, it provides implementation for video call functionality only. Also, VoIP has functionality to call, call disconnection, call disconnection before accepting the call due to this we need VoIP along with Video call.
So, to achieve the functionality of getting notified when one user makes video call to another user, we have used VoIP call SDK and after user picks up the incoming VoIP call, we are switching to video call by using video call SDK provided by Twilio.

General VOIP flow:

Consider Device A and Device B as mobile devices. Device A will ask for VoIP token from API by submitting identity. Then we have to register VoIP token.
For placing a call to device B, we need to use Voice connection method of Twilio SDK along with its respected identity.

On account of this Twilio server will fire up push notification to Device B.
After accepting the call, we need to convert it into video call (video call flow is explained in later in the document).

General VIDEO call flow:

When Device A will call over VoIP, using ‘makeCall’ then twillio will fire a push notification to Device B and Device A will get connected to room for video call and will wait for peer participant to get connected.

If Device B accepts the call, then we need to disconnect the VoIP call and then connect to Video Call in the same room in which Device A is connected.
So, when device B accepts the call, we need to disconnect the VoIP call. Call disconnect listener will be called in Device A end and it will look for any participant is added in Video Call room and will wait for 10 sec. Under this 10 second if peer device is getting connected then we can have a normal video call. And if peer device B is not getting connected in 10 second then automatically call will be disconnected. It will also be helpful when receiver (device B) rejects the call, Caller call will be rejected in 10 seconds.
Twillio Video call provides feature like group call, change audio device, camera switch and mute.
Any unique name will work as room name, here we have kept logic for room name like:

IdentityA_IdentityB or userNameA_userNameB.

Steps to implement VoIP voice call using Twilio in iOS

Install the TwilioVoice framework:

- To install via Cocoapods add pod ‘TwilioVoice‘, ‘~> 5.5.0’ in Podfile and under the project path, run pod install and let the Cocoapods library create the workspace for you. Also please make sure to use Cocoapods v1.0 and later.

Create a Voice API key:

- Go to https://www.twilio.com/console/voice/settings/api-keys page and create a new API key.
- Save the generated API_KEY and API_KEY_SECRET in notepad. We will need them in the next step.

Configure a server to generate an access token to be used in the app:

- Backend team need to follow documentation for configuring server and for configuration they need Twilio Account SID and they also need API_KEY and API_KEY_SECRET that we got from previous step.

Create a TwiML application:

- A TwiML application identifies a public URL for retrieving TwiML call control instructions.
- When iOS app makes a call to the Twilio cloud, Twilio will make a webhook request to this URL, your application server will respond with generated TwiML, and Twilio will execute the instructions you’ve provided.
- To create a TwiML application, go to the TwiML app page. Create a new TwiML application, and use the public URL of your application server’s /makeCall endpoint as the Voice Request URL (If your app server is written in PHP, then you need .php extension at the end).

- Save your TwiML Application configuration, and grab the TwiML Application SID (a long identifier beginning with the characters AP).

Configure Application Server:

- TwilML Application SID that we got in previous step needs to put in server configuration and then backend person need to restart the server so it uses the new configuration info.
- Now to check whether everything is configured correctly, open up a browser and visit the URL for your application server’s Access Token endpoint: https://{YOUR_SERVER_URL}/accessToken (If your app server is written in PHP, then you need .php extension at the end).
- If everything is configured correctly, you should see a long string of letters and numbers, which is a Twilio Access Token. Your iOS app will use a token like this to connect to Twilio.

Run the App:

- To get a quick overview of the code and how things work, we can refer to Voice Quickstart for Swift. We have used this Quickstart project as reference and used its classes in relevant application where we integrated video call functionality.
- Please replace baseURLString in voice class with your application server’s public url.
- Now if we build quickstart project, if we leave the text field empty and press the call button to start a call. You will hear the congratulatory message. To dial another client we need to do some additional steps which we will look into next.

Create VoIP Service Certificate:

- The Programmable Voice SDK uses Apple’s VoIP Services to let your application know when it is receiving an incoming call. If you want your users to receive incoming calls, you’ll need to enable VoIP Services in your application and generate a VoIP Services Certificate.
- To generate a VoIP Services Certificate, go to Apple Developer portal and you’ll need to do the following:

- - An Apple Developer membership to be able to create the certificate.
  - Make sure your App ID has the “Push Notifications” service enabled.
  - Create a corresponding Provisioning Profile for your app ID.

- - Create an Apple VoIP Services Certificate for this app by navigating to Certificates –> Production and clicking the + on the top right to add the new certificate.
- Choose VoIP Services Certificate from available options.

Create a Push Credential with your VoIP Service Certificate:

- Once you have generated the VoIP Services Certificate using Keychain Access, you will need to upload it to Twilio so that Twilio can send push notifications to your app on your behalf.
- Export your VoIP Service Certificate as a .p12 file from Keychain Access.

- Extract the certificate and private key from the .p12 file using the openssl command. Follow below commands to get the keys:

- - $> openssl pkcs12 -in PATH_TO_YOUR_P12 –nokeys -out cert.pem -nodes
  - $> openssl pkcs12 -in PATH_TO_YOUR_P12 –nocerts -out key.pem -nodes
  - $> openssl rsa -in key.pem -out key.pem

- Go to the Push Credentials page and create a new Push Credential. Paste the certificate and private key extracted from your certificate. You must paste the keys in as plaintext:

- - For the cert.pem you should paste everything from —–BEGIN CERTIFICATE—– to —–END CERTIFICATE—–.
  - For the key.pem you should paste everything from —–BEGIN RSA PRIVATE KEY—– to —–END RSA PRIVATE KEY—–.

- Remember to check the “Sandbox” option. This is important. The VoIP Service Certificate you generated can be used both in production and with Apple’s sandbox infrastructure. Checking this box tells Twilio to send your pushes to the Apple sandbox infrastructure which is appropriate with your development provisioning profile.
- Once the app is ready for store submission, update the plist with “APS Environment: production” and create another Push Credential with the same VoIP Certificate but without checking the sandbox option.
- This Push Credential SID need to be added in server configuration. The Push Credential SID will now be embedded in your access token.

Configure Xcode project settings for push notifications:

- On the project’s Capabilities tab, enable “Push Notifications”. In Xcode 8 or earlier, enable both “Voice over IP” and “Audio, AirPlay and Picture in Picture” capabilities in the Background Modes.
- In Xcode 9+, make sure that the “Audio, AirPlay and Picture in Picture” capability is enabled and a “UIBackgroundModes” dictionary with “audio” and “voip” is in the app’s plist.

Receive an incoming call:

- We are now ready to receive incoming calls. Rebuild your app and hit your application server’s /placeCall endpoint: https://{YOUR_SERVER_URL}/placeCall (If your app server is written in PHP, then you need .php extension at the end). This will trigger a Twilio REST API request that will make an inbound call to your mobile app. Once your app accepts the call, you should hear a congratulatory message.

Make client to client call:

- To make client to client calls, you need the application running on two devices.
- To run the application on an additional device, make sure you use a different identity in your access token when registering the new device. That is both devices must have different identity in order o get call on a specified device.
- Twilio generates access token on the basis of identity so it is good to have different identity for each user.
- In quickstart project use the text field to specify the identity of the call receiver, then tap the “Call” button to make a call. The TwiML parameters used in TwilioVoice.connect() method should match the name used in the server. This means that name or identity to which we want to call must be exact same as it registered on server.

- Whenever we want to make call to any other identity, we must have to register for incoming notifications using VoIP Push Notifications via TwilioVoice.registerWithAccessToken(…) and when we want to stop receiving incoming notifications, we have to unregister for incoming notifications using VoIP Push Notifications via TwilioVoice.unregisterWithAccessToken(…).

Get Help from the Sunflower Lab Professionals

Steps to implement video call using Twilio in iOS:

Install the TwilioVideo framework:

- To install via Cocoapods add pod ‘TwilioVideo‘, ‘~> 3.7’ in Podfile and under the project path, run pod install and let the Cocoapods library create the workspace for you.

- To install SDK via Carthage or manually follow this document.
- The iOS SDK supports iOS 11.0 or higher.
- The TwilioVideo.framework is built with Xcode 11. The framework can be consumed with previous versions of Xcode. However, re-compiling Bitcode when exporting for Ad Hoc or Enterprise distribution requires the use of Xcode 11.x.

Xcode Configuration:

- To allow a connection to a Room to be persisted while an application is running in the background, you must select the Audio, AirPlay, and Picture in Picture background mode from the Capabilities project settings page.

Get an API Key:

- API Keys represent credentials to access the Twilio API. They are used for two purposes:

1. 1. To authenticate to the REST API.
  2. To create and revoke Access Tokens

- We can create our API Key from the Twilio Console also.

- - Go to the API Keys section under Tools in the Twilio Console.
  - Click on “Create a New API Key”, add a friendly name and save your Key and Secret.

Generate an Access Token:

- For testing purposes you can use the Testing Tools page in the Twilio Console to generate an Access Token. An Access Token is a short-lived credential used to authenticate your client-side application to Twilio.

- In a production application, your back-end server will need to generate an Access Token for every user in your application.

Connect to a Room:

- Call TwilioVideo.connect() to connect to a Room from your iOS application. Once connected, you can send and receive audio and video streams with other Participants who are connected to the Room.

- You must pass the Access Token when connecting to a Room. You may also optionally pass local audio, video or data tracks, to begin sharing pre-created local media with other Participants in the Room upon connecting. You can also pass a room name, which allows you to dynamically specify the name of the Room you wish to join.

- You can also encode the Room name in the Access Token, which will allow the user to connect to only the Room specified in the token.
- An ICE transport policy, which allows you to force calls through TURN relay for testing purposes.
- The name of the Room specifies which Room you wish to join. If a Room by that name does not already exist, it will be created upon connection. If a Room by that name is already active, you’ll be connected to the Room and receive notifications from any other Participants also connected to the same Room. Room names must be unique within an account.
- You can also create a Room using the Rooms REST API. Look at the REST API Rooms resource docs for more details.

Join a Room:

- If you’d like to join a Room you know already exists, you handle that exactly the same way as creating a room: just pass the Room name to the connect method.
- Once in a Room, you’ll receive a room:participantDidConnect: callback for each Participant that successfully joins. Querying the participants getter will return any existing Participants who have already joined the Room.

Setup Local Media:

- You can capture local media from your device’s microphone, camera or screen-share on different platforms in the following ways:

- In an iOS application, begin capturing audio data by creating a TVILocalAudioTrack, and begin capturing video by creating a TVILocalVideoTrack with an associated TVIVideoCapturer. The iOS Video SDK provides customizable video capturers for both camera and screen capture.

Specify tracks at connect time:

- When the client joins a Room, the client can specify which Tracks they wish to share with other Participants. Imagine we want to share the audio and video Tracks we created earlier.

Working with Remote Participants:

Handle Connected Participants:

- When you join a Room, Participants may already be present. You can check for existing Participants in the roomDidConnect: callback by using the participants getter.

Handle Participant Connection Events:

- We can use room(_ room: TVIRoom, participantDidConnect participant: TVIRemoteParticipant) method to check if any participat connects to the room. Similarly we can use room(_ room: TVIRoom, participantDidDisconnect participant: TVIRemoteParticipant) to see if any participant disconnects from the room.

Display a Remote Participant’s Video:

- To see the Video Tracks being sent by remote Participants, we need to render them to the screen:

Participating in a Room

Display a Camera Preview:

- The iOS SDK provides a means to render a local camera preview outside the context of an active Room:

Disconnect from a Room:

- You can disconnect from a Room you’re currently participating in. Other Participants will receive a participantDisconnected event.

Server-side control:

- The Programmable Video REST API allows you to control your video applications from your back-end server via HTTP requests. To learn more, check out the Programmable Video REST API docs.

Get Help from the Sunflower Lab Professionals

How to start video call after making a VoIP call?

Concept to switch from VoIP to video call is that when caller makes call, receiver receives VoIP call notification.
After that when receiver receives VoIP call, disconnect VoIP call from receiver side so the call will get disconnected from caller side also.
After disconnecting call from receiver side, we need to configure video call.
For caller side, when receiver disconnects VoIP call we will get delegate method for call disconnection so we can configure video call for caller side also.
In the case if receiver decline the incoming VoIP call, no video call screen will appear on receiver side but caller will get notified that VoIP call is disconnected so in that case we can’t identify if receiver has answered the call or declined the call because in both the cases we are disconnecting the VoIP call to switch to video call so we are going by regular procedure of configuring video call environment and will wait for 10 seconds to see if anyone joins to the room. If somebody joins that means receiver has answered the call other wise we will disconnect the video call.

There are two scenarios:

1. Caller Side
2. Receiver Side

Caller Side Implementation:

- After configuring VoIP call code, open video call screen with camera preview window.

- Other configuration for video call like creating audio, video tracks, access token configuration these all configuration is done after the other person picks up the call. This is to avoid audio session related issue that can occur because two libraries(TwilioVoice and TwilioVideo) that are using AudioSession.
- To know when user disconnects call, there is one method in TVOCallDelegate, call(_ call: TVOCall, didDisconnectWithError error: Error?). This method gets called when VoIP call disconnects.
- So at receiver’s end VoIP call needs to get disconnected after answering or declining the call. Logic for how call disconnects at receiver side will be described later.
- After call gets disconnected we can start configuration of access token or audio and video track and other necessary configuration for caller side that are described in quickstart guide of video call and in its documentation.
- We can identify from roomDidConnect(room: Room) method of RoomDelegate to check if user has successfully connected to the room.

- As caller enters in the room for video call, he will wait here for 10 seconds to see if anyone enters the room or not. We can do this by setting a boolean flag that is true if we are getting an event on didSubscribeToAudioTrack(audioTrack: RemoteAudioTrack, publication: RemoteAudioTrackPublication, participant: RemoteParticipant) or didSubscribeToVideoTrack(videoTrack: RemoteVideoTrack, publication: RemoteVideoTrackPublication, participant: RemoteParticipant) that means some user has joined the video call. These methods are from RemoteParticipantDelegate.
- Whenever you want to disconnect video call use self.room?.disconnect() method.

2. Receiver Side Implementation:

- When receiver receives the VoIP call, he has two options: 1) Answer 2) Decline
- When receiver answers the call performAnswerVoiceCall(uuid: UUID, completionHandler: @escaping (Bool) -> Void) gets called so we can set a boolean flag for call answered to true. We need to set another boolean flag for call disconnected to false in this method only. This call disconnection flag checks if VoIP call disconnected or not.

- Whenever user picks up the call after that application will come to the foreground so register a notification UIApplication.didBecomeActiveNotification to implement code for call disconnection when app comes to foreground.
- In the method that will get called when app comes to background, check for below three things:

- - User must be receiver (We can check this by setting a boolean flag when user makes call it is true otherwise its false)
  - VoIP call must be active. This means call is not yet disconnected. To check this we can use activeCall object already available in quickstart project for VoIP. If this object is not equal to nil means call is not yet disconnected.
  - We have set one boolean flag in performAnswerVoiceCall method to check for call disconnection. This flag must be false that means call is yet not disconnected.

- When all these three condition meets then only we can run the code for call disconnection by calling the method performEndCallAction(uuid: activeCall!.uuid).
- When call disconnected successfully, we can start video call configuration.
- Now when user declines the call caller will get call disconnection delegate event as we discussed in caller scenario.

Extra configuration for video call:

Voice related issue in video call:

- Problem-1: Sometimes it happens that while switching from VoIP to video call, you will receive voice of other participant from receiver instead of speaker by default even if you have override audio output to speaker in VoIP while configuring audio session.
- Solution-1:

- - When participant connects to room we are getting callback method didSubscribeToAudioTrack(audioTrack: RemoteAudioTrack, publication: RemoteAudioTrackPublication, participant: RemoteParticipant).
  - In this method, we can set audio configuration. If current audio output id built in receiver then we should override output audio port to speaker.

- Problem-2: If for example headphone or wireless headphone is attached before making the video call then default audio output port will be headphone, in this case if headphone is removed while video call is in progress then it can be possible that output audio port will be built in receiver which should be speaker in normal scenario.
- Solution-2:

- - When we attach or detach audio port from device, we can get notification by registering to AVAudioSession.routeChangeNotification. So register to this notification so in our scenario whenever user removes headphone, we can check route change reason key to see if device added or removed, if device is removed then we need to check if wired or wireless headphones or any other audio port is attached or not. If not then we can override audio port to speaker to avoid audio route set to built in receiver.

References:

Flow Diagrams:

- For VoIP:

Published by

Janki Thaker

5 years ago

Why Agentic AI Outperforms Traditional Approaches?

1 month ago

Data & Analytics

Databricks vs Snowflake vs Redshift

The workload type, financial concerns, and business use case…

1 month ago

AI/ML

Agentic AI: Your Autonomous Ally

While traditional AI tools provide recommendations or analyze data,…

2 months ago

AI/ML

Can Your Business Outsmart Uncertainty? How Agentic AI Drives Adaptive Intelligence

2 months ago

Data & Analytics

ROI vs ROO in Data Analytics

Are you measuring the correct things? This is where…

2 months ago

Data & Analytics

Databricks Lakehouse vs Traditional Data Warehouse: Which Offers Better ROI?

Databricks, an advanced solution that uses the Lakehouse architecture…

2 months ago

How to Implement VoIP Voice Call Using Twilio in iOS

General VIDEO call flow:

Steps to implement VoIP voice call using Twilio in iOS

Steps to implement video call using Twilio in iOS:

Working with Remote Participants:

Participating in a Room

How to start video call after making a VoIP call?

Extra configuration for video call:

References:

Recent Posts

Why Agentic AI Outperforms Traditional Approaches?

Databricks vs Snowflake vs Redshift

Agentic AI: Your Autonomous Ally

Can Your Business Outsmart Uncertainty? How Agentic AI Drives Adaptive Intelligence

ROI vs ROO in Data Analytics

Databricks Lakehouse vs Traditional Data Warehouse: Which Offers Better ROI?

Headline