Negotiating Modality in Real-Time Communications

A mechanism for negotiating human language for real-time communication is specified in . The indication of language preference is expressed per media and specified in SDP attributes 'hlang-send' and 'hlang-recv'. Negotiation of language can take place by the answering part selecting from the languages, media and direction alternatives expressed by the offering part. Languages are expressed by using language-tags as specified in BCP 47 . When starting a conversation in a media-rich environment, the users may have very specific preferences for using one modality (spoken, written or signed) over other possible but less preferred modalities. In traditional call establishment, it is the answering part who is expected to start the conversation by a greeting. In the media-rich environment, the modality and language of this greeting sets the expectations for what modality and language to mainly use in the session. Deviation from this initial expectation is usually possible during the session by mutual agreement between the participants, but may be time consuming and cause uncertainty. A way for the parties to not only indicate alternative languages and modalities for the communication directions in the session, but also indicate preference for specific modalities per direction provides the opportunity to more exactly describe the desired language communication for a session, while still providing information about less preferred alternatives. This specification extends with a mechanism for indicating modality preference by a condensed notation integrated with the syntax of the language indications of . The expected application area is wide. By old tradition, the most common modality for real-time interaction is spoken communication. In some settings, e.g. where silence is required, it may be desirable to express a preference for using written communication, while still leaving a possibility open for traditional spoken communication by an indication on lower preference level. For persons having full ability to both use sign language and spoken language, but not wanting to force the other party to bring in a sign language interpreter in the call, it may be of importance to be able to indicate the sign language capability on a lower preference level and the spoken laanguage capability on a higher level. Some persons with disabilities may strongly prefer to conduct a written conversation, while still wanting to express that a spoken conversation is possible as a last resort. Many other situations exist in the media-rich communication environment when the media preference indication is of value for a smooth initiation of a real-time session.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

This specification extends the use of the asterisk in the 'hlang'send' and 'hlang-recv' SDP attributes introduced by . In , the asterisk appended at the end of the attribute value indicates a preference to not get the call denied if no languages match. This specification adds the following meaning of the asterisk: In an offer or answer, a 'hlang-send' or 'hlang-recv' attribute value MAY have an asterisk appended as the final token. An asterisk appended to a value in an offer indicates a the caller has higher preference for the corresponding modality to be used in the specified direction than other modalities for the indicated direction without an asterisk. In an answer, the asterisk indicates a modality that is preferred by the callee to be used in the session. A user may have a clear preference to use one specific modality in a direction, while use of other modalities may be acceptable but lower in preference. This condition MAY be indicated by appending an asterisk as the last parameter in the corresponding 'hlang-' value. Note that the asterisk appended at the end of a 'hlang-' attribute value also should also be seen as a preference to not have the call denied even if no indicated languages are in common as specified in . When negotiating language use for a direction, languages and modalities specified together with the asterisk should be given preference to be selected for use. If there is no specific preference between modalities in the same direction, this condition should be indicated by appending an asterisk on all or no 'hlang-' values for that direction.

If no modality preference is indicated in any 'hlang-' attribute by no attached asterisk, this should also be taken as a preference by the caller to get the call denied if no languages are in common between the caller and the callee. A caller with language capabilities in multiple media, but no specific modality preferences should attach the asterisk to all 'hlang-' attributes in at least one direction for indication that the call should not be denied. If there is a preference for denying the call when no languages match, no asterisk should be appended on any 'hlang-' attribute value, and then it is not possible to indicate any preferred modality at the same time.

- - Interaction with simultaneity indication - -

An offer requesting the following media streams: audio for the caller to send using spoken English (most preferred modality) or American Sign Language (less preferred modality), audio for the caller to receive spoken English (most preferred modality) or American Sign Language (less preferred modality), supplemental text. The offer also requests that the call proceed even if the callee does not support any of the languages. The offer is likely from a hearing person with knowledge in sign language: m=text 45020 RTP/AVP 103 104 m=audio 49250 RTP/AVP 20 a=hlang-recv:en * a=hlang-send:en * m=video 51372 RTP/AVP 31 32 a=hlang-recv: ase a=hlang-send: ase An answer for the above offer, indicating video in which the callee will send and receive American Sign Language, because that callee had no capability for spoken English. The text and audio streams are opened as supplementary streams. m=text 45020 RTP/AVP 103 104 m=audio 49250 RTP/AVP 20 m=video 51372 RTP/AVP 31 32 a=hlang-send: ase a=hlang-recv: ase An offer requesting the following media streams: audio for the caller to send using spoken French (most preferred modality) or written French (less preferred modality), text for the caller to receive written French. The offer also requests that the call proceed even if the callee does not support any of the languages. Video is supplemental.The offer is likely from a hard-of-hearing person with no use of received spoken language and a preference to use spoken language rather than type French: m=text 45020 RTP/AVP 103 104 a=hlang-send:fr a=hlang-recv:fr m=audio 49250 RTP/AVP 20 a=hlang-send:fr * m=video 51372 RTP/AVP 31 32 An answer for the above offer, indicating text in which the callee will send written French, and audio in which the callee is prepared to receive spoken French. The video stream is opened as a supplementary stream. m=text 45020 RTP/AVP 103 104 a=hlang-send: fr m=audio 49250 RTP/AVP 20 a=hlang-recv: fr m=video 51372 RTP/AVP 31 32

Thanks to Randall Gellens for providing the background for this extension. Brian Rosen and Paul Kyzivat for thorough discussions and guidance.