As the Web continues its evolution into a powerful application platform, an increasing number of its additional abilities risk compromising user privacy unless they are specifically created to promote it. Privacy has to become a core feature of the Web.
This document is focused on those who develop specifications. Their design will be read by others who implement and deploy. As such, this document aims to provide them enough guidance so that they consider privacy during the design and convey powerful messages to those who implement and deploy.
This document does not attempt to define what privacy is (in a Web context). Instead privacy is the sum of what is contained in this document. While this may not be exactly what most readers would typically assume but privacy is a complicated concept with a rich history that spans many disciplines and there remains confusion over the meaning.
Whether any individual document warrants a specific privacy considerations section will depend on the document's content. In some cases privacy may be part of the security considerations or may not be relevant to certain specifications at all. In some other cases a fruitful discussion about privacy properties of a system requires several specifications to be evaluated in concert to offer meaningful guidance.
Note that this document does not try to attempt to define the term 'privacy' itself. Instead privacy is the sum of what is contained in this document. We therefore follow the approach taken by [[RFC3552]] as well as [[IAB-PRIVACY-CONSIDERATIONS]]. Examples of several different brief definitions are provided in [[RFC4949]].
This section defines basic terms used in this document, with references to pre-existing definitions as appropriate.
There are a wide range of privacy concerns and on a high level they fall into the following categories.
To mitigate various privacy threats presented above techniques like data minimisation, anonymity and pseudonymity techniques, identity confidentiality, user participation, and security protection can be applied.
The starting point for addressing privacy in the design of a protocol, in its implementation and the deployment is to acknowledge that there are potential privacy concerns. Documenting these concerns is an important step to develop a shared understanding is the prerequisite for deciding whether there are ways to mitigate these concerns, either with an alternative protocol design, guidance for implementers, or with recommendations for those who deploy the technology.
The communication architecture defines what entities are involved in the data exchange and what information they get to see. Changing the underlying security and privacy foundation typically leads to modifications of the entire architecture. Consequently, it is wise to consider security as well as privacy early in the design process. With regard to the level of privacy and security investigations the goal is not to be exhaustive (such as by producing as much text as possible) but rather to capture the most important aspects. Unfortunately, security and privacy experts will often not be available in the early design phase since there are too few of them around. Hence, it is important to produce a writeup that allows those from the security and privacy community to quickly capture the main architectural spirit of the newly designed API feature, or protocol. It is too easy to get lost in details and feedback from external reviewers will ensure that potential concerns are identified early. Basic guidance on how to produce such high-level writeups is already available with [[RFC4101]].
Since privacy builds on top of security proper security protection needs to be provided. In addition to the guidance for protocol designers provided in RFC 3552 for writing security consideration text the following questions are relevant.
The guidance given below is not a hard rule but instead aims to make you think about your design. Not all items are equally applicable to all parties. For example, notice and consent is best realized by parties that implement and deploy software rather than by those who purely develop specifications since many deployments use non-standardized functionality for informing their user base about the privacy properties of their service and for obtaining the users consent before they are able to use the service. In other cases again, users are given the opportunity to make consent decisions in real-time during the communication interaction before sharing data.
Minimisation is a strategy that involves exposing as little information to other communication partners as is required for a given operation to complete. More specifically, it requires not providing access to more information than was apparent in the user-mediated access or allowing the user some control over which information exactly is provided.
For example, if the user has provided access to a given file, the object representing that should not make it possible to obtain information about that file's parent directory and its contents as that is clearly not what is expected.
Basic fingerprinting guidance can be found here. The following questions are relevant for specification authors: Is the designed protocol vulnerable to fingerprinting? If so, how? Can it be re-designed, configured, or deployed to reduce or eliminate this fingerprinting vulnerability? If not, why not?
In context of data minimisation it is natural to ask what data is passed around between the different parties, how persistent the data items and identifiers are, and whether there are correlation possibilities between different protocol runs.
For example, the W3C Device APIs Working Group has defined the following requirements in their Device API Privacy Requirements document.
Data minimisation is applicable to specification authors, implementers as well as to those deploying the final service.
The following questions arise concerning data minimalization:
As an example, consider mouse events. When a page is loaded, the application has no way of knowing whether a mouse is attached, what type of mouse it is (e.g., make and model), what kind of capabilities it exposes, how many are attached, and so on. Only when the user decides to use the mouse — presumably because it is required for interaction — does some of this information become available. And even then, only a minimum of information is exposed: you could not know whether it is a trackpad for instance, and the fact that it may have a right button is only exposed if it is used. For instance, the Gamepad API makes use of this data minisation capability. It is impossible for a Web game to know if the user agent has access to gamepads, how many there are, what their capabilities are, etc. It is simply assumed that if the user wishes to interact with the game through the gamepad then she will know when to action it — and actioning it will provide the application with all the information that it needs to operate (but no more than that).
The way in which the functionality is supported for the mouse is simply by only providing information on the mouse's behaviour when certain events take place. The approach is therefore to expose event handling (e.g., triggering on click, move, button press) as the sole interface to the device.
For implementers and those who deploy services is the ability to share data selectively provided? Is a user given the obtain to report false information or data with less granularity (fuzzified information)?
Designing privacy features into a protocol or architecture often requires tradeoffs to be made. As a designer we often make these decisions without spending too much thoughts about them. For others, this decision making process is, however, important to judge the value of certain design decisions.
Does the protocol make trade-offs between privacy and usability, privacy and efficiency, privacy and implementability, privacy and security, or privacy and other design goals? Capture these trade-offs and the rationale for the design chosen.
Protocol often come with flexible options so that it can be tailored to specific environments. Does the default mode minimize the amount, identifiability, and persistence of the data and identifiers exposed by the protocol? Does the default mode or option maximize the opportunity for user participation? Does it provide the strictest security features of all the modes/options?
If any of these answers are no, explain why less protective defaults were chosen.
Many Web applications collect data and allow it to be shared with other parties through APIs and other protocol mechanisms.
Data collection: What mechanism for obtaining consent does the specification develop forsee to be used before collecting data starts? Is a user able to access the data that is collected about him or her that has implications for the protocol, API, or extension being defined?
Data sharing: What controls or consent mechanisms does the protocol define or require before personal data are shared with other parties (e.g., via the protocol or API)? Does the user have control over their data after it has been shared with other parties, for example regarding secondary use? Are users able to determine what information was shared with other parties (as part of an audit log)?
What recommendations can be given to implementers and those who deploy regarding privacy-friendly collection and sharing of information for the technology being standardized? This in particular refers to the ability to provide additional information about why the sharing is taking place (the purpose), what information is shared and with whom. Furthermore relevant is to control the degree (or the granularity) of information sharing. Will the data subject be given enough context to make an informed decision? Is it anticipated that the decision about granting sharing with a particular party be made persistent (i.e., cached) so that the user is not repeately asked? If so, for how long is the decision cached? How can it be revoked? Is there an anticipated way for a user to determine what decisions have been cached? For data that is collected is the user able to retrieve that data in an electronic format (data portability)? Are there standardized data formats (if they exist) used for the data export? What ability for deleting data previously collected is given to the user? Is a user given the ability to delete all collected data or is data deletion applied only selectively.
We would like to thank Karl Dubost, David Singer, Robin Wilton, Frank Dawson, and Frederick Hirsch. We are particularly thankful to the support from the W3C Privacy Interest Group (PING) chairs, Christine Runnegar, and Tara Whalen.