Privacy Policies in the Age of AI

Surveyor 1 selfie, as enhanced by JPL, dated Aug. 9, 1966
Surveyor 1 selfie, as enhanced by JPL, dated Aug. 9, 1966

I have been thinking quite a lot about privacy notifications and how they would be far more useful to the user, or data subject, now that AI is really a thing. I believe that many would consent to use of their data if it wasn’t going to come out in strange and unusual ways. Some of the more popular AI engines are known to produce some rather strange and unusual manipulations of user data.

Current Thinking in Online Privacy

Currently, privacy policies are conceived around the framework laid out in Europe in the GDPR (General Data Protection Regulation). This thinking involves discrete elements, including collection, processing, transferring, and storage. While I think several of these concepts are useful, the processing part of the GDPR is an odd concept in computing, since “conducting a statistical analysis” and “correlating with other data collected on the subject” are both processing activities, but they have radically different implications to the data subject.

Nowhere does this distinction ring louder than with AI. The data put into the AI could be interpreted for many different facets by that AI and output as many different ways. It all depends on what you are training that AI to do. If you are training an AI to perform some service for other people which has little to do with the origin of the data, it’s a far stretch from trying to winnow something about interests, preferences, or proclivities of the subject themselves. Both are recognized uses for AI, both may use the same data, but both are not equivalent to the data subject.

I keep thinking of Zoom’s use of AI for meeting summarization. That application will be producing a very narrow output where, if it happens that sections of a meeting conducted very narrowly maps to a previous meeting fed into the system, the wild chance will occur that the data from the first meeting could be mirrored. However, even if the two perfectly coincide for a portion, the text summarizing the second meeting would never be attributed to or even acknowledge the first meeting’s existence. Recipients of the output would not even be aware of the lack of originality of their own meeting.

Relevant Information

Policies that state unambiguously how a subject’s data might exit the AI is more relevant than policies that disclose how data enters the AI. Subjects (and more importantly activists) are far more likely to object if they do not understand how particular pieces of information are used.

Most privacy policies express this to an extent to help subjects understand why their data must be used, but that’s not the required part. Crafters of policies should adopt the “how and why” for use as a matter of course. Yes, this limits developers and other internal functions to those specific uses, but it brings data use more in line with subject expectations.

Although current regulation surrounding privacy requires that a subject be notified that their data is being used or transferred to third parties, this is an area of privacy that is currently not regulated. The challenge is how to regulate how the granularity or specificity that is reported to the user, particularly in advanced computing where the internal workings of code are not entirely understood by the people who coded them.

Yet output metrics exist, otherwise code would not be useful or valuable. These goals or metrics are precisely what should be communicated to the user. Granted, there are certain marketing or advertising goals that organizations are reluctant to disclose. Those are the goals the activists seek to steer around for data subjects who give up their data. It is shortchanging the data subjects to play hide the ball on this type of use.

Third party use is also very difficult in this context, but we have been through this with both HIPAA and GDPR equivalency. “Selling data so agencies can better advertise to people” and “sharing data so third parties can support the application” are radically different reasons to share data, but they are treated equivalently under the current privacy thinking. If all data sharing agreements included a usage component, then both activists and data subjects would be happier with the current state of privacy.

Many Attorneys Poorly Equipped

Unfortunately, the status quo does not require the privacy attorney to grapple with even broad brush nuances of how data is being used. Many attorneys in the privacy space only have a vague notion of what goes into computation and are seldom apprised of the goals of that computation, although it is always readily available. When separated by organizational structure drafters of privacy policies must try to wrangle developers into describing what they are doing. Data in –> utility out is deceivingly simple, but not often discussed in those terms.