Tim Gilbert led the effort to use NLP word vectors, entity recognition, and regular expressions to analyze 70 topics across corporate privacy policies from companies across 183 NAICS sectors to find the types of similarities and differences in coverage by sector, and identify which paragraph and sentences in each policy were semantically aligned with each topic.
One challenging part of the project was distinguishing between gathered data that company reserved the right to use internally vs data that it would be sharing/selling to other companies.
Example privacy topics analyzed:
- Kinds of unique identifiers collection
- Biometrics
- Communication logging
- Cookies, cross-origin histories and device tracking
- Interests
- Submitted content ownership
- Employment history
Example external recipients of collected data:
- Law enforcement and government entities
- Advertisers
- Joint ventures
- Parent companies
- 3rd party service providers
- Liquidation purchasers
- Credit bureaus
- Fulfillment agencies
- Brokers
- Analytics Companies