Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Text Embeddings Reveal (Almost) As Much As Text

Text Embeddings Reveal (Almost) As Much As Text

FromPapers Read on AI


Text Embeddings Reveal (Almost) As Much As Text

FromPapers Read on AI

ratings:
Length:
27 minutes
Released:
Oct 18, 2023
Format:
Podcast episode

Description

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a na\"ive model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover $92\%$ of $32\text{-token}$ text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

2023: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush



https://arxiv.org/pdf/2310.06816v1.pdf
Released:
Oct 18, 2023
Format:
Podcast episode

Titles in the series (100)

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.