44 min listen
Text Embeddings Reveal (Almost) As Much As Text
Text Embeddings Reveal (Almost) As Much As Text
ratings:
Length:
27 minutes
Released:
Oct 18, 2023
Format:
Podcast episode
Description
How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a na\"ive model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover $92\%$ of $32\text{-token}$ text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.
2023: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush
https://arxiv.org/pdf/2310.06816v1.pdf
2023: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush
https://arxiv.org/pdf/2310.06816v1.pdf
Released:
Oct 18, 2023
Format:
Podcast episode
Titles in the series (100)
Communicative Agents for Software Development: Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elab... by Papers Read on AI