Text Embeddings Reveal (Almost) As Much As Text

FromPapers Read on AI

Start listening View podcast show

Text Embeddings Reveal (Almost) As Much As Text

FromPapers Read on AI

ratings:

Length:

27 minutes

Released:

Oct 18, 2023

Format:

Podcast episode

Description

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a na\"ive model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover $92\%$ of $32\text{-token}$ text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

2023: John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush

https://arxiv.org/pdf/2310.06816v1.pdf

Released:

Oct 18, 2023

Format:

Podcast episode

Titles in the series (100)

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Text Embeddings Reveal (Almost) As Much As Text

Text Embeddings Reveal (Almost) As Much As Text

Description

Titles in the series (100)

More Episodes from Papers Read on AI

Related podcast episodes