# How is information compressed in pdf?

I’m trying to read the author(title, keywords) field of a pdf file using php. I used TCPDF, while the file is parsed into objects, but the fields are not pure but contain impurities from different characters – is it possible that this is some kind of compression and if so, how to get rid of them?

The following field is parsed, although in fact it is pure:

`��<�?xml version='1.0' encoding='cp1251'?><�stamps><�stamp></�stamp></�stamps>`

``````str_replace('�', '', \$data); // xD