© 2014 Elsevier Ltd. All rights reserved. Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by microaggregation, a popular protection method from statistical disclosure control, and adapted to work with sparse and high dimensional data sets.
- Data mining
- Information loss
- Privacy preserving
- Sparse data 10.1016/j.cose.2014.11.005
- Vector space