We map social challenges of the tech world using text mining
➀ We identified 6 umbrella topics related to social challenges of internet technologies
We have assigned keywords for each topic. Those keywords were used for article retrieval
➁ We use tech articles shared on Twitter , Reddit and Hackernews
We extract article texts and meta data using Python package Newspaper3k
➂ Our dataset consists of 111k articles
➃ We cluster the articles based on their similarity
Text data can be treated as high dimensional vectors. Reducing dimensionality and preserving meaningful clusters is a well known challenge in the text mining field. We have applied an original algorithm combination (t-SNE using single perplexity 50 and Gaussian mixture) which proved to be effective in producing coherent maps of articles.