Analysis primer
Analysis primer
We map social challenges of the tech world using text mining
➀ We identified 6 umbrella topics related to social challenges of internet technologies
We have assigned keywords for each topic. Those keywords were used for article retrieval
➁ We use tech articles shared on Twitter
, Reddit
and Hackernews 
We extract article texts and meta data using Python package Newspaper3k
➂ Our dataset consists of 111k articles
➃ We cluster the articles based on their similarity
Text data can be treated as high dimensional vectors. Reducing dimensionality and preserving meaningful clusters is a well known challenge in the text mining field. We have applied an original algorithm combination (t-SNE using single perplexity 50 and Gaussian mixture) which proved to be effective in producing coherent maps of articles.
Top domains identified in the articles
The most frequently occurring words in the articles