Introduction

In recent years, social network analysis (SNA) has become popular research along with the availability of a massive and diverse database. SNA initially was assumed as a homogeneous network which has relationship between similar object. However, most networks have more than one type of object, for example the relationship between authors based on topic, publication or conference venue. Network that are connected based on different types of objects was called heterogeneous information network (HIN) [1].

In this article, the champion league was chosen as a topic of social network anlysis because of the characteristics which contains many object, network and…


Motivation

Problem want to solve?

About 83% of the SEO respondents have experienced keyword cannibalization which occurs the multiple pages of a site target has same keywords [1]. In addition, the client or company had used the same key phrase for every article on their site before they become their SEO, and sometime they revise the article that affect to keywords cannibalization. According to the paragraph, the problem that want to solve is as follow:

How to generate keyword list automatically according to the article?

Strategic Goal?

The following is strategies to achieve this goal:

  • Extract keywords from an article
  • Generate to keywords list

Data Model

Image for post
Image for post
Figure 1 The data model is based on strategic goal

Introduction

Stock market investment is an area of interest for the long term by many people. The investment is intended to secure or increase their finances, even if it can be a profit or loss. There are many factors in the stock market that affect the profit and losses of a stock, including news. People read news to understand what is happening and what might happen in the future. News can explain the economic condition and touch people’s sentiment. There is a correlation between news, people sentiment and economic conditions (Damstra & Boukes, 2018).

In the past few years, many research…


Image for post
Image for post

Word embedding is a one of the most popular language modeling in Natural Language Processing (NLP) where words or phrases from the vocabulary are mapped to a multi-dimensional vector space. It is able to capturing the context of a word, semantic and syntactic similarity, relation to other words, etc.

The most famous example to describe about word embedding and how they can be added or subtracted is ‘Queen’ word. This word is obtained in adding the vectors associated with the words king and woman while subtracting man is equal to the vector associated with queen.

King - Man + Women…


Image for post
Image for post
Sources image from https://rare-technologies.com/tag/keyword-extraction/

Since news and articles portal are available on internet in massive numbers, it is almost impossible for users to understand and process that information at once. However, this is no longer a limitation for users to get keyword in many documents, because we can use Natural Language Processing (NLP) tools to extract keyword, such as NLTK, Gensim, fastText, sklearn (these are python libraries).

In this article, we will extract keyword from stock market news using Tf-Idf method from python sklearn library. Tf-Idf is weighting words processing that is intended to reflect how important a word in document. There are two…


Sentiment analysis is a text mining process that extract opinions in text. It refers to natural language processing (NLP) and is widely applied to get opinion in social media, customer reviews and survey responses.

Basically, sentiment analysis has three sentiments, which are positive, negative and neutral. Here is the example of sentiment analysis from monkeylearn.com

Image for post
Image for post

In this article, the sentiment analysis is used to get sentiment from stock market news. We review the news whether it has a positive, negative or neutral.

The tool we use is TextBlob which is a Python library for natural language processing textual data. TextBlob…


Basically, we can use several web scraping tools (e.g. BeautifulSoup, Scrapy, Selenium, etc.) to extract information from google. For this article, author use BeautifulSoup because it is easy to implement. Actually, it depends on what you can or you are comfortable with.

Furthermore, this article explain about how to scrape from Google and how to deal with google query and request limitations. There are python code examples for google scraping, including Google News and common Google Search

Image for post
Image for post
Google News

Get Stock News from Google News

Google News Scraping Code

Google News is used to search for news from several publishers. …


Image for post
Image for post

Have a web scraping problem when website must be logged in first?

Well, we can use Selenium for that problem. Basically, selenium is used for automated testing web validation, but it can also be used for scraping, because it can be controlled automatically by scripts, easily work with javascript, DOM or complex html tags

For example, we try to scrap news from websites that need to logged first, such as www.wsj.com or www.barrons.com

The first thing we do is install libraries, including selenium python library, webdriver manager library and import several selenium functions in your file

The Libraries

Create your…

Rahman Taufik

Software Engineer | Full Stack Web Developer | Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store