Social Network Analysis on the UEFA Champions League

Rahman Taufik
11 min readFeb 17, 2021

Introduction

In recent years, social network analysis (SNA) has become popular research along with the availability of a massive and diverse database. SNA initially was assumed as a homogeneous network which has relationship between similar object. However, most networks have more than one type of object, for example the relationship between authors based on topic, publication or conference venue. Network that are connected based on different types of objects was called heterogeneous information network (HIN) [1].

In this article, the champion league was chosen as a topic of social network anlysis because of the characteristics which contains many object, network and information as well. Furthermore, the objective is this article is to analyze social network both homogeneous and heterogeneous information network that occurs in the 2017–2018 UEFA Champions League matches.

The following are the research questions to achieve the objective:

  1. Which team, country or player has the most influence on the 2017–2018 UEFA Champions League?
  2. What are the important things that can affect the performance of the players and the team?
  3. What is the community structure shown in the 2017–2018 UEFA Champions League network?

Dataset

The dataset used for this analysis is obtained from kaggle and uefa, this includes teams, players and matches. The players dataset is obtained from kaggle with the number of data is 17980, while the matches and teams are obtained from UEFA with 32 teams and 128 matches.

Design and Implementation

This study proposes a homogeneous and heterogeneous approach to obtaining information from the champion league network, although the analysis focuses more on heterogeneity. In addition, it is also proposed measuring centrality to obtain the important nodes and community detection to identify a group of objects.

A. Homogeneous Network

The implementation of homogeneous network in this study was built based on the match dataset. Homogeneous network are network where the nodes have same function in the network [2]. There will be two network schemes in this homogeneous network, which were a network based on win and match. This network node represents the team, while the edge represents win and match. Both networks are directed networks, in addition, match network represented as unweighted network while win represented as weighted network.

The network schemes and the visualizations are follows:

  • TLT-1: This scheme represents the relationship between teams based on match. The visualization consist of team represented by black font and match represented by line.
  • TLT-2: This meta-path represents the relationship between teams based on winning team. The visualization consist of teams represented by black font and winning team represented by line, more bold more win.

B. Heterogeneous Network

Heterogeneous networks are networks where there are two or more classes of nodes categorized by both function and utility [2]. There are several heterogeneity network schemes that were built based on the players, teams and matches dataset. These network was undirected, it because the relationship has two-way direction, in addition, it was either weighted or unweighted.

Fig. 1 Heterogeneity network full scheme

The picture above is a completed network scheme built from the three datasets. Each dataset has subject represented by bold-underlined-font and attributes represented by normal-font.

Furthermore, this network scheme was developed into a meta-path. Meta-path was the path defined on the network scheme, which has a relation sequence between more than one object type and defines a new or existing relationship between objects.

Moreover, each network of meta-paths was visualized which has different type and form. The explanation of meta-paths that define based on network scheme and visualization results are follows:

  • LTFTL: This meta-path represents the relationship between teams and formations based on matches. The visualization consist of two node, which are teams represented by red font and formations represented by black font.
  • LTNTL: This meta-path represents the relationship between teams and nations based on matches. The visualization consist of two node, which are teams represented by red font and nations represented by black font.
  • TPNPT: This meta-path represents the relationship between players and nations based on teams. The visualization consist of two node, which are players represented by orange circle and nations represented by blue square.
  • TPPoPT: This meta-path represents the relationship between players and positions based on teams. The visualization consist of two node, which are players represented by orange circle and positions represented by blue square.
  • PoPSPPo: This meta-path represents the relationship between players and skills based on positions. The visualization consist of two node, which are players represented by orange circle and skills represented by blue square.
  • TLSLT: This meta-path represents the relation between team and goal numbers based on matches. The visualization consist of two node, which are winning teams represented by red font and scores represented by black font.
  • TLVLT: This meta-path represents the relation between team and venue based on winning team in a match. The visualization consist of two node, which are winning teams represented by red font and venue represented by black font.
  • TLPLT: This meta-path represents the relation between team and goalscorer based on the matches. The visualization consist of two node, which are goalscorers represented by orange circle and teams that have conceded goals represented by blue square.

Centrality Measurement

Centrality measurement was divided based on homogeneous and heterogeneous network. Each network have centrality measurement (i.e it use eigencentrality and betwenness centrality) and important nodes (i.e it use degree centrality) based on meta-paths that built. The following is centrality measurement result and important nodes for homogeneous network:

Important nodes by degree centrality:

  • TLT-1: Liverpool (13) | Real Madrid (13) | Bayern Munich (12)
  • TLT-2: Liverpool (13) | Real Madrid (13) | Bayern Munich (12)

Following is centrality centrality measurement and important nodes for heterogeneous network:

Important nodes by degree centrality:

  • LTFTL: 4–3–3 (0.104) | 4–2–3–1 (0.099) | 4–4–2 (0.699)
  • LTNTL: England (0.069) | Italy (0.057) | Spain (0.057)
  • TPNPT: France (0.063) | Brazil (0.061) | Portugal (0.052)
  • TPPoPT: CB (0.066) | GK (0.055) | CM (0.053)
  • PoPSPPo: Speed (0.116) | Dribbling (0.086) | Interceptions (0.067)
  • TLSLT: 1–2 (0.2) | 3–4 (0.154) | 5–6 (0.062)
  • TLVLT: Real Madrid (0.04) | Bayern Munich (0.032) | Chelsea (0.028)
  • TLPLT: Salah (0.0082) | Firmino (0.0082) | Mane (0.0081)

Community Detection

The implementation of community detection was used the Louvain method and was not carried out on all networks, which were only PoPSPPo and TPPoPT meta-paths. This was intended to identify a group of objects on a networks, so that only representative network of data were applied for community detection.

Fig. 2 Community detection of TPPoPT (a) and PoPSPPo (b)

The figure 2 (a) is the result of community detection of TPPoPT. This represents a group of player position with another players based on the modularity of the relationship density between players in each team. While the 2 (b) is the results of community detection of PoPSPPo. This represents a group of players skills based on the modularity of the relationship density between players within the positions

The community detection of TPPoPT were divided into 5 groups including:

  • Blue group: RWB, RB, LB, LWB, RM, LM
  • Red group: CB
  • Orange group: GK
  • Green group: CDM, CM, CAM, LW, CF, RW
  • Light blue group: ST

The community detection of PoPSPPo were divided into 5 groups including:

  • Yellow group: Heading, Interception
  • Green group: GK
  • Light blue group: Dribbling, Finishing
  • Blue group: Crossing
  • Red group: Speed

Analysis

The information obtained from the implementation of this champions league dataset includes network visualization, important nodes, centrality measurement and community detection. Each information has different results based on the proposed meta-path, in addition, these results were divided by type (homogeneous or heterogeneous), characteristic (weighted/unweighted and directed/undirected), and density based on centrality measurement (sparse or dense). The following is a summary of the proposed meta-paths:

The centrality measurement for all networks propose degree centrality, eigencentrality and betwenness centrality. These centrality measurement were intended to determine the network quality (i.e eigencentrality and betwenness centrality) and important nodes (i.e degree centrality). The average results of eigencentrality on homogenoeus and heterogeneous network were above 0.5, which indicate both networks have nodes that have a good influence on their neighbors. While, for average results of betwenness centrality, both networks have average score below 0.5, which indicate there are fewer shortest-path in the network. Thus, the network quality of homogenoeus and hetergeneous networks were average quality, it based on the average result of eigencentrality and betwenness centrality.

Even though we already know that the best team in the 2017–2018 champions league is Real Madrid for the champion team, Liverpool for the second winner and Bayern Munich for the third winner, but we still measure the centrality because we want to know another important information related to networks. Thus, the important nodes on homogeneous network represents the best team, either based on match (TLT-1) or winning team (TLT-2). Both in the TLT-1 and TLT-2, the results of important nodes were almost the same as the original results, although both Liverpool and Real Madrid have same score, which is 13. It could be caused by a network factor which is the number of matches and goals.

On the other hand, the important nodes in heterogeneous network were divided based on meta-paths. Each meta-paths were proposed based on the objective and network schemes. First network scheme is team, these scheme have LTFTL and LTNTL meta-paths. The important nodes in LTFTL were 4–3–3, 4–2–3–1 and 4–4–2 formations, these formations were proved to be the formation for the winning teams. While, the important nodes in LTNTL were England, Italy and Spain leagues that send many teams in the 2017–2018 UEFA champions league. Second network scheme is player, these schemes have TPNPT, TPPoPT and PoPSPPo meta-paths. TPNPT represents the relationship between player in all teams based on nations and the important nodes results were France, Brazil and Portugal. TPPoPT represents the relationship between player in all teams based on positions and the important nodes results were CB, GK, and CM. PoPSPPo represents the relationship between position player in all teams based on skills and the important nodes results were speed, dribling and interception. Third network schemes is league, these schemes were TLSLT, TLVLT and TLPLT meta-paths. TLSLT represents the relationship between winning team based on scores and the important nodes results were 1–2, 3–4 and 5–6 goals. TLVLT represents the relationship between winning team based on venue and the important nodes results were Real Madrid, Bayern Munich and Chelsea. TLPLT represents the relationship between team match based on player scorer and the important nodes results were Salah, Mane and Firminho.

Furthermore, the community was detected, especially only the meta-paths of the TPPoPT and PoPSPPo because they had diverse and dense data. In TPPoPT, the objective was grouping of player position, it is because player position can affect the game. For example, the first group was RWB, RB, LB, LWB, RM, LM, this group has a sideways defense, the second group was CDM, CM, CAM, LW, CF, RW, this group has a sideways attacking, etc. While, in PoPSPPo, the objective was grouping of players skill based on position. The results examples were heading and interception group skills, these skills were owned by many defenders, or dribbling and finishing group skills, these skills was owned by CM, which is one of important position related to TPPoPT results.

Conclusion

To answer the first question (i.e which team, country or player has the most influence on the 2017–2018 UEFA Champions League), the answers are Real Madrid for team, France for Country and Salah for player. First, Real Madrid was champions in the 2018–2018 Champion League, while, in the implementation of this study, the homogeneous network shows that Real Madrid is the best team even though have same score with Liverpool. Second, France was the most influential nations in the 2018–2018 Champion League, this was obtained from the TPNPT network which shows that many players come from France, although the LTNTL network shows England as the nation that sends the most teams, but in face, there were many players who come from France in the English league. Last, the most important player was Mohamad Salah, this player is a Right Winger (RW) for Liverpool, it was known that Liverpool was the second winner. Although the best player in 2017–2018 Champion League was Christiano Ronaldo, but his goals were not scored in in many matches (i.e only a few match he scored a lot of goals), while Mohamad Salah had goal contribution in each match.

Furthermore, to answer the second question (i.e what are the important things that can affect the performance of the players and the team). The important things for the team can be answered by analysis of LTFTL. The important node in LTFTL was 4–3–3, 4–2–3–1 and 4–4-2 formations. These formations can be used as references because it proven which was owned by best teams, such as Real Madrid with 4–4–3, Liverpool and Bayern Munich with 4–3–3. While, the important things for the players can be answered by analisys of PoPSPPo. The PoPSPPo can be used as a reference for player skills, these skills including speed, dribbling and interception. These skill was proven and owned by many players in many important position.

For third question (i.e What is the community structure shown in the 2017–2018 UEFA Champions League network). The answer is can be seen in TPoPT and PoPSPPo network schemes. TPoPT shows the community of players position that can affect the match, while PoPSPPo shows the community of players skill that can affect the players position.

Based on the result and analysis of the social network implementation for 2017–2018 champions league. The social network anlaysis can provide a lot of information, either general information or hidden information. In addition, networks can be formed not only based on one relationship (homogeneous) but can be more than one relationship (heterogeneous).

References

[1] Shi, C., Li, Y., Zhang, J., Sun, Y., & Philip, S. Y. (2016). A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 29(1), 17–37.

[2] https://guides.co/g/the-network-effects-bible/121732

Source

--

--