Scrape Bank Table Pandas

We’ll download data tables from a web page directly with pandas. There is no need to send a request or use an API

Case Study

  • Use Wikipedia to download a table showing the list of the largest banks in the world by market capitalization
  • Save in df
  • View top 10

Scrape Table

import pandas as pd

# data is at 
URL = 'https://en.wikipedia.org/wiki/List_of_largest_banks'

# read tables on that page using pandas read_html()
tables = pd.read_html(URL)

# Extract the first table
df = tables[0]
df.head(10)
Rank Bank name Total assets (2023) (US$ billion)
0 1 Industrial and Commercial Bank of China 6303.44
1 2 Agricultural Bank of China 5623.12
2 3 China Construction Bank 5400.28
3 4 Bank of China 4578.28
4 5 JPMorgan Chase 3875.39
5 6 Bank of America 3180.15
6 7 HSBC 2919.84
7 8 BNP Paribas 2867.44
8 9 Mitsubishi UFJ Financial Group 2816.77
9 10 Crédit Agricole 2736.95