import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
%matplotlib inline

pd.set_option("display.float_format", lambda x: f"{x:.3g}")

lsoa = pd.read_csv("data/LSOA_data.csv")

# Displaying dataset information
print("A basic summary of the dataframe:\n")
print(lsoa.info())

A basic summary of the dataframe:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33755 entries, 0 to 33754
Data columns (total 23 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   LSOAName    33755 non-null  object
 1   LSOACode    33755 non-null  object
 2   PartOfCode  33755 non-null  object
 3   PartOfName  33755 non-null  object
 4   Total       33755 non-null  int64 
 5   Age4Under   33755 non-null  int64 
 6   Age5to9     33755 non-null  int64 
 7   Age10to14   33755 non-null  int64 
 8   Age15to19   33755 non-null  int64 
 9   Age20to24   33755 non-null  int64 
 10  Age25to29   33755 non-null  int64 
 11  Age30to34   33755 non-null  int64 
 12  Age35to39   33755 non-null  int64 
 13  Age40to44   33755 non-null  int64 
 14  Age45to49   33755 non-null  int64 
 15  Age50to54   33755 non-null  int64 
 16  Age55to59   33755 non-null  int64 
 17  Age60to64   33755 non-null  int64 
 18  Age65to69   33755 non-null  int64 
 19  Age70to74   33755 non-null  int64 
 20  Age75to79   33755 non-null  int64 
 21  Age80to84   33755 non-null  int64 
 22  Age85Over   33755 non-null  int64 
dtypes: int64(19), object(4)
memory usage: 5.9+ MB
None

"E06000001" in lsoa["PartOfCode"]

False

# Printing the first and last 5 rows
print("First 5 rows:\n", lsoa.head(),"\n", "\nLast 5 rows:\n", lsoa.tail())

First 5 rows:
                     LSOAName   LSOACode PartOfCode            PartOfName  \
0        City of London 001A  E01000001  E09000001        City of London   
1        City of London 001B  E01000002  E09000001        City of London   
2        City of London 001C  E01000003  E09000001        City of London   
3        City of London 001E  E01000005  E09000001        City of London   
4  Barking and Dagenham 016A  E01000006  E09000002  Barking and Dagenham   

   Total  Age4Under  Age5to9  Age10to14  Age15to19  Age20to24  ...  Age40to44  \
0   1473         52       34         32         23         90  ...        114   
1   1384         33       24         22         31        100  ...         92   
2   1613         39       32         33         23         96  ...        111   
3   1101         52       45         35         89        118  ...         61   
4   1842        153      127        110        122        124  ...        164   

   Age45to49  Age50to54  Age55to59  Age60to64  Age65to69  Age70to74  \
0        105         89         73         83        119        102   
1         98        122         88         87         76         69   
2        113        155        118        111         86         85   
3         58         87         82         67         35         26   
4        153        121         85         70         66         41   

   Age75to79  Age80to84  Age85Over  
0         57         57         35  
1         59         43         30  
2         50         31         33  
3         17         14         12  
4         18         17         16  

[5 rows x 23 columns] 
 
Last 5 rows:
                        LSOAName   LSOACode PartOfCode           PartOfName  \
33750  Vale of White Horse 014H  E01035758  E07000180  Vale of White Horse   
33751  Vale of White Horse 015G  E01035759  E07000180  Vale of White Horse   
33752  Vale of White Horse 015H  E01035760  E07000180  Vale of White Horse   
33753  Vale of White Horse 015I  E01035761  E07000180  Vale of White Horse   
33754     West Oxfordshire 004H  E01035762  E07000181     West Oxfordshire   

       Total  Age4Under  Age5to9  Age10to14  Age15to19  Age20to24  ...  \
33750   1169         39       45         58         38         63  ...   
33751   1519        116      107         82         66         62  ...   
33752   1610        206      152         97         49         59  ...   
33753   1609        184      121         98         58         62  ...   
33754   1465         74       97         85         69         59  ...   

       Age40to44  Age45to49  Age50to54  Age55to59  Age60to64  Age65to69  \
33750         67         76         81         57         49         57   
33751         91        100        111        106        102         56   
33752        156         62         41         30         24         14   
33753        113         84         66         50         37         50   
33754         79         98        109        113         96        100   

       Age70to74  Age75to79  Age80to84  Age85Over  
33750         70         46         70         77  
33751         75         58         29         32  
33752         15         13         20         33  
33753         40         23         32         29  
33754         94         72         39         49  

[5 rows x 23 columns]

# Summing the Total column
eng_pop = lsoa["Total"].sum()
print("Total population of England:\n" + str(eng_pop))

Total population of England:
56490091

# Automatically selecting all columns that start with "Age"
age_columns = lsoa.filter(like="Age").columns

# Calculating proportions
age_sums = lsoa[age_columns].sum()
age_proportions = age_sums / eng_pop

print("Age groups and their overall population proportions for England (to 3 s.f.):\n")
print(age_proportions)

Age groups and their overall population proportions for England (to 3 s.f.):

Age4Under   0.0545
Age5to9     0.0593
Age10to14   0.0604
Age15to19    0.057
Age20to24   0.0604
Age25to29   0.0658
Age30to34     0.07
Age35to39   0.0672
Age40to44   0.0634
Age45to49   0.0638
Age50to54   0.0692
Age55to59   0.0674
Age60to64   0.0576
Age65to69    0.049
Age70to74   0.0495
Age75to79   0.0361
Age80to84   0.0252
Age85Over   0.0243
dtype: float64

# Plotting and formatting the bar graph
style.use("fivethirtyeight")
plt.figure(figsize = (7, 4))
plt.bar(age_columns, age_proportions)

plt.text(-3.3, 0.085, 
         "Proportion of England's population by age range", 
         size = 14, weight = "bold", color = "black")
plt.text(-3.3, 0.079, 
         "From the 2021 UK census", 
         size = 13, color = "black")
plt.text(-3.3, -0.03, 
         "RASHAD MALIK" + " " * 50 + "Source: Office for National Statistics", 
         color = "#f0f0f0", 
         backgroundcolor = "#4d4d4d", 
         fontsize=12)

plt.ylabel("Proportion", fontsize = 10, fontweight = "bold", labelpad = 10)
plt.xlabel("Age ranges", fontsize = 10, fontweight = "bold", labelpad = 10)

plt.xticks(rotation = 65, fontsize = 9)
plt.yticks(fontsize = 10)
plt.minorticks_on()

plt.grid(True, which="both", axis="y")
plt.grid(True, which="minor", axis="y", linestyle=":", linewidth=0.5)

plt.show()

# Defining the LSOA areas that we will observe
lsoa_codes = ["E01005044", "E01020395", "E01009136"]
lsoa_names = ["Bury 026E", "Dorset 024A", "Birmingham 014E"]

# Filtering for the required areas, and setting the index to the name of the area
filtered_lsoa = lsoa[lsoa["LSOACode"].isin(lsoa_codes)]
filtered_lsoa = filtered_lsoa.set_index('LSOAName')

# Calculate proportions for each LSOA
lsoa_proportions = filtered_lsoa[age_columns].div(filtered_lsoa["Total"], axis=0)
lsoa_proportions.loc["England Overall"] = age_proportions

print("The new dataframe with the required proportions (to 3 s.f.):\n")
print(lsoa_proportions)

The new dataframe with the required proportions (to 3 s.f.):

                 Age4Under  Age5to9  Age10to14  Age15to19  Age20to24  \
LSOAName                                                               
Bury 026E             0.11    0.139      0.149     0.0984     0.0423   
Birmingham 014E     0.0603    0.058     0.0517     0.0631     0.0506   
Dorset 024A         0.0115    0.022     0.0257     0.0315     0.0388   
England Overall     0.0545   0.0593     0.0604      0.057     0.0604   

                 Age25to29  Age30to34  Age35to39  Age40to44  Age45to49  \
LSOAName                                                                 
Bury 026E            0.049     0.0622     0.0673     0.0597      0.052   
Birmingham 014E      0.058     0.0722     0.0847     0.0466     0.0506   
Dorset 024A         0.0289     0.0194     0.0194      0.032     0.0393   
England Overall     0.0658       0.07     0.0672     0.0634     0.0638   

                 Age50to54  Age55to59  Age60to64  Age65to69  Age70to74  \
LSOAName                                                                 
Bury 026E           0.0393     0.0275     0.0306     0.0291     0.0184   
Birmingham 014E     0.0557      0.058     0.0637     0.0608     0.0546   
Dorset 024A         0.0446     0.0619     0.0787     0.0866      0.113   
England Overall     0.0692     0.0674     0.0576      0.049     0.0495   

                 Age75to79  Age80to84  Age85Over  
LSOAName                                          
Bury 026E           0.0133    0.00867    0.00408  
Birmingham 014E     0.0472     0.0324     0.0318  
Dorset 024A           0.11      0.102      0.135  
England Overall     0.0361     0.0252     0.0243

# Creating 4 subplots
fig, axes = plt.subplots(nrows = 4, ncols = 1, figsize = (7, 10), sharex = True)

# List of LSOAs, and custom colours for the subplots
lsoa_columns = ["England Overall", "Birmingham 014E", "Bury 026E", "Dorset 024A"]
colour_palette = ["Red", "Navy", "Orange"]

# Text and labels for the plot
plt.text(-3, 0.95, 
         "Population proportion comparison by area", 
         size=14, weight="bold", color="black")
plt.text(-3, 0.925, 
         "From the 2021 UK census", 
         size=13, color="black")
plt.text(-3, -0.15, 
         "RASHAD MALIK" + " " * 52 + "Source: Office for National Statistics", 
         color="#f0f0f0", 
         backgroundcolor="#4d4d4d", 
         fontsize=12)

# Iterating over each area to create each subplot
for i, ax in enumerate(axes):
    lsoa_name = lsoa_columns[i]
    if i == 0:
        lsoa_proportions.loc[lsoa_name].plot(kind="bar", ax=ax, stacked=True)
    else:
        lsoa_proportions.loc[lsoa_name].plot(kind="bar", ax=ax, stacked=True, color=colour_palette[i-1])
    
    ax.set_ylabel("Proportion", fontsize=10, fontweight="bold", labelpad=10)
    ax.grid(True, which="both", axis="y", linestyle=":", linewidth=0.5)

    ax.grid(True, which="major", axis="y", linestyle="-", linewidth=1)
    ax.set_ylim([0, 0.2])
    ax.tick_params(axis="y", labelsize=9)
    y_ticks = np.linspace(0, 0.2, 5)
    ax.set_yticks(y_ticks)
    ax.minorticks_on()

    ax.text(-0.13, 0.5, lsoa_name, transform=ax.transAxes, fontsize=10,
            verticalalignment="center", horizontalalignment="left", rotation=90)

# Additional label adjustments
axes[-1].set_xlabel("Age ranges", fontsize=10, fontweight="bold", labelpad=10)
plt.xticks(rotation=70, fontsize=9)
plt.subplots_adjust(hspace=0.15)

plt.show()

# Defining the younger ages
younger_age_columns = ["Age4Under", "Age5to9", "Age10to14", "Age15to19"]

# Initialising the new columns with NaN values
lsoa["YoungerResidents"] = np.NaN
lsoa["YoungerResidents"] = lsoa["YoungerResidents"].astype(object)
lsoa["ProportionYounger"] = np.NaN
lsoa["ProportionYounger"] = lsoa["ProportionYounger"].astype(object)

# Calculating the proportion of younger residents, and creating a new column
lsoa["YoungerResidents"] = lsoa[younger_age_columns].sum(axis=1)
lsoa["ProportionYounger"] = lsoa["YoungerResidents"] / lsoa["Total"]

# Printing the first and last 5 rows to verify
print("First 5 rows:\n", lsoa[["LSOAName", "Total", "YoungerResidents", "ProportionYounger"]].head(),
      "\n", "\nLast 5 rows:\n", lsoa[["LSOAName", "Total", "YoungerResidents", "ProportionYounger"]].tail())

First 5 rows:
                     LSOAName  Total  YoungerResidents  ProportionYounger
0        City of London 001A   1473               141             0.0957
1        City of London 001B   1384               110             0.0795
2        City of London 001C   1613               127             0.0787
3        City of London 001E   1101               221              0.201
4  Barking and Dagenham 016A   1842               512              0.278 
 
Last 5 rows:
                        LSOAName  Total  YoungerResidents  ProportionYounger
33750  Vale of White Horse 014H   1169               180              0.154
33751  Vale of White Horse 015G   1519               371              0.244
33752  Vale of White Horse 015H   1610               504              0.313
33753  Vale of White Horse 015I   1609               461              0.287
33754     West Oxfordshire 004H   1465               325              0.222

# Defining the older ages
older_age_columns = ["Age65to69", "Age70to74", "Age75to79", "Age80to84", "Age85Over"]

# Initialising the new columns with NaN values
lsoa["OlderResidents"] = np.NaN
lsoa["OlderResidents"] = lsoa["OlderResidents"].astype(object)
lsoa["ProportionOlder"] = np.NaN
lsoa["ProportionOlder"] = lsoa["ProportionOlder"].astype(object)

# Calculating the proportion of older residents, and creating a new column
lsoa["OlderResidents"] = lsoa[older_age_columns].sum(axis=1)
lsoa["ProportionOlder"] = lsoa["OlderResidents"] / lsoa["Total"]

# Printing the first and last 5 rows to verify
print("First 5 rows:\n", lsoa[["LSOAName", "Total", "OlderResidents", "ProportionOlder"]].head(),
      "\n", "\nLast 5 rows:\n", lsoa[["LSOAName", "Total", "OlderResidents", "ProportionOlder"]].tail())

First 5 rows:
                     LSOAName  Total  OlderResidents  ProportionOlder
0        City of London 001A   1473             370            0.251
1        City of London 001B   1384             277              0.2
2        City of London 001C   1613             285            0.177
3        City of London 001E   1101             104           0.0945
4  Barking and Dagenham 016A   1842             158           0.0858 
 
Last 5 rows:
                        LSOAName  Total  OlderResidents  ProportionOlder
33750  Vale of White Horse 014H   1169             320            0.274
33751  Vale of White Horse 015G   1519             250            0.165
33752  Vale of White Horse 015H   1610              95            0.059
33753  Vale of White Horse 015I   1609             174            0.108
33754     West Oxfordshire 004H   1465             354            0.242

# Defining lower age boundaries
ages = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85]

def compute_median_age(row):
    totals = row.loc[age_columns].values
    cumul_fract = (np.cumsum(totals)) / (totals.sum())

    index_label = (cumul_fract >= 0.5).argmax()
    index = index_label

    if index > 0:
        prev_fract = cumul_fract[index - 1]
    else:
        prev_fract = 0

    remaining_fract = 0.5 - prev_fract
    fract_increase = cumul_fract[index] - prev_fract
    years_to_add = 5 * (remaining_fract / fract_increase)

    median_age = ages[index] + years_to_add
    return median_age

# Initialising the new column with NaN values
lsoa["MedianAge"] = np.NaN
lsoa["MedianAge"] = lsoa["MedianAge"].astype(object)

# Applying the median age function to the dataframe
lsoa["MedianAge"] = lsoa.apply(compute_median_age, axis=1)

# Printing the first and last 5 rows to verify
print("First 5 rows (median age to 3 s.f.):\n", lsoa[["LSOACode", "LSOAName", "MedianAge"]].head(),
      "\n", "\nLast 5 rows (median age to 3 s.f.):\n", lsoa[["LSOACode", "LSOAName", "MedianAge"]].tail())

First 5 rows (median age to 3 s.f.):
     LSOACode                   LSOAName  MedianAge
0  E01000001        City of London 001A       44.3
1  E01000002        City of London 001B       43.9
2  E01000003        City of London 001C       43.9
3  E01000005        City of London 001E       34.9
4  E01000006  Barking and Dagenham 016A       34.5 
 
Last 5 rows (median age to 3 s.f.):
         LSOACode                  LSOAName  MedianAge
33750  E01035758  Vale of White Horse 014H       44.9
33751  E01035759  Vale of White Horse 015G         40
33752  E01035760  Vale of White Horse 015H       31.7
33753  E01035761  Vale of White Horse 015I         33
33754  E01035762     West Oxfordshire 004H       46.9

print("Summary statistics (to 3 s.f.) for the distributions of the total population,\nthe younger and older proportions, and the median age for all LSOAs:\n")
print(lsoa[["Total", "ProportionYounger", "ProportionOlder", "MedianAge"]].describe())

Summary statistics (to 3 s.f.) for the distributions of the total population,
the younger and older proportions, and the median age for all LSOAs:

         Total  ProportionYounger  ProportionOlder  MedianAge
count 3.38e+04           3.38e+04         3.38e+04   3.38e+04
mean  1.67e+03              0.228            0.189         42
std        353             0.0553           0.0855       7.78
min        999             0.0181         0.000736       14.5
25%   1.44e+03              0.192            0.124       36.1
50%   1.61e+03              0.223            0.183       41.5
75%   1.84e+03              0.259            0.247       47.8
max    9.9e+03              0.629            0.655       71.9

# Plotting the histogram
plt.figure(figsize=(7 ,4))
plt.hist(lsoa["Total"], bins=500, alpha=0.7)

plt.text(-400, 1150, 
         "English LSOA population distribution (bins = 500)", 
         size = 14, weight = "bold", color = "black")
plt.text(-400, 1080, 
         "From the 2021 UK census", 
         size = 13, color = "black")
plt.text(-400, -230, 
         "RASHAD MALIK" + " " * 47 + "Source: Office for National Statistics", 
         color = "#f0f0f0", 
         backgroundcolor = "#4d4d4d", 
         fontsize=12)

plt.ylabel("Frequency", fontsize = 10, fontweight = "bold", labelpad = 0)
plt.xlabel("Total (Population of the LSOA)", fontsize = 10, fontweight = "bold", labelpad = 10)

plt.xticks(fontsize = 10)
y_ticks = np.arange(0, 1100, 100)
plt.yticks(y_ticks, fontsize = 10)

plt.minorticks_on()

plt.grid(True, which="both", axis="both")
plt.grid(True, which="minor", axis="both", linestyle=":", linewidth=0.5)

plt.show()

print(lsoa[["LSOACode", "LSOAName", "Total"]].nlargest(10, "Total"))

        LSOACode            LSOAName  Total
27075  E01028521         Oxford 008A   9900
33508  E01035514      Cambridge 005H   8226
32531  E01034493  County Durham 030H   6466
12706  E01013378           York 023B   6219
23829  E01025105      Lancaster 019A   5746
33510  E01035516      Cambridge 007I   5737
31697  E01033617     Birmingham 050F   5671
33303  E01035309     Canterbury 012H   5447
33502  E01035508     Nottingham 041F   4950
16174  E01017032     Portsmouth 027C   4913

# Plotting the two subplots
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 5.5))

plt.text(-0.985, 530, 
         "Median age distribution of England, and comparing the younger to older age proportions (bins = 500)", 
         size = 14, weight = "bold", color = "black")
plt.text(-0.985, 506, 
         "From the 2021 UK census", 
         size = 13, color = "black")
plt.text(-0.985, -80, 
         "RASHAD MALIK" + " " * 165 + "Source: Office for National Statistics", 
         color = "#f0f0f0", 
         backgroundcolor = "#4d4d4d", 
         fontsize=12)

# Formatting first subplot
p1 = lsoa["MedianAge"]
axes[0].hist(p1, bins=500, alpha=0.7)
axes[0].set_title("Distribution of the median age across LSOAs in England", fontsize=11, pad=15)
axes[0].set_ylabel("Frequency", fontsize = 10, fontweight = "bold", labelpad = 10)
axes[0].set_xlabel("Median Age", fontsize = 10, fontweight = "bold", labelpad = 10)
axes[0].set_ylim(0, 450)
axes[0].set_xlim(0, 80)
axes[0].tick_params(axis="both", labelsize=10)
axes[0].minorticks_on()
axes[0].grid(True, which="both", axis="both")
axes[0].grid(True, which="minor", axis="both", linestyle=":", linewidth=0.5)

# Formatting second subplot
numeric_columns = ["ProportionYounger", "ProportionOlder"]
p2 = lsoa[numeric_columns]
p2.plot(kind="hist", bins=500, alpha=0.5, ax=axes[1], color = ["purple", "orange"])
axes[1].set_title("Distribution of the age proportions across LSOAs in England", fontsize=11, pad=15)
axes[1].set_xlabel("Proportion", fontsize = 10, fontweight = "bold", labelpad = 10)
axes[1].set_ylabel("")
axes[1].set_ylim(0, 450)
axes[1].tick_params(axis="both", labelsize=10)
axes[1].minorticks_on()
axes[1].grid(True, which="both", axis="both")
axes[1].grid(True, which="minor", axis="both", linestyle=":", linewidth=0.5)

plt.show()

calculated_england_median_age = compute_median_age(lsoa.loc[:, age_columns].sum())
fiftieth_percentile = lsoa.loc[:, "MedianAge"].median()

print("Median age for the whole of England (to 3.s.f.):\n", f"{calculated_england_median_age:.3g}",
      "\n50th percentile value of the distribution of median ages by LSOA (to 3 s.f.):\n", f"{fiftieth_percentile:.3g}")

Median age for the whole of England (to 3.s.f.):
 40.4 
50th percentile value of the distribution of median ages by LSOA (to 3 s.f.):
 41.5

print("The 10 LSOAs with the lowest median ages:")
print(lsoa.loc[:, ["LSOAName", "LSOACode", "MedianAge"]].sort_values("MedianAge").head(10),"\n\n")
print("The 10 LSOAs with the highest median ages:")
print(lsoa.loc[:, ["LSOAName", "LSOACode", "MedianAge"]].sort_values("MedianAge").tail(10))

The 10 LSOAs with the lowest median ages:
                                LSOAName   LSOACode  MedianAge
5308                        Salford 031C  E01005614       14.5
32930                    Birmingham 079G  E01034936       18.9
13655  Bath and North East Somerset 012A  E01014380       18.9
12706                          York 023B  E01013378         19
32958                     Harrogate 008F  E01034964         19
7478                      Sheffield 038C  E01007862         19
9167                       Coventry 042C  E01009671       19.1
13971                       Bristol 015F  E01014714       19.3
32666                        Exeter 001G  E01034628       19.3
32442                     Liverpool 042H  E01034404       19.4 


The 10 LSOAs with the highest median ages:
                                       LSOAName   LSOACode  MedianAge
20038                               Rother 010B  E01021106       68.2
18876                           East Devon 017A  E01019894         69
20074                              Wealden 018A  E01021142       69.2
20416                         Castle Point 009C  E01021495       69.5
32810         King's Lynn and West Norfolk 017K  E01034772       69.9
19316  Bournemouth, Christchurch and Poole 019A  E01020349         70
18895                           East Devon 020B  E01019913       70.2
14629  Bournemouth, Christchurch and Poole 048B  E01015403       70.9
19867                           Eastbourne 012B  E01020932       71.5
18937                           East Devon 012B  E01019957       71.9

# Initialising the new column with NaN values
lsoa["AreaType"] = np.NaN
lsoa["AreaType"] = lsoa["AreaType"].astype(object)

# Applying the area types to a new column based on the first three characters of PartOfCode
lsoa.loc[lsoa["PartOfCode"].str[:3] == "E06", "AreaType"] = "Unitary Authorities"
lsoa.loc[lsoa["PartOfCode"].str[:3] == "E07", "AreaType"] = "Non-metropolitan Districts"
lsoa.loc[lsoa["PartOfCode"].str[:3] == "E08", "AreaType"] = "Metropolitan Districts"
lsoa.loc[lsoa["PartOfCode"].str[:3] == "E09", "AreaType"] = "London Borough"

# Printing the first and last 5 rows to verify
print("First 5 rows:\n", lsoa[["LSOACode", "LSOAName", "AreaType"]].head(),
      "\n", "\nLast 5 rows:\n", lsoa[["LSOACode", "LSOAName", "AreaType"]].tail())

First 5 rows:
     LSOACode                   LSOAName        AreaType
0  E01000001        City of London 001A  London Borough
1  E01000002        City of London 001B  London Borough
2  E01000003        City of London 001C  London Borough
3  E01000005        City of London 001E  London Borough
4  E01000006  Barking and Dagenham 016A  London Borough 
 
Last 5 rows:
         LSOACode                  LSOAName                    AreaType
33750  E01035758  Vale of White Horse 014H  Non-metropolitan Districts
33751  E01035759  Vale of White Horse 015G  Non-metropolitan Districts
33752  E01035760  Vale of White Horse 015H  Non-metropolitan Districts
33753  E01035761  Vale of White Horse 015I  Non-metropolitan Districts
33754  E01035762     West Oxfordshire 004H  Non-metropolitan Districts

gbyAreaType = lsoa.groupby(by = "AreaType", dropna=False)

# Printing the area type counts
gbyAreaType["Total"].count()

AreaType
London Borough                 4994
Metropolitan Districts         7335
Non-metropolitan Districts    12599
Unitary Authorities            8827
Name: Total, dtype: int64

area_types = ["Unitary Authorities", "Non-metropolitan Districts", "Metropolitan Districts", "London Borough"]

# Creating 4 subplots, one for each area type
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Titles and additional labels
plt.text(-58, 487, 
         "Median age distributions across area types in England (bins = 250)", 
         size=14, weight="bold", color="black")
plt.text(-58, 471, 
         "From the 2021 UK census", 
         size=13, color="black")
plt.text(-58, -50, 
         "RASHAD MALIK" + " " * 163 + "Source: Office for National Statistics", 
         color="#f0f0f0", backgroundcolor="#4d4d4d", fontsize=12)

axes = axes.flatten()
x_limit = (lsoa["MedianAge"].min(), lsoa["MedianAge"].max())
y_limit = (0, 200)

colours = ["blue", "green", "orange", "purple"]

# Looping for each subplot
for i, area in enumerate(area_types):
    p3 = gbyAreaType.get_group(area)["MedianAge"]

    axes[i].hist(p3, bins=250, alpha=0.7, color = colours[i])  
    axes[i].set_title(area, fontsize=12, pad=0, fontweight="bold")
    
    axes[i].set_xlabel("Median Age", fontsize=10, fontweight="bold", labelpad=5)
    axes[i].set_ylabel("Frequency", fontsize=10, fontweight="bold", labelpad=10)
    axes[i].set_xlim(x_limit)
    axes[i].set_ylim(y_limit)

    axes[i].tick_params(axis="both", labelsize=10)

    axes[i].minorticks_on()
    axes[i].grid(True, which="both", axis="both")
    axes[i].grid(True, which="minor", axis="both", linestyle=":", linewidth=0.5)

plt.subplots_adjust(hspace=0.25, wspace=0.15) 
plt.show()

# Plotting and formatting the scatter plot
plt.figure(figsize = (7, 4))
plt.scatter(lsoa["ProportionYounger"], lsoa["ProportionOlder"], s = 5, alpha = 0.3)

plt.text(-0.073, 0.78, 
         "Comparing older and younger proportions within LSOAs", 
         size = 14, weight = "bold", color = "black")
plt.text(-0.073, 0.73, 
         "From the 2021 UK census", 
         size = 13, color = "black")
plt.text(-0.073, -0.19, 
         "RASHAD MALIK" + " " * 45 + "Source: Office for National Statistics", 
         color = "#f0f0f0", 
         backgroundcolor = "#4d4d4d", 
         fontsize=12)

plt.xlabel("ProportionYounger", fontsize = 10, fontweight = "bold", labelpad = 6)
plt.ylabel("ProportionOlder", fontsize = 10, fontweight = "bold", labelpad = 10)

plt.xticks(fontsize = 9)
plt.yticks(fontsize = 10)
plt.minorticks_on()

plt.grid(True, which="both", axis="y")
plt.grid(True, which="minor", axis="both", linestyle=":", linewidth=0.8)

plt.show()

# Initialising the new column with NaN values
lsoa["AgeClass"] = np.NaN
lsoa["AgeClass"] = lsoa["AgeClass"].astype(object)

# Workers
lsoa.loc[(lsoa["ProportionYounger"] < 0.15) & (lsoa["ProportionOlder"] < 0.15), "AgeClass"] = "Workers"

# SchoolKids
lsoa.loc[(lsoa["ProportionYounger"] > 0.35) & (lsoa["ProportionOlder"] < 0.2), "AgeClass"] = "SchoolKids"

# Retirees
lsoa.loc[(lsoa["ProportionYounger"] < 0.2) & (lsoa["ProportionOlder"] > 0.4), "AgeClass"] = "Retirees"

# Normal
lsoa.loc[lsoa["AgeClass"].isna(), "AgeClass"] = "Normal"

# Printing the first 5 rows of each AgeClass for verification
age_class_filter_workers = lsoa[["LSOAName", "ProportionYounger", "ProportionOlder", "AgeClass"]][lsoa["AgeClass"] == "Workers"]
print("First 5 rows for AgeClass 'Workers':\n", age_class_filter_workers.head(5))

age_class_filter_schoolkids = lsoa[["LSOAName", "ProportionYounger", "ProportionOlder", "AgeClass"]][lsoa["AgeClass"] == "SchoolKids"]
print("\nFirst 5 rows for AgeClass 'SchoolKids':\n", age_class_filter_schoolkids.head(5))

age_class_filter_retirees = lsoa[["LSOAName", "ProportionYounger", "ProportionOlder", "AgeClass"]][lsoa["AgeClass"] == "Retirees"]
print("\nFirst 5 rows for AgeClass 'Retirees':\n", age_class_filter_retirees.head(5))

age_class_filter_normal = lsoa[["LSOAName", "ProportionYounger", "ProportionOlder", "AgeClass"]][lsoa["AgeClass"] == "Normal"]
print("\nFirst 5 rows for AgeClass 'Normal':\n", age_class_filter_normal.head(5))

First 5 rows for AgeClass 'Workers':
         LSOAName  ProportionYounger  ProportionOlder AgeClass
536   Brent 018E               0.15            0.128  Workers
818  Camden 028A              0.102            0.134  Workers
824  Camden 019B               0.12              0.1  Workers
875  Camden 028B              0.122            0.136  Workers
877  Camden 027B              0.147           0.0971  Workers

First 5 rows for AgeClass 'SchoolKids':
                      LSOAName  ProportionYounger  ProportionOlder    AgeClass
6   Barking and Dagenham 015B              0.382           0.0352  SchoolKids
9   Barking and Dagenham 015D              0.376           0.0392  SchoolKids
25  Barking and Dagenham 001D              0.384           0.0679  SchoolKids
42  Barking and Dagenham 021C              0.403           0.0425  SchoolKids
82  Barking and Dagenham 020D              0.351           0.0819  SchoolKids

First 5 rows for AgeClass 'Retirees':
                          LSOAName  ProportionYounger  ProportionOlder  \
2757  Kensington and Chelsea 018C              0.101            0.449   
4690                    Bury 010D               0.15            0.428   
5589               Stockport 038B              0.132            0.422   
6450               Liverpool 051E              0.109            0.405   
6545              St. Helens 023D               0.14            0.434   

      AgeClass  
2757  Retirees  
4690  Retirees  
5589  Retirees  
6450  Retirees  
6545  Retirees  

First 5 rows for AgeClass 'Normal':
                     LSOAName  ProportionYounger  ProportionOlder AgeClass
0        City of London 001A             0.0957            0.251   Normal
1        City of London 001B             0.0795              0.2   Normal
2        City of London 001C             0.0787            0.177   Normal
3        City of London 001E              0.201           0.0945   Normal
4  Barking and Dagenham 016A              0.278           0.0858   Normal

corners = lsoa.loc[lsoa["AgeClass"] != "Normal", :]
pivot_table = corners.pivot_table(index="AreaType", values=["Total"], columns=["AgeClass"], aggfunc="count")

print("Table showing the counts of LSOAs within the specified area types:\n")
print(pivot_table)

Table showing the counts of LSOAs within the specified area types:

                              Total                   
AgeClass                   Retirees SchoolKids Workers
AreaType                                              
London Borough                    1         83     272
Metropolitan Districts           29        427     149
Non-metropolitan Districts      263        108      62
Unitary Authorities             127        211     137

# Plotting the pivot table as a grouped bar chart
pivot_table.plot(kind='bar', figsize=(7, 4))

# Adding labels and title
plt.text(-0.89, 490, 
         "Count of LSOAs by area type and age class", 
         size = 14, weight = "bold", color = "black")
plt.text(-0.89, 460, 
         "'Age Class' derived from observing the proportions of older and younger populations", 
         size = 11, color = "black")
plt.text(-0.89, -95, 
         "RASHAD MALIK" + " " * 48 + "Source: Office for National Statistics", 
         color = "#f0f0f0", 
         backgroundcolor = "#4d4d4d", 
         fontsize=12)

plt.xlabel("Area Type", fontsize = 10, fontweight = "bold", labelpad = 6)
plt.ylabel("Count", fontsize = 10, fontweight = "bold", labelpad = 10)

plt.legend(labels=["Retirees", "SchoolKids", "Workers"], fontsize = 10)
plt.xticks(fontsize = 8, rotation=0)
plt.yticks(fontsize = 10)
plt.minorticks_on()

plt.grid(True, which="both", axis="y")
plt.grid(True, which="minor", axis="y", linestyle=":", linewidth=0.5)

# Show the plot
plt.show()

import geopandas as gpd
bdf = gpd.read_file("data/LSOA_England_geom.gpkg")

lsoa_idx = lsoa.set_index('LSOACode')
bdf2 = bdf.join(lsoa_idx, how='right', on='LSOA21CD')

# Basic plot of the geometries
bdf2.plot()
plt.axis("off")
plt.show()

london_lsoas = bdf2[bdf2["AreaType"] == "London Borough"]
nmd_lsoas = bdf2[bdf2["AreaType"] == "Non-metropolitan Districts"]

ax = london_lsoas.plot(column="Total", legend=True, figsize=(10, 6), cmap="magma")

colorbar = ax.get_figure().get_axes()[1]
colorbar.tick_params(labelsize=10)
plt.axis("off")

plt.text(0.1, 0.96, 
         "Population of LSOAs in London", 
         size=14, weight="bold", color="black", 
         transform=plt.gcf().transFigure)  # Adjust position based on figure

plt.text(0.1, 0.925, 
         "From the 2021 UK census", 
         size=11, color="black", 
         transform=plt.gcf().transFigure)  # Adjust position based on figure

plt.text(0.1, 0, 
         "RASHAD MALIK" + " " * 70 + "Source: Office for National Statistics", 
         color="#f0f0f0", 
         backgroundcolor="#4d4d4d", 
         fontsize=12, 
         transform=plt.gcf().transFigure)  # Adjust position based on figure

plt.show()

ax = nmd_lsoas.plot(column="ProportionOlder", legend=True, figsize=(10, 6), cmap="viridis")

colorbar = ax.get_figure().get_axes()[1]
colorbar.tick_params(labelsize=10)
plt.axis("off")

plt.text(0.2, 0.9, 
         "Proportion of older populations in Non-metropolitan Districts", 
         size=14, weight="bold", color="black", 
         transform=plt.gcf().transFigure)

plt.text(0.2, 0.865, 
         "From the 2021 UK census", 
         size=11, color="black", 
         transform=plt.gcf().transFigure)

plt.text(0.2, 0, 
         "RASHAD MALIK" + " " * 48 + "Source: Office for National Statistics", 
         color="#f0f0f0", 
         backgroundcolor="#4d4d4d", 
         fontsize=12, 
         transform=plt.gcf().transFigure)


plt.show()

Variable	Type	Description
LSOAName	String	Name of the LSOA
LSOACode	String	Code of the LSOA
PartOfCode	String	Code of a larger area containing this LSOA
PartOfName	String	Name of the larger area containing this LSOA
Total	Integer	Total number of usual residents in this LSOA at the time of the 2021 census
Age4Under, Age5to9, Age10to14, Age15to19, Age20to24, Age25to29, Age30to34, Age35to39, Age40to44, Age45to49, Age50to54, Age55to59, Age60to64, Age65to69, Age70to74, Age75to79, Age80to84, Age85Over	Integer	Breakdown of the number of usual residents in this LSOA at the time of the 2021 census in 5 year age ranges (plus an open-ended range for ages 85 and over)

First 3 Characters of `PartOfCode`	Type of Area
E06	Unitary Authorities: typically more urban areas. One example is code E06000004 which is Stockton-on-Tees.
E07	Non-metropolitan Districts: more rural. One example is E07000085 which is East Hampshire.
E08	Metropolitan Districts: more urban areas. One example is Coventry with code E08000026.
E09	London Borough: one example is E09000030 Tower Hamlets.

Demographic Analysis of UK Population Using 2021 Census Data¶

Project aim and outline¶

Introduction¶

Importing libraries¶

The dataset¶

Dataset description and variables¶

Data loading and preliminary exploration¶

Analysis¶

Part 1: Bar chart of the age profile¶

1.1 Total population of England¶

1.2 Comparing the population of each age range across different areas¶

1.3 Age distribution discussion¶

Part 2: Distribution of the younger, older and median ages in each LSOA¶

2.1 Adding new variables to the dataframe¶

2.2 Summary statistics and distributions¶

2.3 Median age comparison and discussion¶

Part 3: Comparing the median age distributions by area type¶

3.1 Grouping the data by area type¶

3.2 Distributions of the median age for the 4 different areas¶

3.3 Discussion on the differences between the distributions¶

Part 4: Classification of Unusual Areas by Proportions¶

4.1 Scatter plot¶

4.2 Comment on the overall shape of the scatter plot¶

4.3 Classifying the extreme corners¶

4.4 Pivot table¶

Part 5: Mapping¶

5.1 Installing the GeoPandas package¶

5.2 Loading the geometry data¶

5.3 Joining the data frames¶

5.4 Plotting and colouring the maps¶

Summary and conclusion¶

References¶