Telco Customer Churn MCP Server¶
Author: Rashad Malik
Objective
This notebook serves as a proof of concept for implementing a Model Context Protocol (MCP) server, demonstrating how to expose data analysis and machine learning capabilities to AI assistants through a standardised interface. Using a telecommunications customer churn dataset as the domain, we prototype multiple tool types to showcase MCP's key features: tool discovery, parameter validation, and seamless integration with LLMs like Claude.
Key Steps
- Load and clean the Telco Customer Churn dataset, preparing it as the foundation for our MCP tools
- Prototype analytical tools including distribution plots, correlation analysis, and pandas query capabilities to demonstrate MCP's versatility
- Train a logistic regression model as an example of exposing machine learning predictions through MCP
- Test all tools to validate functionality before wrapping them in the MCP server interface
By prototyping these tools in a notebook environment first, we can make sure they are working as required, and we can validate our approach before deploying them in a production MCP server. This pattern is transferable to any domain where AI assistants need structured access to data and analytical capabilities.
Required libraries¶
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
1 Loading the dataset¶
The Telco Customer Churn dataset contains 7,043 customers with 21 features covering demographics (gender, senior citizen status, family details), services (phone, internet, streaming), contracts (tenure, type, billing), and charges (monthly and total).
This dataset works well for demonstrating MCP capabilities because it:
- Addresses a real business problem (customer retention)
- Combines categorical and numerical data types
- Has enough complexity to be meaningful (~7,000 rows) without slowing prototyping
- Uses dimensions that non-technical users can understand
# Load the Telco Customer Churn dataset from CSV file
telco_df = pd.read_csv("../data/telco_customer_churn.csv")
print(f"Dataset Loaded: {telco_df.shape[0]} rows, {telco_df.shape[1]} columns")
telco_df.head()
Dataset Loaded: 7043 rows, 21 columns
| customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | ... | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
| 1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | ... | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.5 | No |
| 2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | ... | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
| 3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | ... | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
| 4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | ... | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
5 rows × 21 columns
2 Data Cleaning¶
In this section, we perform data cleaning to ensure our MCP tools work reliably by standardising data types, removing inconsistencies, and preparing features in a format that our functions can process correctly.
Without this step, we'd encounter errors when generating visualisations, calculating correlations, or making predictions.
2.1 Convert TotalCharges to Numeric and Standardise Columns¶
The TotalCharges column is stored as a string due to some blank values. We convert it to numeric and fill missing values with 0 (these represent new customers with no charges yet). We also convert SeniorCitizen from 0/1 to 'Yes'/'No' for consistency with other categorical columns, and drop the non-predictive customerID column.
# Convert 'TotalCharges' to numeric, filling blanks with 0
telco_df['TotalCharges'] = pd.to_numeric(telco_df['TotalCharges'], errors='coerce').fillna(0)
# Convert 'SeniorCitizen' from 0/1 to 'Yes'/'No'
telco_df['SeniorCitizen'] = telco_df['SeniorCitizen'].map({0: 'No', 1: 'Yes'})
# Drop 'customerID' as it is not useful for modelling
telco_df = telco_df.drop(columns=['customerID'])
2.2 Simplify Service-Related Columns¶
We replace "No internet service" and "No phone service" with "No" to reduce redundant categories (the meaning is basically the same).
# Columns with "No internet service"
internet_cols = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
'TechSupport', 'StreamingTV', 'StreamingMovies']
for col in internet_cols:
telco_df[col] = telco_df[col].replace('No internet service', 'No')
# Column with "No phone service"
telco_df['MultipleLines'] = telco_df['MultipleLines'].replace('No phone service', 'No')
2.3 Verify Churn Column Values¶
We check the unique values in the target column Churn before encoding to ensure data quality.
# Checking 'Churn' values
print("Churn unique values:", telco_df['Churn'].unique())
Churn unique values: ['No' 'Yes']
Now, we convert the target variable Churn from 'Yes'/'No' to 1/0, for use in modelling later.
# Encode 'Churn' as binary 0/1 for modelling
telco_df['Churn'] = telco_df['Churn'].map({'No': 0, 'Yes': 1})
2.4 Remove Duplicate Rows¶
Finally, we check for and remove any duplicate rows.
# Drop duplicate rows if any
print(f"Duplicate rows: {telco_df.duplicated().sum()}")
telco_df = telco_df.drop_duplicates()
Duplicate rows: 22
2.5 Final Data Overview¶
Now we can display the cleaned dataset structure, and verify that all of our transformations were applied correctly.
print("\nData Cleaned! Info:")
telco_df.info()
Data Cleaned! Info: <class 'pandas.core.frame.DataFrame'> Index: 7021 entries, 0 to 7042 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 gender 7021 non-null object 1 SeniorCitizen 7021 non-null object 2 Partner 7021 non-null object 3 Dependents 7021 non-null object 4 tenure 7021 non-null int64 5 PhoneService 7021 non-null object 6 MultipleLines 7021 non-null object 7 InternetService 7021 non-null object 8 OnlineSecurity 7021 non-null object 9 OnlineBackup 7021 non-null object 10 DeviceProtection 7021 non-null object 11 TechSupport 7021 non-null object 12 StreamingTV 7021 non-null object 13 StreamingMovies 7021 non-null object 14 Contract 7021 non-null object 15 PaperlessBilling 7021 non-null object 16 PaymentMethod 7021 non-null object 17 MonthlyCharges 7021 non-null float64 18 TotalCharges 7021 non-null float64 19 Churn 7021 non-null int64 dtypes: float64(2), int64(2), object(16) memory usage: 1.1+ MB
print(f"Cleaned dataset: {telco_df.shape[0]} rows, {telco_df.shape[1]} columns")
telco_df.head()
Cleaned dataset: 7021 rows, 20 columns
| gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Female | No | Yes | No | 1 | No | No | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | 0 |
| 1 | Male | No | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | 0 |
| 2 | Male | No | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | 1 |
| 3 | Male | No | No | No | 45 | No | No | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | 0 |
| 4 | Female | No | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | 1 |
The cleaned dataset maintains all rows from the original data. We've successfully converted TotalCharges to a numeric type with missing values filled, standardised SeniorCitizen to match the Yes/No format of other categorical features, and simplified service-related columns by consolidating "No service" variants into "No". The Churn target variable is now encoded as binary (0/1), and the non-predictive customerID column has been removed. With the 22 duplicate rows removed, the data is ready for analysis and modelling.
We can now export this cleaned dataframe as a fresh CSV file.
# Export for the Server
telco_df.to_csv("../data/clean_data.csv", index=False)
print("clean_data.csv has been saved successfully.")
clean_data.csv has been saved successfully.
3 Prototyping tools¶
Before we build our MCP server, we need to design the tools that Claude (or any AI assistant) will be able to call. Think of these as special functions that the AI can use to interact with our data.
Instead of writing these directly in the server file and hoping they work, we're prototyping them here first. This lets us test and refine the logic in a familiar notebook environment before moving everything to the server.
We'll be creating three main tools:
- plot_distribution: Generates charts showing how values are spread across different columns (e.g., payment methods, monthly charges)
- get_correlation_matrix: Creates a heatmap showing which numerical features are related to each other
- run_pandas_query: Allows filtering the data using simple text queries (e.g., "show me senior citizens paying over £70")
Each function below represents the core logic we'll later wrap in our MCP server.
3.1 Plot Distribution Tool¶
This first tool creates visualisations for any column in the dataset. It automatically detects whether the column contains categories (like "Yes"/"No") or numbers, then picks the appropriate chart type.
def prototype_plot_distribution(column_name):
"""
Generates a plot for a column.
"""
plt.figure(figsize=(8, 5))
# Check if column is categorical or numeric to decide plot type
if telco_df[column_name].dtype == 'object':
# Categorical: Use a Countplot
sns.countplot(y=telco_df[column_name], order=telco_df[column_name].value_counts().index)
plt.title(f"Distribution of {column_name}")
else:
# Numeric: Use a Histogram
sns.histplot(telco_df[column_name], kde=True)
plt.title(f"Distribution of {column_name}")
plt.show() # In the server, this will be plt.savefig()
Let's test the distribution plotter with a categorical column (payment methods):
# Testing on "PaymentMethod", should show a bar chart
prototype_plot_distribution('PaymentMethod')
Now we test with a numerical column (monthly charges):
# Testing on "MonthlyCharges", should show a histogram
prototype_plot_distribution('MonthlyCharges')
3.2 Correlation Matrix Tool¶
Next, we create a correlation matrix tool. This shows which numerical features move together—useful for understanding relationships like "do higher monthly charges correlate with more churn?"
def get_correlation_matrix():
"""
Calculates Pearson correlation between numerical columns and returns a heatmap.
"""
# Select only numeric columns
numeric_df = telco_df.select_dtypes(include=[np.number])
# Calculate correlation matrix
corr_matrix = numeric_df.corr()
# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0,
fmt='.2f', linewidths=0.5, square=True)
plt.title('Correlation Matrix of Numerical Features')
plt.tight_layout()
plt.show() # In the server, this will be plt.savefig()
return corr_matrix
Now we'll test the correlation matrix tool to see relationships between the numerical features:
# Generate and display the correlation matrix heatmap
corr_matrix = get_correlation_matrix()
print("\nCorrelation Matrix Values:")
display(corr_matrix)
Correlation Matrix Values:
| tenure | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|
| tenure | 1.000000 | 0.245251 | 0.825595 | -0.351508 |
| MonthlyCharges | 0.245251 | 1.000000 | 0.650653 | 0.194508 |
| TotalCharges | 0.825595 | 0.650653 | 1.000000 | -0.197198 |
| Churn | -0.351508 | 0.194508 | -0.197198 | 1.000000 |
3.3 Pandas Query Tool¶
This query tool lets us filter the data using plain text conditions. It's like asking questions of our data: "Show me customers with X and Y characteristics".
def run_pandas_query(query_string: str):
"""
Allows dynamic filtering of the telco dataframe using pandas query syntax.
Example: "SeniorCitizen == 'Yes' and MonthlyCharges > 70"
"""
try:
# Execute the query
filtered_df = telco_df.query(query_string)
# Build summary
summary = {
"total_rows": len(filtered_df),
"original_rows": len(telco_df),
"percentage": f"{(len(filtered_df) / len(telco_df)) * 100:.2f}%"
}
print(f"Query: {query_string}")
print(f"Results: {summary['total_rows']} rows ({summary['percentage']} of data)")
print("-" * 50)
return filtered_df
except Exception as e:
print(f"Error executing query: {e}")
print("Tip: Use column names like 'SeniorCitizen', 'MonthlyCharges', 'Churn', etc.")
return None
Let's test the query tool by finding senior citizens who pay more than £70 per month:
# Show all Senior Citizens who pay more than $70
result = run_pandas_query("SeniorCitizen == 'Yes' and MonthlyCharges > 70")
if result is not None:
display(result.head(10))
Query: SeniorCitizen == 'Yes' and MonthlyCharges > 70 Results: 862 rows (12.28% of data) --------------------------------------------------
| gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 30 | Female | Yes | Yes | No | 71 | Yes | Yes | Fiber optic | Yes | Yes | Yes | Yes | No | No | Two year | Yes | Credit card (automatic) | 96.35 | 6766.95 | 0 |
| 31 | Male | Yes | Yes | No | 2 | Yes | No | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Credit card (automatic) | 95.50 | 181.65 | 0 |
| 50 | Female | Yes | No | No | 43 | Yes | Yes | Fiber optic | No | Yes | No | No | Yes | No | Month-to-month | Yes | Electronic check | 90.25 | 3838.75 | 0 |
| 53 | Female | Yes | Yes | No | 8 | Yes | Yes | Fiber optic | No | Yes | No | No | No | No | Month-to-month | Yes | Credit card (automatic) | 80.65 | 633.30 | 1 |
| 54 | Female | Yes | Yes | Yes | 60 | Yes | No | DSL | Yes | Yes | Yes | Yes | No | Yes | One year | Yes | Credit card (automatic) | 74.85 | 4456.35 | 0 |
| 55 | Male | Yes | No | No | 18 | Yes | Yes | Fiber optic | No | No | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 95.45 | 1752.55 | 1 |
| 57 | Male | Yes | Yes | Yes | 66 | Yes | Yes | Fiber optic | No | Yes | Yes | Yes | Yes | Yes | One year | Yes | Electronic check | 108.45 | 7076.35 | 0 |
| 72 | Male | Yes | Yes | Yes | 64 | Yes | Yes | Fiber optic | Yes | No | Yes | Yes | Yes | Yes | Two year | Yes | Bank transfer (automatic) | 111.60 | 7099.00 | 0 |
| 75 | Female | Yes | No | No | 56 | Yes | Yes | Fiber optic | Yes | Yes | Yes | No | Yes | Yes | One year | No | Electronic check | 110.50 | 6139.50 | 0 |
| 78 | Male | Yes | No | No | 30 | Yes | No | DSL | Yes | Yes | No | No | Yes | Yes | Month-to-month | Yes | Electronic check | 74.75 | 2111.30 | 0 |
3.4 Regression Tool¶
Now we'll prototype a churn prediction tool. This uses machine learning to predict whether a customer is likely to leave based on their characteristics.
# Global variables to store the trained model and preprocessors
churn_model = None
scaler = None
label_encoders = {}
feature_columns = None
The below function trains our prediction model. It processes the data, encodes categories into numbers, and learns patterns that indicate churn risk.
def train_churn_model():
"""
Trains a logistic regression model on the cleaned telco data.
Call this on server startup to prepare the model for predictions.
"""
global churn_model, scaler, label_encoders, feature_columns
# Load the cleaned data
df = pd.read_csv("../data/clean_data.csv")
# Define features (exclude target)
feature_columns = [col for col in df.columns if col != 'Churn']
X = df[feature_columns].copy()
y = df['Churn']
# Encode categorical columns
categorical_cols = X.select_dtypes(include=['object']).columns
for col in categorical_cols:
le = LabelEncoder()
X[col] = le.fit_transform(X[col])
label_encoders[col] = le
# Scale numerical features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42
)
# Train logistic regression
churn_model = LogisticRegression(max_iter=1000, random_state=42)
churn_model.fit(X_train, y_train)
# Evaluate
train_acc = churn_model.score(X_train, y_train)
test_acc = churn_model.score(X_test, y_test)
print(f"✅ Churn model trained successfully!")
print(f" Training accuracy: {train_acc:.2%}")
print(f" Test accuracy: {test_acc:.2%}")
return churn_model
Once trained, this prediction function takes a customer's details and estimates their likelihood of churning. It also provides a risk level and actionable recommendation.
def predict_churn(
gender: str,
senior_citizen: str,
partner: str,
dependents: str,
tenure: int,
phone_service: str,
multiple_lines: str,
internet_service: str,
online_security: str,
online_backup: str,
device_protection: str,
tech_support: str,
streaming_tv: str,
streaming_movies: str,
contract: str,
paperless_billing: str,
payment_method: str,
monthly_charges: float,
total_charges: float
) -> dict:
"""
Predicts the probability of a customer churning based on their attributes.
Returns a dict with:
- churn_probability: float (0-100%)
- risk_level: str ('Low', 'Medium', 'High')
- recommendation: str (business advice)
"""
global churn_model, scaler, label_encoders, feature_columns
if churn_model is None:
return {"error": "Model not trained. Call train_churn_model() first."}
# Build input dataframe
input_data = pd.DataFrame([{
'gender': gender,
'SeniorCitizen': senior_citizen,
'Partner': partner,
'Dependents': dependents,
'tenure': tenure,
'PhoneService': phone_service,
'MultipleLines': multiple_lines,
'InternetService': internet_service,
'OnlineSecurity': online_security,
'OnlineBackup': online_backup,
'DeviceProtection': device_protection,
'TechSupport': tech_support,
'StreamingTV': streaming_tv,
'StreamingMovies': streaming_movies,
'Contract': contract,
'PaperlessBilling': paperless_billing,
'PaymentMethod': payment_method,
'MonthlyCharges': monthly_charges,
'TotalCharges': total_charges
}])
# Encode categorical columns using saved encoders
for col, le in label_encoders.items():
if col in input_data.columns:
try:
input_data[col] = le.transform(input_data[col])
except ValueError as e:
return {"error": f"Invalid value for {col}: {input_data[col].values[0]}"}
# Ensure column order matches training
input_data = input_data[feature_columns]
# Scale features
input_scaled = scaler.transform(input_data)
# Predict probability
proba = churn_model.predict_proba(input_scaled)[0][1]
churn_percentage = proba * 100
# Determine risk level and recommendation
if churn_percentage < 30:
risk_level = "Low"
recommendation = "Customer appears stable. Continue standard engagement."
elif churn_percentage < 60:
risk_level = "Medium"
recommendation = "Monitor this customer. Consider proactive outreach or loyalty offers."
else:
risk_level = "High"
recommendation = "High churn risk! Recommend immediate intervention: discount offers, contract upgrade incentives, or personalized retention call."
return {
"churn_probability": f"{churn_percentage:.1f}%",
"risk_level": risk_level,
"recommendation": recommendation
}
Training the model on our cleaned data:
# Train the model
train_churn_model()
✅ Churn model trained successfully! Training accuracy: 80.36% Test accuracy: 79.57%
LogisticRegression(max_iter=1000, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
The model has been trained successfully. The training accuracy shows how well the model fits the training data, whilst the test accuracy indicates how well it generalises to unseen customers.
A test accuracy above 70% is acceptable for this use case: it means the model can identify roughly 7 out of 10 at-risk customers correctly.
The small gap between training and test accuracy suggests the model isn't overfitting, which means it should perform reliably on new customer data.
We will now test the prediction with a high-risk customer profile (new customer, month-to-month contract, no add-ons):
# Predict churn for a sample customer (High Risk Profile)
result = predict_churn(
gender='Female',
senior_citizen='No',
partner='No',
dependents='No',
tenure=1,
phone_service='Yes',
multiple_lines='No',
internet_service='Fiber optic',
online_security='No',
online_backup='No',
device_protection='No',
tech_support='No',
streaming_tv='No',
streaming_movies='No',
contract='Month-to-month',
paperless_billing='Yes',
payment_method='Electronic check',
monthly_charges=70.0,
total_charges=70.0
)
print("\nPrediction Result:")
for key, value in result.items():
print(f"{key}: {value}")
Prediction Result: churn_probability: 65.6% risk_level: High recommendation: High churn risk! Recommend immediate intervention: discount offers, contract upgrade incentives, or personalized retention call.
Now testing with a low-risk customer profile (long tenure, 2-year contract, stable payment method):
# Predict churn for a sample customer (Low Risk Profile)
result_low = predict_churn(
gender='Male',
senior_citizen='No',
partner='Yes',
dependents='Yes',
tenure=72,
phone_service='Yes',
multiple_lines='Yes',
internet_service='No',
online_security='No',
online_backup='No',
device_protection='No',
tech_support='No',
streaming_tv='No',
streaming_movies='No',
contract='Two year',
paperless_billing='No',
payment_method='Mailed check',
monthly_charges=25.0,
total_charges=1800.0
)
print("\nPrediction Result (Low Risk):")
for key, value in result_low.items():
print(f"{key}: {value}")
Prediction Result (Low Risk): churn_probability: 0.2% risk_level: Low recommendation: Customer appears stable. Continue standard engagement.
The two test cases demonstrate how the model identifies churn risk based on customer characteristics:
High-Risk Profile (65.6% churn probability): The first customer shows a high churn probability due to several red flags: very short tenure (1 month), month-to-month contract (no commitment), fibre optic internet (higher cost), no value-added services (security, backup, support), and electronic cheque payment (higher churn correlation). These factors combine to indicate an unstable customer relationship. The model correctly flags this as requiring immediate intervention.
Low-Risk Profile (0.2% churn probability): The second customer shows an extremely low churn probability because of stabilising factors: long tenure (72 months), two-year contract (strong commitment), family ties (partner and dependents), basic phone service only (lower cost), and mailed cheque payment (traditional, established customer). This profile suggests a deeply embedded customer unlikely to leave.
The large difference between 65.6% and 0.2% demonstrates that the model has learnt meaningful relationships from the training data.
4 Next Steps¶
Now that we've prototyped and validated all our tools, we need to deploy them as an MCP server that our LLM (in this case, Claude Desktop) can communicate with. This involves three main steps:
- Create the server.py file: Wrap our prototype functions in MCP tool definitions
- Configure the batch script: Write a
start_server.batfile to launch the server with the correct environment - Connect Claude Desktop: Update Claude's configuration to recognise our new server
The MCP server acts as a bridge between Claude and our data analysis tools. When we ask Claude a question like "Show me the distribution of payment methods", Claude will automatically call the appropriate tool through this server.
4.1 Setting Up server.py¶
We create a server.py file that imports our prototype functions and exposes them as MCP tools. Each tool will have a name, description, and parameter schema to ensure Claude can understand how to use them.
This server will run continuously, listening for requests from Claude Desktop.
4.2 Using the Batch Script¶
To run the MCP server, we create a batch script named start_server.bat. This script ensures that the server runs within the correct Conda environment that has all necessary dependencies installed.
This script:
- Activates the
mcp-analyst_py311Conda environment (which has pandas, scikit-learn, matplotlib, etc.) - Runs the server using the environment's Python interpreter
Important: Claude Desktop will launch it automatically when configured, there is no need to run it manually.
4.3 Configuring Claude Desktop¶
To connect Claude to our MCP server, we edit the Claude Desktop configuration file.
Location: %APPDATA%\Claude\claude_desktop_config.json
Inside the mcpServers section, we add this entry:
{
"mcpServers": {
"telco-analyst": {
"command": "cmd.exe",
"args": [
"/c",
"FULL PATH TO THE BATCH SCRIPT FILE"
]
}
}
}
What this does:
- Registers a new MCP server called "telco-analyst"
- Tells Claude to launch it by running our batch script
- The server starts automatically when we open Claude Desktop
5 Testing the MCP Server with Claude Desktop¶
Now that we've set up the MCP server and connected it to Claude Desktop, it's time to test everything to ensure it's working correctly. As seen from the below screenshot, Claude has successfully detected our MCP server and is ready to use the tools we've implemented.
We can ask Claude a few questions that should trigger our MCP tools.
5.1 Distribution Plot Test¶
- "Show me the distribution of MonthlyCharges"
5.2 Correlation Matrix Test¶
- "What's the correlation between tenure and churn?"
5.3 Predict Churn Risk¶
- "Predict churn risk for a customer: Female, not a senior citizen, no partner, no dependents, tenure of 3 months, has phone service with no multiple lines, fibre optic internet, no online security, no online backup, no device protection, no tech support, has streaming TV and movies, month-to-month contract, paperless billing, pays by electronic cheque, monthly charges £85, total charges £255"
5.4 Summary¶
The screenshots above demonstrate that our MCP server integrates successfully with Claude Desktop. When we asked Claude to "show me the distribution of MonthlyCharges", it automatically called the plot_distribution tool and returned a histogram visualisation. For the correlation query, Claude invoked the get_correlation_matrix tool and provided insights about the relationship between tenure and churn.
Most importantly, the churn prediction test confirmed that our predict_churn tool works end-to-end. Claude correctly parsed the customer attributes from natural language, passed them to our trained logistic regression model, and returned a risk assessment with an actionable recommendation.
6 Conclusion¶
This notebook demonstrates a complete proof of concept for building an MCP (Model Context Protocol) server that bridges AI assistants with real-world data and analytical capabilities. Starting from a raw telecommunications dataset, we cleaned and prepared the data, then prototyped four distinct tools: distribution plotting, correlation analysis, flexible data querying, and churn prediction using logistic regression. Each tool was tested in the notebook environment before being deployed as part of an MCP server that Claude Desktop can access directly.
MCP democratises data access, allowing non-technical users to query complex datasets and generate visualisations through natural language without writing code. It provides a standardised integration protocol, reducing the custom development needed to connect AI assistants to different data sources. For enterprises, this means exposing internal databases, machine learning models, and business logic to AI assistants in a controlled manner.
To extend this project, we could add more sophisticated tools such as automated report generation, real-time data connections, or ensemble models for improved prediction accuracy. The approach demonstrated here is transferable to virtually any domain: customer analytics, financial reporting, inventory management, or scientific research.
7 References¶
- Model Context Protocol (MCP): https://modelcontextprotocol.io/
- MCP Python SDK: https://github.com/modelcontextprotocol/python-sdk
- Telco Customer Churn Dataset: https://www.kaggle.com/datasets/blastchar/telco-customer-churn
- Claude Desktop: https://claude.ai/download