# Predicting Apple’s Future: Testing 6 Forecasting Models to Uncover Market Trends (2018–2023)

**Header: Finding the best prediction model out of SVR, KNN, Naive Bayes, Decision Tree, Random Forest and Adaboost.**

Apple is a well-known company that needs no introduction. As a data science engineer, I decided to use my skills to predict stock prices as a side project. One day, while looking at my iPhone, it hit me: why not use my expertise to predict Apple’s stock prices using various models and find out which one works best?

**Dataset**

The data I used comes from Kaggle’s dataset at https://www.kaggle.com/datasets/guillemservera/aapl-stock-data. Kaggle is known for providing clean datasets, but still I performed a few checks, like identifying missing values and outliers. As expected, the data was clean. Although the complete dataset spans from the 1980s to 2024, for my experiment, I focused on a smaller range: 2018 to 2023.

It includes information such as the opening price, closing price, high and low prices, volume of shares traded, and other metrics like adjusted close, change percent, and average volume over 20 days.

**Let’s Predict Future**

**Prediction Using SVR Model**

Support Vector Regression (SVR) is a machine learning technique used to predict continuous values, like stock prices. Unlike traditional linear regression, SVR creates a **margin** or a **boundary** where most of the data points fit, allowing it to handle outliers or irregular data more effectively. It’s particularly good for stock price prediction because it can adapt to non-linear trends, making it versatile for financial data analysis.

`# Selected features and target variables from the dataset`

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp']

X = data[features]

y = data['close']

# Normalized the data

#This step ensures that all features have a similar scale, preventing any feature from dominating the model training due to larger magnitude.

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing set

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train the SVR model

svr = SVR(kernel='rbf', C=1.0, gamma='scale')

svr.fit(X_train, y_train)

# Make predictions and evaluated the model

y_pred = svr.predict(X_test)

The result we got is:

Mean Absolute Error (MAE): 10.835217345694337

Mean Squared Error (MSE): 1052.5231545431413

R-squared: 0.7202456891060729

As you can see the results which I got are not good to increase the efficiency of my model I started using **Hyperparameter tuning with GridSearchCV.**

Hyperparameter tuning with GridSearchCV is a method to find the best settings for a machine learning model. Hyperparameters are the parameters you set before training, like the learning rate in a neural network or the type of kernel in a Support Vector Machine.

GridSearchCV helps you find the best combination of these hyperparameters. It creates a *grid* of different options and tests each combination using **cross-validation**, which means dividing the dataset into parts, training on some, and testing on others to ensure accuracy.

In a nutshell, GridSearchCV lets you systematically try different hyperparameter values to find the one that makes your model perform best. This makes your model more **reliable** and **effective** without having to guess the best settings.

**Prediction using SVR Model (Using GridSearchCV)**

`# Created additional features`

data['date'] = pd.to_datetime(data['date'])

data['timestamp'] = data['date'].astype(np.int64) // 10**9

data['year'] = data['date'].dt.year

data['month'] = data['date'].dt.month

data['day'] = data['date'].dt.day

data['day_of_week'] = data['date'].dt.dayofweek

data['moving_avg_20'] = data['close'].rolling(window=20).mean()

data['price_diff'] = data['close'].diff()

# Fill NaN values (like in moving_avg_20)

data.fillna(method='bfill', inplace=True)

# Selected features and target variable

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp', 'year', 'month', 'day', 'day_of_week', 'moving_avg_20', 'price_diff']

X = data[features]

y = data['close']

# Normalized the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Hyperparameter tuning with GridSearchCV

param_grid = {

'kernel': ['rbf', 'linear', 'poly'],

'C': [0.1, 1, 10, 100],

'gamma': ['scale', 'auto'],

'epsilon': [0.1, 0.01, 0.001]

}

svr = SVR()

grid_search = GridSearchCV(svr, param_grid, scoring='neg_mean_absolute_error', cv=5)

grid_search.fit(X_train, y_train)

# The best model is retrieved (best_estimator_), with the best hyperparameters stored in best_params_

best_model = grid_search.best_estimator_

# Made predictions and evaluated the model

y_pred = best_model.predict(X_test)

The result we got is:

Best Hyperparameters: {‘C’: 100, ‘epsilon’: 0.1, ‘gamma’: ‘scale’, ‘kernel’: ‘linear’}

Mean Absolute Error (MAE): 0.9939365677497963

Mean Squared Error (MSE): 31.075102481634186

R-squared: 0.9917404250508093

**Prediction using KNN Model**

K-Nearest Neighbors (KNN) is a simple yet effective machine learning technique used for both classification and regression tasks. In KNN, the prediction for a data point is made based on the “k” closest points in the dataset, known as “neighbors.”

For stock price prediction, KNN can be useful because it doesn’t make strong assumptions about the underlying data distribution and is flexible with non-linear patterns. The idea is that similar data points tend to have similar outcomes, so by considering a neighborhood of data points, KNN can make predictions based on these similarities.

`# Selected features and target variable`

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp', 'year', 'month', 'day', 'day_of_week', 'moving_avg_20', 'price_diff']

X = data[features]

y = data['close']

# Normalized the data as it is crucial for algorithms like KNN, where distance measurement is important.

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Hyperparameter tuning with GridSearchCV for KNN

param_grid = {

'n_neighbors': [3, 5, 7, 9, 11], #We can Experiment with different 'k' values

'weights': ['uniform', 'distance'], # Trying both uniform and distance weighting

'metric': ['euclidean', 'manhattan', 'minkowski'] # Experimenting with distance metrics

}

knn = KNeighborsRegressor()

grid_search = GridSearchCV(knn, param_grid, scoring='neg_mean_absolute_error', cv=5)

grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

# Made predictions and evaluated the model

y_pred = best_model.predict(X_test)

The result we got is:

Best Hyperparameters: {‘metric’: ‘manhattan’, ‘n_neighbors’: 5, ‘weights’: ‘distance’}

Mean Absolute Error (MAE): 2.9743789532221245

Mean Squared Error (MSE): 15.397849018573007

R-squared: 0.9959073445340882

**Prediction using Naive Bayes Model**

Naive Bayes is a simple yet powerful classification algorithm based on Bayes’ Theorem. It assumes that each feature contributes independently to the outcome, making it a “naive” assumption. In the context of stock price movement I would say Naive Bayes can classify whether a stock’s price is likely to go up or down based on specific features.

`# Creating a binary target variable indicating stock price movement`

data['date'] = pd.to_datetime(data['date'])

data['price_change'] = data['close'].diff() # Price change from the previous day

data['movement'] = np.where(data['price_change'] > 0, 'up', 'down') # 'up' if price increased, 'down' if decreased

# Filled NaN values (resulting from diff())

data.fillna(method='bfill', inplace=True)

# Selected features and target variable

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d']

X = data[features]

y = data['movement'] # Target variable: 'up' or 'down'

# Normalized the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Training a Naive Bayes classifier

nb = GaussianNB()

nb.fit(X_train, y_train)

# Making predictions and evaluated the model

y_pred = nb.predict(X_test)

The result we got is:

Accuracy: 0.9006622516556292

**Prediction using Decision Tree Model**

Decision Trees are used because they’re easy to understand and visualize, making them a great choice when you need a simple way to explain complex decisions. In a Decision Tree, you start with a big question and then split into smaller questions, just like a tree with branches.

`+# Selected features and target variable`

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp', 'year', 'month', 'day_of_week', 'moving_avg_20', 'price_diff']

X = data[features]

y = data['close']

# Normalized the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Created a Decision Tree Regressor with hyperparameter tuning

param_grid = {

'max_depth': [3, 5, 7, 10], # Controls overfitting

'min_samples_split': [2, 5, 10], # Minimum samples required for a split

'min_samples_leaf': [1, 2, 4], # Minimum samples required for a leaf node

}

decision_tree = DecisionTreeRegressor()

grid_search = GridSearchCV(decision_tree, param_grid, scoring='neg_mean_squared_error', cv=5)

grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

# Made predictions and evaluate the model

y_pred = best_model.predict(X_test)

The result we got is:

Best Hyperparameters: {‘max_depth’: 10, ‘min_samples_leaf’: 1, ‘min_samples_split’: 5}

Mean Absolute Error (MAE): 1.2140667648368844

Mean Squared Error (MSE): 6.3588971045335985

R-squared: 0.9983098434748484

**Prediction using Random Forest Model**

Random Forest is a more advanced version of Decision Trees. Instead of just one Decision Tree, you build many trees and then combine their results to get a final prediction. The idea is that by using multiple trees, you get a more reliable and accurate result. It’s like asking a bunch of experts for their opinion instead of just one — it helps to avoid mistakes and get a more balanced answer.

`# Selected features and target variable`

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp', 'year', 'month', 'day', 'day_of_week', 'moving_avg_20', 'price_diff']

X = data[features]

y = data['close']

# Normalized the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Hyperparameter tuning with GridSearchCV for Random Forest

param_grid = {

'n_estimators': [50, 100, 200], # Number of trees in the forest

'max_depth': [3, 5, 10], # Maximum depth of the trees

'min_samples_split': [2, 5], # Minimum samples required to split a node

'min_samples_leaf': [1, 2, 4], # Minimum samples required to be at a leaf node

'bootstrap': [True, False], # Whether bootstrap samples are used

}

random_forest = RandomForestRegressor()

grid_search = GridSearchCV(random_forest, param_grid, scoring='neg_mean_squared_error', cv=5)

grid_search.fit(X_train, y_train)

# Get the best model and its parameters

best_model = grid_search.best_estimator_

best_params = grid_search.best_params_

# Make predictions and evaluate the model

y_pred = best_model.predict(X_test)

The result we got is:

Best Hyperparameters: {‘bootstrap’: True, ‘max_depth’: 10, ‘min_samples_leaf’: 2, ‘min_samples_split’: 2, ‘n_estimators’: 100}

Mean Absolute Error (MAE): 0.8446546479035453

Mean Squared Error (MSE): 2.3546977587982285

R-squared: 0.9993741355275343

**Prediction using Adaboost Model**

AdaBoost, short for **Adaptive Boosting**, is a machine learning technique that combines multiple weak learners to create a strong model. A weak learner is a simple model that doesn’t do well on its own but can contribute to a more robust model when combined with others.

In AdaBoost, we start with a **basic model** and then focus on the errors it makes. The next model is trained to correct those errors. This process is repeated, with each new model focusing more on the data points that were **misclassified** by the previous ones.

`# Selected features and target variable`

features = ['open', 'high', 'low', 'volume', 'adjusted_close', 'change_percent', 'avg_vol_20d', 'timestamp', 'year', 'month', 'day_of_week', 'moving_avg_20', 'price_diff']

X = data[features]

y = data['close']

# Normalized the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Hyperparameter tuning with GridSearchCV for AdaBoost

param_grid = {

'n_estimators': [50, 100, 200], # Number of boosting stages

'learning_rate': [0.01, 0.1, 1], # Learning rate

'base_estimator__max_depth': [1, 2, 3], # Depth of base decision tree

}

# Using a DecisionTreeRegressor as the base estimator

base_estimator = DecisionTreeRegressor()

# AdaBoost with GridSearchCV

adaboost = AdaBoostRegressor(base_estimator=base_estimator)

grid_search = GridSearchCV(adaboost, param_grid, scoring='neg_mean_squared_error', cv=5)

grid_search.fit(X_train, y_train)

# Getting the best model and its parameters

best_model = grid_search.best_estimator_

best_params = grid_search.best_params_

# Made predictions and evaluated the model

y_pred = best_model.predict(X_test)

The result we got is:

Best Hyperparameters: {‘base_estimator__max_depth’: 3, ‘learning_rate’: 0.1, ‘n_estimators’: 200}

Mean Absolute Error (MAE): 4.0444096617399286

Mean Squared Error (MSE): 24.959422476673847

R-squared: 0.9933659359367698

# Let’s Find the Best Model

Among all the available methods, all the **models scored above 0.9**. The cross-validation metric has shown that all supervised machine learning methods are performing well.

**Random Forest** is the chosen model because it has the **best score** and produces good predictions that can be understood easily. The **Random Forest model** can also handle big data with numerous variables. This model also handles variables fast, making it more suitable for this use case.

Thanks for reading the article, love to hear your feedback and if require any help feel free to contact.