Learning HubMachine Learning with Python

Machine Learning with Python

12 lessons

Lessons

10 of 12

Logistic Regression, Decision Tree and Random Forest

Introduction to Logistic Regression, Decision Tree and Random Forest - Explore essential classification algorithms like Logistic Regression, Decision Trees, and Random Forests. Build your understanding of supervised learning methods.


 

Introduction to Logistic Regression, Decision Tree and Random Forest

 

Welcome to Module 3 of our journey through Machine Learning with Python! In this lesson, we're going to explore three powerful algorithms for classification tasks: Logistic Regression, Decision Trees, and Random Forest. Understanding these algorithms is essential for building predictive models in various domains, from healthcare to finance. By the end of this lesson, you'll have a solid understanding of the intuition behind each algorithm and how to implement them using scikit-learn in Python.

 

Logistic Regression

Logistic Regression is a widely-used algorithm for binary classification problems, where the target variable has two possible outcomes. Despite its name, Logistic Regression is a classification algorithm rather than a regression algorithm. It estimates the probability that a given input belongs to a particular class using the logistic function.

 

The logistic function, also known as the sigmoid function, maps any real-valued number to the range [0, 1], making it suitable for modeling probabilities.

 

\[ P(y=1|x) = \frac{1}{1 + e^{-z}} \]

 

Where:

- \( P(y=1|x) \) is the probability that the target variable \( y \) equals 1 given input \( x \)

- \( z \) is the linear combination of input features and their corresponding coefficients

 

Let's see an example of Logistic Regression in Python using scikit-learn:

 

```python

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

 

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

 

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

 

# Create and fit the Logistic Regression model

model = LogisticRegression()

model.fit(X_train, y_train)

 

# Predict on the testing set

y_pred = model.predict(X_test)

 

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

```

Decision Trees

Decision Trees are versatile algorithms that can be used for both classification and regression tasks. They work by recursively partitioning the input space into regions, with each partition corresponding to a decision node in the tree. At each decision node, the algorithm selects the feature that best splits the data based on a certain criterion, such as Gini impurity or entropy.

 

Decision Trees are easy to interpret and visualize, making them valuable tools for understanding the underlying patterns in the data.

 

Let's illustrate Decision Trees with an example:

 

```python

from sklearn.tree import DecisionTreeClassifier

 

# Create and fit the Decision Tree model

tree_model = DecisionTreeClassifier()

tree_model.fit(X_train, y_train)

 

# Predict on the testing set

y_pred_tree = tree_model.predict(X_test)

 

# Calculate accuracy

accuracy_tree = accuracy_score(y_test, y_pred_tree)

print("Decision Tree Accuracy:", accuracy_tree)

```

Random Forest

Random Forest is an ensemble learning method that combines multiple Decision Trees to improve performance and reduce overfitting. It works by training each tree on a random subset of the training data and averaging their predictions.

 

Random Forests are highly effective for classification tasks, particularly when dealing with high-dimensional datasets with complex relationships between features.

 

Let's implement Random Forest in Python:

 

```python

from sklearn.ensemble import RandomForestClassifier

 

# Create and fit the Random Forest model

forest_model = RandomForestClassifier()

forest_model.fit(X_train, y_train)

 

# Predict on the testing set

y_pred_forest = forest_model.predict(X_test)

 

# Calculate accuracy

accuracy_forest = accuracy_score(y_test, y_pred_forest)

print("Random Forest Accuracy:", accuracy_forest)

```

Conclusion

In this lesson, we've explored three powerful algorithms for classification tasks: Logistic Regression, Decision Trees, and Random Forest. Each algorithm has its strengths and weaknesses, making them suitable for different types of problems and datasets. By mastering these algorithms and understanding their intuition, you'll be well-equipped to tackle a wide range of classification tasks in real-world scenarios. Keep experimenting, exploring, and honing your skills in Machine Learning.