This repository displays my personal study notes across various tech subjects, showcasing my continuous learning and expertise in all things related to computer science.
my_model = ModelName()
my_model.fit(features, target)
my_model.predict(data)
import pandas as pd
name = pd.read_csv(file_path)
name.shape
name.columns
name.head()
name.isnull().sum()
name.describe()
count
, shows how many rows have non-missing values.pd
pd.read_csv()
functiondescribe()
functiony = name.Potato
name_features = ['names', 'of', 'columns']
X = name[name_features]
X.describe()
DecisionTreeRegressor()
creates a new, untrained model
name_model = DecisionTreeRegressor(random_state=1)
name_model.fit(X,y)
head()
functiontrain_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
mean_absolute_error()
function from the scikit-learn.metrics moduledef get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y): model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0) model.fit(train_X, train_y) preds_val = model.predict(val_X) mae = mean_absolute_error(val_y, preds_val) return(mae)
for max_leaf_nodes in [5, 50, 500, 5000]: my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y) print(“Max leaf nodes: %d \t\t Mean Absolute Error: %d” %(max_leaf_nodes, my_mae)) ```
max_leaf_nodes
argument to control overfitting vs underfitting