Posts

Showing posts from January, 2020

Feature Engineering for Machine Learning in Python

CHAPTER 1: import pandas library import pandas as pd load csv file using read_csv function of pandas df = pd.read_csv(path_to_csv_file) print first 5 rows print(df.head()) print columns of the dataframe print(df.columns) print column datatypes of the dataframe print(df.dtypes) select integer datatype columns only_ints = df.select_dtypes(include=['int']) print(only_ints.columns) convert category column into label columns pd.get_dummies(df, columns=['Country'],prefix='C') pd.get_dummies(df, columns=['Country'],drop_first=True, prefix='C') counting the value labels counts = df['Country'].value_counts() print(counts) creating mask for labels that have less than 5 count as others mask = df['Country'].isin(counts[counts < 5].index) df['Country'][mask] = 'Other' print(pd.value_counts(colors)) df['Binary_Violation'] = 0 df.loc[df['Number_of_Violations'] > 0,'Binary_Violation...