pandas數(shù)據(jù)集的端到端處理

2020-02-16 01:13:30

字體：大中小

來源：轉載

供稿：網(wǎng)友

1. 數(shù)據(jù)集基本信息

df = pd.read_csv()

df.head()：前五行；

df.info()：

rangeindex：行索引； data columns：列索引； dtypes：各個列的類型，主體部分是各個列值的情況，比如可判斷是否存在 NaN 值；

對于非數(shù)值型的屬性列

df[‘some_categorical_columns'].value_counts()：取值分布；

df.describe()：各個列的基本統(tǒng)計信息

count mean std min/max 25%, 50%, 75%：分位數(shù)

df.hist(bins=50, figsize=(20, 15))：統(tǒng)計直方圖；

對 df 的每一列進行展示：

train_prices = pd.DataFrame({'price': train_df.SalePrice,     'log(price+1)': np.log1p(train_df.SalePrice)}) # train_prices 共兩列，一列列名為 price，一列列名為 log(price+1)train_prices.hist()

2. 數(shù)據(jù)集拆分

def split_train_test(data, test_ratio=.3): shuffled_indices = np.random.permutation(len(data)) test_size = int(len(data)*test_ratio) test_indices = shuffled_indices[:test_size] train_indices = shuffled_indices[test_size:] return data.iloc[train_indices], data.iloc[test_indices]

3. 數(shù)據(jù)預處理

一鍵把 categorical 型特征（字符串類型）轉化為數(shù)值型：

>> df['label'] = pd.Categorical(df['label']).codes

一鍵把 categorical 型特征（字符串類型）轉化為 one-hot 編碼：

>> df = pd.get_dummies(df)

null 值統(tǒng)計與填充：

>> df.isnull().sum().sort_values(ascending=False).head()# 填充為 mean 值>> mean_cols = df.mean()>> df = df.fillna(mean_cols)>> df.isnull().sum().sum()0

總結

以上就是這篇文章的全部內容了，希望本文的內容對大家的學習或者工作具有一定的參考學習價值，謝謝大家對武林站長站的支持。如果你想了解更多相關內容請查看下面相關鏈接

上一篇：詳解Python3注釋知識點

下一篇：深入理解Django-Signals信號量

學習交流

如何重啟打印機打印服務

如何重啟打印機打印服務...

熱門圖片

猜你喜歡的新聞

猜你喜歡的關注

国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

pandas數(shù)據(jù)集的端到端處理