Tensorflow入门——Keras处理分类问题,Classification with Keras
cn-stem·@hongtao·
0.000 HBDTensorflow入门——Keras处理分类问题,Classification with Keras
Tensorflow 和 Keras 除了能处理[前一篇](https://busy.org/@hongtao/tensorflow-keras)文章提到的回归(Regression,拟合&预测)的问题之外,还可以处理分类(Classfication)的问题。
这篇文章我们就介绍一下如何用Keras快速搭建一个线性分类器或神经网络,通过分析病人的生理数据来判断这个人是否患有糖尿病。
同样的,为了方便与读者交流,所有的源代码都放在了这里:
https://github.com/zht007/tensorflow-practice
### 1. 数据的导入
数据的csv文件已经放在了项目目录中,也可以去[Kaggle](https://www.kaggle.com/uciml/pima-indians-diabetes-database)下载。

### 2.数据预处理
#### 2.1 Normalization(标准化)数据
标准化数据可以用sklearn的工具,但我这里就直接计算了。要注意的是,这里没有标准化年龄。
```python
cols_to_norm = ['Number_pregnant', 'Glucose_concentration', 'Blood_pressure', 'Triceps',
'Insulin', 'BMI', 'Pedigree']
diabetes[cols_to_norm] = diabetes[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
```
#### 2.2 年龄分段
对于向年龄这样的数据,通常需要按年龄段进行分类,我们先看看数据中的年龄构成。

可以通过panda自带的cut函数对年龄进行分段,我们这里将年龄分成0-30,30-50,50-70,70-100四段,分别标记为0,1,2,3
```python
bins = [0,30,50,70,100]
labels =[0,1,2,3]
diabetes["Age_buckets"] = pd.cut(diabetes["Age"],bins=bins, labels=labels, include_lowest=True)
```
#### 3.4 训练和测试分组
这一步不用多说,还是用sklearn.model_selection 的 train_test_split工具进行处理。
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_data,labels,test_size=0.33, random_state=101)
```
### 3. 用Keras搭建线性分类器
与[前一篇](https://busy.org/@hongtao/tensorflow-keras)文章中介绍的线性回归模型一样,但线性分类器输出的Unit 为 2 需要加一个"sorftmax"的激活函数。
```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Activation
from tensorflow.keras.optimizers import SGD,Adam
from tensorflow.keras.utils import to_categorical
model = Sequential()
model.add(Dense(2,input_shape = (X_train.shape[1],),activation = 'softmax'))
```
需要注意的是标签y需要进行转换,实际上是将一元数据转换成二元数据(Binary)的"One Hot"数据。比如原始标签用"[1]"和"[0]"这样的一元标签来标记"是"“否”患病,转换之后是否患病用"[1 , 0]"和"[0 , 1]"这样的二元标签来标记。
```python
y_binary_train= to_categorical(y_train)
y_binary_test = to_categorical(y_test)
```
同样可以选用SGD的优化器,但是要注意的是,在Compile的时候损失函数要选择"categorical_crossentropy"
```python
sgd = SGD(0.005)
model.compile(loss = 'categorical_crossentropy', optimizer = sgd, metrics=['accuracy'])
```
### 4. 分类器的训练
训练的时候可以直接将测试数据带入,以方便评估训练效果。
```python
H = model.fit(X_train, y_binary_train, validation_data=(X_test, y_binary_test),epochs = 500)
```
### 5. 训练效果验证
训练效果可以直接调用history查看损失函数和准确率的变化轨迹,线性分类器的效果还不错。

### 6. 改用神经网络试试
这里我在model中搭建一个20x10的两层全连接的神经网络,优化器选用adam
```python
model = Sequential()
model.add(Dense(20,input_shape = (X_train.shape[1],), activation = 'relu'))
model.add(Dense(10,activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
adam = Adam(0.01)
```
可以看到,虽然精确度比采用线性分类器稍高,但是在200个epoch之后,明显出现过拟合(Over fitting)的现象。

### 7. 用模型进行预测
同样的我们可以用训练得到的模型对验证数据进行预测,这里需要注意的是我们最后需要将二元数据用np.argmax转换成一元数据。
```python
import numpy as np
y_pred_softmax = model.predict(X_test)
y_pred = np.argmax(y_pred_softmax, axis=1)
```
---
同步到我的简书
https://www.jianshu.com/u/bd506afc6fc1👍 steeming-hot, cheddarsfloss, sillyboast, meetingsnazzy, smackabandoned, accepttransition, tubingfeuille, branchanus, onorin, hongtao, picketscrub, lesersa, curx, nikbutus89, zapncrap, osm0sis, improv, stinawog, bergelmirsenpai, busy.org, justyy, dailychina, superbing, dailystats, turtlegraphics, witnesstools, ilovecoding, steemfuckeos, jianan, woolfe19861008, dongfengman, lilypang22, sweet-jenny8, anxin, ethanlee, laiyuehta, steemstem, dna-replication, alexzicky, moniroy, ascorphat, curie, pflanzenlilly, liberosist, rwilday, locikll, kjaeger, suesa, erikkun28, mattiarinaldoni, traviseric, emmanuel293, cryptofuwealth, scoora82, alexworld, casiloko, praditya, kingnosa, cameravisual, amin-ove, huilco, donasys, faberleggenda, vact, anwenbaumeister, norbu, jasonbu, coolbuddy, tuoficinavirtual, aalok, intellihandling, mtfmohammad, mohaaking, celine-robichaud, imaloser, hendrikdegrote, diebaasman, hotsteam, reaverza, pinksteam, reavercois, trixie, tajstar, chrisluke, wstanley226, herculean, kevinwong, lemouth, alexander.alexis, howo, tanyaschutte, felixrodriguez, mayowadavid, enzor, robotics101, tristan-muller, fejiro, sco, rharphelle, shoganaii, real2josh, kingabesh, effofex, de-stem, temitayo-pelumi, bloom, kryzsec, helo, samminator, tsoldovieri, abigail-dantes, esteemguy, mr-aaron, gra, kenadis, maticpecovnik, gentleshaid, mathowl, terrylovejoy, olajidekehinde, stemng, dexterdev, geopolis, alexdory, flugschwein, francostem, lesmouths-travel, derbesserwisser, michaelwrites, deholt, ibk-gabriel, purelyscience, jrevilla, javier.dejuan, stem.witness, jent, cryptokrieg, phogyan, aboutyourbiz, howtostartablog, makrotheblack, skycae, esaia.mystic, sissyjill, morbyjohn, dashfit, vadimlasca, markmorbidity, delegate.lafona, nitego, muliadi, zipporah, qberry, niouton, schroders, operahoser, hiddenblade, clweeks, emdesan, neneandy, moksamol, thatsweeneyguy, szokerobert, g0nr0gue, joendegz, stahlberg, hansmast, getrichordie, bavi, tombstone, creatrixity, didic, neumannsalva, poodai, drmake, avizor, hardaeborla, ogsenti, whoib, nigerian-yogagal, clement.poiret, gbemy, diana.catherine, gangstayid, gabrielatravels, vilda, ilovecryptopl, rhethypo, predict-crypto, semtroneum, lola-carola, psygambler, gmedley, slickhustler007, mininthecity, cjunros, hkmoon, zlatkamrs, annaabi, jlsplatts, reizak, payger, trang, eu-id, sarhugo, dokter-purnama, florian-glechner, teukurival, jcalero, elsll, faiyazmahmud, somegaming, bflanagin, yaelg, lekang, allcapsonezero, jingis07, gabyoraa, kakakk, priyankachauhan, thehulk07, eurodale, shayekh2, ykdesign, ambitiouslife, reverseacid, arconite, mountain.phil28, kingswisdom, monie, croctopus, doctor-cog-diss, steemzeiger, biomimi, corsica, zonguin, agbona, darkiche, djoi, emperorhassy, sciencetech, serylt, wisewoof, fanta-steem, perpetuum-lynx, zest, stem-espanol, smacommunity, bearded-benjamin, peaceandwar, psicoluigi, flores39, amestyj, yrmaleza, miguelangel2801, emiliomoron, tomastonyperez, elvigia, josedelacruz, joseangelvs, viannis, majapesi, reinaseq, fran.frey, lorenzor, azulear, ivymalifred, eliaschess333, ydavgonzalez, luiscd8a, elpdl, andrick, yusvelasquez, joelsegovia, mirzantorres, mary11, giulyfarci52, wilmer14molina, joannar, thescubageek, sbdpayback, andypalacios, angelica7, mahdiyari, lamouthe, vodonik, eniolw, yestermorrow, longer, wackou, iansart, stevenwood, desikaamukkahani, alvin0617, gpcx86, jiujitsu, danaedwards, shinedojo, gracelbm, combatsports, meq, benleemusic, rival, drsensor, eric-boucher, lacher-prise, gribouille, robertbira, eurogee, nicole-st, hhtb, purelove, yomismosoy, acont, alex-hm, kafupraise, goodway, steepup, egotheist, langford, lk666, xanderslee, cordeta, nwjordan, ivan-g, flatman, oghie, hanyseek, wargof, guga34, carlos84, sandracarrascal, douglimarbalzan, ennyta, gaming.yer, steem-familia, evangelista.yova, jenniferjulieth, ajfernandez, endopediatria, ingmarvin, alix96, elimao, anaestrada12, yorgermadison, alexjunior, antunez25, haf67, chavas, eglinson, uzcateguiazambra, asmeira, garrillo, pfernandezpetit, mgarrillogonzale, rubenp, jeferc, hirally, emynb, eugenialobo, ballesteroj, jcmontilva, rodriguezr, marbely20, moyam, emilycg, darys, sibaja, balcej, lmanjarres, anaka, benhurg, judisa, juddarivv, mariamo, kimmorales, loraine25, jonmagnusson, musicvoter, infamousit, grizzle, atomcollector, kentonlee, archaimusic, p4ragon, isacoin, lillywilton, emsteemians, rishhk, edprivat, skorup87,