V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
acone2003
V2EX  ›  Python

请问运行完这个程序大概需要花多长时间?

  •  
  •   acone2003 · 2018-05-23 09:02:44 +08:00 · 1585 次点击
    这是一个创建于 2427 天前的主题,其中的信息可能已经有所发展或是发生改变。
    本人准备学习 python 和机器学习,刚刚搭建好环境,从书上抄了一段程序试验一下,无奈运行了两天还没出结果。CPU 占用率一直接近 100%。请各位帮忙看一下是我的程序有问题呢还是真的没运行完?大概需要多少时间?我的配置是 E5-2650,8 核 16 线程,主频好像是 2.0G ,8G 内存。
    程序如下:
    # Load libraries
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.datasets import load_digits
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import validation_curve

    # Load data
    digits = load_digits()

    # Create feature matrix and target vector
    features, target = digits.data, digits.target

    # Create range of values for parameter
    #param_range = np.arange(1, 250, 2)
    param_range = np.arange(1, 250, 25)

    # Calculate accuracy on training and test set using range of parameter values
    train_scores, test_scores = validation_curve(
    # Classifier
    RandomForestClassifier(),
    # Feature matrix
    features,
    # Target vector
    target,
    # Hyperparameter to examine
    param_name="n_estimators",
    # Range of hyperparameter's values
    param_range=param_range,
    # Number of folds
    cv=3,
    # Performance metric
    scoring="accuracy",
    # Use all computer cores
    n_jobs=-1)

    # Calculate mean and standard deviation for training set scores
    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)

    # Calculate mean and standard deviation for test set scores
    test_mean = np.mean(test_scores, axis=1)
    test_std = np.std(test_scores, axis=1)

    # Plot mean accuracy scores for training and test sets
    plt.plot(param_range, train_mean, label="Training score", color="black")
    plt.plot(param_range, test_mean, label="Cross-validation score",
    color="dimgrey")

    # Plot accurancy bands for training and test sets
    plt.fill_between(param_range, train_mean - train_std,
    train_mean + train_std, color="gray")
    plt.fill_between(param_range, test_mean - test_std,
    test_mean + test_std, color="gainsboro")

    # Create plot
    plt.title("Validation Curve With Random Forest")
    plt.xlabel("Number Of Trees")
    plt.ylabel("Accuracy Score")
    plt.tight_layout()

    plt.legend(loc="best")
    plt.show()

    程序实际上就一个函数 validation_curve,就是看一下不同的超参数对随机森林决策精度的影响。
    princelai
        1
    princelai  
       2018-05-23 10:46:06 +08:00
    正好我有环境,帮你试了下,,源代码一点没动,2 秒出结果。。。。。。你是不是 python 环境有问题?
    princelai
        2
    princelai  
       2018-05-23 10:50:18 +08:00
    贴出我的环境,供你参考

    pd.show_versions()

    INSTALLED VERSIONS
    ------------------
    commit: None
    python: 3.6.5.final.0
    python-bits: 64
    OS: Linux
    OS-release: 4.14.42-1-MANJARO
    machine: x86_64
    processor:
    byteorder: little
    LC_ALL: None
    LANG: zh_CN.UTF-8
    LOCALE: zh_CN.UTF-8

    pandas: 0.23.0
    pytest: 3.5.1
    pip: 10.0.1
    setuptools: 39.2.0
    Cython: 0.28.2
    numpy: 1.14.3
    scipy: 1.1.0
    pyarrow: None
    xarray: None
    IPython: 6.4.0
    sphinx: 1.7.4
    patsy: 0.5.0
    dateutil: 2.7.3
    pytz: 2018.4
    blosc: None
    bottleneck: 1.2.1
    tables: 3.4.3
    numexpr: 2.6.5
    feather: None
    matplotlib: 2.2.2
    openpyxl: 2.5.3
    xlrd: 1.1.0
    xlwt: 1.3.0
    xlsxwriter: 1.0.4
    lxml: 4.2.1
    bs4: 4.6.0
    html5lib: 1.0.1
    sqlalchemy: 1.2.7
    pymysql: None
    psycopg2: None
    jinja2: 2.10
    s3fs: None
    fastparquet: None
    pandas_gbq: None
    pandas_datareader: None
    acone2003
        3
    acone2003  
    OP
       2018-05-23 14:44:42 +08:00
    多谢 princelai,我找找原因。
    John60676
        4
    John60676  
       2018-05-23 18:21:05 +08:00
    macbook pro 2015 8G,也是秒出结果
    neosfung
        5
    neosfung  
       2018-05-23 19:03:32 +08:00 via iPhone
    load_digits 是否会下载数据集呢
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5512 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 28ms · UTC 08:09 · PVG 16:09 · LAX 00:09 · JFK 03:09
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.