V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
爱意满满的作品展示区。
polythene
1D

[入门实践] - 怎样用机器学习来提取网页正文

  •  
  •   polythene ·
    polyrabbit · Sep 18, 2018 · 2902 views
    This topic created in 2777 days ago, the information mentioned may be changed or developed.

    这是我第一次用机器学习来解决实际问题。之前一直是知道很多理论知识,但没找到合适的项目练手,后来突然想到可以把我提取Hacker News正文的算法用机器学习重新实现一遍。

    所以有了这篇 Notebook 笔记,希望能够抛砖引玉,启迪更多的人:

    https://github.com/polyrabbit/hacker-news-digest/blob/master/%5Btutorial%5D%20How-to-extract-main-content-from-web-pages-using-Machine-Learning.ipynb

    4 replies    2018-09-27 11:10:03 +08:00
    ClutchBear
        1
    ClutchBear  
       Sep 18, 2018   ❤️ 1
    Newspaper3k ?
    tshwangq
        2
    tshwangq  
       Sep 18, 2018   ❤️ 1
    nice
    polythene
        3
    polythene  
    OP
       Sep 18, 2018
    @ClutchBear 哇,感谢分享!要是早点知道有这么神奇库,我就不用辛苦的发明轮子了,羡慕人家能把新闻分析做成这么成熟的库~

    @tshwangq Thanks
    yemoluo
        4
    yemoluo  
       Sep 27, 2018
    过来膜拜下
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   846 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 140ms · UTC 21:21 · PVG 05:21 · LAX 14:21 · JFK 17:21
    ♥ Do have faith in what you're doing.