V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
wei6666
V2EX  ›  问与答

抓取知网遇到一个一直解决不了的 bug,求大神们支援,急,急,急

  •  
  •   wei6666 · 2018-06-07 00:47:14 +08:00 · 1111 次点击
    这是一个创建于 2370 天前的主题,其中的信息可能已经有所发展或是发生改变。

    Traceback (most recent call last): File "China_hownet_journal_end.py", line 296, in <module> china_hownet.run() File "China_hownet_journal_end.py", line 281, in run url_list = self.parse_content_html(html3str) File "China_hownet_journal_end.py", line 212, in parse_content_html html = etree.HTML(html3str) File "lxml.etree.pyx", line 2945, in lxml.etree.HTML (src/lxml/lxml.etree.c:62546) File "parser.pxi", line 1617, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:93194) File "parser.pxi", line 1488, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:91938) File "parser.pxi", line 969, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:88328) File "parser.pxi", line 577, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:84385) File "parser.pxi", line 676, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:85488) File "parser.pxi", line 625, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:84945) lxml.etree.XMLSyntaxError: line 1046: htmlParseEntityRef: expecting ';'

    wei6666
        1
    wei6666  
    OP
       2018-06-07 00:48:34 +08:00
    我以为是 xpath 写错了,我就改了很多次 xpath 匹配规则,但是还是会出报错。。。。不知道怎么解决了,求大神们支援
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   3579 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 23ms · UTC 05:00 · PVG 13:00 · LAX 21:00 · JFK 00:00
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.