推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
donglongtu
V2EX  ›  Python

Scrapy 中 xpath 用到中文报错

  •  
  •   donglongtu · Jun 28, 2017 · 2732 views
    This topic created in 3250 days ago, the information mentioned may be changed or developed.

    问题描述

    links = sel.xpath('//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
    

    报错:ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

    revotu
        1
    revotu  
       Jun 28, 2017
    参见文章:[解决 Scrapy 中 xpath 用到中文报错问题][1]

    ## 解决方法 ##
    方法一:将整个 xpath 语句转成 Unicode
    ```Python
    links = sel.xpath(u'//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
    ```
    方法二:xpath 语句用已转成 Unicode 的 title 变量
    ```Python
    title = u"置顶"
    links = sel.xpath('//i[contains(@title,"%s")]/following-sibling::a/@href' %(title)).extract()
    ```
    方法三:直接用 xpath 中变量语法(`$`符号加变量名)`$title`, 传参 title 即可
    ```Python
    links = sel.xpath('//i[contains(@title,$title)]/following-sibling::a/@href', title="置顶").extract()
    ```


    [1]: http://www.revotu.com/solve-unicode-erros-using-xpath-in-scrapy.html
    bsns
        2
    bsns  
       Jun 28, 2017
    我一般是加 u
    mingyun
        3
    mingyun  
       Jun 28, 2017
    @revotu nice
    NaVient
        4
    NaVient  
       Jun 29, 2017
    独立爬虫项目,请用 py3
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2858 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 40ms · UTC 04:54 · PVG 12:54 · LAX 21:54 · JFK 00:54
    ♥ Do have faith in what you're doing.