Scrapy 中 xpath 用到中文报错

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3250 days ago, the information mentioned may be changed or developed.

问题描述

links = sel.xpath('//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()

报错：ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

xpath

报错

links

unicode

4 replies

revotu

Jun 28, 2017

参见文章：[解决 Scrapy 中 xpath 用到中文报错问题][1]

## 解决方法 ##
方法一：将整个 xpath 语句转成 Unicode
```Python
links = sel.xpath(u'//i[contains(@title,"置顶")]/following-sibling::a/@href').extract()
```
方法二：xpath 语句用已转成 Unicode 的 title 变量
```Python
title = u"置顶"
links = sel.xpath('//i[contains(@title,"%s")]/following-sibling::a/@href' %(title)).extract()
```
方法三：直接用 xpath 中变量语法(`$`符号加变量名)`$title`, 传参 title 即可
```Python
links = sel.xpath('//i[contains(@title,$title)]/following-sibling::a/@href', title="置顶").extract()
```

[1]: http://www.revotu.com/solve-unicode-erros-using-xpath-in-scrapy.html

bsns

Jun 28, 2017

我一般是加 u

mingyun

Jun 28, 2017

@revotu nice

NaVient

Jun 29, 2017

独立爬虫项目，请用 py3