V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
14
V2EX  ›  Python

Python 怎么优雅地拆分字典

  •  
  •   14 · 2014-09-12 15:45:09 +08:00 · 4446 次点击
    这是一个创建于 3759 天前的主题,其中的信息可能已经有所发展或是发生改变。
    原数据
    {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",
    ……


    目标是:
    {
    "A0801": [{"201301": ""}, ……]
    "A0802": [{"201308": ""}, ……]
    ……
    }
    第 1 条附言  ·  2014-09-12 16:21:29 +08:00
    output= {}
    for k, v in tabledata.items():
    ....a, b, c = k.split('_')
    ....output[a] = []

    for k, v in tabledata.items():
    ....a, b, c = k.split('_')
    ....output[a].append({c:v})

    print output
    现在遍历两次可以实现
    19 条回复    2020-09-06 21:04:16 +08:00
    xiaket
        1
    xiaket  
       2014-09-12 15:59:15 +08:00
    需求都没陈述清楚... 那个00000是怎么处理的?

    用列表解析或者itertools里面的东西来做吧.
    linKnowEasy
        2
    linKnowEasy  
       2014-09-12 15:59:57 +08:00
    替换。
    alsotang
        3
    alsotang  
       2014-09-12 16:00:30 +08:00
    最普通的迭代。
    icinessz
        4
    icinessz  
       2014-09-12 16:09:18 +08:00   ❤️ 1
    package main

    import "fmt"

    func main() {
    src := map[string]string{
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",
    }
    rs := map[string]map[string]string{}
    for k, v := range src {
    if _, ok := rs[k[:5]]; !ok {
    rs[k[:5]] = map[string]string{}
    }
    rs[k[:5]][k[13:]] = v
    }
    fmt.Println(rs)
    }

    --------------------------------
    map[A0801:map[201309:1,433.4 201301:1,321.8 201302:1,199.8] A0802:map[201308:10,878.4 201309:12,311.8 201310:13,739.9 201305:6,688.3 201306:8,085.2 201307:9,481.0]]
    spritevan
        5
    spritevan  
       2014-09-12 16:16:18 +08:00   ❤️ 1
    #!/usr/bin/env python

    from pprint import pprint as pp

    origin = {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",
    }

    res = {}
    fn = lambda fields,v: res.setdefault(fields[0], []).append({fields[-1]:v})
    for k,v in origin.iteritems():
    fn(k.split('_'),v)
    pp(res)

    ---

    {'A0801': [{'201309': '1,433.4'},
    {'201302': '1,199.8'},
    {'201301': '1,321.8'}],
    'A0802': [{'201305': '6,688.3'},
    {'201306': '8,085.2'},
    {'201307': '9,481.0'},
    {'201310': '13,739.9'},
    {'201308': '10,878.4'},
    {'201309': '12,311.8'}]}
    imn1
        6
    imn1  
       2014-09-12 16:20:19 +08:00
    如果原始数据是一个字串(json),用正则拆很快
    hahastudio
        7
    hahastudio  
       2014-09-12 16:35:50 +08:00   ❤️ 1
    大概 LZ 没玩过 setdefault,我记得我在第一次看 Cookbook 的时候也被这用法惊呆了
    不过我不了解你对中间那串 0 怎么搞的

    https://gist.github.com/hahastudio/e1d4bb5423be3052935b
    14
        8
    14  
    OP
       2014-09-12 16:43:37 +08:00
    @hahastudio 感谢,要的就是这样的东西
    hahastudio
        9
    hahastudio  
       2014-09-12 16:57:40 +08:00
    @14 其实跟你的那个不太一样= =
    我仔细看才发现你在帖子里要的是一个列表,里面都是只有一个键值对的字典= =
    那样的话你可以看 @spritevan 的回答= =
    14
        10
    14  
    OP
       2014-09-12 17:07:34 +08:00
    @hahastudio 确实……不过把你的代码稍稍改一下就是了:
    for k in d:
    ....key, mid, subkey = k.split('_')
    ....new_d.setdefault(key, []).append({subkey:d[k]})
    advancedxy
        11
    advancedxy  
       2014-09-12 17:27:59 +08:00
    from collections import defaultdict

    d = {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",
    }

    def addItem(dd, item):
    k,v = item
    k1,k2,k3 = k.split('_')
    dd[k1].append({k3:value})
    return dd

    dict(reduce(addItem, d, defaultdict(list)))
    starsoi
        12
    starsoi  
       2014-09-12 20:49:36 +08:00
    @hahastudio @14 setdefault 看着简短,但速度还是没有直截了当的if else快 (大约快30%)

    https://gist.github.com/starsoi/ef3c813ebd2c04e3e8ff.js
    hahastudio
        13
    hahastudio  
       2014-09-12 21:29:31 +08:00
    @starsoi 嘛,性能自然是不足= =
    这是花哨写法的代价= =

    不过你要注意一下
    第一,应该是 new_d[key]
    第二,你这样少了每次新建字典时候的第一个结果
    use_ifelse = """
    new_d = {}
    for k in tabledata:
    ....key, mid, subkey = k.split('_')
    ....if key not in new_d:
    ........new_d[key] = []
    ....new_d[key].append({subkey:tabledata[k]})
    """
    这样你比较一下,性能提升就没你说的那么多了
    frankzeng
        14
    frankzeng  
       2014-09-12 21:43:51 +08:00   ❤️ 1
    #!/usr/bin/python

    ss = {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",}


    output = {}
    for key,data in ss.iteritems():
    temp = key.split("_")
    try:
    k = temp[0]
    j = temp[2]
    except:
    print key,data
    continue
    if k in output:
    pass
    else:
    output[k] = []
    output[k].append({j:data})

    print output

    只需要遍历一次,而且简单易懂,你值得拥有。
    starsoi
        15
    starsoi  
       2014-09-12 22:33:00 +08:00
    @hahastudio 有道理,原来是码错了。。
    hahastudio
        16
    hahastudio  
       2014-09-12 22:43:29 +08:00
    @starsoi 不过貌似数据规模一大的话,还是用 defaultdict 比较好

    new_d = defaultdict(list)
    for k in tabledata:
    ....key, mid, subkey = k.split('_')
    ....new_d[key].append({subkey:tabledata[k]})

    http://nbviewer.ipython.org/gist/hahastudio/5f7ed0ee9c4adfa2a86f
    mengzhuo
        17
    mengzhuo  
       2014-09-12 23:02:35 +08:00   ❤️ 1
    著名的One Line Tree, 绝对优雅

    aa = {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",


    from collections import defaultdict
    def tree(): return defaultdict(tree)

    bb = tree()

    for k,v in aa.items():
    ....prefix, _ , appendix = k.split('_')
    ....bb[prefix][appendix] = v
    xylophone21
        18
    xylophone21  
       2014-09-13 15:10:21 +08:00
    @hahastudio

    4行代码,居然也要专门封装成一下,Python的服务还真是到位啊.

    if self.data.has_key(key):
    self.data[key].append(value)
    else:
    self.data[key] = [value]


    http://starship.python.net/~mwh/hacks/setdefault.html
    biglazycat
        19
    biglazycat  
       2020-09-06 21:04:16 +08:00
    tabledata = {
    "A0801_000000_201301": "1,321.8",
    "A0801_000000_201302": "1,199.8",
    "A0801_000000_201309": "1,433.4",
    "A0802_000000_201305": "6,688.3",
    "A0802_000000_201306": "8,085.2",
    "A0802_000000_201307": "9,481.0",
    "A0802_000000_201308": "10,878.4",
    "A0802_000000_201309": "12,311.8",
    "A0802_000000_201310": "13,739.9",
    }

    output = {}
    for k, v in tabledata.items():
    (a, b, c) = k.split('_')
    output.setdefault(a,[]).append({c: v})
    print(output)
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2756 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 28ms · UTC 13:04 · PVG 21:04 · LAX 05:04 · JFK 08:04
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.