Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pika加入到codis3.0报错 #83

Closed
wyf0705 opened this issue Apr 28, 2017 · 10 comments
Closed

pika加入到codis3.0报错 #83

wyf0705 opened this issue Apr 28, 2017 · 10 comments

Comments

@wyf0705
Copy link

wyf0705 commented Apr 28, 2017

想将pika加入到codis3.0,到--group-add这一步报错,我换个redis,就不报错,帮忙分析一下,谢谢!报错信息如下:
2017/04/28 15:24:54 dashboard.go:442: [PANIC] call rpc group-add-server to dashboard 192.168.161.19:10597 failed
[error]: [Remote Error] ERR not set slotmigrate
3 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/pkg/topom/redis.go:58
github.com/CodisLabs/codis/pkg/topom.(*RedisClient).command
2 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/pkg/topom/redis.go:145
github.com/CodisLabs/codis/pkg/topom.(*RedisClient).SlotsInfo
1 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/pkg/topom/topom_api.go:303
github.com/CodisLabs/codis/pkg/topom.(*apiServer).GroupAddServer
0 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/pkg/topom/topom_api.go:82
github.com/CodisLabs/codis/pkg/topom.(*apiServer).GroupAddServer-fm
... ...
[stack]:
2 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/cmd/admin/dashboard.go:442
main.(*cmdDashboard).handleGroupCommand
1 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/cmd/admin/dashboard.go:63
main.(*cmdDashboard).Main
0 /home/jiang_shiyi/rpmbuild/BUILD/codis-3.0.1/src/github.com/CodisLabs/codis/cmd/admin/main.go:72
main.main
... ...

@left2right
Copy link
Contributor

这个应该是由于codis 3.0及之后在添加server时,codis会向后端server发送slotsinfo命令进行检查,而pika在支持codis时有一个开关,只有在打开开关的情况下,slots相关的命令才可以支持成功。所以你可以参考http://www.jianshu.com/p/07bc0483e56c 这个将开关打开,应该就可以了。
其实这个可以做下改动,对slotsinfo命令使pika默认支持(不打开开关时)即可。但有个问题是如果默认支持slotsinfo(不打开开关),用户能够正常部署使用codis,就潜意识认为是迁移相关的命令都可以,而这时(没打开开关)会导致迁移失败,对于codis一些初级用户,可能就造成线上迁移失败,丢失数据了。所以我目前觉得毕竟好的就是打开支持codis迁移开关吧,后面我找时间再这个问题上再做些思考,看看怎样让大家使用方便,而且不至于有太大的问题。

@wyf0705
Copy link
Author

wyf0705 commented May 3, 2017

果然打开了就可以了,谢谢!
但现在有一个新问题,codis3.0以pika作为server,当我用redis-benchmark测试的时候,后台pika-server有挂掉的,重启以后,还是会有,看日志也没有看到挂掉的原因,只是报了如下信息,帮忙分析下,谢谢
Log file created at: 2017/05/03 14:22:46
Running on machine: tr730n91-app
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0503 14:22:46.263049 140454 pika_client_conn.cc:136] command: "SET" "key:000015179466" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", start_time(s): 1493792566, duration(us): 10266
E0503 14:22:51.272236 140341 pika_client_conn.cc:136] command: "SET" "key:000056986528" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", start_time(s): 1493792571, duration(us): 13178
E0503 14:22:51.272255 140386 pika_client_conn.cc:136] command: "SET" "key:000078435589" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", start_time(s): 1493792571, duration(us): 12900
E0503 14:22:51.272265 140415 pika_client_conn.cc:136] command: "SET" "key:000016099174" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", start_time(s): 1493792571, duration(us): 12935
E0503 14:22:51.272274 140401 pika_client_conn.cc:136] command: "SET" "key:000001456346" "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", start_time(s): 1493792571, duration(us): 12932

@left2right
Copy link
Contributor

这应该不是挂掉的原因,看看log还有没有其他有问题的点,看看core dump文件,一般我遇到的挂掉,多是oom导致的看下/var/log/messages,综合这几个应该就找到挂掉原因了。

@wyf0705
Copy link
Author

wyf0705 commented May 4, 2017

我这边看了core文件,报了如下错误,我在fe界面看server的空间都是1GB,这是默认的吗?
(gdb) bt
#0 0x00000034fbc89788 in memcpy () from /lib64/libc.so.6
#1 0x000000000051ca95 in slash::PosixMmapFile::Append (this=0xdd91e0, data=Unhandled dwarf expression opcode 0xf3
) at src//env.cc:433
#2 0x000000000045e695 in Binlog::EmitPhysicalRecord (this=0xdd8ff0, t=Unhandled dwarf expression opcode 0xf3
) at src/pika_binlog.cc:273
#3 0x000000000045e842 in Binlog::Produce (this=0xdd8ff0, item=Unhandled dwarf expression opcode 0xf3
) at src/pika_binlog.cc:321
#4 0x000000000045eaca in Binlog::Put (this=0xdd8ff0, item=...) at src/pika_binlog.cc:199
#5 0x0000000000469696 in PikaClientConn::DoCmd (this=0x7f4e18000a60, opt=...) at src/pika_client_conn.cc:108
#6 0x0000000000469ddc in PikaClientConn::DealMessage (this=0x7f4e18000a60) at src/pika_client_conn.cc:144
#7 0x00000000006285eb in pink::RedisConn::ProcessInputBuffer (this=0x7f4e18000a60) at src/redis_conn.cc:312
#8 0x0000000000628736 in pink::RedisConn::GetRequest (this=0x7f4e18000a60) at src/redis_conn.cc:355
#9 0x00000000004c0405 in pink::WorkerThread::ThreadMain (this=0xc04ba0)
at ./third/pink/output/include/worker_thread.h:175
#10 0x0000000000624ef6 in pink::Thread::RunThread (arg=0xc04ba0) at src/pink_thread.cc:56
#11 0x00000034fc007aa1 in start_thread () from /lib64/libpthread.so.0
#12 0x00000034fbce893d in clone () from /lib64/libc.so.6

@left2right
Copy link
Contributor

fe显示的是内存的使用,是基于redis info命令解析获得的,而pika info命令返回和redis是不相同的,所以fe上面的显示可以忽略不计,看看实际使用的情况

@wyf0705
Copy link
Author

wyf0705 commented May 4, 2017

关于上面的core文件,能帮忙分析一下是哪里的问题吗?我这边机器内存大概有200G,SSD有2个多T,空间应该够用,我试了往单个pika里压数据没问题,就是挂到codis下就会有挂掉的。。。

@left2right
Copy link
Contributor

codis proxy就是将请求转了下发给后端server了,和直接连接pika没太大实质上的区别,我能想到codis连接后端pika server的问题就是,codis默认和后端server是单连接的,导致性能比较低(可以用多连接解决),但这个不应该导致后端server pika挂掉。所以我有些怀疑应该是直接压pika,也可能会挂掉。对这个core dump文件我定位不到问题的原因。也许 @baotiao @KernelMaker 能帮忙定位下?

@wyf0705
Copy link
Author

wyf0705 commented May 4, 2017

我这边又试了下,如果codis下只挂一个pika,压数据时不会挂掉,多于1个就会有挂掉的,我试了挂两个pika,压一会儿就有一个挂掉了

@left2right
Copy link
Contributor

你加入294254078 这个pika交流群,将你的情况,尤其是core dump的内容和pika的几位作者交流下,看你的core dump文件是底层库的地方,他们熟悉,如果能定位到是codis的问题,我可以进一步定位问题的原因,我对pika了解的很浅 ~ @wyf0705

@wyf0705
Copy link
Author

wyf0705 commented May 8, 2017

嗯,已经把相关信息发给作者帮忙解决了,多谢!

@baotiao baotiao closed this as completed May 8, 2017
AlexStocks pushed a commit to ipixiu/pika that referenced this issue Dec 21, 2018
* add RedisZsets::ZScan() interface

* create root directory before open db

* support dump database
luky116 added a commit to luky116/pika that referenced this issue Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants