Product Promotion
0x5a.live
for different kinds of informations and explorations.
GitHub - yanyiwu/gojieba: "结巴"中文分词的Golang版本
"结巴"中文分词的Golang版本. Contribute to yanyiwu/gojieba development by creating an account on GitHub.
Visit SiteGitHub - yanyiwu/gojieba: "结巴"中文分词的Golang版本
"结巴"中文分词的Golang版本. Contribute to yanyiwu/gojieba development by creating an account on GitHub.
Powered by 0x5a.live 💗
GoJieba
GoJieba是"结巴"中文分词的Golang语言版本。
简介
- 支持多种分词方式,包括: 最大概率模式, HMM新词发现模式, 搜索引擎模式, 全模式
- 核心算法底层由C++实现,性能高效。
- 字典路径可配置,NewJieba(...string), NewExtractor(...string) 可变形参,当参数为空时使用默认词典(推荐方式)
用法
go get github.com/yanyiwu/gojieba
分词示例
package main
import (
"fmt"
"strings"
"github.com/yanyiwu/gojieba"
)
func main() {
var s string
var words []string
use_hmm := true
x := gojieba.NewJieba()
defer x.Free()
s = "我来到北京清华大学"
words = x.CutAll(s)
fmt.Println(s)
fmt.Println("全模式:", strings.Join(words, "/"))
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("精确模式:", strings.Join(words, "/"))
s = "比特币"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("精确模式:", strings.Join(words, "/"))
x.AddWord("比特币")
// `AddWordEx` 支持指定词语的权重,作为 `AddWord` 权重太低加词失败的补充。
// `tag` 参数可以为空字符串,也可以指定词性。
// x.AddWordEx("比特币", 100000, "")
s = "比特币"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("添加词典后,精确模式:", strings.Join(words, "/"))
s = "他来到了网易杭研大厦"
words = x.Cut(s, use_hmm)
fmt.Println(s)
fmt.Println("新词识别:", strings.Join(words, "/"))
s = "小明硕士毕业于中国科学院计算所,后在日本京都大学深造"
words = x.CutForSearch(s, use_hmm)
fmt.Println(s)
fmt.Println("搜索引擎模式:", strings.Join(words, "/"))
s = "长春市长春药店"
words = x.Tag(s)
fmt.Println(s)
fmt.Println("词性标注:", strings.Join(words, ","))
s = "区块链"
words = x.Tag(s)
fmt.Println(s)
fmt.Println("词性标注:", strings.Join(words, ","))
s = "长江大桥"
words = x.CutForSearch(s, !use_hmm)
fmt.Println(s)
fmt.Println("搜索引擎模式:", strings.Join(words, "/"))
wordinfos := x.Tokenize(s, gojieba.SearchMode, !use_hmm)
fmt.Println(s)
fmt.Println("Tokenize:(搜索引擎模式)", wordinfos)
wordinfos = x.Tokenize(s, gojieba.DefaultMode, !use_hmm)
fmt.Println(s)
fmt.Println("Tokenize:(默认模式)", wordinfos)
keywords := x.ExtractWithWeight(s, 5)
fmt.Println("Extract:", keywords)
}
我来到北京清华大学
全模式: 我/来到/北京/清华/清华大学/华大/大学
我来到北京清华大学
精确模式: 我/来到/北京/清华大学
比特币
精确模式: 比特/币
比特币
添加词典后,精确模式: 比特币
他来到了网易杭研大厦
新词识别: 他/来到/了/网易/杭研/大厦
小明硕士毕业于中国科学院计算所,后在日本京都大学深造
搜索引擎模式: 小明/硕士/毕业/于/中国/科学/学院/科学院/中国科学院/计算/计算所/,/后/在/日本/京都/大学/日本京都大学/深造
长春市长春药店
词性标注: 长春市/ns,长春/ns,药店/n
区块链
词性标注: 区块链/nz
长江大桥
搜索引擎模式: 长江/大桥/长江大桥
长江大桥
Tokenize: [{长江 0 6} {大桥 6 12} {长江大桥 0 12}]
See Details in gojieba-demo See example in jieba_test, extractor_test
Benchmark
Unittest
go test ./...
Benchmark
go test -bench "Jieba" -test.benchtime 10s
go test -bench "Extractor" -test.benchtime 10s
Contributors
Code Contributors
This project exists thanks to all the people who contribute.
GoLang Resources
are all listed below.
GitHub - GuilhermeCaruso/anko: :crystal_ball: Simple application watcher
resource
~/github.com
resource
GitHub - jidicula/go-fuzz-action: GitHub Action for Go 1.18 fuzz testing
resource
~/github.com
resource
GitHub - tucnak/climax: Climax is an alternative CLI with the human face
resource
~/github.com
resource
GitHub - lawrencewoodman/roveralls: A Go recursive coverage testing tool
resource
~/github.com
resource
GitHub - nakagami/firebirdsql: Firebird RDBMS sql driver for Go (golang)
resource
~/github.com
resource
GitHub - liweiyi88/onedump: Effortlessly database dump with one command.
resource
~/github.com
resource
GitHub - beefsack/go-astar: Go implementation of the A* search algorithm
resource
~/github.com
resource
GitHub - lxn/walk: A Windows GUI toolkit for the Go Programming Language
resource
~/github.com
resource
GitHub - mongodb/mongo-go-driver: The Official Golang driver for MongoDB
resource
~/github.com
resource
GitHub - mozillazg/go-unidecode: ASCII transliterations of Unicode text.
resource
~/github.com
resource
GitHub - bolknote/go-gd: Go bingings for GD (http://www.boutell.com/gd/)
resource
~/github.com
resource
GitHub - mosajjal/dnsmonster: Passive DNS Capture and Monitoring Toolkit
resource
~/github.com
resource
GitHub - haxpax/gosms: :mailbox_closed: Your own local SMS gateway in Go
resource
~/github.com
resource
GitHub - wajox/gobase: This is a simple skeleton for golang applications
resource
~/github.com
resource
GitHub - VividCortex/gohistogram: Streaming approximate histograms in Go
resource
~/github.com
resource
GitHub - malaschitz/randomForest: Random Forest implementation in golang
resource
~/github.com
resource
GitHub - google/gopacket: Provides packet processing capabilities for Go
resource
~/github.com
resource
GitHub - khezen/evoli: Genetic Algorithm and Particle Swarm Optimization
resource
~/github.com
resource
GitHub - didip/tollbooth: Simple middleware to rate-limit HTTP requests.
resource
~/github.com
resource
GitHub - mustafaakin/gongular: A different approach to Go web frameworks
resource
~/github.com
resource
GitHub - songgao/colorgo: Colorize (highlight) `go build` command output
resource
~/github.com
resource
Made with ❤️
to provide different kinds of informations and resources.