Unicode transliterator (also known as unidecode) for Go.

Unicode transliterator (also known as unidecode) for Go
=======================================================

Use the following command to install gounidecode

go get -u github.com/fiam/gounidecode/unidecode

Example usage
=============

    package main

    import (
        "fmt"
        "github.com/fiam/gounidecode/unidecode"
    )

    func main() {
        fmt.Println(Unidecode("áéíóú")) // Will print aeiou
        fmt.Println(Unidecode("\u5317\u4EB0")) // Will print Bei Jing
        fmt.Println(Unidecode("Κνωσός")) // Will print Knosos
    }


gounidecode

# getlang

[![GoDoc](https://godoc.org/github.com/rylans/getlang?status.svg)](https://godoc.org/github.com/rylans/getlang) [![Go Report Card](https://goreportcard.com/badge/github.com/rylans/getlang)](https://goreportcard.com/report/github.com/rylans/getlang) [![Build Status](https://travis-ci.org/rylans/getlang.svg?branch=master)](https://travis-ci.org/rylans/getlang) ![cover.run go](https://cover.run/go/github.com/rylans/getlang.svg?tag=golang-1.10)

getlang provides fast natural language detection in Go.

## Features

* Offline -- no internet connection required
* Supports [29 languages](https://github.com/rylans/getlang/blob/master/LANGUAGES.md)
* Provides ISO 639 language codes
* Fast

## Getting started

Installation:
```sh
    go get -u github.com/rylans/getlang
```

example:
```go
package main

import (
	"fmt"
	"github.com/rylans/getlang"
)

func main(){
  info := getlang.FromString("Wszyscy ludzie rodzą się wolni i równi w swojej godności i prawach")
  fmt.Println(info.LanguageCode(), info.Confidence())
}
```

## Documentation
[getlang on godoc](https://godoc.org/github.com/rylans/getlang)

## License
[MIT](https://github.com/rylans/getlang/blob/master/LICENSE)

## Acknowledgements and Citations
* Thanks to [abadojack](https://github.com/abadojack) for the trigram generation logic in whatlanggo
* Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Ann arbor mi 48113.2 (1994): 161-175.


getlang

Functions to determine the natural language of a unicode text.

# guesslanguage [![Build Status](https://travis-ci.org/endeveit/guesslanguage.svg?branch=master)](https://travis-ci.org/endeveit/guesslanguage)

This is a Go version of python [guess-language](http://code.google.com/p/guess-language>).

guesslanguage provides a simple way to detect the natural language of unicode string and detects over 60 languages listed in the [models](https://github.com/endeveit/guesslanguage/tree/master/models) directory.

## Supported Go versions

guesslanguage is regularly tested against Go 1.1, 1.2, 1.3 and tip.

## Usage

Install in your `${GOPATH}` using `go get -u github.com/endeveit/guesslanguage`

Then call it:
```go
package main

import (
	"fmt"
	"github.com/endeveit/guesslanguage"
)

func main() {
	lang, err := guesslanguage.Guess("This is a test of the language checker.")

	// Output:
	// en
	if err != nil {
		fmt.Println(lang)
	}
}
```


guesslanguage

An accurate natural language detection library, suitable for long and short text alike. Supports detecting multiple languages in mixed-language text.

lingua-go

Natural language detection package for Go. Supports 84 languages and 24 scripts (writing systems e.g. Latin, Cyrillic, etc).

# Whatlanggo

[![Build Status](https://travis-ci.org/abadojack/whatlanggo.svg?branch=master)](https://travis-ci.org/abadojack/whatlanggo) [![Go Report Card](https://goreportcard.com/badge/github.com/abadojack/whatlanggo)](https://goreportcard.com/report/github.com/abadojack/whatlanggo) [![GoDoc](https://godoc.org/github.com/abadojack/whatlanggo?status.png)](https://godoc.org/github.com/abadojack/whatlanggo) [![Coverage Status](https://coveralls.io/repos/github/abadojack/whatlanggo/badge.svg)](https://coveralls.io/github/abadojack/whatlanggo)

Natural language detection for Go.
## Features
* Supports [84 languages](https://github.com/abadojack/whatlanggo/blob/master/SUPPORTED_LANGUAGES.md)
* 100% written in Go
* No external dependencies
* Fast
* Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

## Getting started
Installation:
```sh
 go get -u github.com/abadojack/whatlanggo
```

Simple usage example:
```go
package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}
```

## Blacklisting and whitelisting
```go
package main

import (
	"fmt"

	"github.com/abadojack/whatlanggo"
)

func main() {
	//Blacklist
	options := whatlanggo.Options{
		Blacklist: map[whatlanggo.Lang]bool{
			whatlanggo.Ydd: true,
		},
	}

	info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)

	fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])

	//Whitelist
	options1 := whatlanggo.Options{
		Whitelist: map[whatlanggo.Lang]bool{
			whatlanggo.Epo: true,
			whatlanggo.Ukr: true,
		},
	}

	info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}
```
For more details, please check the [documentation](https://godoc.org/github.com/abadojack/whatlanggo).

## Requirements
Go 1.8 or higher

## How does it work?

### How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams.
To understand the idea, please check the original whitepaper [Cavnar and Trenkle '94: N-Gram-Based Text Categorization'](https://www.researchgate.net/publication/2375544_N-Gram-Based_Text_Categorization).

### How _IsReliable_ calculated?

It is based on the following factors:
* How many unique trigrams are in the given text
* How big is the difference between the first and the second(not returned) detected languages? This metric is called `rate` in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas.
This function is a hyperbola and it looks like the following one:

<img alt="Language recognition whatlang rust" src="https://raw.githubusercontent.com/abadojack/whatlanggo/master/images/whatlang_is_reliable.png" width="450" height="300" />

For more details, please check a blog article [Introduction to Rust Whatlang Library and Natural Language Identification Algorithms](https://www.greyblake.com/blog/2017-07-30-introduction-to-rust-whatlang-library-and-natural-language-identification-algorithms/).

## License
[MIT](https://github.com/abadojack/whatlanggo/blob/master/LICENSE)

## Derivation
whatlanggo is a derivative of [Franc](https://github.com/wooorm/franc) (JavaScript, MIT) by [Titus Wormer](https://github.com/wooorm).

## Acknowledgements
Thanks to [greyblake](https://github.com/greyblake) (Potapov Sergey) for creating [whatlang-rs](https://github.com/greyblake/whatlang-rs) from where I got the idea and algorithms.

whatlanggo

Implementation of the porter stemming algorithm.

go-stem

Sentiment analyzer using sentiwordnet lexicon in Go.

# 💬 GoSentiwordnet

[![PkgGoDev](https://pkg.go.dev/badge/github.com/dinopuguh/gosentiwordnet/v2)](https://pkg.go.dev/github.com/dinopuguh/gosentiwordnet/v2)
[![Unit Test Status](https://github.com/dinopuguh/gosentiwordnet/actions/workflows/unit-test.yml/badge.svg?branch=master)](https://github.com/dinopuguh/gosentiwordnet/actions) 
[![Go Report Card](https://goreportcard.com/badge/github.com/dinopuguh/gosentiwordnet)](https://goreportcard.com/report/github.com/dinopuguh/gosentiwordnet)
[![codecov](https://codecov.io/gh/dinopuguh/gosentiwordnet/branch/master/graph/badge.svg)](https://codecov.io/gh/dinopuguh/gosentiwordnet)

Sentiment analyzer using [sentiwordnet](https://github.com/aesuli/SentiWordNet) lexicon in Go. This library produce sentiment score for each word, including positive, negative, and objective score.

## ⚙ Installation

First of all, [download](https://golang.org/dl/) and install Go `1.14` or higher is required.

Install this library using the [`go get`](https://golang.org/cmd/go/#hdr-Add_dependencies_to_current_module_and_install_them) command:

```bash
$ go get github.com/dinopuguh/gosentiwordnet/v2
```

## ⚡ Quickstart

```go
package main

import (
    "fmt"

    goswn "github.com/dinopuguh/gosentiwordnet/v2"
)

func main() {
    sa := goswn.New()

    scores, exist := sa.GetSentimentScore("love", "v", "2")
    if exist {
        fmt.Println("💬 Sentiment score:", scores) // => 💬 Sentiment score: {1 0 0}
    }
}
```

The `GetSentimentScore` required 3 parameters(word, pos-tag, and word usage):

1. **Word**: the word want to process
2. **POS tag**: part-of-speech tag of the word
3. **Word usage**: 1 for most common usage and a higher number would indicate lesser common usages

## 👍 Contributing

If you want to say **thank you** and/or support the active development of `Gosentiwordnet`:

1. Add a [GitHub Star](https://github.com/dinopuguh/gosentiwordnet/stargazers) to the project.
2. Write a review or tutorial on [Medium](https://medium.com/), [Dev.to](https://dev.to/) or personal blog.
3. Be a part of our [sponsors](https://github.com/sponsors/dinopuguh) to support this project.

## 💻 Contributors

- Dino Puguh (initial works)

Open for any pull requests to develop this project.


gosentiwordnet

Go implementation of [VADER Sentiment Analysis](https://github.com/cjhutto/vaderSentiment).

govader

Cgo binding for libtextcat C library. Guaranteed compatibility with version 2.2.

About
==========

Cgo binding for libtextcat C library. Guaranteed compatibility with version 2.2.

Installation
==========

Installation consists of several simple steps. They may be a bit different on your target system (e.g. require more permissions) so adapt them to the parameters of your system.

### Get libtextcat C library code

* Download original libtextcat archive from [libtextcat download section](http://software.wise-guys.nl/libtextcat/). 
* Unarchive it.

NOTE: If this link is not working or there are some problems with downloading, there is a stable version 2.2 snapshot saved in [Downloads](https://github.com/downloads/goodsign/libtextcat/libtextcat-2.2.tar.gz).

### Build and install libtextcat C library

From the directory, where you unarchived libtextcat, run:

```
./configure
make
sudo make install
sudo ldconfig 
```

### Install Go wrapper

```
go get github.com/goodsign/libtextcat
go test github.com/goodsign/libtextcat (must PASS)
```

Installation notes
==========

Make sure that you have your local library paths set correctly and that installation was successful. Otherwise, **go build** or **go test** may fail.

libtextcat is installed in your local library directory (e.g. **/usr/local/lib**) and puts its libraries there. This path should be registered in your system (using ldconfig or exporting LD_LIBRARY_PATH, etc.) or the linker would fail.

Usage
==========

```go
cat, err := NewTextCat(ConfigPath) // See 'Usage notes' section

if nil != err {
    // ... Handle error ...
}
defer cat.Close()

matches, err := cat.Classify(text)

if nil != err {
    // ... Handle error ...
}

// Use matches. 
// NOTE: matches[0] is the best match.

```

Usage notes
==========

libtextcat library needs to load language models to start guessing languages. These models are set using a configuration file and a number of language model (.lm) files.

Configuration file maps .lm files to identifiers used in the library. See [example](https://github.com/goodsign/libtextcat/blob/master/defaultcfg/conf.txt). Path to this file is specified in the **NewTextCat** call.

.lm files contain language patterns and frequencies for a specified language. See [example](https://github.com/goodsign/libtextcat/blob/master/defaultcfg/english.lm). Paths to these files are specified in the config file above. They can be absolute or relative (to the caller).

Quickstart
----------

To immediately get started, copy **/defaultcfg** folder contents to the directory of your target project and use:

```go
cat, err := NewTextCat("defaultcfg/conf.txt")
```

This will give you a standard set of languages described in the **Default configuration** section below.

Default configuration
----------

This package contains a default configuration (/defaultcfg) which is created to work in following conditions:

* Utf-8 only languages
* Language list is taken from [snowball](github.com/goodsign/snowball) package
* Language identifiers are the same as in [snowball](github.com/goodsign/snowball) package

This configuration is meant to be used in pair with the [snowball](github.com/goodsign/snowball) package.

More info
----------

For more information on libtextcat refer to the original [website](http://software.wise-guys.nl/libtextcat/), which contains links on theory and other details.

libtextcat Licence
==========

The libtextcat library is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)

[LICENCE file](https://github.com/goodsign/libtextcat/blob/master/LICENCE_libtextcat)

Licence
==========

The goodsign/libtextcat binding is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)

[LICENCE file](https://github.com/goodsign/libtextcat/blob/master/LICENCE)

libtextcat

Extract values from strings and fill your structs with nlp.

[![GoDoc](https://godoc.org/github.com/shixzie/nlp?status.svg)](https://godoc.org/github.com/shixzie/nlp) 
[![Go Report Card](https://goreportcard.com/badge/github.com/shixzie/nlp)](https://goreportcard.com/report/github.com/shixzie/nlp)
[![Build Status](https://travis-ci.org/shixzie/nlp.svg?branch=master)](https://travis-ci.org/shixzie/nlp)
[![codecov](https://codecov.io/gh/shixzie/nlp/branch/master/graph/badge.svg)](https://codecov.io/gh/shixzie/nlp)


# nlp

> `nlp` is a general purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model

## Supported types
```go
int  int8  int16  int32  int64
uint uint8 uint16 uint32 uint64
float32 float64
string
time.Time
time.Duration
```

## Installation
```
// go1.8+ is required
go get -u github.com/shixzie/nlp
```


**Feel free to create PR's and open Issues :)**

## How it works

You will always begin by creating a NL type calling nlp.New(), the NL type is a 
Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().

### RegisterModel(i interface{}, samples []string, ops ...ModelOption) error

RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.

The empty struct lets nlp know all possible values inside the text, for example:
```go
type Song struct {
	Name        string // fields must be exported
	Artist      string
	ReleasedAt  time.Time
}
err := nl.RegisterModel(Song{}, someSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}
// ...
```

tells nlp that inside the text may be a Song.Name, a Song.Artist and a Song.ReleasedAt.

The samples are the key part about nlp, not just because they set the *limits*
between *keywords* but also because they will be used to choose which model 
use to handle an expression.

Samples must have a special syntax to set those *limits* and *keywords*.
```go
songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}
```

In the example below, you can see we're reffering to the Name and Artist fields
of the `Song` type declared above, both `{Name}` and `{Artist}` are our *keywords* 
and yes! you guessed it! Everything between `play` and `by` will be treated as a
`{Name}`, and everything that's after `by` will be treated as an `{Artist}` meaning 
that `play` and `by` are our *limits*.
```
     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords
```

Any character can be a *limit*, a `,` for example can be used as a limit.

*keywords* as well as *limits* are `CaseSensitive` so be sure to type them right.

**Note that putting 2 *keywords* together will cause that only 1 or none of them will be detected**

> *limits are important* - Me :3


### Learn() error

Learn maps all models samples to their respective models using the NaiveBayes 
algorithm based on those samples. `Learn()` also trains all registered models
so they're able to fit expressions in the future.

```go
// must call after all models are registrated and before calling nl.P()
err := nl.Learn() 
if err != nil {
	panic(err)
}
// ...
```

Once the algorithm has finished learning, we're now ready to start Processing 
those texts.

**Note that you must call NL.Learn() after all models are registrated and before calling NL.P()**

### P(expr string) interface{}

P first asks the trained algorithm which model should be used, once we get
the right *and already trained* model, we just make it fit the expression.

**Note that everything in the expression must be separated by a _space_ or _tab_**

When processing an expression, nlp searches for the *limits* inside that 
expression and evaluates which sample fits better the expression, it doesn't
matter if the text has `trash`. In this example:
```
     limits
 ┌─────┴─────┐
┌┴─┐        ┌┴┐
play {Name} by  {Artist}
     └─┬──┘     └───┬──┘
       └──────┬─────┘
           keywords
```

we have 2 *limits*, `play` and `by`, it doesn't matter if we had an expression 
*hello sir can you pleeeeeease play King by Lauren Aquilina*, since:
```
                                  limits
            trash              ┌────┴────┐
┌─────────────┴─────────────┐ ┌┴─┐      ┌┴┐
hello sir can you pleeeeeease play King by  Lauren Aquilina
                                   └┬─┘     └─────┬───────┘
                                 {Name}       {Artist}
                                 └─┬──┘       └───┬──┘
                                   └──────┬───────┘
                                       keywords
```

`{Name}` would be replaced with `King`, 
`{Artist}` would be replaced with `Lauren Aquilina`, 
`trash` would be ignored as well as the *limits* `play` and `by`, 
and then **a pointer to a filled struct with the type used to register the model** (`Song`) 
( `Song.Name` being `{Name}` and `Song.Artist` beign `{Artist}` ) 
**will be returned**.

## Usage

```go
type Song struct {
	Name       string
	Artist     string
	ReleasedAt time.Time
}

songSamples := []string{
	"play {Name} by {Artist}",
	"play {Name} from {Artist}",
	"play {Name}",
	"from {Artist} play {Name}",
	"play something from {ReleasedAt}",
}

nl := nlp.New()
err := nl.RegisterModel(Song{}, songSamples, nlp.WithTimeFormat("2006"))
if err != nil {
	panic(err)
}

err = nl.Learn() // you must call Learn after all models are registered and before calling P
if err != nil {
	panic(err)
}

// after learning you can call P the times you want
s := nl.P("hello sir can you pleeeeeease play King by Lauren Aquilina") 
if song, ok := s.(*Song); ok {
	fmt.Println("Success")
	fmt.Printf("%#v\n", song)
} else {
	fmt.Println("Failed")
}

// Prints
//
// Success
// &main.Song{Name: "King", Artist: "Lauren Aquilina"}
```


Go Natural Language Processing library supporting LSA (Latent Semantic Analysis).

# Natural Language Processing 
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) 
[![GoDoc](https://godoc.org/github.com/james-bowman/nlp?status.svg)](https://godoc.org/github.com/james-bowman/nlp) 
[![Build Status](https://travis-ci.org/james-bowman/nlp.svg?branch=master)](https://travis-ci.org/james-bowman/nlp)
[![Go Report Card](https://goreportcard.com/badge/github.com/james-bowman/nlp)](https://goreportcard.com/report/github.com/james-bowman/nlp)
[![codecov](https://codecov.io/gh/james-bowman/nlp/branch/master/graph/badge.svg)](https://codecov.io/gh/james-bowman/nlp)
[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge-flat.svg)](https://github.com/avelino/awesome-go)
[![Sourcegraph](https://sourcegraph.com/github.com/james-bowman/nlp/-/badge.svg)](https://sourcegraph.com/github.com/james-bowman/nlp?badge)


<img src="https://github.com/james-bowman/nlp/raw/master/Gophers.008.crop.png" alt="nlp" align="left" />

Implementations of selected machine learning algorithms for natural language processing in golang. The primary focus for the package is the statistical semantics of plain-text documents supporting semantic analysis and retrieval of semantically similar documents.

Built upon the [Gonum](https://www.gonum.org/) package for linear algebra and scientific computing with some inspiration taken from Python's [scikit-learn](http://scikit-learn.org/stable/) and [Gensim](https://radimrehurek.com/gensim/).

Check out [the companion blog post](http://www.jamesbowman.me/post/semantic-analysis-of-webpages-with-machine-learning-in-go/) or [the Go documentation page](https://godoc.org/github.com/james-bowman/nlp) for full usage and examples.

 

## Features

* [LSA (Latent Semantic Analysis aka Latent Semantic Indexing (LSI))][LSA] implementation using truncated [SVD (Singular Value Decomposition)](https://en.wikipedia.org/wiki/Singular-value_decomposition) for dimensionality reduction.
* Fast comparison and retrieval of semantically similar documents using [SimHash](https://en.wikipedia.org/wiki/SimHash)(random hyperplanes/[sign random projection](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Random_projection)) algorithm with multi-index and Forest schemes for [LSH (Locality Sensitive Hashing)](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) to support fast, approximate cosine similarity/angular distance comparisons and approximate nearest neighbour search using significantly less memory and processing time.
* [Random Indexing (RI)](https://en.wikipedia.org/wiki/Random_indexing) and Reflective Random Indexing (RRI) (which extends RI to support indirect inference) for scalable [Latent Semantic Analysis (LSA)][LSA] over large, web-scale corpora.
* [Latent Dirichlet Allocation (LDA)](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) using a parallelised implementation of the fast [SCVB0 (Stochastic Collapsed Variational Bayesian inference)][SCVB0] algorithm for unsupervised topic extraction. 
* [PCA (Principal Component Analysis)](https://en.wikipedia.org/wiki/Principal_component_analysis)
* [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) weighting to account for frequently occuring words
* [Sparse matrix](http://github.com/james-bowman/sparse) implementations used for more efficient memory usage and processing over large document corpora.
* Stop word removal to remove frequently occuring English words e.g. "the", "and"
* [Feature hashing](https://en.wikipedia.org/wiki/Feature_hashing) ('the hashing trick') implementation (using [MurmurHash3](http://github.com/spaolacci/murmur3)) for reduced memory requirements and reduced reliance on training data
* Similarity/distance measures to calculate the similarity/distance between feature vectors.

## Planned

* Expanded persistence support
* Stemming to treat words with common root as the same e.g. "go" and "going"
* Clustering algorithms e.g. Heirachical, K-means, etc.
* Classification algorithms e.g. SVM, KNN, random forest, etc.

## References

1. [Rosario, Barbara. Latent Semantic Indexing: An overview. INFOSYS 240 Spring 2000](http://people.ischool.berkeley.edu/~rosario/projects/LSI.pdf)
1. [Latent Semantic Analysis, a scholarpedia article on LSA written by Tom Landauer, one of the creators of LSA.](http://www.scholarpedia.org/article/Latent_semantic_analysis)
1. [Thomo, Alex. Latent Semantic Analysis (Tutorial).](http://webhome.cs.uvic.ca/~thomo/svd.pdf)
1. [Latent Semantic Indexing. Standford NLP Course](http://nlp.stanford.edu/IR-book/html/htmledition/latent-semantic-indexing-1.html)
1. [Charikar, Moses S. "Similarity Estimation Techniques from Rounding Algorithms" in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing - STOC ’02, 2002, p. 380.](https://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/CharikarEstim.pdf)
1. [M. Bawa, T. Condie, and P. Ganesan, “LSH forest: self-tuning indexes for similarity search,” Proc. 14th Int. Conf. World Wide Web - WWW ’05, p. 651, 2005.](http://dl.acm.org/citation.cfm?id=1060745.1060840)
1. [A. Gionis, P. Indyk, and R. Motwani, “Similarity Search in High Dimensions via Hashing,” VLDB ’99 Proc. 25th Int. Conf. Very Large Data Bases, vol. 99, no. 1, pp. 518–529, 1999.](http://www.cs.princeton.edu/courses/archive/spring13/cos598C/Gionis.pdf%5Cnhttp://portal.acm.org/citation.cfm?id=671516)
1. [Kanerva, Pentti, Kristoferson, Jan and Holst, Anders (2000). Random Indexing of Text Samples for Latent Semantic Analysis](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4.6523&rep=rep1&type=pdf)
1. [Rangan, Venkat. Discovery of Related Terms in a corpus using Reflective Random Indexing](https://www.umiacs.umd.edu/~oard/desi4/papers/rangan.pdf)
1. [Vasuki, Vidya and Cohen, Trevor. Reflective random indexing for semi-automatic indexing of the biomedical literature](https://ac.els-cdn.com/S1532046410000481/1-s2.0-S1532046410000481-main.pdf?_tid=f31f92e8-028a-11e8-8c31-00000aab0f6c&acdnat=1516965824_e24a804445fff1744281ca6f5898a3a4)
1. [QasemiZadeh, Behrang and Handschuh, Siegfried. Random Indexing Explained with High Probability](http://pars.ie/publications/papers/pre-prints/random-indexing-dr-explained.pdf)
1. [Foulds, James; Boyles, Levi; Dubois, Christopher; Smyth, Padhraic; Welling, Max (2013). Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation][SCVB0]



[LSA]: https://en.wikipedia.org/wiki/Latent_semantic_analysis
[SCVB0]: https://arxiv.org/pdf/1305.2452

Go port of the Rapid Automatic Keyword Extraction Algorithm (RAKE).

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.

Original Python implementation available at: https://github.com/aneesha/RAKE

The source code is released under the MIT License.

## Docs and Report Card
- godoc.org: https://godoc.org/github.com/afjoseph/RAKE.Go
- goreportcard.com: https://goreportcard.com/report/github.com/afjoseph/RAKE.Go

## Example Usage

```go
package main

import (
	"github.com/afjoseph/goRAKE"
	"fmt"
)

func main() {
	text := `The growing doubt of human autonomy and reason has created a state of moral confusion where man is left without the guidance of either revelation or reason. The result is the acceptance of a relativistic position which proposes that value judgements and ethical norms are exclusively matters of arbitrary preference and that no objectively valid statement can be made in this realm... But since man cannot live without values and norms, this relativism makes him an easy prey for irrational value systems.`

	candidates := rake.RunRake(text)

	for _, candidate := range candidates {
		fmt.Printf("%s --> %f\n", candidate.Key, candidate.Value)
	}

	fmt.Printf("\nsize: %d\n", len(candidates))
}




 9.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 4.000000-->
 3.500000-->
 1.500000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->
 1.000000-->


```

RAKE.go

Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality [Snowball native](http://snowball.tartarus.org/).

Description
====

Snowball stemmer port (cgo wrapper) for Go. Provides word stem extraction functionality. For more detailed info see http://snowball.tartarus.org/

Installing
====

```
go get github.com/goodsign/snowball
go test github.com/goodsign/snowball (Must PASS)
```

Done! Use it in your go files. (import 'github.com/goodsign/snowball')

Usage
====

```go
  stemmer, err := NewWordStemmer(algorithm, encoding)
  
  if nil != err {
    /*...handle error...*/
  }
  defer stemmer.Close() 

  wordStem, err := stemmer.Stem(word)
  if nil != err {
    /*...handle error...*/
  }

  /* Use wordStem */

```
Usage notes
-----------

According to Snowball documentation:

```
Creating a stemmer is a relatively expensive operation - the expected
usage pattern is that a new stemmer is created when needed, used
to stem many words, and deleted after some time.
```

Algorithms & encodings
----

File **modules.txt** contains all the main algorithms for each language, in UTF-8, and also with
the most commonly used encoding.

```
Language        Encodings               Algorithms

danish          UTF_8,ISO_8859_1        danish,da,dan
dutch           UTF_8,ISO_8859_1        dutch,nl,dut,nld
english         UTF_8,ISO_8859_1        english,en,eng
finnish         UTF_8,ISO_8859_1        finnish,fi,fin
french          UTF_8,ISO_8859_1        french,fr,fre,fra
german          UTF_8,ISO_8859_1        german,de,ger,deu
hungarian       UTF_8,ISO_8859_1        hungarian,hu,hun
italian         UTF_8,ISO_8859_1        italian,it,ita
norwegian       UTF_8,ISO_8859_1        norwegian,no,nor
portuguese      UTF_8,ISO_8859_1        portuguese,pt,por
romanian        UTF_8,ISO_8859_2        romanian,ro,rum,ron
russian         UTF_8,KOI8_R            russian,ru,rus
spanish         UTF_8,ISO_8859_1        spanish,es,esl,spa
swedish         UTF_8,ISO_8859_1        swedish,sv,swe
turkish         UTF_8                   turkish,tr,tur
```

Thread-safety
====

The original Snowball documentation says:

```
Stemmers are re-entrant, but not threadsafe.  In other words, if
you wish to access the same stemmer object from multiple threads,
you must ensure that all access is protected by a mutex or similar
device.
```

Thus this Go wrapper uses **sync.Mutex** for each stem operation, so it is thread safe.

Snowball Licence
==========

The Snowball library is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)

Licence
==========

The goodsign/snowball binding is released under the [BSD Licence](http://opensource.org/licenses/bsd-license.php)

[LICENCE file](https://github.com/goodsign/libtextcat/blob/master/LICENCE)

snowball

Self-contained Machine Learning and Natural Language Processing library in Go.

<img src="https://github.com/nlpodyssey/spago/blob/main/assets/spago_logo.png" width="400"/>
 


 <a href="https://github.com/nlpodyssey/spago/actions/workflows/go.yml?query=branch%3Amain">
 <img alt="Build" src="https://github.com/nlpodyssey/spago/actions/workflows/go.yml/badge.svg?branch=main">
 </a>
 <a href="https://codecov.io/gh/nlpodyssey/spago">
 <img alt="Coverage" src="https://codecov.io/gh/nlpodyssey/spago/branch/main/badge.svg">
 </a>
 <a href="https://goreportcard.com/report/github.com/nlpodyssey/spago">
 <img alt="Go Report Card" src="https://goreportcard.com/badge/github.com/nlpodyssey/spago">
 </a>
 <a href="https://codeclimate.com/github/nlpodyssey/spago/maintainability">
 <img alt="Maintainability" src="https://api.codeclimate.com/v1/badges/be7350d3eb1a6a8aa503/maintainability">
 </a>
 <a href="https://pkg.go.dev/github.com/nlpodyssey/spago/">
 <img alt="Documentation" src="https://pkg.go.dev/badge/github.com/nlpodyssey/spago/.svg">
 </a>
 <a href="https://opensource.org/licenses/BSD-2-Clause">
 <img alt="License" src="https://img.shields.io/badge/License-BSD%202--Clause-orange.svg">
 </a>
 <a href="http://makeapullrequest.com">
 <img alt="PRs Welcome" src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square">
 </a>
 <a href="https://github.com/avelino/awesome-go">
 <img alt="Awesome Go" src="https://awesome.re/mentioned-badge.svg">
 </a>


 
 If you like the project, please ★ star this repository to show your support! 🤩
 
 


> 15 Jan 2024 - As I reflect on the journey of Spago, I am filled with gratitude for the enriching experience it has provided me. Mastering Go and revisiting the fundamentals of Deep Learning through Spago has been immensely rewarding. The unique features of Spago, especially its asynchronous computation graph and focusing on clean coding, have made it an extraordinary project to work on. Our goal was to create a minimalist ML framework in Go, eliminating the dependency on Python in production by enabling the creation of standalone executables. This approach of Spago successfully powered several of my projects in challenging production environments.
> 
> However, the endeavor to elevate Spago to a level where it can compete effectively in the evolving 'AI space', which now extensively involves computation on GPUs, requires substantial commitment. At the same time, the vision that Spago aspired to achieve is now being impressively realized by the [Candle](https://github.com/huggingface/candle) project in Rust. With my limited capacity to dedicate the necessary attention to Spago, and in the absence of a supporting maintenance team, I have made the pragmatic decision to pause the project for now.
>
> I am deeply grateful for the journey Spago has taken me on and for the community that has supported it. As we continue to explore the ever-evolving field of machine learning, I look forward to the exciting developments that lie ahead.
> 
> Warm regards,
>
> Matteo Grella

---

Spago is a **Machine Learning** library written in pure Go designed to support relevant neural architectures in **Natural
Language Processing**.

Spago is self-contained, in that it uses its own lightweight computational graph both for training and
inference, easy to understand from start to finish. 

It provides:
- Automatic differentiation via dynamic define-by-run execution
- Feed-forward layers (Linear, Highway, Convolution...)
- Recurrent layers (LSTM, GRU, BiLSTM...)
- Attention layers (Self-Attention, Multi-Head Attention...)
- Gradient descent optimizers (Adam, RAdam, RMS-Prop, AdaGrad, SGD)
- Gob compatible neural models for serialization

If you're interested in NLP-related functionalities, be sure to explore the [Cybertron](https://github.com/nlpodyssey/cybertron) package!

## Usage

Requirements:

* [Go 1.21](https://golang.org/dl/)

Clone this repo or get the library:

```console
go get -u github.com/nlpodyssey/spago
```

### Getting Started

A good place to start is by looking at the implementation of built-in neural models, such as the LSTM.

### Example 1
Here is an example of how to calculate the sum of two variables:

```go
package main

import (
	"fmt"
	"log"

	"github.com/nlpodyssey/spago/ag"
	"github.com/nlpodyssey/spago/mat"
)

func main() {
	// define the type of the elements in the tensors
	type T = float32

	// create a new node of type variable with a scalar
	a := mat.Scalar(T(2.0), mat.WithGrad(true)) // create another node of type variable with a scalar
	b := mat.Scalar(T(5.0), mat.WithGrad(true)) // create an addition operator (the calculation is actually performed here)
	c := ag.Add(a, b)

	// print the result
	fmt.Printf("c = %v (float%d)\n", c.Value(), c.Value().Item().BitSize())

	c.AccGrad(mat.Scalar(T(0.5)))

	if err := ag.Backward(c); err != nil {
		log.Fatalf("error during Backward(): %v", err)
	}

	fmt.Printf("ga = %v\n", a.Grad())
	fmt.Printf("gb = %v\n", b.Grad())
}
```

Output:

```console
c = [7] (float32)
ga = [0.5]
gb = [0.5]
```

### Example 2

Here is a simple implementation of the perceptron formula:

```go
package main

import (
	"fmt"
	
	. "github.com/nlpodyssey/spago/ag"
	"github.com/nlpodyssey/spago/mat"
)

func main() {
	x := mat.Scalar(-0.8)
	w := mat.Scalar(0.4)
	b := mat.Scalar(-0.2)

	y := Sigmoid(Add(Mul(w, x), b))

	fmt.Printf("y = %0.3f\n", y.Value().Item())
}
```

## Contributing

If you think something is missing or could be improved, please open issues and pull requests.

To start contributing, check the [Contributing Guidelines](https://github.com/nlpodyssey/spago/blob/main/CONTRIBUTING.md).

## Contact

We highly encourage you to create an issue as it will contribute to the growth of the community. However, if you prefer to communicate with us privately, please feel free to email [Matteo Grella](mailto:matteogrella@gmail.com) with any questions or comments you may have.

spaGO

A spelling corrector for the Spanish language or create your own.

spelling-corrector

Go efficient text segmentation; support english, chinese, japanese and other.

# gse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.
And supports with [elasticsearch](https://github.com/vcaesar/go-gse-elastic) and [bleve](https://github.com/vcaesar/gse-bleve).




[![Build Status](https://github.com/go-ego/gse/workflows/Go/badge.svg)](https://github.com/go-ego/gse/commits/master)
[![CircleCI Status](https://circleci.com/gh/go-ego/gse.svg?style=shield)](https://circleci.com/gh/go-ego/gse)
[![codecov](https://codecov.io/gh/go-ego/gse/branch/master/graph/badge.svg)](https://codecov.io/gh/go-ego/gse)
[![Build Status](https://travis-ci.org/go-ego/gse.svg)](https://travis-ci.org/go-ego/gse)
[![Go Report Card](https://goreportcard.com/badge/github.com/go-ego/gse)](https://goreportcard.com/report/github.com/go-ego/gse)
[![GoDoc](https://godoc.org/github.com/go-ego/gse?status.svg)](https://godoc.org/github.com/go-ego/gse)
[![GitHub release](https://img.shields.io/github/release/go-ego/gse.svg)](https://github.com/go-ego/gse/releases/latest)
[![Join the chat at https://gitter.im/go-ego/ego](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/go-ego/ego?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)




[简体中文](https://github.com/go-ego/gse/blob/master/README_zh.md)

Gse is implements jieba by golang, and try add NLP support and more feature

## Feature:

- Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes;
- Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words
- Support multilingual: English, Chinese, Japanese and others
- Support Traditional Chinese
- Support HMM cut text use Viterbi algorithm
- Support NLP by TensorFlow (in work)
- Named Entity Recognition (in work)
- Supports with [elasticsearch](https://github.com/vcaesar/go-gse-elastic) and bleve
- run<a href="https://github.com/go-ego/gse/blob/master/tools/server/server.go"> JSON RPC service</a>.

## Algorithm:

- [Dictionary](https://github.com/go-ego/gse/blob/master/dictionary.go) with double array trie (Double-Array Trie) to achieve
- [Segmenter](https://github.com/go-ego/gse/blob/master/dag.go) algorithm is the shortest path (based on word frequency and dynamic programming), and DAG and HMM algorithm word segmentation.

## Text Segmentation speed:

- <a href="https://github.com/go-ego/gse/blob/master/tools/benchmark/benchmark.go"> single thread</a> 9.2MB/s
- <a href="https://github.com/go-ego/gse/blob/master/tools/benchmark/goroutines/goroutines.go">goroutines concurrent</a> 26.8MB/s.
- HMM text segmentation single thread 3.2MB/s. (2core 4threads Macbook Pro).

## Binding:

[gse-bind](https://github.com/vcaesar/gse-bind), binding JavaScript and other, support more language.

## Install / update

With Go module support (Go 1.11+), just import:

```go
import "github.com/go-ego/gse"
```

Otherwise, to install the gse package, run the command:

```
go get -u github.com/go-ego/gse
```

## Use

```go
package main

import (
	_ "embed"
	"fmt"

	"github.com/go-ego/gse"
)

//go:embed testdata/test_en2.txt
var testDict string

//go:embed testdata/test_en.txt
var testEn string

var (
	text = "To be or not to be, that's the question!"
	test1 = "Hiworld, Helloworld!"
)

func main() {
	var seg1 gse.Segmenter
	seg1.DictSep = ","
	err := seg1.LoadDict("./testdata/test_en.txt")
	if err != nil {
		fmt.Println("Load dictionary error: ", err)
	}

	s1 := seg1.Cut(text)
	fmt.Println("seg1 Cut: ", s1)
	// seg1 Cut: [to be or not to be , that's the question!]

	var seg2 gse.Segmenter
	seg2.AlphaNum = true
	seg2.LoadDict("./testdata/test_en_dict3.txt")

	s2 := seg2.Cut(test1)
	fmt.Println("seg2 Cut: ", s2)
	// seg2 Cut: [hi world , hello world !]

	var seg3 gse.Segmenter
	seg3.AlphaNum = true
	seg3.DictSep = ","
	err = seg3.LoadDictEmbed(testDict + "\n" + testEn)
	if err != nil {
		fmt.Println("loadDictEmbed error: ", err)
	}
	s3 := seg3.Cut(text + test1)
	fmt.Println("seg3 Cut: ", s3)
	// seg3 Cut: [to be or not to be , that's the question! hi world , hello world !]

	// example2()
}
```

Example2:

```go
package main

import (
	"fmt"
	"regexp"

	"github.com/go-ego/gse"
	"github.com/go-ego/gse/hmm/pos"
)

var (
	text = "Hello world, Helloworld. Winter is coming! こんにちは世界, 你好世界."

	new, _ = gse.New("zh,testdata/test_en_dict3.txt", "alpha")

	seg gse.Segmenter
	posSeg pos.Segmenter
)

func main() {
	// Loading the default dictionary
	seg.LoadDict()
	// Loading the default dictionary with embed
	// seg.LoadDictEmbed()
	//
	// Loading the Simplified Chinese dictionary
	// seg.LoadDict("zh_s")
	// seg.LoadDictEmbed("zh_s")
	//
	// Loading the Traditional Chinese dictionary
	// seg.LoadDict("zh_t")
	//
	// Loading the Japanese dictionary
	// seg.LoadDict("jp")
	//
	// Load the dictionary
	// seg.LoadDict("your gopath"+"/src/github.com/go-ego/gse/data/dict/dictionary.txt")

	cut()

	segCut()
}

func cut() {
	hmm := new.Cut(text, true)
	fmt.Println("cut use hmm: ", hmm)

	hmm = new.CutSearch(text, true)
	fmt.Println("cut search use hmm: ", hmm)
	fmt.Println("analyze: ", new.Analyze(hmm, text))

	hmm = new.CutAll(text)
	fmt.Println("cut all: ", hmm)

	reg := regexp.MustCompile(`(\d+年|\d+月|\d+日|[\p{Latin}]+|[\p{Hangul}]+|\d+\.\d+|[a-zA-Z0-9]+)`)
	text1 := `헬로월드 헬로 서울, 2021年09月10日, 3.14`
	hmm = seg.CutDAG(text1, reg)
	fmt.Println("Cut with hmm and regexp: ", hmm, hmm[0], hmm[6])
}

func analyzeAndTrim(cut []string) {
	a := seg.Analyze(cut, "")
	fmt.Println("analyze the segment: ", a)

	cut = seg.Trim(cut)
	fmt.Println("cut all: ", cut)

	fmt.Println(seg.String(text, true))
	fmt.Println(seg.Slice(text, true))
}

func cutPos() {
	po := seg.Pos(text, true)
	fmt.Println("pos: ", po)
	po = seg.TrimPos(po)
	fmt.Println("trim pos: ", po)

	pos.WithGse(seg)
	po = posSeg.Cut(text, true)
	fmt.Println("pos: ", po)

	po = posSeg.TrimWithPos(po, "zg")
	fmt.Println("trim pos: ", po)
}

func segCut() {
	// Text Segmentation
	tb := []byte(text)
	fmt.Println(seg.String(text, true))

	segments := seg.Segment(tb)
	// Handle word segmentation results, search mode
	fmt.Println(gse.ToString(segments, true))
}

```

[Look at an custom dictionary example](/examples/dict/main.go)

```Go
package main

import (
	"fmt"
	_ "embed"

	"github.com/go-ego/gse"
)

//go:embed test_en_dict3.txt
var testDict string

func main() {
	// var seg gse.Segmenter
	// seg.LoadDict("zh, testdata/zh/test_dict.txt, testdata/zh/test_dict1.txt")
	// seg.LoadStop()
	seg, err := gse.NewEmbed("zh, word 20 n"+testDict, "en")
	// seg.LoadDictEmbed()
	seg.LoadStopEmbed()

	text1 := "Hello world, こんにちは世界, 你好世界!"
	s1 := seg.Cut(text1, true)
	fmt.Println(s1)
	fmt.Println("trim: ", seg.Trim(s1))
	fmt.Println("stop: ", seg.Stop(s1))
	fmt.Println(seg.String(text1, true))

	segments := seg.Segment([]byte(text1))
	fmt.Println(gse.ToString(segments))
}
```

[Look at an Chinese example](/examples/main.go)

[Look at an Japanese example](/examples/jp/main.go)

## Elasticsearch

How to use it with elasticsearch?

[go-gse-elastic](https://github.com/vcaesar/go-gse-elastic)

## Authors

- [Maintainers](https://github.com/orgs/go-ego/people)
- [Contributors](https://github.com/go-ego/gse/graphs/contributors)

## License

Gse is primarily distributed under the terms of "both the MIT license and the Apache License (Version 2.0)".
See [LICENSE-APACHE](http://www.apache.org/licenses/LICENSE-2.0), [LICENSE-MIT](https://github.com/go-vgo/robotgo/blob/master/LICENSE).

Thanks for [sego](https://github.com/huichen/sego) and [jieba](https://github.com/fxsjy/jieba)([jiebago](https://github.com/wangbin/jiebago)).

This is a GO implementation of [MMSEG](http://technology.chtsai.org/mmseg/) which a Chinese word splitting algorithm.

MMSEGO
=====
This is a GO implementation of [MMSEG](http://technology.chtsai.org/mmseg/) which a Chinese word splitting algorithm.

TO DO list
----------
* Documentation/comments
* Benchmark

Usage
---------
#Input Dictionary Format
```sh
Key\tFreq
```
Each key occupies one line. The file should be utf-8 encoded, please refer to [go-darts](https://github.com/awsong/go-darts)

#Code example
```go
package main

import (
    "fmt"
    "time"
    "os"
    "mmsego"
    "bufio"
    "log"
    )

func main() {
    var s = new(mmsego.Segmenter)
    s.Init("darts.lib")
    if err != nil {
	log.Fatal(err)
    }

    t := time.Now()
    offset := 0

    unifile, _ := os.Open("/tmp/a.txt")
    uniLineReader := bufio.NewReaderSize(unifile, 4000)
    line, bufErr := uniLineReader.ReadString('\n')
    for nil == bufErr {
	//takeWord := func(off int, length int){ fmt.Printf("%s ", string(line[off-offset:off-offset+length])) }
	takeWord := func(off, length int){ }
	s.Mmseg(line[:], offset, takeWord, nil, false)
	offset += len(line)
	line, bufErr = uniLineReader.ReadString('\n')
    }
    takeWord := func(off int, length int){ fmt.Printf("%s ", string(line[off-offset:off-offset+length])) }
    s.Mmseg(line, offset, takeWord, nil, true)

    fmt.Printf("Duration: %v\n", time.Since(t))
}
```
LICENSE
-----------
Apache License 2.0


MMSEGO

Library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more. English only.

# prose [![Build Status](https://travis-ci.org/jdkato/prose.svg?branch=master)](https://travis-ci.org/jdkato/prose) [![GoDoc](https://godoc.org/github.com/golang/gddo?status.svg)](https://pkg.go.dev/github.com/jdkato/prose/v2@v2.0.0?tab=doc) [![Coverage Status](https://coveralls.io/repos/github/jdkato/prose/badge.svg?branch=master)](https://coveralls.io/github/jdkato/prose?branch=master) [![Go Report Card](https://goreportcard.com/badge/github.com/jdkato/prose)](https://goreportcard.com/report/github.com/jdkato/prose) [![codebeat badge](https://codebeat.co/badges/a867ec38-c025-4f65-85f9-89a9188cc458)](https://codebeat.co/projects/github-com-jdkato-prose-master) [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/avelino/awesome-go#natural-language-processing)

`prose` is a natural language processing library (English only, at the moment) in *pure Go*. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction.

You can find a more detailed summary on the library's performance here: [Introducing `prose` v2.0.0: Bringing NLP *to Go*](https://medium.com/@errata.ai/introducing-prose-v2-0-0-bringing-nlp-to-go-a1f0c121e4a5).

## Installation

```console
$ go get github.com/jdkato/prose/v2
```

## Usage

### Contents

* [Overview](#overview)
* [Tokenizing](#tokenizing)
* [Segmenting](#segmenting)
* [Tagging](#tagging)
* [NER](#ner)

### Overview


```go
package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("Go is an open-source programming language created at Google.")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag, tok.Label)
        // Go NNP B-GPE
        // is VBZ O
        // an DT O
        // ...
    }

    // Iterate over the doc's named-entities:
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Go GPE
        // Google GPE
    }

    // Iterate over the doc's sentences:
    for _, sent := range doc.Sentences() {
        fmt.Println(sent.Text)
        // Go is an open-source programming language created at Google.
    }
}
```

The document-creation process adheres to the following sequence of steps:

```text
tokenization -> POS tagging -> NE extraction
            \
             segmentation
```

Each step may be disabled (assuming later steps aren't required) by passing the appropriate [*functional option*](https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis). To disable named-entity extraction, for example, you'd do the following:

```go
doc, err := prose.NewDocument(
        "Go is an open-source programming language created at Google.",
        prose.WithExtraction(false))
```

### Tokenizing

`prose` includes a tokenizer capable of processing modern text, including the non-word character spans shown below.

| Type            | Example                           |
|-----------------|-----------------------------------|
| Email addresses | `Jane.Doe@example.com`            |
| Hashtags        | `#trending`                       |
| Mentions        | `@jdkato`                         |
| URLs            | `https://github.com/jdkato/prose` |
| Emoticons       | `:-)`, `>:(`, `o_0`, etc.         |


```go
package main

import (
    "fmt"
    "log"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, err := prose.NewDocument("@jdkato, go to http://example.com thanks :).")
    if err != nil {
        log.Fatal(err)
    }

    // Iterate over the doc's tokens:
    for _, tok := range doc.Tokens() {
        fmt.Println(tok.Text, tok.Tag)
        // @jdkato NN
        // , ,
        // go VB
        // to TO
        // http://example.com NN
        // thanks NNS
        // :) SYM
        // . .
    }
}
```

### Segmenting

`prose` includes one of the most accurate sentence segmenters available, according to the [Golden Rules](https://github.com/diasks2/pragmatic_segmenter#comparison-of-segmentation-tools-libraries-and-algorithms) created by the developers of the `pragmatic_segmenter`.

| Name                | Language | License   | GRS (English)  | GRS (Other) | Speed†   |
|---------------------|----------|-----------|----------------|-------------|----------|
| Pragmatic Segmenter | Ruby     | MIT       | 98.08% (51/52) | 100.00%     | 3.84 s   |
| prose               | Go       | MIT       | 75.00% (39/52) | N/A         | 0.96 s   |
| TactfulTokenizer    | Ruby     | GNU GPLv3 | 65.38% (34/52) | 48.57%      | 46.32 s  |
| OpenNLP             | Java     | APLv2     | 59.62% (31/52) | 45.71%      | 1.27 s   |
| Standford CoreNLP   | Java     | GNU GPLv3 | 59.62% (31/52) | 31.43%      | 0.92 s   |
| Splitta             | Python   | APLv2     | 55.77% (29/52) | 37.14%      | N/A      |
| Punkt               | Python   | APLv2     | 46.15% (24/52) | 48.57%      | 1.79 s   |
| SRX English         | Ruby     | GNU GPLv3 | 30.77% (16/52) | 28.57%      | 6.19 s   |
| Scapel              | Ruby     | GNU GPLv3 | 28.85% (15/52) | 20.00%      | 0.13 s   |

> † The original tests were performed using a *MacBook Pro 3.7 GHz Quad-Core Intel Xeon E5 running 10.9.5*, while `prose` was timed using a *MacBook Pro 2.9 GHz Intel Core i7 running 10.13.3*.

```go
package main

import (
    "fmt"
    "strings"

    "github.com/jdkato/prose/v2"
)

func main() {
    // Create a new document with the default configuration:
    doc, _ := prose.NewDocument(strings.Join([]string{
        "I can see Mt. Fuji from here.",
        "St. Michael's Church is on 5th st. near the light."}, " "))

    // Iterate over the doc's sentences:
    sents := doc.Sentences()
    fmt.Println(len(sents)) // 2
    for _, sent := range sents {
        fmt.Println(sent.Text)
        // I can see Mt. Fuji from here.
        // St. Michael's Church is on 5th st. near the light.
    }
}
```

### Tagging

`prose` includes a tagger based on Textblob's ["fast and accurate" POS tagger](https://github.com/sloria/textblob-aptagger). Below is a comparison of its performance against [NLTK](http://www.nltk.org/)'s implementation of the same tagger on the Treebank corpus:

| Library | Accuracy | 5-Run Average (sec) |
|:--------|---------:|--------------------:|
| NLTK    |    0.893 |               7.224 |
| `prose` |    0.961 |               2.538 |

(See [`scripts/test_model.py`](https://github.com/jdkato/aptag/blob/master/scripts/test_model.py) for more information.)

The full list of supported POS tags is given below.

| TAG        | DESCRIPTION                               |
|------------|-------------------------------------------|
| `(`        | left round bracket                        |
| `)`        | right round bracket                       |
| `,`        | comma                                     |
| `:`        | colon                                     |
| `.`        | period                                    |
| `''`       | closing quotation mark                    |
| ``` `` ``` | opening quotation mark                    |
| `#`        | number sign                               |
| `$`        | currency                                  |
| `CC`       | conjunction, coordinating                 |
| `CD`       | cardinal number                           |
| `DT`       | determiner                                |
| `EX`       | existential there                         |
| `FW`       | foreign word                              |
| `IN`       | conjunction, subordinating or preposition |
| `JJ`       | adjective                                 |
| `JJR`      | adjective, comparative                    |
| `JJS`      | adjective, superlative                    |
| `LS`       | list item marker                          |
| `MD`       | verb, modal auxiliary                     |
| `NN`       | noun, singular or mass                    |
| `NNP`      | noun, proper singular                     |
| `NNPS`     | noun, proper plural                       |
| `NNS`      | noun, plural                              |
| `PDT`      | predeterminer                             |
| `POS`      | possessive ending                         |
| `PRP`      | pronoun, personal                         |
| `PRP$`     | pronoun, possessive                       |
| `RB`       | adverb                                    |
| `RBR`      | adverb, comparative                       |
| `RBS`      | adverb, superlative                       |
| `RP`       | adverb, particle                          |
| `SYM`      | symbol                                    |
| `TO`       | infinitival to                            |
| `UH`       | interjection                              |
| `VB`       | verb, base form                           |
| `VBD`      | verb, past tense                          |
| `VBG`      | verb, gerund or present participle        |
| `VBN`      | verb, past participle                     |
| `VBP`      | verb, non-3rd person singular present     |
| `VBZ`      | verb, 3rd person singular present         |
| `WDT`      | wh-determiner                             |
| `WP`       | wh-pronoun, personal                      |
| `WP$`      | wh-pronoun, possessive                    |
| `WRB`      | wh-adverb                                 |

### NER

`prose` v2.0.0 includes a much improved version of v1.0.0's chunk package, which can identify people (`PERSON`) and geographical/political Entities (`GPE`) by default.

```go
package main

import (
    "github.com/jdkato/prose/v2"
)

func main() {
    doc, _ := prose.NewDocument("Lebron James plays basketball in Los Angeles.")
    for _, ent := range doc.Entities() {
        fmt.Println(ent.Text, ent.Label)
        // Lebron James PERSON
        // Los Angeles GPE
    }
}
```

However, in an attempt to make this feature more useful, we've made it straightforward to train your own models for specific use cases. See [Prodigy + `prose`: Radically efficient machine teaching *in Go*](https://medium.com/@errata.ai/prodigy-prose-radically-efficient-machine-teaching-in-go-93389bf2d772) for a tutorial.


prose

Go library for performing Unicode Text Segmentation as described in [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/)

# segment

[![Tests](https://github.com/blevesearch/segment/workflows/Tests/badge.svg?branch=master&event=push)](https://github.com/blevesearch/segment/actions?query=workflow%3ATests+event%3Apush+branch%3Amaster)

A Go library for performing Unicode Text Segmentation
as described in [Unicode Standard Annex #29](http://www.unicode.org/reports/tr29/)

## Features

* Currently only segmentation at Word Boundaries is supported.

## License

Apache License Version 2.0

## Usage

The functionality is exposed in two ways:

1.  You can use a bufio.Scanner with the SplitWords implementation of SplitFunc.
The SplitWords function will identify the appropriate word boundaries in the input
text and the Scanner will return tokens at the appropriate place.

		scanner := bufio.NewScanner(...)
		scanner.Split(segment.SplitWords)
		for scanner.Scan() {
			tokenBytes := scanner.Bytes()
		}
		if err := scanner.Err(); err != nil {
			t.Fatal(err)
		}

2.  Sometimes you would also like information returned about the type of token.
To do this we have introduce a new type named Segmenter.  It works just like Scanner
but additionally a token type is returned.

		segmenter := segment.NewWordSegmenter(...)
		for segmenter.Segment() {
			tokenBytes := segmenter.Bytes())
			tokenType := segmenter.Type()
		}
		if err := segmenter.Err(); err != nil {
			t.Fatal(err)
		}

## Choosing Implementation

By default segment does NOT use the fastest runtime implementation.  The reason is that it adds approximately 5s to compilation time and may require more than 1GB of ram on the machine performing compilation.

However, you can choose to build with the fastest runtime implementation by passing the build tag as follows:

		-tags 'prod'

## Generating Code

Several components in this package are generated.

1.  Several Ragel rules files are generated from Unicode properties files.
2.  Ragel machine is generated from the Ragel rules.
3.  Test tables are generated from the Unicode test files.

All of these can be generated by running:

		go generate

## Fuzzing

There is support for fuzzing the segment library with [go-fuzz](https://github.com/dvyukov/go-fuzz).

1.  Install go-fuzz if you haven't already:

		go get github.com/dvyukov/go-fuzz/go-fuzz
		go get github.com/dvyukov/go-fuzz/go-fuzz-build

2.  Build the package with go-fuzz:

		go-fuzz-build github.com/blevesearch/segment

3.  Convert the Unicode provided test cases into the initial corpus for go-fuzz:

		go test -v -run=TestGenerateWordSegmentFuzz -tags gofuzz_generate

4.  Run go-fuzz:

		go-fuzz -bin=segment-fuzz.zip -workdir=workdir

## Status


[![Build Status](https://travis-ci.org/blevesearch/segment.svg?branch=master)](https://travis-ci.org/blevesearch/segment)

[![Coverage Status](https://img.shields.io/coveralls/blevesearch/segment.svg)](https://coveralls.io/r/blevesearch/segment?branch=master)

[![GoDoc](https://godoc.org/github.com/blevesearch/segment?status.svg)](https://godoc.org/github.com/blevesearch/segment)

segment

Sentence tokenizer: converts text into a list of sentences.

[![release](https://github.com/neurosnap/sentences/actions/workflows/release.yml/badge.svg)](https://github.com/neurosnap/sentences/actions/workflows/release.yml)
[![GODOC](https://godoc.org/github.com/nathany/looper?status.svg)](https://godoc.org/github.com/neurosnap/sentences)
![MIT](https://img.shields.io/packagist/l/doctrine/orm.svg)
[![Go Report Card](https://goreportcard.com/badge/github.com/neurosnap/sentences)](https://goreportcard.com/report/github.com/neurosnap/sentences)

# Sentences - A command line sentence tokenizer

This command line utility will convert a blob of text into a list of sentences.

* [Demo](https://sentences-231000.appspot.com/)
* [Docs](https://godoc.org/github.com/neurosnap/sentences)

## Features

* Supports multiple languages (english, czech, dutch, estonian, finnish,
  german, greek, italian, norwegian, polish, portuguese, slovene, and turkish)
* Zero dependencies
* Extendable
* Fast

## Install

### arch

[aur](https://aur.archlinux.org/packages/sentences-bin)

### mac

```
brew tap neurosnap/sentences
brew install sentences
```

### other

Or you can find the pre-built binaries on [the github
releases page](https://github.com/neurosnap/sentences/releases).

### using golang

```
go get github.com/neurosnap/sentences
go install github.com/neurosnap/sentences/cmd/sentences
```

## Command

![Command line](sentences.gif?raw=true)

## Get it

```
go get github.com/neurosnap/sentences
```

## Use it

```Go
import (
    "fmt"
    "os"

    "github.com/neurosnap/sentences"
)

func main() {
    text := `A perennial also-ran, Stallings won his seat when longtime lawmaker David Holmes
    died 11 days after the filing deadline. Suddenly, Stallings was a shoo-in, not
    the long shot. In short order, the Legislature attempted to pass a law allowing
    former U.S. Rep. Carolyn Cheeks Kilpatrick to file; Stallings challenged the
    law in court and won. Kilpatrick mounted a write-in campaign, but Stallings won.`

    // download the training data from this repo (./data) and save it somewhere
    b, _ := os.ReadFile("./path/to/english.json")

    // load the training data
    training, _ := sentences.LoadTraining(b)

    // create the default sentence tokenizer
    tokenizer := sentences.NewSentenceTokenizer(training)
    sentences := tokenizer.Tokenize(text)

    for _, s := range sentences {
        fmt.Println(s.Text)
    }
}
```

## English

This package attempts to fix some problems I noticed for english.

```Go
import (
    "fmt"

    "github.com/neurosnap/sentences/english"
)

func main() {
    text := "Hi there. Does this really work?"

    tokenizer, err := english.NewSentenceTokenizer(nil)
    if err != nil {
        panic(err)
    }

    sentences := tokenizer.Tokenize(text)
    for _, s := range sentences {
        fmt.Println(s.Text)
    }
}
```

## Contributing

I need help maintaining this library.  If you are interested in contributing
to this library then please start by looking at the [golden-rules](https://github.com/neurosnap/sentences/tree/golden-rule) branch which
tests the [Golden Rules](https://github.com/diasks2/pragmatic_segmenter/blob/master/README.md#the-golden-rules)
for english sentence tokenization created by the [Pragmatic Segmenter](https://github.com/diasks2/pragmatic_segmenter)
library.

Create an issue for a particular failing test and submit an issue/PR.

I'm happy to help anyone willing to contribute.

## Customize

`sentences` was built around composability, most major components of this package
can be extended.

Eager to make ad-hoc changes but don't know how to start?
Have a look at `github.com/neurosnap/sentences/english` for a solid example.

## Notice

I have not tested this tokenizer in any other language besides English.  By default
the command line utility loads english. I welcome anyone willing to test the
other languages to submit updates as needed.

A primary goal for this package is to be multilingual so I'm willing to help in
any way possible.

This library is a port of the [nltk's](http://www.nltk.org) punkt tokenizer.

## A Punkt Tokenizer

An unsupervised multilingual sentence boundary detection library for golang.
The way the punkt system accomplishes this goal is through training the tokenizer
with text in that given language.  Once the likelihoods of abbreviations, collocations,
and sentence starters are determined, finding sentence boundaries becomes easier.

There are many problems that arise when tokenizing text into sentences, the primary
issue being abbreviations.  The punkt system attempts to determine whether a  word
is an abbreviation, an end to a sentence, or even both through training the system with text
in the given language.  The punkt system incorporates both token- and type-based
analysis on the text through two different phases of annotation.

[Unsupervised multilingual sentence boundary detection](http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=BAE5C34E5C3B9DC60DFC4D93B85D8BB1?doi=10.1.1.85.5017&rep=rep1&type=pdf)

## Performance

Using [Brown Corpus](http://www.hit.uib.no/icame/brown/bcm.html) which is annotated American English
text, we compare this package with other libraries across multiple programming languages.

|Library    | Avg Speed (s, 10 runs) | Accuracy (%)
|:----------|:----------------------:|:-----------:
| Sentences | 1.96                   | 98.95
| NLTK      | 5.22                   | 99.21


sentences

The shamoji is word filtering package written in Go.

# shamoji

[![GitHub Actions](https://github.com/osamingo/shamoji/workflows/CI/badge.svg?branch=master)](https://github.com/osamingo/shamoji/actions?query=workflow%3ACI+branch%3Amaster)
[![codecov](https://codecov.io/gh/osamingo/shamoji/branch/master/graph/badge.svg)](https://codecov.io/gh/osamingo/shamoji)
[![Go Report Card](https://goreportcard.com/badge/github.com/osamingo/shamoji)](https://goreportcard.com/report/github.com/osamingo/shamoji)
[![codebeat badge](https://codebeat.co/badges/9d9fdf3d-0c6d-455f-8444-8399a07d49ae)](https://codebeat.co/projects/github-com-osamingo-shamoji-master)
[![GoDoc](https://godoc.org/github.com/osamingo/shamoji?status.svg)](https://godoc.org/github.com/osamingo/shamoji)
[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/osamingo/shamoji/master/LICENSE)

## About

The shamoji (杓文字) is word filtering package.

## Install

```
$ go get github.com/osamingo/shamoji@latest
```

## Usage

```go
package main

import (
	"fmt"
	"sync"

	"github.com/osamingo/shamoji"
	"github.com/osamingo/shamoji/filter"
	"github.com/osamingo/shamoji/tokenizer"
	"golang.org/x/text/unicode/norm"
)

var (
	o sync.Once
	s *shamoji.Serve
)

func main() {
	yes, word := Contains("我が生涯に一片の悔い無し")
	fmt.Printf("Result: %v, Word: %s", yes, word)
}

func Contains(sentence string) (bool, string) {
	o.Do(func() {
		tok, err := tokenizer.NewKagomeTokenizer(norm.NFKC)
		if err != nil {
			panic(err)
		}
		s = &shamoji.Serve{
			Tokenizer: tok,
			Filer:     filter.NewCuckooFilter("涯に", "悔い"),
		}
	})
	return s.Do(sentence)
}
```

## License

Released under the [MIT License](https://github.com/osamingo/shamoji/blob/master/LICENSE).


shamoji

Stemmer packages for Go programming language. Includes English and German stemmers.

Stemmer package for Go
======================

Stemmer package provides an interface for stemmers and includes English,
German and Dutch stemmers as sub-packages:

 - `porter2` sub-package implements English (Porter2) stemmer as described in
 <http://snowball.tartarus.org/algorithms/english/stemmer.html>

 - `german` sub-package implements German stemmer as described in
 <http://snowball.tartarus.org/algorithms/german/stemmer.html>

 - `dutch` sub-package implements Dutch stemmer as described in
 <http://snowball.tartarus.org/algorithms/dutch/stemmer.html>


Installation
-------------

English stemmer:

 go get github.com/dchest/stemmer/porter2

German stemmer:

 go get github.com/dchest/stemmer/german

Dutch stemmer:

 go get github.com/dchest/stemmer/dutch

This will also install the top-level `stemmer` package.

Example
-------

 import (
 "github.com/dchest/stemmer/porter2"
 "github.com/dchest/stemmer/german"
 "github.com/dchest/stemmer/dutch"
 )

 // English.
 eng := porter2.Stemmer
 eng.Stem("delicious") // => delici
 eng.Stem("deliciously") // => delici

 // German.
 ger := german.Stemmer
 ger.Stem("abhängen") // => abhang
 ger.Stem("abhängiger") // => abhang

 // Dutch.
 dt := dutch.Stemmer
 dt.Stem("lichamelijke") // => licham
 dt.Stem("opglimpende") // => opglimp

Tests
-----

Included `test_output.txt` and `test_voc.txt` are from the referenced original
implementations, used only when running tests with `go test`.


License
-------

2-clause BSD-like (see LICENSE and AUTHORS files).

stemmer

Go package for n-gram based text categorization, with support for utf-8 and raw text.

textcat

Another i18n pkg for golang, which follows GNU gettext style and supports .po/.mo files: `t.T (gettext)`, `t.N (ngettext)`, etc. And it contains a cmd tool [xtemplate](https://github.com/youthlin/t/blob/main/cmd/xtemplate), which can extract messages as a pot file from text/html template.

# t
t: a translation util for go, inspired by GNU gettext  
t: GNU gettext 的 Go 语言实现，Go 程序的国际化工具  
[![sync-to-gitee](https://github.com/youthlin/t/actions/workflows/gitee.yaml/badge.svg)](https://github.com/youthlin/t/actions/workflows/gitee.yaml)
[![test](https://github.com/youthlin/t/actions/workflows/test.yaml/badge.svg)](https://github.com/youthlin/t/actions/workflows/test.yaml)
[![codecov](https://codecov.io/gh/youthlin/t/branch/main/graph/badge.svg?token=6RyU5nb3YT)](https://codecov.io/gh/youthlin/t)
[![Go Report Card](https://goreportcard.com/badge/github.com/youthlin/t)](https://goreportcard.com/report/github.com/youthlin/t)
[![Go Reference](https://pkg.go.dev/badge/github.com/youthlin/t.svg)](https://pkg.go.dev/github.com/youthlin/t)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fyouthlin%2Ft.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fyouthlin%2Ft?ref=badge_shield)


## Install 安装

```bash
go get -u github.com/youthlin/t
```

go.mod
```go
require (
    github.com/youthlin/t latest
)
```

Gitee 镜像：[gitee.com/youthlin/gottext](gitee.com/youthlin/gottext) (gottext: go + gettext)
> 鸣谢仓库同步工具：https://github.com/Yikun/hub-mirror-action
```
// 使用 gitee 镜像
// go.mod:
replace github.com/youthlin/t latest => gitee.com/youthlin/gottext latest
```


## Usage 使用
```go
path := "path/to/filename.po" // .po, .mo file
path = "path/to/po_mo/dir"    // or dir.
// (mo po 同名的话，po 后加载，会覆盖 mo 文件，因为 po 是文本文件，方便修改生效)
// 1 bind domain 绑定翻译文件
t.Load(path) // 默认绑定在 default 域 会自动搜索路径下的文件，读取 po/mo 里的语言标签进行注册
t.Bind("my-domain", path) // 或者指定Ø文本域
// 2 set current domain 设置使用的文本域
t.SetDomain("my-domain")
// 3 set user language 设置用户语言
// t.SetLocale("zh_CN")
t.SetLocale(t.MostMatchLocale()) // empty to use system default
// 4 use the gettext api 使用 gettext 翻译接口
fmt.Println(t.T("Hello, world"))
fmt.Println(t.T("Hello, %v", "Tom"))
fmt.Println(t.N("One apple", "%d apples", 1)) // One apple
fmt.Println(t.N("One apple", "%d apples", 2)) // %d apples
// t.N(single, plural, n, args...)
// n => used to choose single or plural
// args => to format
// args... supported, used to format string
// 支持传入 args... 参数用于格式化输出
fmt.Println(t.N("One apple", "%d apples", 2, 2)) // 2 apples
fmt.Println(t.N("%[2]s has one apple", "%[2]s has %[1]d apples", 2, 200, "Bob"))
// Bob has 200 apples
t.X("msg_context_text", "msg_id")
t.X("msg_context_text", "msg_id")
t.XN("msg_context_text", "msg_id", "msg_plural", n)
```

## API
```go
T(msgID, args...)
N(msgID, msgIDPlural, n, args...) // and N64
X(msgCTxt, msgID, args...)
XN(msgCTxt, msgID, msgIDPlural, n, args...) // and XN64
D(domain)
L(locale)
// T:  gettext
// N:  ngettext
// X:  pgettext
// XN: npgettext
// D:  with domain
// L:  with locale(language)
```

## Domain 文本域
```go
t.Bind(domain1, path1)
t.Bind(domain2, path2)
t.SetLocale("zh_CN")

t.T("msg_id")           // use default domain

t.SetDomain(domain1)
t.T("msg_id")            // use domain1
t.D(domain2).T("msg_id") // use domain2
t.D("unknown-domain").T("msg_id") // return "msg_id" directly

```

## Language 指定语言
If you are building a web application, you may want each request use diffenrent language, the code below may help you:  
如果你写的是 web 应用而不是 cli 工具，你可能想让每个 request 使用不同的语言，请看：

```go
t.Load(path)

// a) Specify a language 可以指定语言
t.L("zh_CN").T("msg_id")

// b) every one use his own language 每个用户使用他接受的语言
// b.1) server supports 第一步，服务器支持的语言
langs := t.Locales()
// golang.org/x/text/language
// EN: https://blog.golang.org/matchlang
// 中文: https://learnku.com/docs/go-blog/matchlang/6525
var supported []language.Tag
for _, lang =range langs{
    supported = append(supported, language.Make(lang))
}
matcher := language.NewMatcher(supported)
// b.2) user accept 第二步，用户接受的语言
// Judging by the browser header（Accept-Language）
// 根据浏览器标头获取用户语言
// or: userAccept := []language.Tag{ language.Make("lang-code-from-cookie") }
// 或从 cookie 中获取用户语言偏好
userAccept, q, err :=language.ParseAcceptLanguage("zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6")
// b.3) match 第三步，匹配出最合适的
matchedTag, index, confidence := matcher.Match(userAccept...)
// confidence may language.No, language.Low, language.High, language.Exact
// 这里 confidence 是指匹配度，可以根据你的实际需求决定是否使用匹配的语言。
// 如服务器支持 英文、简体中文，如果用户是繁体中文，那匹配度就不是 Exact，
// 这时根据实际需求决定是使用英文，还是简体中文。
userLang := langs[index]
t.L(userLang).T("msg_id")

// with domain, language 同时指定文本域、用户语言
t.D(domain).L(userLang).T("msg_id")
```

> more examples can be find at: [example_test.go](example_test.go)

## How to extract string 提取翻译文本
```bash
# if you use PoEdit, add a extractor
# 如果你使用 PoEdit，在设置-提取器中新增一个提取器
# Language: Go, *.go 语言填 Go 扩展名填 *.go 提取翻译的命令填写
# xgettext -C --add-comments=TRANSLATORS: --force-po -o %o %C %K %F
# 最后的三个输入框分别填写
# -k%k
# %f
# --from-code=%c
# keywords: 关键字这样设置：
# T:1;N:1,2;N64:1,2;X:2,1c;XN:2,3,1c;XN64:2,3,1c
xgettext -C --add-comments=TRANSLATORS: --force-po -kT -kN:1,2 -kX:2,1c -kXN:2,3,1c  *.go
```

## Done 已完成
- ✅ mo file 支持 mo 二进制文件
- ✅ extract from html templates 从模板文件中提取: [xtemplate](cmd/xtemplate/)
```bash
go install github.com/youthlin/t/cmd/xtemplate@latest
```

## Links 链接
- https://www.gnu.org/software/gettext/manual/html_node/index.html
- https://github.com/search?l=Go&q=gettext&type=Repositories
- https://github.com/antlr/antlr4/
- https://blog.gopheracademy.com/advent-2017/parsing-with-antlr4-and-go/
- https://xuanwo.io/2019/12/11/golang-i18n/ (中文)



## License
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fyouthlin%2Ft.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fyouthlin%2Ft?ref=badge_large)

Minimal cgo bindings for [libenca](https://cihar.com/software/enca/), which detects character encodings.

# enca [![Build Status](https://travis-ci.org/endeveit/enca.svg?branch=master)](https://travis-ci.org/endeveit/enca)

This is a minimal cgo bindings for [libenca](http://cihar.com/software/enca/).

If you need to detect the language of a string you can use [guesslanguage](https://github.com/endeveit/guesslanguage) package.

## Supported Go versions

enca is tested against Go 1.0, 1.1, 1.2, 1.3 and tip.

## Usage

Install libenca to your system:
```
$ sudo apt-get install libenca0 libenca-dev
```

Install in your `${GOPATH}` using `go get -u github.com/endeveit/enca`

Then call it:
```go
package main

import (
	"fmt"
	"github.com/endeveit/enca"
)

func main() {
	analyzer, err := enca.New("zh")

	if err == nil {
		encoding, err := analyzer.FromString("美国各州选民今天开始正式投票。据信，", enca.NAME_STYLE_HUMAN)
		defer analyzer.Free()

		// Output:
		// UTF-8
		if err == nil {
			fmt.Println(encoding)
		}
	}
}
```

## Documentation

godoc is available [here](http://godoc.org/github.com/endeveit/enca).


enca

go-unidecode
==============

[![Build Status](https://github.com/mozillazg/go-unidecode/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/mozillazg/go-unidecode/actions/workflows/ci.yml)
[![Coverage Status](https://coveralls.io/repos/mozillazg/go-unidecode/badge.svg?branch=master)](https://coveralls.io/r/mozillazg/go-unidecode?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/mozillazg/go-unidecode)](https://goreportcard.com/report/github.com/mozillazg/go-unidecode)
[![GoDoc](https://godoc.org/github.com/mozillazg/go-unidecode?status.svg)](https://godoc.org/github.com/mozillazg/go-unidecode)

ASCII transliterations of Unicode text. Inspired by [python-unidecode](https://github.com/avian2/unidecode).


Installation
------------

```
go get github.com/mozillazg/go-unidecode
```

Install CLI tool:

```
$ go install github.com/mozillazg/go-unidecode/cmd/unidecode@latest

$ unidecode 北京kožušček
Bei Jing kozuscek
```


Documentation
--------------

API documentation can be found here:
https://godoc.org/github.com/mozillazg/go-unidecode


Usage
------

```go
package main

import (
	"fmt"
	"github.com/mozillazg/go-unidecode"
)

func main() {
	s := "abc"
	fmt.Println(unidecode.Unidecode(s))
	// Output: abc

	s = "北京"
	fmt.Println(unidecode.Unidecode(s))
	// Output: Bei Jing

	s = "kožušček"
	fmt.Println(unidecode.Unidecode(s))
	// Output: kozuscek
}
```


go-unidecode

Provides one-way string transliteration with supporting of language-specific transliteration rules.

Golang text Transliterator
==============

[![Build Status](https://travis-ci.com/alexsergivan/transliterator.svg?branch=master)](https://travis-ci.com/github/alexsergivan/transliterator)
[![Coverage Status](https://coveralls.io/repos/github/alexsergivan/transliterator/badge.svg)](https://coveralls.io/github/alexsergivan/transliterator)
[![Go Report Card](https://goreportcard.com/badge/github.com/alexsergivan/transliterator)](https://goreportcard.com/report/github.com/alexsergivan/transliterator)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/9b062cd8ba9f4f7f850e167d6966b75b)](https://www.codacy.com/manual/alexsergivan/transliterator?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=alexsergivan/transliterator&amp;utm_campaign=Badge_Grade)


Golang Transliterator provides one-way string transliteration. It takes Unicode text and converts to ASCII characters.
Example use-case: transliterate cyrilic city name to be able to use it in the url ("Київ" ==> "Куiv").

For now, only these languages have specific transliteration rules: DE, DA, EO, RU, BG, SV, HU, HR, SL, SR, NB, UK, MK, CA, BS. For other languages, general ASCII transliteration rules will be applied. Also, this package supports adding custom transliteration rules for your specific use-case. Please check the examples section below.


Installation
------------

```
go get -u github.com/alexsergivan/transliterator
```


Language specific transliteration example
------

```go
package main

import (
	"fmt"
	"github.com/alexsergivan/transliterator"
)

func main() {
	trans := transliterator.NewTransliterator(nil)
	text := "München"
	// Langcode should be provided accrding to ISO 639-1.
	fmt.Println(trans.Transliterate(text, "de")) // Result: Muenchen
	fmt.Println(trans.Transliterate(text, "en")) // Result: Munchen

	anotherText := "你好"
	fmt.Println(trans.Transliterate(anotherText, "")) // Result: Ni Hao

	oneMoreText := "Київ"
	fmt.Println(trans.Transliterate(oneMoreText, "uk")) // Result: Kyiv
	fmt.Println(trans.Transliterate(oneMoreText, "en")) // Result: Kiyiv
	fmt.Println(trans.Transliterate(oneMoreText, "")) // Result: Kiyiv
}
```

Adding of custom Language translitartion rules
------

```go
package main

import (
	"fmt"
	"github.com/alexsergivan/transliterator"
)

func main() {
	customLanguageOverrites := make(map[string]map[rune]string)

	customLanguageOverrites["myLangcode"] = map[rune]string{
		// Ї
		0x407: "CU",
		// и
		0x438: "y",
	}
	trans := transliterator.NewTransliterator(&customLanguageOverrites)
	text := "КиЇв"
	fmt.Println(trans.Transliterate(text, "myLangcode")) // Result: KyCUv

}
```


The Best Go Libraries For Natural Language Processing (27)