Hanlp Tutorial

HanLP 提供开箱即用的 RESTful API 和原生 Python API,它们在为不同场景设计时共享非常相似的接口。

Naming Convention

key Task Chinese
tok Tokenization. Each element is a token. 分词
pos Part-of-Speech Tagging. Each element is a tag. 词性标注
lem Lemmatization. Each element is a lemma. 词干提取
fea Features of Universal Dependencies. Each element is a feature. 词法语法特征
ner Named Entity Recognition. Each element is a tuple of (entity, type, begin, end), where ends are exclusive offsets. 命名实体识别
dep Dependency Parsing. Each element is a tuple of (head, relation) where head starts with index 0 (which is ROOT). 依存句法分析
con Constituency Parsing. Each list is a bracketed constituent. 短语成分分析
srl Semantic Role Labeling. Similar to ner, each element is a tuple of (arg/pred, label, begin, end), where the predicate is labeled as PRED. 语义角色标注
sdp Semantic Dependency Parsing. Similar to dep, however each token can have any number (including zero) of heads and corresponding relations. 语义依存分析
amr Abstract Meaning Representation. Each AMR graph is represented as list of logical triples. See AMR guidelines. 抽象意义表示