Link Grammar Parser

by Davy Temperley, John Lafferty and Daniel Sleator
Maintained and extended by Linas Vepstas - <linasvepstas@gmail.com>, Dom Lachowicz - <domlachowicz@gmail.com>, the Open Cognition project and Abiword.


January, 2017: link-grammar 5.3.14 released! See below for a description of recent changes.

The 5.0.0 version of Link Grammar now uses a new license: the LGPL v2.1 license. Older versions remain available under the BSD license. This license change was made to allow greater participation in the project.

The new version includes the Persian and Arabic systems, which were previously distributed separately. It also includes prototype, experimental dictionaries for Lithuanian, Indonesian, Vietnamese, Hebrew and Turkish. In addition, the programming interfaces for python and ocaml are now integrated, joining those for java and common lisp. A shell script to run the JSON network parse server is included.

What is Link Grammar?

The Link Grammar Parser is a syntactic parser of English, Russian, Arabic and Persian (and other languages as well), based on Link Grammar, an original theory of syntax and morphology. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a "constituent" (HPSG style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.). The RelEx extension provides Stanford-style Dependency Grammar output.

The theory of Link Grammar parsing, and the original version of the parser was created in 1991 by Davy Temperley, John Lafferty and Daniel Sleator, at the time professors of linguistics and computer science at the Carnegie Mellon University. It is the product of decades of academic research into grammar and morphology, and is discussed in numerous publications.

Ongoing development by OpenCog

The practical day-to-day mechanics of maintaining an open-source project made ongoing hosting by Carnegie-Mellon impractical. Thus, this, the main Link Grammar website, is hosted by AbiWord, while the source code is located at GitHub.

Ongoing development of Link Grammar is guided and supported by the Open Cognition project, where the parser plays an important role in the OpenCog natural language processing subsystem. Research and implementation is ongoing; current work includes investigations into unsupervised learning of language, unsupervised learning of morphology, semantically guided parsing and grammatically induced word-sense disambiguation.

A sibling project, RelEx, uses constraint-grammar-like techniques to extract dependency relations and assorted additional linguistic information, including FrameNet-style framing and reference (anaphora) resolution. The dependency output is similar to that of the Stanford parser. It's performance is comparable to the Stanford PCFG parsing model, and is more than three times faster than the Stanford "lexicalized" (factored) model.

For sentence generation, i.e. the creation of grammatically correct sentences from a bag of semantic relations, the microplanner and surface realization (sureal) portion of OpenCog is strongly recommended. A short example is here.

We previously recommended two projects that should now be considered obsolete: NLGen and NLGen2. For your entertainment, they're still listed below: The NLGen and NLGen2 projects provide natural language generation modules, based on, and compatible with link-grammar and RelEx. They implement the SegSim ideas for NL generation. See the following YouTube videos of a virtual dog, showing some of NLGen's capabilities (circa 2009): Demo of Virtual Dog Learning to Play Fetch via Imitation and Reinforcement, AI Virtual Dog's Emotions Fluctuate Based on Its Experiences, Demo of Embodied Anaphora Resolution and AI Virtual Dog Answers Simple Questions about Itself and Its Environment.

Although based on the original Carnegie-Mellon code base, the current Link Grammar package has evolved and changed in certain profound and important ways. There have been innumerable bug fixes, and performance has improved by more than an order of magnitude. Other notable differences include:

  • Actively maintained! New releases typically happen quarterly.
  • Russian dictionaries!
  • Morphology support!
  • Expanded English dictionaries, with many thousands of new words; dramatically improved parse coverage for a wide variety of constructions.
  • Merger of BioLG project changes, for improved parsing of biomedical text. This includes enhanced entity recognition, and precise identification of numeric quantities.
  • New bindings, including Ruby, Python, perl, Lisp, Java and Ocaml.
  • Support for UTF8 Unicode; Arabic and Persian dictionaries; prototype German dictionary.
  • Multi-threading support; a standard build system; pkg-config integration; a CMake config file, dynamic/shared library support; a TCP/IP-based parse server, fixes for non-Linux platforms, including Windows, MacOSX, FreeBSD.

Downloading Link Grammar

The source code to the system can be downloaded as a tarball. The current stable version is Link Grammar 5.3.14 (January, 2017). Older versions are available here. Unstable, development versions are available via the link-grammar github repository. These are not recommended, unless you are a developer, mostly because the require the autoconf infrastructure.


One of the best ways to obtain a solid, easy-to-understand overview of the parser is to review the original papers describing it, here, here, here and here. There is an extensive set of pages documenting the dictionary; specifically, the names of links and their meanings, as well as how to write new rules. There is also a short primer for creating dictionaries for new languages. The documentation for the programming API is here. Documentation for additions made in the 4.0 release is on the improvements page. A fairly comprehensive bibliography of papers written before 2004 is here (mirror).

Mailing Lists

The mailing list for Link Grammar discussion is at the link-grammar google group.

Subscribe to link-grammar:

Enter email:

Linguistic Disclaimer

Link Grammar is a natural language parser, not a human-level artificial general intelligence. This means that there are many sentences that it cannot parse correctly, or at all. There are entire classes of speech and writing that it cannot handle, including twitter posts, IRC chat logs, Valley-girl basilect, Old and Middle English, stock-market listings and raw HTML dumps.

Link Grammar works best with "newspaper English", as taught to and written by those educated in American colleges: standard-sized sentences, with good grammar, proper punctuation, and correct capitalization. Link Grammar has difficulties with the following types of textual input:

  • Phrases (that are not a part of a complete sentence).
  • Twitter posts. These tend to be sentence fragments, often lacking proper grammatical structure.
  • Any text containing a large number of spelling errors.
  • "Registers", such as newspaper headlines, where determiners are omitted; for example, "Thieves rob bank."
  • Dialog, stage plays and movie scripts. Such dialog tends to consist of interleaved sentences.
  • Speech-to-text output. Such systems generate large numbers of mis-heard words that, taken at face value cannot be a part of valid sentences. Even such recognition was perfect, spoken English tends not to be as well-constructed or grammatical as written English.
  • Support for British English and Commonwealth English is poor. This includes any English dialects spoken in India, Pakistan, Nigeria, Bangladesh, South Africa, as well as former American protectorates, such as the Phillipines. British and regional spelling of words is missing from the dictionaries.
  • Slang and various regional non-middle-class-American dialects. This includes most dialects spoken by anyone living in economically poor or under-educated geographical regions, whether in urban housing projects or the red-state small-town and rural poor. Self-identifying subgroup dialects are also not handled, such as drug-culture, gang-culture and hacker-culture.
  • Long run-on sentences. These can generate thousands of alternative parses in a combinatorial explosion.
It is hoped that the unsupervised learning of language proposal will be of sufficient power and ability to handle most of these exceptional cases. Work is ongoing.


Ranked in order of maturity.

The main English documentation is here.
A set of Russian dictionaries providing full coverage for the language have been incorporated into the main distribution as of version 4.7.10 (March 2013). An older version, from which these are derived, can be found at http://slashzone.ru/parser/. By Sergey Protasov. Includes link documentation (mirror) and subscript (morphology) documentation (mirror). Russian morpheme dictionaries can be had at http://aot.ru.

Документация по связям и по классам слов доступна в виде списка примеров.

The Persian dictionaries from Jon Dehdari have been incorporated into the main distribution, as of version 5.0.0 (April 2014). This includes a copy of the Persian stemming engine, as significant morphology analysis needs to be performed to parse Persian.
The Arabic dictionaries from Jon Dehdari have been incorporated into the main distribution, as of version 5.0.0 (April 2014). These are derived from the older, original version. [Mirror] These require the Aramorph stemming package, which is included.
A small German dictionary is available as a part of the distribution. It contains roughly one thousand words. A brief description is provided here.
A small Lithuanian prototype dictionary has been created. It contains a few hundred words. A few basic sentences parse just fine; the current version focuses on morphological analysis coupled with grammatical analysis. Documentation is here.

Sukurta yra labai prasta Lietuvių kalbos žodynas; beveik neiks ikį šiol neveikia. Čia dokumentacija.

A small Vietnamese prototype dictionary has been created. It contains several hundred words.
A small Indonesian prototype dictionary has been created. It contains about one hundred words.
A very small Hebrew prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
A very small Kazakh prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
A very small Turkish prototype dictionary has been created. It contains a few dozen words. Almost nothing works correctly (yet).
French, Luthor project
The Luthor project aims to develop a set of scripts to automatically construct Link Grammar linkage dictionaries by mining Wiktionary data. Current efforts are focusing on French. (This project appears to be defunct).

Adjunct Projects

The default distribution for Link Grammar includes bindings for Java, Python, OCaML, Common Lisp, and AutoIt, as well as a SWIG FFI interface file. Additional language bindings, and some related projects, are listed below:

RelEx Semantic Relation Extractor
RelEx is an English-language semantic relationship extractor, built on the Link Parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It will also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. RelEx includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm.
Ruby bindings
Ruby bindings are coordinated at the Ruby-LinkParser website. The code can be found at the ged/link-parser github page.
Perl bindings
The perl bindings, created by Danny Brian, have been updated. See the Lingua-LinkParser page on CPAN. There is also a tutorial written against an older version of the bindings; some details may be different.
Psi Toolkit (Perl)
The Psi Toolkit, an NLP toolkit aimed at linguists and NLP engineers, includes bindings for link-grammar, via perl.
Obsolete Javascript bindings can be found at the dijs/link-grammar github page. Someone, please port these to the latest version!
Pre-parsed Wikipedia
Parsed versions of various texts, including all articles from a May 2008 dump of Wikipedia, as well as a partial parse of an October 2010 dump, are available at http://gnucash.org/linas/nlp/data/

Recent Applications and Publications

The original homepage hosted at the Carnegie Mellon University lists an extensive bibliography (mirror) referencing several dozen older (pre-2004) papers pertaining to the Link Grammar Parser. More recent publications and announcements are listed here.

Recent Changes

Version 5.3.14 (19 Jan 2017)

  • Fix printing widths for Unicode-9 CJK ideographs and emoji's.
  • Fix broken randomization in the "any" language.
  • Add UTF-8 support to the random morpheme splitter (amy).
  • Create an "ady" language for two-part morphology splits.
  • Improved error notification facility (experimental).

Version 5.3.13: (19 November 2016)

Emergency fix: remove accidental dependency on zlib and python.

  • Fix fatal errors w/ zlib-dev and python dependencies.

Version 5.3.12: (17 November 2016)

Notable: Both python2 and python 3 bindings are built by default.

  • Fix bug in 'any' language (opencog/relex/issues/248).
  • Preliminary support for common typos in English.
  • Enable both python2 & python3 bindings by default.
  • Fix locale_t use for the newly introduced Cygwin 2.6.0.
  • Include in the distribution the missing make-check.py (for Windows).
  • Minisat configuration improvements + fix a problem on Gentoo.
  • When using the bundled minisat, link it statically, don't install it.

Version 5.3.11: (28 September 2016)

Notable: A conflict of the bundled version of minisat with the system-provided version is minimzed: LG will now use the system-provided version, if it is available (and not install the bundled version).

  • Re-enable postscript header printing!
  • Cleanup python API (in a non-backwards-compat fashion).
  • Fix certain adverbial uses of "only".
  • Fix some interjective openers to questions.
  • Fix serious error with subject-verb inversion to past participle.
  • Remove most calls to exit() from the library.
  • Update the SAT solver code to use MiniSAT 2.2.
  • Use the system minisat2 library if available, instead of the bundled one.

Version 5.3.10: (14 September 2016)

Notable: Fixes a build-break for OSX! Also, a large restructuring of the English-language dictionaries to handle a greater variety of sentences with "as" and "so" in them.

  • Implement `make installcheck`.
  • Pull #371: Simplification of API when handling disconnected words.
  • Fix SAT parser crashes.
  • Expand default list of Java JDK search paths.
  • Fix python bindings: after timeout, no further parsing is performed.
  • Fix various adverbial, conjunctive uses of "as", "so".
  • Extended list of exclamations.
  • Remove CC link, add VC link, for clauses to coordinating conjunctions.
  • Fixes for the verb "dare", "someone or other", etc.
  • Fix OSX build break, concerning undefined locale_t.
  • Pull #385: Fix ancient bug that made dictionary debugging difficult.

Version 5.3.9: (27 August 2016)

Emergency release to fix a fatal error in the previous release!

  • Pull req #354: Major changes to support Cygwin.
  • Pull req #356: SAT parser bug fix.
  • General python binding cleanup.
  • Fatal error: Unable to open default dictionary.

Version 5.3.8: (15 August 2016)

The big change in this release is the support for python2 and python3 bindings, large improvements in Windows support, and the use of locales in dictionaries, which should help avoid locale-related difficulties (for example, capitalization is locale-dependent; and so mis-set locales break Turkish).

  • Disambiguate "Bob" the given name from "bob" the verb.
  • Pull req #300: Crash while parsing certain Russian sentences.
  • Pull req #301: MSVC compiler error, and warnings.
  • Pull req #304: Python failure when no parses are possible.
  • Pull req #309: Add MSVC14 support, remove MSVC9, MSVC12.
  • Pull req #317: Make Java build reproducible.
  • Remove (obsolete) binreloc support.
  • Enable both python2.7 and python3.4 bindings to be built.
  • Improved Cygwin and MinGW support (as well as improved MSVC support).
  • Dictionaries now specify the appropriate locale.

Version 5.3.7: (7 May 2016)

  • Fix another MacOS build break, regarding library exports.

Version 5.3.6: (1 May 2016)

  • Add missing `parses-quotes-en.txt` file that python tests need.
  • Fix build break related to lg_fgetc when libeditline is missing.

Version 5.3.5: (28 April 2016)

Fix strange Apple Mac OSX behavior.

  • Modified (hacked) Kazakh.
  • MacOS bug fix: fgetc behaves oddly in OSX, see bug #293.

Version 5.3.4: (16 March 2016)

  • Fix broken handling of apostrophe (issue #281).
  • Revamp the README file; describe transitivity.
  • Revised Turkish dictionary from Tatiana Batura, et al.
  • Prototype Kazakh dictionary from Tatiana Batura, et al.
  • Parse priority tweaks for the OpenCog chatbot.
  • Fix Windows printing problem affecting some utf8 codepoints (issue #285).

Version 5.3.3: (23 December 2015)

Fix build break for Apple Mac OSX.

  • Improve support for quoted phrases.
  • Fixes for assorted zero-infinitive speech acts.
  • Add 37 paraphrasing verbs.
  • Add Greek mythological names.
  • A few dozen more common computing terms added to dictionary.
  • Misc coordination and question fixes.
  • Misc abbreviations.
  • Vietnamese dictionaries!
  • Major overhaul of subject-verb inversion.
  • Performance improvements on long sentences. (pull #247)
  • Change default setting of 'islands_ok' back to false (bug #140).
  • Fix for build break on Mac OSX el_capitan w/clang (bug #255).
  • Disable perl bindings by default; use Lingua::LinkParser

Version 5.3.2: (4 December 2015)

Fix build break for Apple Mac OSX.

  • Performance improvements, esp. for long sentences.
  • Use std=c11 (the 2011 C standard) by default.
  • Partial Irish English support.
  • A few dozen common computing terms added to dictionary.
  • Fix for build break on Mac OSX.

Version 5.3.1: (22 November 2015)

Fix build break.

  • Fix build break with SAT solver.

Version 5.3.0: (22 November 2015)

This is a major release of the parser, with many important changes in it. Most fundamentally, the tokenizer has been completely redesigned; the tokenizer is the device that splits sentences in sequences of words and (for non-English languages) morphemes.

Another very important change: The python bindings are completely redesigned, and not in a backwards-compatible way. The new python bindings are much closer to how the parsing process should be thought about in the abstract.

There are also various fixes: the SAT solver is no longer crippled. Assorted performance speedups have been implemented, especially affecting longer sentences. Assorted bugs and cleanup has been performed.

  • Major redesign of the python bindings.
  • Major redesign of sentence tokenization (the "wordgraph" design)
  • Verb 'steal' is optionally transitive.
  • Fixes for misc MSVC warnings.
  • Hebrew dictionary expansion.
  • Enhanced diagram printing, giving more space for link names.
  • Minor work on phonetic agreement for 'a' vs. 'an'.
  • Add ability to histogram the costs of different parses.
  • Improve support for splitting sentences.
  • Change default setting of 'islands_ok' to true.
  • Improve performance on long sentences.
  • Fix rare crash due to memory corruption on long sentences.
  • Random morphology generation can be enabled at runtime.
  • Remove obsolete, unmaintained MacOSX build file.
  • Extensive updates to man page.
  • Fix crash on long sentences (issue #137).
  • Fix a memory leak in language bindings (issue #138).
  • Remove bogus post-processor API function.
  • Fix broken domain letter printing.
  • New regex-file feature - negative regex'es.
  • Correct the handling of moprhology stems with non-LL links.
  • Fix !!LEFT-WALL and !!RIGHT-WALL
  • SAT solver now linked statically.
  • Assorted SAT sovler cleanup and improvements.
  • Performance improvement in fast macher: 15% faster on fixes.batch.
A list of older changes can be found here.


Current versions of the Link Grammar parser software, language dictionaries and documentation are available under the LGPL v2.1 license. Versions prior to 5.0.0 are available under a variant of the BSD license.

Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John Lafferty. All rights reserved.
Copyright (c) 2003 Peter Szolovits
Copyright (c) 2004,2012,2013 Sergey Protasov
Copyright (c) 2006 Sampo Pyysalo
Copyright (c) 2007 Mike Ross
Copyright (c) 2008,2009,2010 Borislav Iordanov
Copyright (c) 2008-2017 Linas Vepstas
Copyright (c) 2014-2017 Amir Plivatsky