A parser built for the messy reality of Japanese.

akusento is not a dictionary lookup wrapped in a pretty interface. It is a context-aware Japanese pitch accent parser designed to handle the way Japanese actually appears in sentences: conjugated, compounded, attached to particles, interrupted by punctuation, and full of edge cases.

From raw Japanese text to readable pitch accent.

The system combines established morphological analysis with a custom rule engine. The goal is clear and simple: turn difficult pitch accent information into something learners can read directly, without hiding the grammar decisions that produced it.

01

Tokenize the sentence

Input text is split into word objects with MeCab and UniDic. Each token carries the information needed for later decisions: surface form, lemma, reading, part of speech, conjugation details, pitch drop candidates, ruby data, and grammar metadata.

02

Apply context rules

Those tokens are passed through a custom sentence-level rule engine. It checks neighboring words, particles, suffixes, conjugations, compounds, grammar boundaries, and known exceptions before deciding how the accent should behave in context.

03

Render the result

The parsed data is returned to the frontend as structured output and rendered as readable Japanese: furigana, pitch drop marks, devoicing marks, color-coded pattern classes, and clickable explanations for rules that were applied.

Input
Tokenize
Rules Auxiliary rule: ~です lexical pitch drop = 1 previous word drop = 1 condition previous.drop !== 0 true result です → deaccented drop = 0
JSON { html, json_data, pitch_accents, applied_rules }
Output

Measured on real prose, not toy examples.

Pitch accent parsing becomes difficult when the input is no longer a clean dictionary entry. That is why akusento is tested against long-form novel text, where compounds, names, kana spellings, particles, suffixes, and ambiguous readings appear naturally.

99.52% estimated token-level accuracy
87 logged corrections
44,998 characters reviewed
~517 characters per correction

How the benchmark was collected

The current internal benchmark was collected while reading real novel text from . I read the pitch-annotated output generated by akusento while listening to the professionally narrated audiobook version of the same passage. Whenever the parser’s prediction appeared to differ from the narrator’s pronunciation, I paused, researched the case, and only then flagged it as an issue.

Each flagged case was logged with the word, reading, pitch class, character gap, timestamp, and error type, such as reading, pitch, compounding, or chunking. This makes the benchmark both an accuracy estimate and a debugging loop: every correction points to a concrete sentence-level failure mode that can be inspected and improved.

The accuracy estimate uses the same telemetry formula as the development dashboard: characters between logged corrections are summed, converted into an estimated word-token count using a 2.5 characters-per-token heuristic, and compared against the number of logged errors.

Download sanitized benchmark JSON

Sentence context has been removed from the public report for copyright reasons.

What these numbers actually mean: the benchmark is an internal, manually audited development run on real literary text, not a universal claim that every possible input will be 99.52% correct. It is published because transparency matters: the errors are counted, categorized, and used to improve the parser.

Readable output without hiding the complexity.

Context over lookup

Dictionaries are useful, but Japanese is not spoken as isolated entries. akusento is built around sentence context: what comes before, what comes after, and how grammar changes the accent shape.

Explainable rules

When the parser applies a sentence-level rule, the frontend can expose that decision. This makes the tool useful not only as an answer machine, but also as a learning surface.

Real-world edge cases

The rule system is shaped by actual failures: ambiguous readings, counter expressions, compounds, suffix behavior, deaccenting chains, verb-noun ambiguity, and punctuation around grammar boundaries.

Standard Tokyo Japanese

akusento focuses on standard Tokyo-style pitch accent. Proper nouns, dialectal forms, rare literary expressions, and creative spellings can still be difficult, but each logged error becomes a concrete path for improvement.