Japanese text translation webapp

Post by **KLSymph** » Fri Apr 22, 2016 11:33 am

Congratulations to Baka-Tsuki on ten years of translations! In commemoration, I would like to offer the community a tool that will, in the years to come, hopefully ease the job of fellow translators who produce content for everyone's enjoyment.

Japanese Text Translation on Heroku.

This is a single-page web application which takes Japanese text input in the left panel, separates that text into sections, looks up dictionary definitions for the words within, and presents them to you for easy scanning. For example, you can try it out with:

Sample text wrote: ばか！バカ！　馬鹿ー月！

十年目の記念日、おめでとうございます！

...and get an output like:

You can also perform quick dictionary lookups on specific terms using the input on the bottom-right of the page. For example, try 記念日, and see how it's different from the text translation. A more in-depth tutorial on of how to use the tool is available from the information button on the top-right corner of the page.

This tool was based on the WWWJDIC text glossing tool and takes its dictionary, but adds a substantially different user interface. I decided to make the tool after contemplating my experience of using WWWJDIC to translate volume four of Rakudai Kishi no Eiyuutan over the course of one and a half months with daily updates, an experience that was predominantly about balancing the effort of reading gigantic blocks of JDIC output from submitting large passages (try using the sample text above in WWWJDIC, and imagine that going on for pages) against the effort of manually splitting passage into more manageable pieces and then submitting them individually (a workflow which involves four active windows: the text to be copied from, the translation to be written into, WWWJDIC itself, and the original source for error checking). After a period of philosophical meditation, I concluded that this experience was total flaming garbage (although the readers seemed happy about the daily updates), so I examined my workflow and tried to identify where it could be facilitated. This tool is a result of that exercise, and I'd like to spread the benefits to the rest of the community. Please try out the version above and see if it does benefit you.

Technical matters: Heroku, the cloud platform on which the application is deployed, imposes rather severe limitations in exchange for being free, making the application's performance considerably worse. For example:

It may take you a long time to even go to the link I provided. This is because Heroku sleeps the application when it is idle for a while, and now it must spin up again, and until that happens the page is blank. Please wait.
It may happen that a translation completely fails. The cause is probably either your text is so long that an atomic part of the query timed out on Heroku's enforced thirty-second limit, or Heroku just randomly dropped your query. This behavior is unfortunately not predictable, especially under the load of multiple users, so all I can say is try it again, keeping your text to around sub-chapter length. Lookups are usually not penalized by this, because they are much simpler.
It generally takes a long time to translate a text, much longer than the equivalent text in WWWJDIC. Heroku is slower than the application is on localhost; it takes about two minutes to translate a test case of about ten pages on Heroku, while I've successfully translated an entire 300+ page light novel in under a minute using localhost (though that suffers other technical issues which make it currently unfeasible for general deployment). In particular, starting a translation tends to have the progress spinner begin, but the progress bar does not move for a long time. This is because Heroku is taking a long time to perform the text pre-processing before the actual translation measured by the progress bar, but yes the app is actually working (unless Heroku fails the pre-processing step, see above).

For this and a list of other technical concerns, please consider the link I provided as a demonstration of the tool's functionality while I continue to improve its design. Ideally, it would be an honor if Baka-Tsuki could consider adopting this webapp as an official community tool and serve a copy of the application itself. I may even resort to just offering it as a downloadable local program.

But that would depend on people actually using and liking it (or at least the idea of it, if not the performance of the Heroku deployment). And on the administration not hammering it as quasi-machine translation. I've used this program's prototypes for translating a number of manga chapters, and it makes my work more pleasant. I hope it helps others as well.

Post by **Shadowys** » Mon May 09, 2016 2:26 am

Great tool you have there! Though it does look more like a dictionary than a translator

Post by **KLSymph** » Mon May 09, 2016 6:11 am

Thanks for the feedback. I admit it's not the most immediately accurate title, but the webapp is slightly more than a dictionary since it reads text and splits it into words based on grammar (in a very limited way). It's not a translator in the sense that it translates stuff for you, but it's translation in the sense that someone can use it as a major part of the translation process. Maybe I should call it glossing like WWJDIC, but that's a technical term and in common usage has a negative connotation, and the definition of gloss uses "translation" anyway.

Post by **Shadowys** » Mon May 09, 2016 6:46 am

Is it possible to list possible variations of the splitting? Since some may have pretty complicated ways to split a sentence depending on context.

Post by **KLSymph** » Mon May 09, 2016 9:18 am

I use Kuromoji for lexical analysis, but unfortunately Kuromoji documentation is not readily available, and as far as I can tell, Kuromoji only splits any particular sentence in one single way. A way that, incidentally, is way more grammatically pedantic than a LN translator would like. For example, it will tokenize します into the tokens し and ます rather than recognizing it as a single-verb variation of する, which is certainly not wrong but spectacularly unhelpful for actual translations (try searching for し in a dictionary), forcing me to do complicated post-processing to hook up auxiliary verbs onto main verbs and other nonsense to get even common words to work.

To answer your question, it might be possible to get tokenization variants, but you probably won't want them once you see them.

If only I could get my hands on WWWJDIC's lexer.

Post by **KLSymph** » Sun Jun 05, 2016 10:28 pm

After considerable redesign for the last two months, I've deployed a new release of the webapp onto Heroku. The major differences are:

The original 10-page test case was sped up from "more than two minutes" to "less than thirty seconds".
Failing on long atomic operations should no longer be a problem.
Definitions of common postpositional particles ("wa", "ga", "wo", etc.) are no longer displayed, because they're extremely common and can be extremely ambiguous, making for frequent definition dumps of low usefulness.
The translate and lookup features were renamed to document translation and term translation, because...
A new feature, line translation, was added for translations of single lines (useful for manga work, for example).
Progress bar indicator and saving of standalone translation result were removed, because I had to heavily change the data handing, and I need to rethink how these two features fit in the user workflow.

The progress bar and savefile functionality are not very useful on localhost, as the webapp will finish a light novel chapter in seconds. But as usual, Heroku is much slower, so it's likely that I still need to find a way to add the progress bar back in, though it would be difficult because the data flow of the app doesn't have equally spaced checkpoints for progress. The save standalone results feature may also be needed if Heroku insists on being so slow as well, though I have not had any feedback on whether people use it, and implementing it causes interface problems.

For testing, I used this app to translate two volumes of Madan no Ou manga this week, and it was a pretty decent experience. The localhost version, anyway. An executable release is looking more and more tempting, but it does force a pile of complication on the user.

ばか！バカ！　馬鹿ー月！

Japanese text translation webapp

Japanese text translation webapp

Re: Japanese text translation webapp

Re: Japanese text translation webapp

Re: Japanese text translation webapp

Re: Japanese text translation webapp

Re: Japanese text translation webapp