Rpapo's Translation Assistant.

This forum is for Games & Computing related discussion

Moderators: Fringe Security Bureau, Senior Editors, Senior Translators, Alt. Language Translator/Editor, Executive Council, Project Translators, Project Editors

Locked
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Xplorer30 wrote:Haha, it got to line 420 and crashed with:
(null): can't open JUMAN.grammar .

It might have ran out of my comp memory. But it gave me a 4 meg output text file.

Here is the last line in my output file, if it helps:
こと = (n) (1) thing/matter/(2) incident/occurrence/event/something serious/trouble/crisis/(3) circumstances/situation/state of affairs/(4) work/business/affair/(5) after an inflectable word, creates a noun phrase indicating something the speaker does not feel close to/(n-suf) (6) nominalizing suffix/(7) pretending to .../playing make-believe .../(P
Could be memory. Might be something else. In any case I've not seen that error message before. Could you PM me your input file? I can look at it in the morning. I'm signing off for the night.
Xplorer30
Temporal Time Variant Entity
Posts: 250
Joined: Sun Jul 17, 2011 2:22 pm
Favourite Light Novel:

Re: Rpapo's Translation Assistant.

Post by Xplorer30 »

rpapo wrote:Could be memory. Might be something else. In any case I've not seen that error message before. Could you PM me your input file? I can look at it in the morning. I'm signing off for the night.
It's unedited script dump from OCR for a chapter of Accel World. Normally, I only use those to look up hard kanji or strange sentences in the web.

As for sending you the file, not sure how to go about it. Guess I will check out mediafire or something.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Xplorer30 wrote:
rpapo wrote:Could be memory. Might be something else. In any case I've not seen that error message before. Could you PM me your input file? I can look at it in the morning. I'm signing off for the night.
It's unedited script dump from OCR for a chapter of Accel World. Normally, I only use those to look up hard kanji or strange sentences in the web.

As for sending you the file, not sure how to go about it. Guess I will check out mediafire or something.
It appears there is what we call a "memory leak" going on. That is, my code is requesting memory for something, then forgetting about it without actually releasing that memory. The Java people out there would say, "Why don't you use Java, it doesn't leak memory.", but if I were to use Java for this the memory consumption would be higher and it would be 50-100 times slower than it already is. Java is fine for many kinds of programming applications, but does not do very well for something that is "CPU bound" (that is, it doesn't spend much time waiting for anything).

For the moment I would suggest you use the command line parameters to restrict the program to doing about 100-200 lines at a time, like this:

Code: Select all

x64\release\nihongo.exe Accelworld.txt Results.txt 0 199
[/b]
I will get to the bottom of the memory leak eventually (did I mention I'm rather busy right now?).

The ironic thing about this is that my day job is being one of the main programmers working in the deeps of the product BoundsChecker (see http://en.wikipedia.org/wiki/BoundsChecker), which is a tool used for discovering such leaks. I haven't used it against this program, for various reasons.
Xplorer30
Temporal Time Variant Entity
Posts: 250
Joined: Sun Jul 17, 2011 2:22 pm
Favourite Light Novel:

Re: Rpapo's Translation Assistant.

Post by Xplorer30 »

Haha, I was just trying to see how long it takes your program to go through one chapter on my laptop. Guess I shook out a bug, just call me accidental tester.

Anyway, will keep to low number of lines when I actually going to use it, thanks.

This will go back to the startup speed problem again, sigh.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Xplorer30 wrote:Haha, I was just trying to see how long it takes your program to go through one chapter on my laptop. Guess I shook out a bug, just call me accidental tester.

Anyway, will keep to low number of lines when I actually going to use it, thanks.

This will go back to the startup speed problem again, sigh.
It's on my to-do list . . .
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

<ALERT Class="Non-programmer eyes may glaze over here.">

As it turns out, simply saving the generated dictionary the first time it's built, and reloading it on later runs, doesn't really gain us anything. The problem here is in the gyrations the Standard Template Library goes through with the heap in setting up the dictionary map. And then you still have to index it all, which actually takes longer than building or loading the basic dictionary does.

So, what might be done? I may have to devise a non-STL construct for the dictionary that could be saved as a block to disk, and reloaded as a block, and quickly fixed up once loaded. The key thing will be to entirely avoid the use of STL, since it is in all the various heap allocations and releases, constructors, copy constructors, assignment operators and so on that we are killing time. The results of STL are very nice, but getting to where it is all set up so you can use it takes time. No offense to the designers of STL, but STL relies on the heap, and when you are dealing with a construct that winds up using millions of small chunks of the heap, the overhead of using the heap is what kills you.

Mind you, the overhead is in building the dictionary and index, not in using it. I cheat with the teardown: I simply exit the application and let the operating system release the resources.

</ALERT>
Xplorer30
Temporal Time Variant Entity
Posts: 250
Joined: Sun Jul 17, 2011 2:22 pm
Favourite Light Novel:

Re: Rpapo's Translation Assistant.

Post by Xplorer30 »

Hmm, that does sound complicated.

Is there anyway you make this into a windows prog? So you can either feed it small sentences or big chunks. That way I can run the prog so it can do all the indexing, then keep it running and do small translations with it on the go without shutting it down and take time for all the re-indexing on startup. There's no way that would be possible with a DOS prog.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Xplorer30 wrote:Hmm, that does sound complicated.

Is there anyway you make this into a windows prog? So you can either feed it small sentences or big chunks. That way I can run the prog so it can do all the indexing, then keep it running and do small translations with it on the go without shutting it down and take time for all the re-indexing on startup. There's no way that would be possible with a DOS prog.
I have every intention of doing that . . . eventually. What you describe is the start of what I had in mind.

Anyway, for the time being, as indicated by my earlier post, I may still be able to devise a quick dictionary reload.
User avatar
Mystes
Heaven's Blade Successor
Posts: 15932
Joined: Thu Aug 05, 2010 6:54 am
Favourite Light Novel:
Contact:

Re: Rpapo's Translation Assistant.

Post by Mystes »

Is it normal that one I open Nihongo.exe, there's nothing?
Kira0802

#campione at rizon for some #campione discussions~~ And other stuffs.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

An update. We have found that some input text drives the JUMAN parser crazy, consuming memory left and right and eventually blowing up. For some reason, this has never happened with my input texts, but it happened on the very first line of a text that Kira gave me. Gotta hand it to him, that young man has talent!

Anyway, I have updated the program to default to not running the JUMAN analysis, and have added a command-line parameter to request that analysis if you want it. The updated package, with an updated dictionary as well, can be downloaded from http://mywebpages.comcast.net/rpapo/Nihongo.zip.

The program is intended to run from the command line. Once you have unpacked it to whatever directory you want, you run it by typing "x64\release\nihongo.exe", followed by several command line parameters separated by spaces. The first parameter is the name of the input file, which is expected to be in Unicode TXT format (save it from NOTEPAD if there's any doubt), and must exist already. The second parameter is the name of the output file (which will be obliterated). The third parameter will be the number (from zero) of the first line to be processed from the file, and the fourth parameter will be the number (from zero) of the last line to be processed.

There are two optional parameters:
/REGEN = Regenerate the dictionary file, even if the file Dictionary.dat already exists.
/JUMAN = Analyze the text with the JUMAN parser as well, and include that output in the output file.

WARNING: For now, limit yourself to processing no more than 100 lines at a time. There appears to be a memory leak in the program somewhere.

NOTE: The saved dictionary file, Dictionary.dat, is a transitional thing that I am still working on. Hopefully, when that project is done, the program will start and run much quicker than it has been running.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

OK. I spent three days of my spare time in doing it, but I have succeeded in drastically overhauling the innards of the program. Once you've generated the dictionary file, it can be loaded quickly now, and what's more, the whole program is a good deal quicker now. On my system, when generating the dictionary file, it consumes 4.4Gb of memory. The dictionary gets saved in a very raw format, taking 1.2Gb of disk space, but the next time you go to run the program, it loads that raw file only, saving a lot of time and memory space. Running with a pregenerated dictionary file, the program only consumes 1.3Gb of memory.

And it is several times faster than before. 8)

Anyway, the update can be downloaded from the usual place http://mywebpages.comcast.net/rpapo/Nihongo.zip. The first time you run it, it will complain about not finding a valid dictionary file, and will announce that it is generating a new dictionary from EDICT. The next time you run it, it will simply state that it is loading the dictionary file. On my system, loading the prebuilt dictionary takes 2-3 seconds.

Once more, this is a X64 program, and requires an X64 version of Windows to run. The operating instructions are the same as what I gave in my previous post, and the only real change to what I said there is that the file "Dictionary.dat" is no longer something temporary. It is the saved prebuilt dictionary. I would give it to you all prebuilt in the ZIP file, except it makes the ZIP file go from 8Mb to 110Mb in size. I didn't think you would appreciate the added download time...
User avatar
Mystes
Heaven's Blade Successor
Posts: 15932
Joined: Thu Aug 05, 2010 6:54 am
Favourite Light Novel:
Contact:

Re: Rpapo's Translation Assistant.

Post by Mystes »

Thanks for the update, rpapo. I'll try it ASAP. :D
Kira0802

#campione at rizon for some #campione discussions~~ And other stuffs.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Additional note: Please delete the file "Dictionary.dat" before you try to generate the new dictionary. Otherwise, I have found, it will simply keep trying over and over again to generate the new dictionary.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

I've updated the code yet again, though this time only in minor ways:

(1) You don't need to worry about Dictionary.dat anymore. It will take care of itself.
(2) The format of the output listing has been tweaked so it looks better when the JUMAN analysis is not requested. Leaving off the /juman option was resulting in the original text not being output at all, only the kana and romaji versions.
(3) The memory occupied by the dictionary has been locked down, and is treated as read-only.
(4) Some code no longer used has been removed.

The download link is the same as ever . . .
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Rpapo's Translation Assistant.

Post by rpapo »

Oops. Something's not working as well as before. Going to have to look into it . . . :oops:
Locked

Return to “Games & Computing”