Sunday 5 January 2014

Zero Ambiguity Version of LANGANA's First Outputs

My Turkish Parser page at Source Forge:
http://sourceforge.net/projects/turkishlanguageparser/files/?source=navbar

This work is protected by the Copyright:
Creative Commons -CC Attribution-NonCommercial-NoDerivatives 4.0 International

This is a Turkish-Parser output of Steinbeck's 'Of Mice and Men'.  I did the
parsing first by using my Turkish Language Parser LANGANA.  I reduced the
ambiguity to less than %5 using LANGANA's various modules.  I reduced the ambiguity
less than 3 percent by using procedural appraoches using editor commands such as
'change'.  I corrected the remainder manually using my Turkish fluency.  I
reduced the ambiguity to 0.00 at the end.

If studied it will be noticed that there are style choices made because of the
dictionary I have used and because of me as a person.

LANGANA has a module to upload this parse output to a MySQL database.  I have
written a very simple JAVA program using queries to answer questions about
the text.

The book is approximately 100 pages long, there are 27470 words.  You can find
statistical data below.

rootParseCount=27470
wordParseCount=27470

verbParseCount=6813
questionParseCount=192
nounParseCount=9120
SpecialNounParseCount=1512
adjectiveParseCount=2735
adverbParseCount=2588
prepositionParseCount=643
pronounParseCount=2065
notIdentifiedParseCount=39
conjunctParseCount=1531
exclamationParseCount=232

Currently, I am looking for financial support, possibly a customer to continue my work.
My intention is to make a product which reads a long text and then answers questions
through internet.  The questions will be read from a web page automatically and the
answers will be posted similarly or sent by e-mail automatically.

Some possibilities for this project follows:
1-There may be an online version of this consequently.
2-Very large text searching, such 1-2 thousand pages and outlining a specific area.
Legal texts, medical texts, etc. specialization.
3-Legal recording scanning.  Telephone recordings, aviation related recodings scanning.
4-Turkish to English translation machine.

Ali Riza SARAL

Copyright condition:
Creative Commons -CC Attribution-NonCommercial-NoDerivatives 4.0 International
Attribution-NonCommercial-NoDerivs
 CC BY-NC-ND

You are free to:
Share — copy and redistribute the material in any medium or format

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Information

NonCommercial — You may not use the material for commercial purposes.
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.