public abstract class LexiconImpl extends Object implements Lexicon
This implementation will either read from a straight ASCII file or a binary file. When reading from an ASCII file, you can specify when the input line is tokenized: load, lookup, or never. If you specify 'load', the entire file will be parsed when it is loaded. If you specify 'lookup', the file will be loaded, but the parsing for each line will be delayed until it is referenced and the parsed form will be saved away. If you specify 'never', the lines will parsed each time they are referenced. The default is 'never'. To specify the load type, set the system property as follows:
-Dcom.sun.speech.freetts.lexicon.LexTokenize=load
If a binary file is used, you can also specify whether the new IO package is used. The new IO package is new for JDK1.4, and can greatly improve the speed of loading files. To enable new IO, use the following system property (it is enabled by default):
-Dcom.sun.speech.freetts.useNewIO=true
The implementation also allows users to define their own addenda that will be used in addition to the system addenda. If the user defines their own addenda, it values will be added to the system addenda, overriding any existing elements in the system addenda. To define a user addenda, the user needs to set the following property:
-Dcom.sun.speeech.freetts.lexicon.userAddenda=<URLToUserAddenda>Where <URLToUserAddenda> is a URL pointing to an ASCII file containing addenda entries.
[[[TODO: support multiple homographs with the same part of speech.]]]
Modifier and Type | Field and Description |
---|---|
private Map |
addenda
The addenda.
|
private URL |
addendaURL
URL for the addenda.
|
private boolean |
binary
Type of lexicon to load
|
private char[] |
charBuffer
Temporary place holder.
|
private Map |
compiled
The compiled lexicon.
|
private URL |
compiledURL
URL for the compiled form.
|
private LetterToSound |
letterToSound
The LetterToSound rules.
|
private URL |
letterToSoundURL
URL for the letter to sound rules.
|
private boolean |
loaded
Loaded State of the lexicon
|
private static Map |
loadedCompiledLexicons
A static directory of compiledURL URL objects and associated
already-loaded compiled Map objects.
|
private static int |
MAGIC
Magic number for binary Lexicon files.
|
private static String[] |
NO_PHONES
No phones for this word.
|
private ArrayList |
partsOfSpeech
Parts of Speech.
|
protected boolean |
tokenizeOnLoad
If true, the phone string is replaced with the phone array in
the hashmap when the phone array is loaded.
|
protected boolean |
tokenizeOnLookup
If true, the phone string is replaced with the phone array in
the hashmap when the phone array is first looked up.
|
private boolean |
useNewIO
Use the new IO package?
|
private static int |
VERSION
Current binary file version.
|
Constructor and Description |
---|
LexiconImpl()
Class constructor for an empty Lexicon.
|
LexiconImpl(URL compiledURL,
URL addendaURL,
URL letterToSoundURL,
boolean binary)
Create a new LexiconImpl by reading from the given URLS.
|
Modifier and Type | Method and Description |
---|---|
void |
addAddendum(String word,
String partOfSpeech,
String[] phones)
Adds a word to the addenda.
|
boolean |
compare(LexiconImpl other)
Tests to see if this lexicon is identical to the other for
debugging purposes.
|
private boolean |
compare(Map lex,
Map other)
Determines if the two lexicons are identical for debugging purposes.
|
protected Map |
createLexicon(InputStream is,
boolean binary,
int estimatedSize)
Reads the given input stream as lexicon data and returns the
results in a
Map . |
void |
dumpBinary(String path)
Dumps this lexicon (just the compiled form).
|
private void |
dumpBinaryLexicon(Map lexicon,
String path)
Dumps a binary form of the database.
|
private List |
findPhonemes(Map lexicon)
Returns a list of the unique phonemes in the lexicon.
|
protected static String |
fixPartOfSpeech(String partOfSpeech)
Fixes the part of speech if it is
null . |
protected String[] |
getPhones(Map lexicon,
String wordAndPartOfSpeech)
Gets a phone list for a word from a given lexicon.
|
protected String[] |
getPhones(Map lexicon,
String word,
String partOfSpeech)
Gets a phone list for a word from a given lexicon.
|
protected String[] |
getPhones(String phones)
Turns the phone
String into a String[] ,
using " " as the delimiter. |
String[] |
getPhones(String word,
String partOfSpeech)
Gets the phone list for a given word.
|
String[] |
getPhones(String word,
String partOfSpeech,
boolean useLTS)
Gets the phone list for a given word.
|
private String |
getString(ByteBuffer bb)
Inputs a string from a DataInputStream.
|
private String |
getString(DataInputStream dis)
Inputs a string from a DataInputStream.
|
boolean |
isLoaded()
Determines if this lexicon is loaded.
|
void |
load()
Loads the data for this lexicon.
|
private Map |
loadBinaryLexicon(InputStream is,
int estimatedSize)
Loads the binary lexicon from the given InputStream.
|
private Map |
loadMappedBinaryLexicon(FileInputStream is,
int estimatedSize)
Loads the binary lexicon from the given InputStream.
|
protected Map |
loadTextLexicon(InputStream is,
int estimatedSize)
Reads the given input stream as text lexicon data and returns the
results in a
Map . |
private void |
outString(DataOutputStream dos,
String s)
Outputs a string to a data output stream.
|
protected void |
parseAndAdd(Map lexicon,
String line)
Creates a word from the given input line and add it to the lexicon.
|
void |
removeAddendum(String word,
String partOfSpeech)
Removes a word from the addenda.
|
protected void |
setLexiconParameters(URL compiledURL,
URL addendaURL,
URL letterToSoundURL,
boolean binary)
Sets the lexicon parameters
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
isSyllableBoundary
protected boolean tokenizeOnLoad
protected boolean tokenizeOnLookup
private static final int MAGIC
private static final int VERSION
private URL compiledURL
private URL addendaURL
private URL letterToSoundURL
private LetterToSound letterToSound
private ArrayList partsOfSpeech
private static Map loadedCompiledLexicons
addAddendum()
and removeAddendum
, each lexicon instance has its
own addenda.private boolean loaded
private boolean binary
private char[] charBuffer
private boolean useNewIO
public LexiconImpl(URL compiledURL, URL addendaURL, URL letterToSoundURL, boolean binary)
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a LetterToSound to use if a word cannot
be found in the compiled form or the addendabinary
- if true
, the input streams are binary;
otherwise, they are text.public LexiconImpl()
protected void setLexiconParameters(URL compiledURL, URL addendaURL, URL letterToSoundURL, boolean binary)
compiledURL
- a URL pointing to the compiled lexiconaddendaURL
- a URL pointing to lexicon addendaletterToSoundURL
- a URL pointing to the LetterToSound to usebinary
- if true
, the input streams are binary;
otherwise, they are text.public boolean isLoaded()
public void load() throws IOException
load
in interface Lexicon
IOException
- if errors occur during loadingprotected Map createLexicon(InputStream is, boolean binary, int estimatedSize) throws IOException
Map
.is
- the input streambinary
- if true
, the data is binaryestimatedSize
- the estimated size of the lexiconIOException
- if errors are encountered while reading the dataprotected Map loadTextLexicon(InputStream is, int estimatedSize) throws IOException
Map
.is
- the input streamestimatedSize
- the estimated number of entries of the lexiconIOException
- if errors are encountered while reading the dataprotected void parseAndAdd(Map lexicon, String line)
lexicon
- the lexiconline
- the input textpublic String[] getPhones(String word, String partOfSpeech)
null
. The format is lexicon
dependent. If the part of speech does not matter, pass in
null
.public String[] getPhones(String word, String partOfSpeech, boolean useLTS)
null
is returned. The
partOfSpeech
is implementation dependent, but
null
always matches.protected String[] getPhones(Map lexicon, String word, String partOfSpeech)
null
. The format is
lexicon dependent. If the part of speech does not matter, pass
in null
.lexicon
- the lexiconword
- the word to findpartOfSpeech
- the part of speechnull
protected String[] getPhones(Map lexicon, String wordAndPartOfSpeech)
null
.lexicon
- the lexiconwordAndPartOfSpeech
- word and part of speech concatenated
togethernull
protected String[] getPhones(String phones)
String
into a String[]
,
using " " as the delimiter.phones
- the phonespublic void addAddendum(String word, String partOfSpeech, String[] phones)
addAddendum
in interface Lexicon
word
- the word to findpartOfSpeech
- the part of speechphones
- the phones for the wordpublic void removeAddendum(String word, String partOfSpeech)
removeAddendum
in interface Lexicon
word
- the word to removepartOfSpeech
- the part of speechprivate void outString(DataOutputStream dos, String s) throws IOException
dos
- the data output streams
- the string to outputIOException
- if errors occur during writingprivate String getString(DataInputStream dis) throws IOException
dis
- the data input streamIOException
- if errors occur during readingprivate String getString(ByteBuffer bb) throws IOException
bb
- the input byte bufferIOException
- if errors occur during readingprivate void dumpBinaryLexicon(Map lexicon, String path)
Binary format is:
MAGIC VERSION (int) numPhonemes (String) phoneme0 (String) phoneme1 (String) phonemeN (int) numEntries (String) nameWithPOS (byte) numPhonemes phoneme index 1 phoneme index 2 phoneme index n
Strings are formatted as: (byte) len char0 char1 charN
Limits: Strings: 128 chars
Limits: Strings: 128 phonemes per word
lexicon
- the lexicon to dumppath
- the path to dump the file toprivate Map loadMappedBinaryLexicon(FileInputStream is, int estimatedSize) throws IOException
is
- the InputStream to load the database fromestimatedSize
- estimate of how large the database isMap
containing the lexiconIOException
- if an IO error occursprivate Map loadBinaryLexicon(InputStream is, int estimatedSize) throws IOException
is
- the InputStream to load the database fromestimatedSize
- estimate of how large the database isMap
containing the lexiconIOException
- if an IO error occurspublic void dumpBinary(String path)
path
- the root path to dump it toprivate List findPhonemes(Map lexicon)
lexicon
- the lexicon of interestpublic boolean compare(LexiconImpl other)
other
- the other lexicon to compare toprivate boolean compare(Map lex, Map other)
lex
- this lexother
- the other lexicon to chdprotected static String fixPartOfSpeech(String partOfSpeech)
null
. The
default representation of a null
part of speech
is the number "0".WebARTS Library Licensed Under the GNU - General Public License. Other Libraries licensed under their respective Open Source Licenses