![]() ![]() PostgreSQL provides predefined dictionaries for many languages. ![]() NULL if the dictionary does not recognize the input token If indexing numbers, we can remove some fractional digits to reduce the range of possible numbers, so for example 3.14159265359, 3.1415926, 3.14 will be the same after normalization if only two digits are kept after the decimal point.Ī dictionary is a program that accepts a token as input and returns:Īn array of lexemes if the input token is known to the dictionary (notice that one token can produce more than one lexeme)Ī single lexeme with the TSL_FILTER flag set, to replace the original token with a new token to be passed to subsequent dictionaries (a dictionary that does this is called a filtering dictionary)Īn empty array if the dictionary knows the token, but it is a stop word URL locations can be canonicalized to make equivalent URLs match:Ĭolor names can be replaced by their hexadecimal values, e.g., red, green, blue, magenta -> FF0000, 00FF00, 0000FF, FF00FF Linguistic - Ispell dictionaries try to reduce input words to a normalized form stemmer dictionaries remove word endings Normalization does not always have linguistic meaning and usually depends on application semantics. Aside from improving search quality, normalization and removal of stop words reduce the size of the tsvector representation of a document, thereby improving performance. A successfully normalized word is called a lexeme. Dictionaries are used to eliminate words that should not be considered in a search ( stop words), and to normalize words so that different derived forms of the same word will match. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |