Vocabulary¶
-
class
matchup.structure.vocabulary.
Vocabulary
(save, **kwargs)¶ Bases:
object
Crucial data structure that represents and storage all text processing.
Attributes Summary
idf
Get the data structure that represents the IDF weighting keys
Get all keywords presents in vocabulary sanitizer
Sanitizer property getter tf
Get the data structure that represents the TF weighting Methods Summary
documents_with_keywords
(kwds)generate_idf
()import_collection
()This is a function that recover the vocabulary previously generated. import_file
(file_path)Given a file path of a document, this function append this document into some structure, case the path are import_folder
(folder_path)Generalization of import_file(). This function receive a folder path and try to append all documents of index_files
()This function try to process all content of files that have been inserted before, generating maximum_frequencies_per_document
()save
()Persist data structure on disc. Attributes Documentation
-
idf
¶ - Get the data structure that represents the IDF weighting
Returns: IDF object
-
keys
¶ - Get all keywords presents in vocabulary
Returns: list of all keywords
-
sanitizer
¶ - Sanitizer property getter
Returns:
-
tf
¶ - Get the data structure that represents the TF weighting
Returns: TF object
Methods Documentation
-
documents_with_keywords
(kwds: Set[str]) → Set[str]¶
-
generate_idf
()¶
-
import_collection
() → bool¶ - This is a function that recover the vocabulary previously generated.
Returns: boolean flag that indicates success or failure in case the vocabulary has no generated yet.
-
import_file
(file_path: str) → bool¶ - Given a file path of a document, this function append this document into some structure, case the path are correct. The processing of this file can be started running function index_files()
Parameters: file_path – string that represents a relative or absolute path of an txt file Returns: boolean flag that indicates if the file has been identified
-
import_folder
(folder_path: str) → bool¶ Generalization of import_file(). This function receive a folder path and try to append all documents of this folder into some structure. he processing of all this file can be started running function
index_files()Parameters: folder_path – string that represents a relative or absolute path of an folder Returns: boolean flag that indicates if the folder has been identified
-
index_files
() → None¶ - This function try to process all content of files that have been inserted before, generating the vocabulary data structure ready for use.
Returns: None
-
maximum_frequencies_per_document
() → DefaultDict[str, float]¶
-
save
() → bool¶ - Persist data structure on disc.
Returns: boolean flag that indicates if the data structure can be persisted.
-