parser API reference
- class taxadb2.parser.Accession2TaxidParser(acc_file=None, chunk=500, fast=False, **kwargs)
Main parser class for nucl_xxx_accession2taxid files
This class is used to parse accession2taxid files.
- Parameters:
acc_file (
str) – File to parsechunk (
int) – Chunk insert size. Default 500fast (
bool) – Directly load accession into database, do not check existence.
- __init__(acc_file=None, chunk=500, fast=False, **kwargs)
Base class
- accession2taxid(acc2taxid=None, chunk=None)
Parses the accession2taxid files
- This method parses the accession2taxid file, build a dictionary,
stores it in a list and yield for insertion in the database.
{ 'accession': accession_id_from_file, 'taxid': associated_taxonomic_id }
- Parameters:
acc2taxid (
str) – Path to acc2taxid input file (gzipped)chunk (
int) – Chunk size of entries to gather before yielding. Default 500 (set at object construction)
- Yields:
list – Chunk size of read entries
- set_accession_file(acc_file)
Set the accession file to use
- Parameters:
acc_file (
str) – File to be set- Returns:
True
- Raises:
SystemExit – If acc_file is None or not a file (check_file)
- class taxadb2.parser.TaxaDumpParser(nodes_file=None, names_file=None, merged_file=None, dbtype=None, dbname=None, hostname=None, password=None, port=None, username=None, config=None, **kwargs)
Main parser class for ncbi taxdump files
This class is used to parse NCBI taxonomy files found in taxdump.gz archive
- Parameters:
nodes_file (
str) – Path to nodes.dmp filenames_file (
str) – Path to names.dmp filemerged_file (
str) – Path to merged.dmp file
- __init__(nodes_file=None, names_file=None, merged_file=None, dbtype=None, dbname=None, hostname=None, password=None, port=None, username=None, config=None, **kwargs)
- resolve_taxid(taxid)
Check if a taxID is deprecated in the database and return the new one.
- Parameters:
taxid (int) – The queried TaxID.
- Returns:
Updated TaxID if found in DeprecatedTaxID table, otherwise original.
- Return type:
int
- set_merged_file(merged_file)
Set merged_file
Set the merged file to use (deprecated TaxIDs mapping)
- Parameters:
merged_file (
str) – Merged file to be set- Returns:
True
- Raises:
SystemExit – If merged_file is None or not a file (check_file)
- set_names_file(names_file)
Set names_file
Set the accession file to use
- Parameters:
names_file (
str) – Nodes file to be set- Returns:
True
- Raises:
SystemExit – If names_file is None or not a file (check_file)
- set_nodes_file(nodes_file)
Set nodes_file
Set the accession file to use
- Parameters:
nodes_file (
str) – Nodes file to be set- Returns:
True
- Raises:
SystemExit – If nodes_file is None or not a file (check_file)
- taxdump(nodes_file=None, names_file=None, merged_file=None)
Parse .dmp files and return both taxa and deprecated taxid data
Parse nodes.dmp, names.dmp, and merged.dmp files and return: - A list of taxonomic information to insert into the Taxa table - A list of deprecated tax ID mappings to insert into the DeprecatedTaxID table
- Parameters:
nodes_file (
str) – Path to nodes.dmp filenames_file (
str) – Path to names.dmp filemerged_file (
str) – Path to merged.dmp file
- Returns:
List of dictionaries to insert into Taxa
List of dictionaries to insert into DeprecatedTaxID
- Return type:
tuple
- class taxadb2.parser.TaxaParser(verbose=False)
Base parser class for taxonomic files
- __init__(verbose=False)
Base class
- __weakref__
list of weak references to the object (if defined)
- static cache_merged_taxids()
Load merged tax IDs data from the DeprecatedTaxID table into a dictionary
- Returns:
Dictionary mapping old_taxid to new_taxid
- Return type:
data (
dict)
- static cache_taxids()
Load data from taxa table into a dictionary
- Returns:
Data from taxa table mapped as dictionary
- Return type:
data (
dict)
- static check_file(element)
Make some check on a file
This method is used to check an element is a real file.
- Parameters:
element (
type) – File to check- Returns:
True
- Raises:
SystemExit – if element file does not exist
SystemExit – if element is not a file