parser API reference

class taxadb2.parser.Accession2TaxidParser(acc_file=None, chunk=500, fast=False, **kwargs)

Main parser class for nucl_xxx_accession2taxid files

This class is used to parse accession2taxid files.

Parameters:
  • acc_file (str) – File to parse

  • chunk (int) – Chunk insert size. Default 500

  • fast (bool) – Directly load accession into database, do not check existence.

__init__(acc_file=None, chunk=500, fast=False, **kwargs)

Base class

accession2taxid(acc2taxid=None, chunk=None)

Parses the accession2taxid files

This method parses the accession2taxid file, build a dictionary,

stores it in a list and yield for insertion in the database.

{
    'accession': accession_id_from_file,
    'taxid': associated_taxonomic_id
}
Parameters:
  • acc2taxid (str) – Path to acc2taxid input file (gzipped)

  • chunk (int) – Chunk size of entries to gather before yielding. Default 500 (set at object construction)

Yields:

list – Chunk size of read entries

set_accession_file(acc_file)

Set the accession file to use

Parameters:

acc_file (str) – File to be set

Returns:

True

Raises:

SystemExit – If acc_file is None or not a file (check_file)

class taxadb2.parser.TaxaDumpParser(nodes_file=None, names_file=None, merged_file=None, dbtype=None, dbname=None, hostname=None, password=None, port=None, username=None, config=None, **kwargs)

Main parser class for ncbi taxdump files

This class is used to parse NCBI taxonomy files found in taxdump.gz archive

Parameters:
  • nodes_file (str) – Path to nodes.dmp file

  • names_file (str) – Path to names.dmp file

  • merged_file (str) – Path to merged.dmp file

__init__(nodes_file=None, names_file=None, merged_file=None, dbtype=None, dbname=None, hostname=None, password=None, port=None, username=None, config=None, **kwargs)
resolve_taxid(taxid)

Check if a taxID is deprecated in the database and return the new one.

Parameters:

taxid (int) – The queried TaxID.

Returns:

Updated TaxID if found in DeprecatedTaxID table, otherwise original.

Return type:

int

set_merged_file(merged_file)

Set merged_file

Set the merged file to use (deprecated TaxIDs mapping)

Parameters:

merged_file (str) – Merged file to be set

Returns:

True

Raises:

SystemExit – If merged_file is None or not a file (check_file)

set_names_file(names_file)

Set names_file

Set the accession file to use

Parameters:

names_file (str) – Nodes file to be set

Returns:

True

Raises:

SystemExit – If names_file is None or not a file (check_file)

set_nodes_file(nodes_file)

Set nodes_file

Set the accession file to use

Parameters:

nodes_file (str) – Nodes file to be set

Returns:

True

Raises:

SystemExit – If nodes_file is None or not a file (check_file)

taxdump(nodes_file=None, names_file=None, merged_file=None)

Parse .dmp files and return both taxa and deprecated taxid data

Parse nodes.dmp, names.dmp, and merged.dmp files and return: - A list of taxonomic information to insert into the Taxa table - A list of deprecated tax ID mappings to insert into the DeprecatedTaxID table

Parameters:
  • nodes_file (str) – Path to nodes.dmp file

  • names_file (str) – Path to names.dmp file

  • merged_file (str) – Path to merged.dmp file

Returns:

  • List of dictionaries to insert into Taxa

  • List of dictionaries to insert into DeprecatedTaxID

Return type:

tuple

class taxadb2.parser.TaxaParser(verbose=False)

Base parser class for taxonomic files

__init__(verbose=False)

Base class

__weakref__

list of weak references to the object (if defined)

static cache_merged_taxids()

Load merged tax IDs data from the DeprecatedTaxID table into a dictionary

Returns:

Dictionary mapping old_taxid to new_taxid

Return type:

data (dict)

static cache_taxids()

Load data from taxa table into a dictionary

Returns:

Data from taxa table mapped as dictionary

Return type:

data (dict)

static check_file(element)

Make some check on a file

This method is used to check an element is a real file.

Parameters:

element (type) – File to check

Returns:

True

Raises:
  • SystemExit – if element file does not exist

  • SystemExit – if element is not a file