Utils

padua.utils.build_combined_label(sl, idxs, sep=' ', label_format=None)[source]

Generate a combined label from a list of indexes into sl, by joining them with sep (str).

Parameters:
  • sl (dict of str) – Strings to combine
  • idxs (list of sl keys) – Indexes into sl
  • sep
Returns:

str of combined label

padua.utils.calculate_s0_curve(s0, minpval, maxpval, minratio, maxratio, curve_interval=0.1)[source]

Calculate s0 curve for volcano plot.

Taking an min and max p value, and a min and max ratio, calculate an smooth curve starting from parameter s0 in each direction.

The curve_interval parameter defines the smoothness of the resulting curve.

Parameters:
  • s0float offset of curve from interset
  • minpvalfloat minimum p value
  • maxpvalfloat maximum p value
  • minratiofloat minimum ratio
  • maxratiofloat maximum ratio
  • curve_intervalfloat stepsize (smoothness) of curve generator
Returns:

x, y, fn x,y points of curve, and fn generator

padua.utils.chunks(seq, num)[source]

Separate seq (np.array) into num series of as-near-as possible equal length values.

Parameters:
  • seq (np.array) – Sequence to split
  • num (int) – Number of parts to split sequence into
Returns:

np.array of split parts

padua.utils.find_nearest_idx(array, value)[source]
Parameters:
  • array
  • value
Returns:

padua.utils.get_index_list(l, ms)[source]
Parameters:
  • l
  • ms
Returns:

padua.utils.get_protein_id(s)[source]

Return a shortened string, split on spaces, underlines and semicolons.

Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:s (str or unicode) – protein IDs in MaxQuant format
Returns:string
padua.utils.get_protein_id_list(df, level=0)[source]

Return a complete list of shortform IDs from a DataFrame

Extract all protein IDs from a dataframe from multiple rows containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:
  • df (pandas.DataFrame) – DataFrame
  • level (int or str) – Level of DataFrame index to extract IDs from
Returns:

list of string ids

padua.utils.get_protein_ids(s)[source]

Return a list of shortform protein IDs.

Extract all protein IDs from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).

Parameters:s (str or unicode) – protein IDs in MaxQuant format
Returns:list of string ids
padua.utils.get_shortstr(s)[source]

Return the first part of a string before a semicolon.

Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268

Parameters:s (str or unicode) – protein IDs in MaxQuant format
Returns:string
padua.utils.hierarchical_match(d, k, default=None)[source]

Match a key against a dict, simplifying element at a time

Parameters:
  • df (pandas.DataFrame) – DataFrame
  • level (int or str) – Level of DataFrame index to extract IDs from
Returns:

hiearchically matched value or default

padua.utils.qvalues(pv, m=None, verbose=False, lowmem=False, pi0=None)[source]

Copyright (c) 2012, Nicolo Fusi, University of Sheffield All rights reserved.

Estimates q-values from p-values

m: number of tests. If not specified m = pv.size verbose: print verbose messages? (default False) lowmem: use memory-efficient in-place algorithm pi0: if None, it’s estimated as suggested in Storey and Tibshirani, 2003.

For most GWAS this is not necessary, since pi0 is extremely likely to be 1
Parameters:
  • pv
  • m
  • verbose
  • lowmem
  • pi0
Returns: