Utils¶
-
padua.utils.
build_combined_label
(sl, idxs, sep=' ')[source]¶ Generate a combined label from a list of indexes into sl, by joining them with sep (str).
Parameters: - sl (dict of str) – Strings to combine
- idxs (list of sl keys) – Indexes into sl
- sep –
Returns: str of combined label
-
padua.utils.
calculate_s0_curve
(s0, minpval, maxpval, minratio, maxratio, curve_interval=0.1)[source]¶ Calculate s0 curve for volcano plot.
Taking an min and max p value, and a min and max ratio, calculate an smooth curve starting from parameter s0 in each direction.
The curve_interval parameter defines the smoothness of the resulting curve.
Parameters: - s0 – float offset of curve from interset
- minpval – float minimum p value
- maxpval – float maximum p value
- minratio – float minimum ratio
- maxratio – float maximum ratio
- curve_interval – float stepsize (smoothness) of curve generator
Returns: x, y, fn x,y points of curve, and fn generator
-
padua.utils.
chunks
(seq, num)[source]¶ Separate seq (np.array) into num series of as-near-as possible equal length values.
Parameters: - seq (np.array) – Sequence to split
- num (int) – Number of parts to split sequence into
Returns: np.array of split parts
-
padua.utils.
get_protein_id
(s)[source]¶ Return a shortened string, split on spaces, underlines and semicolons.
Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268
Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).
Parameters: s (str or unicode) – protein IDs in MaxQuant format Returns: string
-
padua.utils.
get_protein_id_list
(df, level=0)[source]¶ Return a complete list of shortform IDs from a DataFrame
Extract all protein IDs from a dataframe from multiple rows containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268
Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).
Parameters: - df (pandas.DataFrame) – DataFrame
- level (int or str) – Level of DataFrame index to extract IDs from
Returns: list of string ids
-
padua.utils.
get_protein_ids
(s)[source]¶ Return a list of shortform protein IDs.
Extract all protein IDs from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268
Long names (containing species information) are eliminated (split on ‘ ‘) and isoforms are removed (split on ‘_’).
Parameters: s (str or unicode) – protein IDs in MaxQuant format Returns: list of string ids
-
padua.utils.
get_shortstr
(s)[source]¶ Return the first part of a string before a semicolon.
Extract the first, highest-ranked protein ID from a string containing protein IDs in MaxQuant output format: e.g. P07830;P63267;Q54A44;P63268
Parameters: s (str or unicode) – protein IDs in MaxQuant format Returns: string
-
padua.utils.
hierarchical_match
(d, k, default=None)[source]¶ Match a key against a dict, simplifying element at a time
Parameters: - df (pandas.DataFrame) – DataFrame
- level (int or str) – Level of DataFrame index to extract IDs from
Returns: hiearchically matched value or default
-
padua.utils.
qvalues
(pv, m=None, verbose=False, lowmem=False, pi0=None)[source]¶ Copyright (c) 2012, Nicolo Fusi, University of Sheffield All rights reserved.
Estimates q-values from p-values
m: number of tests. If not specified m = pv.size verbose: print verbose messages? (default False) lowmem: use memory-efficient in-place algorithm pi0: if None, it’s estimated as suggested in Storey and Tibshirani, 2003.
For most GWAS this is not necessary, since pi0 is extremely likely to be 1Parameters: - pv –
- m –
- verbose –
- lowmem –
- pi0 –
Returns: