About PathwayWorld  |  Strand Life Sciences  |  Contact Us
  • Search Pathways

    (e.g. Lung Cancer, Diabetes)

    'Search Tips'

    Use
    1. " " or AND for exact match,
    2. * for partial match


Analysis Pipeline

FAQ Section


This section is intended to help users with the terminology associated with understanding terminologies associated with PathwayWorld and Network Analysis in biology.


What do the nodes/entity and edges represent in a pathway/network?

Nodes: The nodes in a pathway/network are mostly representative of proteins, enzymes, families of proteins/enzymes, biological processes, functions, complexes or small molecules.
Edges: The edges in a pathway/network represent the molecular interaction between two molecules. The type of interaction can be binding, expression, metabolism, promoter binding, protein modification, regulation or transport.

Note: Edge length does not mean any thing in a PathwayWorld network.

Why are the edges directed?

The direction arrow on the edge represents a directed interaction. For example: if molecule A regulates molecule B, the arrow will be directed from A to B. The bidirectional arrow shows a binding relationship or a bidirectional regulation.

What is the source of molecular interaction database used in PathwayWorld?

The information on molecular interactions are available in Strand's proprietary interaction database. This database is constructed based on information from the following sources:
i) Physical interaction data represented in IntAct database.
ii) Data from published abstracts in PubMed derived using text mining.

Where is the information on molecular interactions available?

Strand's proprietary molecular interaction database is a part of GeneSpring® microarray data analysis tool. To download the trial version of GeneSpring® Click Here

Where are the microarray data sets analysed in PathwayWorld sources from?

The data sets in PathwayWorld are sourced from:
i) Already analyzed data sets are from NCBI's Gene Expression Omnibus (GEO).
ii) Present data sets are from three organisms i.e. Human, Mouse and Rat using two microarray platforms i.e. Agilent and Affymetrix.
iii) Forthcoming data sets: the analysis will be extended to all publicly available data sets from different microarray data repositories and all the organisms.

How was the microarray data analyzed to create significant gene lists?

Analysis of the microarray data was performed using GeneSpring® which is built on Strand's proprietary platform - AVADIS™. Algorithms used are present in GeneSpring® software.

The analysis pipe line:
i) Analysis was done using a automated pipeline.
ii) Grouping of samples was done using information from the meta-data associated with each GEO data sets.
iii) Data normalization was done using different algorithms present in GeneSpring®.
iv) Quality check on arrays.
v) Significance analysis was done between any two compared groups using a unpaired t-test to get a significantly differentially expressed gene list.

Note: Fold change given for each network is in the linear scale.

What are the QC criteria?

A common complaint with public microarray datasets is that, we are not sure about the quality of the data for each and every experiment. To address this problem we have used a common QC criteria for including an array within an experiment. For every experiment, a single array within an experimental group, should have atleast 0.8 correlation coefficient in its expression data, with each array within the same group. If an array fails to meet this criteria, it is discarded from further analysis. Grouping is redone by discarding these bad arrays. Normalization is performed once all bad arrays are discarded from all groups. This data is now subjected to further significance analysis.

What are the selection criteria for significant gene list?

Significant genes are selected with a p value cut-off of 0.1. The resultant list was then sorted in a descending order of fold change values. From the sorted list, the top 100 genes were selected for pathway/network analysis. If the gene list is less than 100, the entire gene list was taken for pathway/network analysis.

Why do pathways/networks contain lesser number of nodes compared to the significant gene list?

Pathways/networks only contain the nodes that have a known molecular relationship. A few nodes are added to the network to create a continuous graph using the 'shortest connect' algorithm in GeneSpring® [Ref:GS:Manual]

How close are these networks to those published by the original author of the data set?

Microarray data analysis result will vary depending upon the choice of algorithms, quality control and filter criteria.
We have analyzed the data using an automated pipeline that accepts standard values for these constraints.
i) Normalization algorithm used: any one or a combination of RMA, MAS5, base line transformation, median shift and lowess
ii) Significance analysis: unpaired t-test
iii) QC: Correlation between the samples within a group should be greater than 0.8
iv) P value: for the significant gene should be less than 0.1






DISCLAIMER:
The interactions, networks, list of genes/molecules or any other information presented here, is HYPOTHETICAL in nature and is based on information collected from public as well as other sources, and is analyzed by certain automated mining and analysis software workflows. Strand Life Sciences Pvt.Ltd. or the editors of PathwayWorld offer this information with NO CLAIMS of accuracy and WARRANTIES of any kind, and DO NOT accept liability or legal responsibility for losses allegedly caused by using this information.

GeneSpring® is a registered trademark of Agilent Technologies.
GeneSpring® is developed on AVADIS™ from Strand Life Sciences

PathwayWorld Community Forum

  • Interact with the PathwayWorld community.
  • Contribute articles and opinions.
  • Suggest ideas for site improvement.

Enter Forum >> 

Email   :

Subject:

Enter query/comment below: