Bioinformatics and biostatistics for the analysis of omics data and biological big data
Based on its 20-year experience, Acobiom acquired a unique expertise in the bioinformatics analysis of gene expression and in the identification of RNA biomarkers that are specific to the studied pathology or biological question. So, the company developed methods and tools for the analysis of sequencing data using proprietary bioinformatics programs and databases.
Overview of the provided Bioinformatics and Biostatistics analyses and the MaRS database
Bioinformatics for sequencing data analysis
Facing to the increasing demands on statistical methods and bioinformatics tools for the analysis and the management of the huge amounts of omics data (sequencing), Acobiom adapts its tools and parameters to the specifications of its partners and their specific biological requests.
Acobiom’s team offers customized pipelines and algorithms for the gene expression study and sequencing data analysis:
The main steps of a gene expression analysis pipeline are as follows: a) Data Quality controls, b) Mapping on reference genome, c) Counting of the mapped sequences, d) Normalization and comparison, e) Sequence annotation, f) Overview of expressed genes including statistics analysis, visualization and validation.
For each of the steps, Acobiom selects and adapts the tools to answer the biological question as precisely as possible.
Acobiom has a cutting edge expertise associated with proprietary biostatistic tools: consulting and training in the field of Data Science, Modelling and Statistics.
Its generic roadmap includes several steps: i) pre- processing, ii) quality control, iii) normalization, iv) expression quantitation, v) differential expression analysis etc.,
and can be customized to the client/partner’s needs and specifications: machine learning, multivariate statistical analysis, survival analysis…
Acobiom can also provide upstream the experimental design the most adapted to the research theme in order to obtain the full potential of the data resulting from the associated omics analysis (sequencing, qPCR).
MaRS (Matrix of RNA-Seq): a database of 21,000 Human RNA-Seq profiles
ACOBIOM collected in a sole database, called MaRS, 21,000 Human gene expression profiles, RNA-Seq profiles, produced by laboratories worldwide.
MaRS is focused on the RNA-Seq method, which reflects the expression of genes in a specific condition. Generated by New Generation Sequencing (NGS), all these profiles had been analyzed according to standardized way on High Performance Computing (HPC). The MaRS matrix will allow direct comparison of these 21,000 RNA-Seq profiles and the associated transcriptomics data.
MaRS contains profiles of Human diseases (cancers, infections, genetic diseases…) and several organs (blood, breast, lung, liver, pancreas…). So, ACOBIOM uses MaRS to explore and identify new biomarkers, but also to combine these collected data with newly generated data.
In addition, MaRS makes it possible to qualify and/or identify new reference genes (Housekeeping genes) that are just as important as the targeted genes/biomarkers, which are too often neglected in these studies.
The MaRS database is a great internal development tool for ACOBIOM. The company provides also its partners with access to this database through services and/or scientific collaborations.
Technical view of Bioinformatics and Biostatistics analyses
Workflows in bioinformatics processing
a/ Gene expression profiling for different biological conditions (mapping on an annotated reference genome, counting, normalization, sequence annotation), b/ Overview and comparisons of expressed genes (differential analyses, visualization and statistical validation) .
Method used to study transcriptomes: RNA-Seq analysis (NGS sequencing of all mRNA for a determined condition).
Services provided: i) Quality control and sequence cleaning , ii) Sequence mapping on the reference genome, iii) Expression analysis (genes differentially or co-expressed).
Deliverables: i) Raw sequences (FASTQ), ii) Mapping results (BAM), iii) Tables of results of counting mapped sequences, iv) Comparison of read counts with annotations and statistical analyses.
Study of genetic variations in a condition compared to a reference genome.
Exome: part of the genome constituted by the exons (expressed part).
Method used: a) Exons analysis by high quality NGS sequencing for variant detection (variant calling), b) Precise detection of InDel and SNP, annotations.
Services provided: i) Quality control and sequence cleaning, ii) Sequence mapping on the reference genome, iii) Analysis of variants in relation to a reference or other condition.
Deliverables: i) Raw sequences (FASTQ), ii) Mapping results (BAM), iii) Variant tables (VCF), annotations, iv) Statistical analysis to identify variants of interest.
Metagenome Analysis and Microbial Composition study
a) Study of the microbial composition of a sample under a given condition, b) Cleaning and clustering of representative sequences into OTUs (Operational Taxonomic Units), counting of populations, c) Taxonomic analysis of samples (species detection), d) Population and diversity analysis (Alpha-, Beta-diversity).
Services provided: i) Quality control and sequence cleaning, ii) OTUs clustering tables and counts, iii) Visualization files (species distribution, histograms…).
De Novo Assembly of Genomes and Transcriptomes
a) De Novo Assembly: Genome or transcriptome assembly without using a reference, b) Cleaning and assembly of sequences from high throughput sequencing, c) Contig creation, d) Consensus sequence creation, e) Sequence annotations, f) Optimal strategy, specifically adapted to the type of sequencing and the organism studied.
Services provided: i) Quality control and sequence cleaning, ii) Sequence files (FASTA/FASTQ), iii) Annotation files, iv) Statistical Overview and Tools for Visualization.
Workflows in biostatistical analyses
Acobiom supports its partners in the design/experience plan of their studies by proposing designs optimized to their needs: i) Classic and adaptive design, ii) Calculation of power and individuals, iii) Statistical Analysis Plan (SAP), iv) Writing statistical parts of the protocol, v) CRF review, vi) Randomization.
Statistical analysis & Reporting
Statistical analysis and study report: i) Inferential analysis, descriptive analysis, multivariate analysis, machine learning, supervised/unsupervised clustering, time series, geostatistics… ii) Production of tables, lists and graphs dedicated to communication and regulated information, iii) Writing the statistical parts of the clinical study, producing statistics, iv) Statistical reporting in compliance with regulatory authorities, v) ISE and ISS (Integrated Summaries of Efficacy and Safety) writing for health authorities.
Services provided: i) Differential Expression cross-analysis, ii) Dimension reduction, iii) Classifier building, iv) Modelling, v) Features extraction and Parameter optimization…
Deliverables: i) Complete graphic synthesis, ready to use for any kind of presentation, ii) Analytical synthesis for health authorities compliance, iii) Predictive/Diagnostic tool for helping decision, iv) Writing for scientific/marketing valorization, v) List of validated biomarkers…