Precision Medicine - Biomarkers & Diagnostics

Bioinformatics and biostatistics for the analysis of omics (big) data

Based on its 20-year experience, Acobiom acquired a huge expertise on the bioinformatics analysis of gene expression. The company developed proprietary bioinformatics programs and databases for analyzing sequencing data.

Overview of the provided Bioinformatics and Biostatistics analyses and the MaRS database

Bioinformatics for sequencing data analysis

Facing to the increasing demands on statistical methods and bioinformatics tools for the analysis and the management of the huge amounts of data, Acobiom adjusts tools and parameters to the specifications of its partners and their specific biological requests.

Acobiom’s team proposes personalized pipeline and algorithms for gene expression and sequencing data analysis:
> Data Quality check
> Mapping on reference genome
> Comparison of read counts
> Annotation of sequences
> Overview of expressed genes including statistics analysis, visualization and validation


Acobiom has a cutting edge expertise associated with proprietary biostatistic tools: consulting and training in the field of Data Science, Modelling and Statistics.

Its generic roadmap includes several steps: pre- processing, quality control, normalization, expression quantitation, differential expression analysis etc., and can be customized to client/partner needs and specifications: machine learning, multivariate statistical analysis, survival analysis…

Acobiom can provide the design of experiment of the researches enabling the full potential of the data issued from omics (sequencing, qPCR) analyses.

MaRS (Matrix of RNA-Seq): a database of 22,000 Human RNA-Seq profiles

ACOBIOM collected Human RNA-Seq profiles produced by laboratories worldwide that allow the comparison of all the NGS transcriptomic data. MaRS is focused on the RNA-Seq method, which reflects the expression of the genes in a specific condition. MaRS contains ~22,000 Human RNA-Seq profiles generated by Next Generation Sequencing.

ACOBIOM uses MaRS to explore new targets or pathways from a large amount of available data and to combine them with new generated data. Moreover, MaRS allows to qualify and/or identify new reference genes (housekeeping genes) just as important as targeted gene/biomarker, which are all too often neglected in these studies.

However, the use of MaRS database is used for in-house development, but ACOBIOM opens the database access for partners through services and/or scientific collaborations.

Technical view of Bioinformatics and Biostatistics analyses

Tasks followed in Bioinformatics treatments

De Novo Assembly of Genomes and Transcriptomes

De Novo Assembly of Genomes
> Optimal assembly strategy specifically adapted to sequencing outputs and expected genome size.

Deliverables for De Novo Assembly of Genomes:
> FASTQ files and FASTA files (for contigs)
> Statistical overview

De Novo Assembly of Trancriptomes

Deliverables for De Novo Assembly of Transcriptomes:
> FASTQ files and FASTA files
> Statistical overview

Mapping & Variance Analysis

> Accurately study of all types of mutations
> Detection of all SNPs and InDels
> Exact mapping of each read, differences between reads and reference sequence are detected
> Accurate SNP and InDel calls
> Results are reported in the common VCF format (Variant Calling Format)
> Mutations human-readable tables (tab-delimited)

Services provided:
– Mapping of re-sequenced data to a reference genome for whole genome sequencing or amplicon sequencing projects
– SNP and InDel detection based on the mapping results

– FASTQ files
– BAM files (Binary Sequence Alignment/Map and Index) for further analysis and visualization of mapping results
– Mapping report on number of reads and bases mapped, reference coverage etc.
– Annotated variants in VCF format
– Mapping and variant calling results for easy access with Integrative Genomics Viewer (IGV)

Exome Analysis

> Genetic variations easy to interpret
> Exome sequencing data high quality variant detection
> Accurate and annotated InDel and SNP detection reports in VCF (Variant Calling Format) format
> Bioinformatic pipeline: QC (FASTQC), SNP and InDel detection (VCF format), Annotation of variants and comparison to known SNP db (dbSNP, DijonSNP), Target coverage statistics

– FASTQ files
– Mapping files (BAM, BAI)
– Variant files with annotation (VCF)
– Targeted enrichment statistics (tab-delimited table)

Transcriptome Analysis

> Gene expression profiling (Mapping on an annotated reference genome, Comparison of read counts, annotation of sequences)
> Overview of expressed genes including statistics analysis, visualization and validation

Services provided:
– Mapping mRNA-Seq
– Expression analysis

> Mapping results
> Visualization software tools
> Read counts result tab delimited
> Comparison of read counts

Metagenome Analysis and Microbial Composition study

> Study of the microbial composition of the samples

Bioinformatic pipeline:
> Quality clipping reads, noise removal sorting
> OTU representative sequences (Operational Taxonomic Unit)
> Taxonomical assignment read abundance estimation for OTUs species level

> Detailed report (PDF)
> Result tables (Excel)
> HTML files for visual exploration

Tasks followed in Biostatistics analyses


Design of the study
> Classic and adaptive design,
> Calculation of power and individuals,
> Writing statistical parts of the protocol,
> CRF review,
> Randomization,
> Statistical Analysis Plan (SAP)

Statistical analysis and reporting of the study
> Interim analysis
> Final analysis,
> Production of tables,
> lists and graphs,
> Writing of the statistical parts of clinical study,
> reporting statistics,
> writing of ISE and ISS.


Services provided:
> Differential Expression analysis done with a process adapted from two R packages “edgeR” & “DESeq”
>Addition of two more R packages (“DEXSeq” and “DESeq2”) to increase drastically the resolution of the results

> Results displayed in easy interpretable plot like heatmap or any other plot for presentation & publication usage.
> Optimization of the experience with an omics score that can be used in addition to several variables, e.g. in medicine the clinical variables, as the pain, localization of tumor, etc.