Welcome to Institute of Genomics
Genoscope Home Page
The CNG is the French national research center which enables a response to scientific questions necessitating high throughput sequencing and genotyping thanks to the development and deployment of innovative integrated technologies. The organization of the CNG enables optimization of genetic and genomic research on human diseases by creating indispensable links between cohort constitution (DNA samples), identification of the responsible genes, and study of the transcriptome and epigenome.
List of the main projects of the Institut de Génomique
Well-known to students as a model unicellular organism, Paramecium (Paramecium tetraurelia) is a very large eukaryotic cell (120 micrometers) covered with vibrating cilia. It belongs to the Ciliate phylum (Ciliophora). In the Alveolate clade, the Ciliates form a group related to the unicellular parasites named apicomplexans, which include Plasmodium falciparum, the main causative agent of malaria. Paramecium is an organism which is both unicellular and complex; it is therefore an excellent model for the genetic study of the numerous differentiated functions in multicellular organisms which are absent in simpler eukaryotes such as yeast.
The Ciliates have the fascinating property of having separate germinal and somatic lineages within a single cytoplasmic unit. These cells possess indeed two nuclei. A germinal nucleus (the micronucleus) is responsible for the transmission of genetic information via sexual processes, whereas a somatic nucleus (the macronucleus) ensures expression of this information. At each sexual generation, a new somatic nucleus is produced by programmed rearrangements of the whole genome contained in the germinal nucleus.
More than 50 years of classical genetics experiments have led to the accumulation of almost 200 Mendelian mutations of Paramecium, affecting very diverse cellular processes (i.e. morphogenesis, regulated secretion, cell cycle, antigenic variation, sex determination and expression of the mating type, and rearrangements of the genome). Indeed, Paramecium is very well-suited to genetic analysis because of its two modes of sexual reproduction, autogamy and conjugation. Autogamy is a process of self-fertilization which renders the genome of the zygote completely homozygous in one generation. The stored strains are therefore just as easy to manipulate as haploid organisms. Conjugation is a process of reciprocal fertilization which produces two new zygotic nuclei which are identical in the two partners, which makes it possible to identify the traits with Mendelian heredity very easily, and to distinguish traits which are maternally inherited. The genes identified by mutation can be cloned by functional complementation.
Gene silencing, which can be provoked by the introduction of transgenes, provides a powerful tool for the functional analysis of genomes. However, the ideal tool in Paramecium is RNA interference, which can be obtained with remarkable efficiency in this organism by ingestion of bacteria which produce double-stranded RNA. This method of "feeding," which was originally developed for the nematode Caenorhabditis elegans, makes it possible to envisage large scale functional analysis of the ORFs that are identified.
The nuclei of Paramecium, the micronucleus and the macronucleus, differ in both structure and function. The diploid micronucleus, which is present in two copies in the P. tetraurelia species, represents the germ line and is completely silent in terms of transcription. This is the nucleus which undergoes meiosis and fertilization during sexual events (conjugation between competent cells or autogamy in a single cell).
The macronucleus, which is highly polyploid (about 1000n), represents the somatic line and is the site of transcription. Both the macronucleus and the micronucleus are derived from copies of the zygotic nucleus. The programmed development of the macronucleus includes DNA amplification by a factor of about 250, precise elimination of short internal sequences called IES and imprecise elimination of regions which are rich in transposons and repeated sequences and are probably heterochromatic. These events causes fragmentation of chromosomes. The extremities created in this way are repaired by the addition of telomeres.
The majority of the heterochromatin is eliminated during development of the macronucleus, which is therefore essentially euchromatic and devoid of repeat sequences and microsatellites. This represents a real advantage for a first genome sequencing project, because the presence of repeat sequences causes technical problems, notably during the genome assembly stage.
Because of this elimination of repeated sequences, the macronuclear genome of Paramecium is very "compact": it is estimated that its coding fraction is over 70% (in the human genome, it is of the order of 1%!). The introns are uniformly small (from 18 to 35 bases) and the intergenic regions are generally less than 50 100 bases, and may be only a few bases long (a minimum of 9 bases has been observed to date). Its remarkably compact nature makes the macronucleus the material of choice for the inventory of Paramecium genes.
This inventory will also benefit from the sequencing of the genome of Tetrahymena thermophila, which is another Ciliate, evolutionarily distant from Paramecium by over 100 million years. Tetrahymena is also being studied by a large community of scientists.
The DNA from the macronucleus of Paramecium tetraurelia is being sequenced at Genoscope using a whole genome shotgun sequencing strategy. Several libraries with different insert sizes (3 kb, 5 kb and 10 kb) have been constructed from macronuclear DNA. This comes from purified macronuclei from stock strain d4-2. The macronuclei are easy to obtain by simple centrifugation due to their size and density. The cloning of the large inserts (10 kb), which provide large scale clone links during the assembly, required a special development effort because of the high A + T content of the Paramecium genome (70-75%, and up to 85% in the intergenic regions).
The ensemble of the sequences read will represent a 10X coverage of the haploid macronuclear genome, for which the size is estimated at about 100 Mb. These reads are assembled with an assembler developed at Genoscope. The elimination of repeat sequences during the formation of the macronucleus and the apparent absence of duplicated regions will facilitate the assembly, for which no mapping data is available. Problems may occur with the chromosome ends: each of the chromosomes of the macronucleus (there are about 350 chromosomes) is present in a thousand copies, which may differ slightly at the points where the telomeres are added. These problems should be easy to resolve.
The assembly is unlikely to lead to a number of contigs, or even of scaffolds, which would be equivalent to the number of chromosomes in the macronucleus. Nevertheless, there should be only a few gaps with sequencing at 10X coverage. The sequencing of the micronucleus of Paramecium, which is in project phase, should facilitate ordering of the sequences of the macronucleus.
Laboratories involved: Laboratory for the Genomic Analysis of Eukaryotes , Sequencing Laboratory
CEA is a French government-funded technological research organisation in four main areas: low-carbon energies, defense and security, information technologies and health technologies. A prominent player in the European Research Area, it is involved in setting up collaborative projects with many partners around the world.