Brain SPORE Blog

Recently in TCGA Category

This is the agenda of the recent TCGA steering committee meeting, with links to the presentations made at that meeting:


National Cancer Institute and National Human Genome Research Institute National Institutes of Health


The Cancer Genome Atlas (TCGA) Steering Committee Meeting

December 3-4, 2008, Bethesda North Marriott Hotel & Conference Center

Bethesda, Maryland


PRESENTATIONS

Wednesday, December 3

8:30 a.m. - 8:40 a.m. Opening Remarks

Anna D. Barker, Ph.D., National Cancer Institute, NIH

Alan E. Guttmacher, M.D., National Human Genome Research Institute, NIH


8:40 a.m. - 10:40 a.m. Center Presentations


Report on data generation and analysis; presenter should include brief comment on status of ovarian project (10 minutes each)

Moderator: Bruce E. Johnson, M.D., Dana-Farber Cancer Institute

a. Cancer Genome Characterization Centers

University of North Carolina at Chapel Hill D. Neil Hayes, M.D.

Memorial Sloan-Kettering Cancer Center Marc Ladanyi, M.D.

Lawrence Berkeley National Laboratory Elizabeth Purdom, Ph.D.

Johns Hopkins/University of Southern California Peter W. Laird, Ph.D.

HudsonAlpha Institute of Biotechnology Devin Absher, Ph.D.

Harvard Medical School Peter J. Park, Ph.D.

Broad Institute of MIT and Harvard Gad Getz, Ph.D.


b. Genome Sequencing Centers

Washington University School of Medicine Li Ding, Ph.D.

Broad Institute of MIT and Harvard Michael S. Lawrence, Ph.D.

Baylor College of Medicine David A. Wheeler, Ph.D.


11:00 a.m. - 12 noon Analysis Working Group Reports

Moderator: D. Neil Hayes, M.D. University of North Carolina at Chapel Hill


Publications Using Integrated Data

Expression Analysis and GBM Subtypes (15 minutes) Roel G.W. Verhaak, Ph.D. Broad Institute of MIT and Harvard

Integrative Copy Number Analysis (15 minutes) Terrence P. Speed, Ph.D. (presented by E. Purdom) University of California, Berkeley


State of Ovarian Cancer Genomics (Literature Review) Paul T. Spellman, Ph.D., Lawrence Berkeley National Laboratory

c. Planning for the Ovarian Analysis Jamboree (15 minutes) Gad Getz, Ph.D., Broad Institute of MIT and Harvard


1:15 p.m. - 2:45 p.m. Technology Development (10 minutes each + 5-minute discussion)

Moderator: Robert H. Waterston, M.D., Ph.D. University of Washington


TCGA R21 Technology Development Grantees

Cytosine methylation and mammary carcinoma Timothy Bestor, Ph.D. Columbia University

Methylation Profiling of Normal and Cancer Genomes Gerd P. Pfeifer, Ph.D., City of Hope

Allele-Specific DNA Methylation in Normal and Cancer Tissues Benjamin Tycko, M.D., Ph.D., Columbia University

Detecting Structural Mutations in Cancer Genomes by Long-Range End-Tag Profiling

Aleksandar Milosavljevic, Ph.D., Baylor College of Medicine

Microarray Sequence Capture for Large-Scale Targeted Sequencing of Cancer Genomes Thomas Albert, Ph.D., Roche/Nimblegen Systems

Targeted Genomic Circularization for Cancer Genome Resequencing, Hanlee Ji, M.D., Stanford University School of Medicine


3:00 p.m. - 3:30 p.m. Technology Development of the CGCCs (10 minutes each)

RNA Profiling Via High-Throughput DNA SequencingJonathan G. Seidman, Ph.D., Harvard Medical School

Detecting Gene Rearrangement Associated with Intragenic CNA, Cameron W. Brennan, M.D., Memorial Sloan-Kettering Cancer Center




3:30 p.m. - 5:30 p.m. GSC Technology Implementation

Moderator: Mark S. Chee, Ph.D., Prognosys Biosciences, Inc.



Results From Pilots, Planning for 2009, Discussion

Richard A. Gibbs, Ph.D., Baylor College of Medicine

Stacey Gabriel, Ph.D. & Gad Getz, Ph.D., Broad Institute of MIT and Harvard

Elaine R. Mardis, Ph.D., Washington University School of Medicine



Thursday, December 4

8:30 a.m. - 10:00 a.m. Discussion: Cancer Genomics, Biology, and Translation

Moderators: Ronald A. DePinho, M.D. & Geoffrey Duyk, M.D., Ph.D.

What are the predictions for cancer genomics in 2 years? In 5 years?

What are the key challenges in technology? In analysis? In translation?

What are the expectations for diagnostic markers? For therapeutic targets?

How should cancer research communities be preparing for the influx of new findings?


10:10 a.m. - 11:45 a.m. Analyzing and Applying TCGA Data (15 minutes each)


Moderator: Sean Eddy, Ph.D., Janelia Farm Research Campus

caBIG® and TCGA, Kenneth H. Buetow, Ph.D., National Cancer Institute, NIH

Visualizing Cancer Genome Data, Jill P. Mesirov, Ph.D., Broad Institute of MIT and Harvard

UCSC Cancer Genomics Browser for TCGA Data, David Haussler, Ph.D., M.S., University of California, Santa Cruz


TCGA Functional Analysis Protein Mutations & Pathways Chris Sander, Ph.D., Memorial Sloan-Kettering Cancer Center

Analysis of mRNA Expression Data from TCGA and non-TCGA Glioma Samples, Kenneth D. Aldape, M.D., University of Texas M.D. Anderson Cancer Center

11:45 a.m. - 12:30 p.m. Samples Update

Samples Update: A Cautionary Tale (25 minutes), Carolyn C. Compton, M.D., Ph.D., National Cancer Institute, NIH

Breast Cancer Samples from SPORES (5 minutes), Charles M. Perou, Ph.D., University of North Carolina at Chapel Hill


12:30 p.m. Closing Remarks and General Adjournment

Alan E. Guttmacher, M.D.

Anna D. Barker, P

Splice Center @ NCI Genomics and Bioinformatics Group

| Comments (0) | Trackbacks (0)

The Genomics and Bioinformatics Group.jpgDr. Weinstein presented a useful tool to use at today's TCGA meeting, called Splice Center.


VisitGenomics and Bioinformatics Group and look under tools at left, or use this link to Splice Center.

You can see the splice variants from GenBank and RefSeq, and see Affy probes superimposed. Links to NCBI allow you to look at the genes.

Another tool, GoMiner, is an interpretation tool for 'omics' data. You can upload gene IDs and look at gene ontologies. According to the website: "Addresses the question, "Now that I've done the gene expression experiment and identified a set of 'interesting' genes, what do those genes mean biologically?" GoMiner batch-processes and organizes lists of thousands or tens of thousands of genes and provides two fluent, robust visualizations of the genes in the framework of the Gene Ontology hierarchy. (Zeeberg, et al., Genome Biology 2003; 4:R28)".

Contact Dr. Weinstein if you would like to be referred to someone who can help.

Dr. Chris Pelloski presented his analysis of methylation status in glioblastoma at the American Association for Cancer Research "Molecular Diagnostics in Cancer Therapeutic Development" meeting being held at the moment. AACR made a press release that you can see here:

Methylation levels key to glioblastoma survival

Quote:
"This study shows that the methylation status of CpG islands may serve a robust, and previously unexplored, source of biomarkers for this disease," said lead author Christopher E. Pelloski, M.D., an assistant professor of radiation oncology and pathology at M. D. Anderson. "It also indicates that there seems to be a common theme to glioblastoma that the more closely the tumor cells resemble cells of neural development, the less aggressive the clinical course; whereas if they more resemble mesenchymal cells, which are poorly differentiated and invasive, the worse the clinical outcome will be.

TCGA Study highlighted on web

| Comments (0) | Trackbacks (0)

A nice comprehensive blog post on the TCGA brain study can be seen here:
http://www.highlighthealth.com/diseases-and-conditions/the-cancer-genome-atlas-reports-molecular-characterization-of-brain-tumors/trackback/

It is interesting how the perception of TCGA has changed - it is hard to remember that there was a time when some people were very concerned about whether it would yield anything of value. The blogger at highlighthealth.com says:

"This is an excellent example of how current genome characterization technologies can systematically explore the universe of genomic changes involved in cancer. The TCGA is also studying lung and ovarian cancer."

TCGA Data: Local Update by Pablo Freire

| Comments (0) | Trackbacks (0)

Pablo Freire, working with Dr. Jonas Almeida on the TCGA data, has posted new data to our (secure) TCGA intranet site. If you are part of our MDACC TCGA group and wish to have access to these data, please send an email to Oliver Bogler.

Pablo writes:

The data is presented as tab-delimited files and can be imported to excel. For the expression and copy number data, those values are calculated for every affymetrix probe position. Sequence data are the exact tab-delimited files from the TCGA portal.

For the expression and copy number files, I did the following:

Expression platform: Affymetrix U133A plate set, performed by Broad Inst.
Copy number platoform: Agilent 244A aCGH, performed by Harvard Medical School.

For expression, the raw data (CEL files) were retieved from the TCGA portal. For the copy number, the normalized data was used.

Expression data normalized with RMA.
Copy number data segmented with CBS.

For every affymetrix probe position, I eliminated genes that were not annotated at the RefSeq or probes that either don't match anywhere or match in more than one location in the genome.
Also, for every affymetrix probe position, I calculated the copy numeber associated by getting the mean values of the copy number probes within the span of the gene.

Just one sample per patient was used. In the cases where there were multiple samples per patient, the first one was used, yielding a total of 207 patients.

Links to the data are (access & password required):

TCGA sequence data

TCGA Agilent copy number

TCGA Affymetrix Expression

MD Anderson Licenses GeneGo Metacore

| Comments (0) | Trackbacks (0)

According to the GeneGo website:

"MD ANDERSON BECOMES A GENEGO CENTER OF EXCELLENCE USING METACORE FOR ONCOLOGY RESEARCH

St. Joseph, MI. September 2nd, 2008 - GeneGo, Inc., the leading systems biology tools company, announced today that MD Anderson has become a certified GeneGo Center of Excellence. MD Anderson researchers will have institution-wide access to GeneGo's MetaCore data analysis suite, training and advanced support. MD Anderson specializes in cancer treatment and research and is well known for its advanced clinical trials programs. MetaCore will be used throughout many research programs both as a central data repository, management and collaboration platform for clinical OMICs data and as an integrative pathway analysis suite.

Mary E. Edgerton, a pathologist performing cancer research at MD Anderson and a long-term client, says "I have been using Metacore for many years in my research into pathways and networks that control aggressive behavior in cancers. I am currently using it to infer networks from analysis of gene expression array data for pathways in lung, brain, and breast cancer. I also use the curated pathways to formulate mathematical models of molecular networks that predict tumor behavior using multiscale modeling. Not only do I find the product to be very useful, but I also appreciate the responsiveness of the staff at Genego to my technical questions and suggestions."

"MD Anderson is renowned for their pioneering work in oncology, I have known some of the MDs there for a while and I admire their dedication to their patients and to working hard to find cures." said Julie Bryant, GeneGo's VP of Business development. "We are proud to have them as an institution customer and a Center of Excellence. We have a large development program in the area of cancer system biology tools supported by NCI and we are glad to see this work appreciated by some of the best oncology professionals in the world working at MD Anderson.""

GeneGo, Inc. Systems Biology for Drug Discovery

How can we all get into the GeneGo action? Anybody tried it yet? Please comment below.

TCGA Home Page @ Genboree

Since my post on July 11 I have obtained access to the Genboree website, which is built & maintained by the Bioinformatics Research Laboratory
within the Human Genome Sequencing Center at Baylor College of Medicine.

The site has an area dedicated to the TCGA project, and there you can see reports relating to the progress of the sequencing effort, which are compiled quarterly and show progress and quality metrics. The May report says that about 300 samples (of the 1000 total - 500 tumor, 500 blood) have had about 2/3rds of the genes in the list completed.

More interesting (OK, for me) is the genome browser there, which is based on the UCSC browser, but allows you to see the mutations in specific genes, categorized as SNPs. You can zoom down to the sequence level and see the sites that have been found to be mutated. There is an interactive list of the genes included, which you can use to navigate straight to that genes region - very handy.

This is a follow up post to the previous one, with the full text of the article, which appears sometimes to be blocked by subscription. Here it is:
-----------------------------------------------------------

NCI to Roll Out Cancer Molecular Analysis Portal to Integrate Oncology Data from TCGA

[July 18, 2008; from bioinform.com]

By Vivien Marx

In an effort to broaden access to complex oncology data sets, the National Cancer Institute is preparing to unveil a new resource called the Cancer Molecular Analysis portal, which will integrate large, disparate genomics data sets from the Cancer Genome Atlas project and other cancer genomics studies.

The CMA portal is scheduled to launch next week with about a terabyte of brain cancer data from TCGA. TCGA's ovarian and lung cancer datasets are the next scheduled to arrive, with "a continuous flow of datasets into the CMA Portal over the next few months," said Subha Madhavan, associate director of life sciences informatics in the NCI's Center for Biomedical Informatics and Information Technology.

The second data set to be loaded into the CMA portal will be brain glioma data from approximately 500 patients from the so-called Rembrandt study headed by the NCI's Neuro-Oncology branch. "The reason for lining this up right after TCGA is to enable comparisons and correlations between two brain tumor datasets and neuro-oncologists will benefit from two large-scale comprehensive studies in one portal," said Madhavan, who is managing the CMA portal project.

Other NCI-supported projects will be imported into the portal as the data become available and as the data sharing policies are worked out for those studies. Two examples are the Target study, a childhood cancer initiative to catalog genomic changes in high-risk acute lymphoblastic leukemia and neuroblastoma; and CGEMS, or the Cancer Genetic Markers of Susceptibility study, an initiative to identify genetic alterations that make people susceptible to prostate and breast cancer.

Madhavan expects Rembrandt data to be available via the CMA portal by the end of this year. The timing for Target and CGEMS in CMA portal has yet to be determined, as the data must first become available and the sharing policies worked out, she said.

The portal, part of the NCI's Cancer Biomedical Informatics Grid project, is expected to enable researchers to integrate, visualize, and explore clinical and genomic characterization data, said Madhavan.

The initial version of the CMA portal will include genomics data from more than 200 patients suffering from glioblastoma multiforme, along with diagnosis information, treatment history, pathology status, the site of the tumor, and background on the patients’ surgery, said Madhavan.

Genome characterizations available through the portal will encompass sequence data, gene expression studies, copy number and SNP analysis, methylation studies, and miRNA expression data.

Kenneth Aldape, associate professor in MD Anderson Cancer Center's department of pathology, told BioInform in an e-mail that he expects this data to be of great value, particularly because "for each tumor sample, multiple platforms have been used to profile the cancer genome."

As a result, he said, "for the first time, we can integrate data from changes in the cancer cell on the DNA, RNA, and epigenetic levels. Insights gained from this integration of data will most likely lead to new ways that we can understand the molecular pathogenesis of glioblastoma."

Navigating the portal, users can view and access mutation profiles from tumor samples in reference to the human genome, mine clinical characteristics such as survival data and tumor staging, and correlate that with mutation and genome characterization results using a number of analytical tools.

While it is true that scientists can currently accomplish these tasks using other resources, Madhavan noted that there is currently no single integrated source for this information. "Look at the number of databases one would need to access," she said, citing clinical information, the metastatic status of patient tumors, tissue annotation, and expression data as examples.

"A lot of these tools and databases are geared toward sophisticated statisticians and analysts who know how to handle these tools, but the goal for CMA is to put [them] in the hands of the decision-makers, the physician-scientists," said Madhavan.

The portal is designed to let these end users work with the data without expert assistance, she said, using caBIG software functionality to help scientists find the type of datasets they need, from TCGA and elsewhere. "They can put in a gene name and it will bring back a Kaplan-Meier survival chart."

The first data set in the CMA portal is from the TCGA Pilot project, which is run jointly by the NCI and the National Human Genome Research Institute and aims to assess the feasibility of characterizing all human cancers by starting with three cancer types: brain, lung, and ovarian cancer. To date, TCGA has been making data available to the research community through a data portal launched last year.

The TCGA data portal, set up by the project's Data Coordinating Center, supports the program’s immediate data release policy. "This is a simple FTP site wrapped into a web site," said Madhavan, explaining that this site provides access to raw archives that the TCGA centers submitted.

The Cancer Molecular Analysis portal, however, is slated to become a comprehensive site that will include TCGA data and other data, too. "It presents analysis, summaries and allows users to link clinical data and genomic data, which is not possible in the [TCGA portal's] FTP wrapper," said Madhavan.

Bulk download of TCGA data will be possible through the TCGA portal, whereas the CMA portal providing analysis and data visualization capabilities under one site, said Madhavan.

"Some of the vision is to provide a unified view across multiple studies, so people can not only drill deeper into one study but they can cross-correlate and compare data across studies," she said.

Building in User Needs

The CMA portal offers researchers several data views: a "gene view" to analyze expression, copy number, SNP, and pathway data; a "genome view" to look at entire chromosomal regions; a "clinical view," which includes Kaplan-Meier survival plots and other data of clinical interest; and analytical tools such as GenePattern, a software platform developed by the Broad Institute that combines workflow with dozens of computational and visualization tools, or the Cancer Genome Workbench, developed by the NCI as a computational platform to integrate clinical tumor mutation profiles with the reference human genome.

Madhavan said that a key goal for the project was to maintain a user focus. "If your tools are not easy to use, you don't get adoption, and these clinician-scientists are so busy that you don't want these tools to have such a steep learning curve," she said.

"I think it will help my work and others in the field," said Herbert Newton, director of the division of neuro-oncology at Ohio State University Medical Center & James Cancer Hospital.

Newton told BioInform via e-mail that he believes the portal and this kind of data integration "will become more and more valuable as we make further progress with translational programs to develop molecular-based treatments."

"For the first time, we can integrate data from changes in the cancer cell on the DNA, RNA, and epigenetic levels." In particular, the glioblastoma multiforme data set "will be very helpful to neuro-oncology researchers working on molecular aspects of high-grade gliomas," he said. Although there is much information available on the topic, "this will be a much broader effort for characterization of these genes, with a very large and ambitious set of genes to analyze."

Newton said he expects the TCGA glioblastoma multiforme data set to "eventually become the 'gold standard' for molecular characterization and analysis of GBM."

Madhavan said that an important source of input was a use case workshop for the TCGA data portal held in January, which brought together bench researchers, clinicians, statisticians, and computer scientists who jointly defined how the portal should be configured to house TCGA data. The participants were both eager to build technology up and also take down barriers between clinical and research disciplines, said Madhavan. "It's absolutely amazing to see what these groups can do when you put them in one room. They don't talk to each other every day."

The CMA data can be explored online with runtime analysis tools that are part of the portal, but it can also be downloaded for downstream analysis by biostatisticians. "Users can go in and select the data types and patients of interest for easy bulk download of data along with clinical and tissue annotations," said Madhavan.

To obtain that functionality, the NCI team partnered with a number of external researchers, including Peter Park, a bioinformaticist at Boston's Children's Hospital Informatics Program and at the Harvard-MIT Division of Health Sciences and Technology who is also on the faculty of Harvard Medical School, to understand how the community will want to access that data and to create ways to let them do so.

Madhavan noted that Park was "very passionate:" about how researchers will want to "slice and dice" these datasets, such as according to clinical parameters like tissue quality, in order to prepare the data for further analysis with tools of their choice.

Working the Matrix

The NCI developers worked with colleagues from Lawrence Berkeley National Laboratory, Stanford University, MD Anderson, and the University of North Carolina to create a "data access matrix," which offers users access to different "levels" of data and is "a key functionality of the CMA portal," she said. This group also became the portal's beta testers.

As Madhavan explained, "level 1" data is anything that comes out of a machine, such as probe-level data in case of an Affymetrix array. "Level 2" data in that example would be CHP files with information normalized within a given sample, while "level 3" would be segmented data and "level 4" would comprise genomic regions of interest.

For the matrix, the team sought to clearly indicate to portal users what level of data they are downloading, she said. Scientists seeking to do their own analysis will want mainly raw data, such as what is found in levels 1 and 2, while others may want only processed information.

“The data matrix simply allows one to select sections of the data more easily and reduces the time and effort necessary to obtain the data in a usable format," Park told BioInform via e-mail. In the case of copy number data, for example, “level 1 is the raw log-ratios, level 2 is normalized log-ratios, level 3 is segmented profiles, and level 4 is the regions called significant aberrations," he said.

"For instance, a bioinformatician interested in every step of the analysis may want to download the raw data, but clinicians might want data at the level of genes," said Park. One researcher might want to study expression levels and matched methylation levels for patients with poor survival rates, while another may want to study copy number and expression in another group of patients, he said.

An important goal for the portal, Park said, was to reduce the time that it currently takes to download public data sets and format them for analysis. "Most available data sets are poorly annotated and much effort is required by users to link different parts of the data," he said.

Another issue is reproducibility. "In general, it is nearly impossible to replicate a result described in a paper by downloading the data and following the description given by the authors, especially when the data are complex." The data matrix approach "is attempting to make this a bit more friendly," he said.

Another aspect that the CMA developers considered was patient privacy. The TCGA project defined its own patient-protection policies, and "our job on the CMA portal was to implement those patient privacy protection policies to help ensure that we are protecting the research participants in a manner that is consistent with HIPAA as well as their consent forms," said Madhavan.

However, as the portal expands to include data from other projects, it will likely encounter a range of different access models. "One has to think carefully about how this data will be shared," said R. Mark Adams in a presentation outlining the Cancer Molecular Analysis portal at last month’s caBIG annual meeting in Washington, DC.

Adams, a senior associate at caBIG contractor Booz Allen Hamilton, added that grappling with privacy issues "can be as challenging or more challenging than informatics or technical issues." The problem, he said, is "coming up with ways that we can safely provide widespread access to the data to the widest range of researchers in keeping with protecting the participants."

CMA handled this challenge by using a tiered approach. One tier is open-access data, such as gene expression profiles, which are publicly available to users without a log-in, Madhavan said, adding that this information “cannot be aggregated to generate data that is unique to an individual."

The portal also includes a controlled-access data tier, which contains clinical data and individually unique information and requires user certification for data access.

For small research labs and community-based cancer centers with only a small number of samples, researchers might use the CMA Portal to increase the statistical power of an analysis, said Madhavan, adding that everyone benefits from the portal's "instantaneous data release" policy.

"These projects are putting out these data sets in a publicly accessible way even before the publication has come out," she said. "This is why we are getting interest from outside the TCGA group," she said.

As Madhavan explained, the goal of the CMA portal is "to lower the barrier to entry to the portal by making open-tier datasets available to users in an easily usable fashion." As datasets are prepared for the portal, the access policy will need to be tailored to the dataset. For example, Target is a childhood cancer initiative to catalog genomic changes in certain types of pediatric cancers.

"Such an implementation [the open-access tier] may not readily work for Target, where children are involved and the patient privacy concerns are heightened. Hence, we may have to make some changes to the CMA portal software to implement the data release policies of the Target project," said Madhavan.

Powered by an Integrator

The CMA Portal is powered by caBIG's caIntegrator module, which had only been applied to smaller studies prior to the portal project. As a result, Madhavan said, the team's first task was to see if the 1 terabyte dataset could even be loaded into it.

Madhavan said that at the "heart" of caIntegrator is CGOM, or the clinical genomics object model, which is caBIG's standard representation for clinical and genomic findings and the annotations that go along with them.

"There is also a real-time analytic engine that provides this on-the-fly computational analysis," she said. Users can select patient cohorts with certain criteria and punt that over to any of dozens of analytic tools, such as GenePattern.

This semantic interoperability is expected to save researchers time, Madhavan said. For example, if a scientist wants to correlate overall survival in patients with a mutation rate in a particular gene, that would require "a lot of semantic connectivity between mutation data and clinical information, [so] that is what we spent most of the time on … figuring out the semantic touch-points between these different data types."

Adams said in his caBIG talk that an important goal of the portal is to make the data accessible to researchers in a user-friendly, integrated format. "Often the insights in this information are hidden in terms of finding how to correlate the multiple subsets of information," he said.

Quoting part of a wish list by Daniela Gerhard, the NCI's director of the office of cancer genomics, Adams said that the CMA portal is envisioned as a way to make this data accessible, and not by saying, "'Go to the FTP site and knock yourself out.'"

Visit the new CMA Portal here.

Cancer Molecular Analysis Portal

| Comments (0) | Trackbacks (0)

BioInform: NCI to Roll Out Cancer Molecular Analysis Portal to Integrate Oncology Data from TCGA

A little snipped appeared on one of the subscription Bioinformatics newsletters, suggesting the imminent rollout of a new portal for TCGA data. If anyone has a subscription, please post more information, or send me some text from the full post (obogler@mdanderson.org).

The Cancer Genome Atlas (TCGA) is an effort by the NCI to characterize cancers in depth, at the genomic level. One of the cancers in the TCGA is glioblastoma.

Even more interesting, in the context of the SPORE, is that NCI sees a strong connection between the TCGA and the SPOREs. One aspect of this is that the SPOREs are acting as a source of tissue samples - our group, under the leadership of Dr. Ken Aldape in Pathology, was the first center to provide glioblastoma tissues to the TCGA.

TCGA is analyzing tumors for:
- Broad Institute of MIT and Harvard, Cambridge, Mass.
Using the Affymetrix platform, this center will identify changes in expression and copy number alterations that occur in cancer.
- Harvard Medical School and Brigham and Women's Hospital, Boston, Mass.
Using the Agilent platform, this center will characterize tumor samples for alterations in chromosome segments copy number. This center will also develop new technologies to analyze expression profiles.
- Lawrence Berkeley National Laboratory, Berkeley, Calif.
Using an Affymetrix Exon 1.0 array platform, this center will identify changes in the transcription profiles that occur in cancer.
- Memorial Sloan-Kettering Cancer Center, New York, N.Y.
Using Agilent arrays, this center will provide characterization of chromosome segment gains and losses. This center will also develop new approaches to detect novel genetic rearrangements.
- The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, Baltimore, Md.
This is a joint project with the University of Southern California/Norris Comprehensive Cancer Center, which will use Illumina GoldenGate Genotyping platform, to detect changes in methylation profiles associated with transcribed genes in cancer samples.
- Stanford University School of Medicine, Palo Alto, Calif.
Using Illumina HumanHap550 Genotyping BeadChip, this project will identify chromosome segments copy number variation found in cancer.
- University of North Carolina Lineberger Comprehensive Cancer Center, Chapel Hill, N.C.
Using an Agilent array platform, this center will identify changes in the transcription profiles that occur in cancer.

Join Us
Now the SPOREs are thinking about how to digest the data emerging from TCGA and designing the right back end studies. As part of that effort we hold TCGA/SPORE meetings every second Friday at 3pm in FC7.3035 - all welcome.

TCGA Knowledge Base
In today's meeting we discussed the state of the data available and how to translate specific biological questions into queries that can be applied to the current data. The new TCGA Knowledge Base was introduced. The TCGA-KB is an interface to access the raw TCGA data (e.g. CEL files) and the goal is to eventually add normalized data. Eventually a query interface will be added to the site.

Genboree
This is a group that is managing the sequencing effort for NCI - see at www.genboree.org. We will attempt to make contact to get access.