Tuesday, May 15, 2012

Map of Life demo launch

A recent article in Nature News describes the demo launch of the Map of Life, an online resource for biodiversity analysis.  The resource integrates many different data types into one mapping tool.  


My favorite part:
"Map of Life will soon allow users to add or update species data, thereby becoming the first two-way portal of biodiversity information."
This will be essential.  I quickly used the map to plot distributions of fossil hominins, mostly from museum collections.  What I found was that many well known fossil sites from widely distributed groups like Neandertals and Homo erectus are not represented yet.


The article mentions that the biggest challenge facing the new resource is drawing interest from biodiversity researchers.  This seems to me like very good opportunity for open access, collaborative data management.  The vision of the database creators is highlighted in their recent article in Trends in Ecology and Evolution. (Beware the article is behind a paywall).


References:

Jetz, W., McPherson, J. M. & Guralnick, R. P. 2012. Integrating biodiversity distribution knowledge: toward a global map of life. Trends in Ecology & Evolution 27, 151–159. http://dx.doi.org/10.1016/j.tree.2011.09.007


Thursday, February 23, 2012

Technology in the classroom

A story by Jeffrey Young in the Chronicle of Higher Education discusses the value of new technology in the classroom.  The most salient point is that technology only aids teaching if used in the right way.  It can't make up for a lack of connection between teacher and students.  I liked this quote, which I have certainly experienced in my own teaching:
"Students can all sniff out an inauthentic place of learning," the professor argues. "They think, If it's a game, fine, I'll play it for the grade, but I'm not going to learn anything."

Sunday, January 15, 2012

A Basic Guide to the UCSC Liftover Command Line Utility


If, like me, you have spent time working with data files from various projects on the human genome, you have probably encountered the need to compare files that are built from different builds of the human reference genome.


If you are only working with a small number of sites, it is very conveniant to use the online UCSC Batch Coordinate Conversion (Liftover) tool.  Lately, however, I have needed to translate some larger genome files from an older build of the human genome (hg18) to the newest version (hg19).  


As an amateur programmer, I had a lot of trouble finding instructions on how to properly execute a liftover on my local machine.


After some trial and error I was finally succesful in converting between genome builds on my local maching using the UCSC executable liftover command line utility. So, I thought I would share some very basic instructions on how I accomplished this.  (Note that I work on a Mac so instructions for other machines might vary!).


DOWNLOADING AND CONFIGURING THE LIFTOVER EXECUTABLE
To begin, the data file you want to convert has to be in the Browser Extensible Data (BED) format. Basically, this is a tab delimited file in which the first three columns are chromosome, start position, end position.


Next, a few files need to be downloaded from the UCSC Genome Bioinformatics site and configured. The first of these is the actual liftover executable file. To download and configure this:

  1. Go to http://hgdownload.cse.ucsc.edu/admin/exe/ and download the appropriate version of the liftOver utility (depending on your system).  Running a macbook pro with Intel Core i5 processor, I chose to download the macOSX.i386/ version.
  2. Click on macOSX.i386/ then choose liftOver from the      directory (I had to right click to download the attached file).
  3. Once downloaded, you must give the file permission to run as an executable by running at the terminal prompt:

    • $ chmod +x liftOver  (make sure your present directory is  the one where liftOver resides)

THE BASIC LIFTOVER COMMAND
The liftOver executable is now ready to go. Run $liftOver at the terminal prompt to see a description of the commands. Note that if liftOver is not in a file in your $PATH, you either have to move it to a folder in $PATH or give the full location of the file: ex. /path/to/liftOver


The basic command of the liftOver utility takes this form:


$ liftOver oldFile map.chain newFile unMapped


Where:

  • liftOver is the initial command (again use full file location if executable is not in $PATH).
  • oldFile is the location of the file you want to convert to a new build (again, must be in BED format).
  • map.chain is the UCSC chain file that holds the instructions for conversion (instructions for download follow).
  • newFile is the location and name of the file that will hold the results of the successfully remapped output.
  • unMapped is the location and name of the file that will hold the results of the unmapped output.  



DOWNLOADING THE CHAIN FILE
As mentioned above, the liftover process requires an applicable .chain file from the UCSC site:

  1. Go to the UCSC Genome Browser Downloads Page.
  2. Find the download section for the species you are working with, then find the build of the genome that you wish to convert from.  Ex. If converting the human genome from hg18 to hg19, go to the downloads section for hg18.
  3. Find the link entitled LiftOver files and click.
  4. Find the appropriate conversion file and click to download (or right click and select Download Linked File.
  5. Once downloaded, use this file when you run the liftOver utility at the terminal prompt. 

I have successfully used these basic instructions to convert a few relatively large (100 to 200 MB) genome position files. If there are no large sections unmapped this tool should work without any additional specifications.  If you encounter any more problems, I found the following threads on the UCSC mailing list helpful.


I'm going to be trying to use the conversion tool on some larger whole-genome data files and I will update this post if I encounter any major issues.


I hope that some other amateur computational biologists out there find these brief instructions useful!

Saturday, December 10, 2011

Neandertal Introgression and 1000 Genomes

I've been swamped lately and have been neglecting my blogging this fall.  To try and get back into things slowly, I want to point to a post by John Hawks that discusses the work we are currently doing in our lab with the Neandertal and Denisova genomes.


John has provided some very nice graphics under creative commons license (free for anyone to use) describing his comparisons of data from the 1000 Genomes Project to the Neandertal genome.  Using this large dataset we have been able to duplicate the original results of Green and colleagues (2010) and Reich and colleagues (2010) which provided evidence for introgression from Neandertals in all populations outside of Africa.  In addition, this large-scale dataset is also providing some unexpected details about the variation of Neandertal ancestry within continental regions.


John goes into more detail about some of the specifics of his recent findings.  There will be more interesting results to come!




References:


Green, R.E. et al. 2010. A Draft Sequence of the Neandertal Genome. Science. 328:710–722.


Reich, D. et al. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 468:1053–1060.

Tuesday, August 9, 2011

Ancient DNA and human history

I just wanted to point to a recent Nature News article that mentions some of my current research with my advisor, John Hawks.


John makes a really great point in the article that is relevant to science education, both at the high school and college level.


"These genomes are publicly available. There's nothing stopping high-school students from doing this, and the kind of stuff that I'm putting out on my blog is the stuff that a smart high-school student could do." 
Hopefully we will see some ancient genomics popping up at high-school science fairs soon! 

Sunday, July 17, 2011

Whole-genome inference of population histories

A recent paper in Nature (Li and Durbin 2011) introduces a new method to infer population history and effective population size from a single whole genome.  I won't go into great detail because Razib Khan has already posted a very thorough review.  

Reference:
Li, H and Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature: 10.1038/nature10231

Wednesday, July 13, 2011

African genomes

A recent feature in Current Biology (Gross, 2011) discusses the growing awareness of the need for genomic data from Africa.
Now, however, there is an emerging movement within Africa aiming to empower African researchers to participate in the exploration of the treasures of human diversity available on their own continent, and to ensure that medical benefits from such research also reach the native population.
The medical benefits of an increase in the number of genome-wide samples from Africa would be quite large.  As of now, very few Genome-Wide Association Studies (GWAS) have been conducted among African populations. Nonetheless, the potential for GWAS to identify causal loci has been demonstrated by a few recent studies (Jallow et al., 2009; for example).


Gross discusses two projects currently underway in Africa (Southern African Human Genome Programme and Human Heredity and Health in Africa, H3Africa) geared towards understanding the breadth of genomic diversity in Africa and applying this knowledge to medically benefit the people of Africa.
While a large part of the disease burden afflicting Africa is due to avoidable infectious diseases, such as malaria, the continent is also seeing a rise in non-communicable diseases, such as cancer and diabetes. H3Africa aims to improve health in Africa both by adding to the understanding of different susceptibility to both communicable and non-communicable diseases.
I imagine that geneticists across the globe will welcome the insight from these African projects.  Not only do these data stand to greatly benefit medical research in Africa, but also medical research in other populations.
Mutual benefit is also what the funders of the H3Africa project hope for. As NIH director Francis Collins explained in a recent comment in the Huffington Post, “Not only will this [H3Africa project] help people living in Africa, but, since Africa is the cradle of humanity, what is learned about genetic variation and disease likely will have an impact on the health of populations around the globe. [...] Rather than seeing biomedical innovation as something that flows from developed nations to low-income nations, we need to start viewing innovation as a two-way street from which the entire world stands to benefit.”
These benefits also extend to the study of human evolution.  In my own research, for example, I am interested in the evolution of complex (often medically relevant) traits such as those related to human disease and diet.  Disease risk doesn't appear to always be distributed across the same genetic loci in all human populations.  Particularly, where complex genetic pathways are concerned, there is a growing evidence for a great deal of parallelism across human populations, meaning that different populations may often have different risk alleles for the same conditions.  Further, a large part of the heritable disease risk is likely caused by extremely rare variants (McClellan and King, 2010).  Identifying the locations of rare causal variants for the same conditions in different populations will give us a much greater ability to identify the shared genetic networks in which diseases occur, and may ultimately give us the potential to discuss disease risk in the context of ancient genomes.




References:
Gross, M. (2011). African genomes. Current Biology, 21(13), R481–R484. Elsevier. doi:10.1016/j.cub.2011.06.047


Jallow, M., Teo, Y. Y., Small, K. S., Rockett, K. A., Deloukas, P., Clark, T. G., Kivinen, K., et al. (2009). Genome-wide and fine-resolution association analysis of malaria in West Africa. Nature Genetics, 41(6), 657–665. doi:10.1038/ng.388


McClellan, J., & King, M.-C. (2010). Genetic Heterogeneity in Human Disease. Cell, 141, 210–217.