Bioinformatics and more widely Computational Biology is a largely data-driven Science. The array of high-throughput technology platforms in the last 10 years mean that the amount of data being generated in this field is likely to enter into Exabytes by 2020. The challenges associated with this are quite different from the data sets generated by High Energy Physics or Astrophysics in that they tend to gathered from a wide variety of different providers. Meta-analyses of these data sets can give startling new insights but come with many caveats - in particular that the quality of the data from each provider can be highly variable. I will spend some time talking about one set of experiences I have dealing with one specific technology platform and in particular how it is clear that the detection of bias in data sets is a key element of any high-throughput analysis.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336, VAT Registration Number GB 592 9507 00, and is acknowledged by the UK authorities as a “Recognised body” which has been granted degree awarding powers.
Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.