Wednesday, 13 November 2013
The folks at Backblaze published their own field report on HDD failure rate which is interesting for any data center.
Earlier I had read about Google's study on how temperature doesn't affect HDD failure rate and promptly removed the noisy HDD cooling fans in my Linux box.
Their latest blog post at http://blog.backblaze.com/2013/11/12/how-long-do-disk-drives-last/ has me thinking that some of my colleagues elsewhere that are doing Backblaze like setups should switch to consumer grade HDDs to save on cost.
I do have a 80 Gb Seagate HDD that has survived the years. Admittedly I am not sure what to do with it anymore as it is too small(80 Gb) to be useful and too big (3.5") to be portable. It was used as a main HDD until it's size rendered it obselete hence it's sitting in a USB HDD dock that I use occasionally.
maybe you can find out the age by looking up the serial number but I use the SMART data info that you can see from the Disk Utility in Ubuntu.
|My ancient 3.5" HDD|
|Powered on only for 314 days!|
Hmm pretty low mileage for a 80 Gb HDD eh?
Check out this 320 Gb IDE HDD
Completely anecdotal but I have 3 Seagate 1 Tb HDD dying within a year from a software 5x HDD RAID array from within CentOS. When I checked on the powered on days it says it has been running for 3 years. So I am
1) confused how SMART data records HDD age
2) in agreement with Backblaze that HDD have specific failure phases. (where usage patterns play less of a role perhaps)
3)guessing that most of the data on Backblaze are archival in nature i.e. write once and forget until disaster strikes. So it would be great if Backblaze can 'normalize' the lifespan of the HDD with data access patterns per HDD to make it more relevant for a crowd that has a slightly different usage pattern than pure data archival needs.
That said I think it's an excellent piece of reading if you are concerned about using consumer grade HDD. Kudos the Backblaze team who managed to 'shuck' 5.5 Petabytes of raw HDD to weather the Thailand crisis (wonder how that affected their economics of using consumer grade HDD)
As usual YMMV applies here. Feel free to use consumer grade HDD for your archival needs but be sure to build in redundancy and resilience into your system like the folks in Backblaze.
Monday, 28 October 2013
Sequencing Difficult Templates - Why Quality is Everything
Wednesday, November 6
Time: 1:00 pm A.E.D.T
Josquin Tibbits, PhD,
Senior Research Scientist, Dept of Environment and Primary Industries
For most applications, sequence quality (low error rates, correct library size, even coverage etc), stands out as the key metric for the downstream utility of data from NGS platforms. I investigate the quality and utility of data generated from a range of platforms (454, HiSeq, MiSeq and PGM) for the reference initiated assembly of homopolymer, repeat and low complexity plant plastid genomes. These types of sequences are a good proxy for the more difficult sequence regions found when exploring larger genomes in both agricultural and human sequencing projects. The analysis will show in detail how the different platforms cope with these challenging regions
Wednesday, 2 October 2013
"As the previously rapid climb in cost efficiency brought about by next-generation sequencing plateaus, the failure of single-molecule sequencing to deliver might leave some genomics aficionados despondent about the prospects for their field. But a recentCorrespondence article in Genome Biology saw Nobel laureate Richard Roberts, together with Cold Spring Harbor’s Mike Schatz and Mauricio Carneiro of the Broad Institute, argue that the latest iteration of Pacific Biosciences’ SMRT platform is a powerful tool, whose value should be reassessed by a skeptical community.
Genome Biology 2013, 14:405
Go to article >>
Wednesday, 28 August 2013
About the Course
Wednesday, 24 July 2013
- assembly of complex genomes (polyploid, containing excessive long repeat regions, etc.),
- accurate transcript assembly,
- metagenomics of complex communities,
- and phasing of long haplotype blocks.
This latest set of data released on BaseSpace
|Read length distribution of synthetic long reads for a D. melanogaster library|
image source: http://blog.basespace.illumina.com/2013/07/22/first-data-set-from-fasttrack-long-reads-early-access-service/
with the integration of Moleculo they have managed to generate ~30 gb of raw sequence data. They have refrained from talking about 'key analysis metrics' that's available in the pdf report. Perhaps it's much easier to let the blogosphere and data scientists dissect the new data themselves.
Am wondering when the 454 versus Illumina Long Reads side-by-side comparison will pop up
UPDATE:Can't find the 'key analysis metrics' in the pdf report files. Perhaps it's still being uploaded? *shrugs*
so please update me if you see it otherwise I just have to run something on it
These are the files that I have now
259M Jul 18 01:01 mol-32-2832.fastq.gz
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_281c.pdf
149K Jul 24 2013 mol-32-281c-scaffolds.txt
44K Jul 24 2013 FastTrackLongReads_dmelanogaster_2832.pdf
151K Jul 24 2013 mol-32-2832-scaffolds.txt
253M Jul 24 2013 mol-32-281c.fastq.gz
I have ran FastQC (FastQC v0.10.1) on both samples the images below are from 281c.
you can download the full HTML report here
Reading about the Moleculo sample prep method, it seems like it's just a rather ingenious way to stitch short reads which are barcoded to form a single long contig. if that is the case, then I am not sure if the base quality scores here are meaningful anymore since it's a mini-assembly. Also this takes out any quantitative value of the number of reads I presume. So accurate quantification of long RNA molecules or splice variants isn't possible. Nevertheless it's an interesting development on the Illumina platform. Looking forward to seeing more news about it.
Other linksIllumina Long-Read Sequencing Service
Moleculo technology: synthetic long reads for genome phasing, de novo sequencing
CoreGenomics: Genome partitioning: my moleculo-esque idea
Moleculo and Haplotype Phasing - The Next Generation TechnologistNext Generation Technologist
Abstract: Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome (Plant and Animal Genome XXI Conference)
http://www.moleculo.com/ (no info on this page though)
Illumina Announces Phasing Analysis Service for Human Whole-Genome Sequencing - MarketWatch
First publication using the Long Read Seq (LRseq) The genome sequence of the colonial chordate, Botryllus schlosseri | eLife Contains a diagram explaining the LRSeq protocol. This experiment yielded ~1000 6.3kb fragments
Friday, 5 July 2013
The ISOs should be helpful if you wish to 'futureproof' your spanking new application in the latest windows or test exisiting apps to see if they might break in the new win8.1
well another good reason to use it is that I am pretty sure this ain't happening in Mac or Linux
Microsoft is adding native support for 3D printing as part of the Windows 8.1 update, making it possible to print directly from an app to a 3D printer. The company is announcing the new feature this morning, working with partners including MakerBot Industries, 3D Systems, Afinia, AutoDesk, Netfabb and others.
Go http://msdn.microsoft.com/en-us/windows/apps/bg182409 now!
loving the 1.5 Mb/s download here