估算双端测序的有效长度
使用picard软件的CollectInsertSizeMetrics功能进行估算
~/app/pichard/picard CollectInsertSizeMetrics \I=/zfsqd1/ST_LBI/PROJECT/P17Z10200N0166-2/bidui/normal/104.subjunc.bam \O=104_insert_size_metrics.txt \H=104_insert_size_histogram.pdf \M=0.5
然后计算片段的平均长度
然后计算有效长度
计算TPM
countToTpm <- function(counts, effLen){rate <- log(counts) - log(effLen)denom <- log(sum(exp(rate)))exp(rate - denom + log(1e6))}countToFpkm <- function(counts, effLen){N <- sum(counts)exp( log(counts) + log(1e9) - log(effLen) - log(N) )}fpkmToTpm <- function(fpkm){exp(log(fpkm) - log(sum(fpkm)) + log(1e6))}countToEffCounts <- function(counts, len, effLen){counts * (len / effLen)}
################################################################################# An example################################################################################cnts <- c(4250, 3300, 200, 1750, 50, 0)lens <- c(900, 1020, 2000, 770, 3000, 1777)countDf <- data.frame(count = cnts, length = lens)# assume a mean(FLD) = 203.7countDf$effLength <- countDf$length - 203.7 + 1countDf$tpm <- with(countDf, countToTpm(count, effLength))countDf$fpkm <- with(countDf, countToFpkm(count, effLength))with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
说在最后:根据原理,我想大家一定都注意到了,tpm的计算是需要计算插入长度的,而这个矫正在想要比较多个样本的时候,我认为反而不利,究其原因是因为多个不同样本的插入片段长度会有所区别,而我们将其引入tpm的计算得到的结果就是,即使定量的是同一个基因,也会因为测序的不同而导致表达量的不同?
参考文章:FPKM与TPM计算
