Motivation
In worldwide, every twelvemonth the harm caused by bird grippe virus is important, doing worldwide concern. In recent old ages, the high infective avian grippe eruptions have often caused a terror in Asia. More worryingly, the avian grippe can do worlds ill. Harmonizing to old surveies, research workers have mastered significant sum cognition about the bird flu virus construction and its remedy mechanism. However, the bird grippe virus has strong variableness, that make the vaccinum or antibiotics look rather weak. Bird such as poulet, duck and the goose, are the natural host of flu virus and birds are easy affected by the grippe virus. Therefore we raise a hypothesis that there will be a certain sequence bing in bird nucleotide sequences, which may be associated with the bird grippe virus coevals.
Purposes: In this undertaking, the chief work is turn uping the conserved parts in flu virus nucleotide sequences, and comparing the similarities between grippe viruses ‘ conserved parts and chicken nucleotide sequence by utilizing Blast. The perfect matching sequence will be used as the natural information to construct phyletic tree by utilizing the package tool MEGA. Harmonizing to the consequence of phyletic tree, the meaningful sequence sections will be checked from miRBase database to place the construction and map of these short sequences. Once these sequences are confirmed, it will be a great aid for the apprehension of flu virus. It will besides greatly better the ability of human to contend the grippe, supplying a new way of analyzing new flu virus antibiotics and vaccinum.
Contact: jingxuan.zhang @ ncl.ac.uk
Debut
Worlds and the Flu virus have co-existed for at least 400 old ages ( Earn et al. , 2002 ) . In 1918, A flu pandemic ( Spanish grippe ) lead to 20 to 100 million deceases, and there were 1 to 1.5 million deceases from H2N2 ( Hilleman, 2002 ) . In the USA, there were over 426,000 deceases due to influenza virus between the old ages of 1972-1992 ( Klimov et al. , 1999 ) . Since the first sensing of Asiatic line of descent of extremely infective avian grippe virus ( H5N1 ) in southern China in 1996, the H5N1 virus eruption has continued for the past13 old ages in Asia ( Suzuki et al, 2009 ) . The high infective Avian Influenza can easy do the domestic fowl dice. Furthermore, between the old ages of 2003-2004, H5N1 caused deceases in several species, including ducks and worlds ( Sturm-Ramirez et al. 2004 and Tran et Al. 2004 ) . Harmonizing to a study by the WHO, there were 371 people that were infected by the H5N1. The strain caused 235 dice, by the terminal of March, 2008. This gave a warning to the universe that, it would be a serious jeopardy to human existences if Avian Influenza had an outbreak worldwide. Therefore, the survey of Avian Influenza seems rather pressing and necessary. Furthermore, it will be possible to supply a new research way for the vaccinums and drug industry, if this undertaking could turn out that the grippe genomes have similarities with poulet genomes. This undertaking is based on the biological hypothesis that a virus is born after the birth of cell biological science ; the chief purpose is to better the relationship between grippe virus and the host, such as poulet. Harmonizing to old research, the bird is one of the best hosts for grippe viruses ( Wang, L. , 2009 ) , and based on the research ( Hey and et Al. ) , given that birds barbour a natural reservoir of grippe viruses. Chicken, as a modern descendent of the dinosaurs, its genomes sequences have been sequenced ( Consortius ICGS, 2004 ) . Thus it gives us an chance to make whole genome analysies. In 1944, Frank Macfarlane Burnet ‘s research, he proved that the grippe virus lost virulency when it was cultured in fertilized biddy eggs. At present, the eggs still are the good medium for flu vaccinum production, and the poulet eggs are still the preferable medium for flu vaccinum production, which have lasted over 50 old ages since first usage ( Gerdil, 2003 ; Hay et al. , 2001 ; Palese, 2006 ) . In this undertaking, I will concentrate on the survey of the similarities between chicken genomes and avian grippe genomes, by comparing the sequences between comparative preserved constituents of avian grippe virus genomes and seeking to happen the endogenous retro-virus and the sites which are close to the jumping genes of the poulet. Consequences of phyletic tree analysis will exemplify our hypothesis that the avian grippe virus genomes may be related to the endogenous retro-virus and the sites shuting to the jumping genes.
Flu virus
Influenza is a negatively stranded RNA virus belonging to the orthomxyoviridae household. Harmonizing to the anti-genicity of their viral nucleoproteins ( NP ) and matrix proteins ( MP ) , the avian grippe virus can be divided into three different types, A, B and C. Each groups the virus have really similar construction ( Wang, 2009 ) . Based on the difference of haemagglutinin ( HA ) and neuraminidase ( NA ) , the Avian grippe virus can be divided into 15 H subtypes ( H1~H15 ) and 9 N subtypes ( N1~N9 ) . The HA are able to adhere the virus ‘ host cell and fuse with it. The NA can disrupt the viral collection and prevent freshly minted virus release from infected cell ( Hilleman, 2002 ) . The Influenza A viruses infects a broad assortment of animate beings, such as human, Equus caballus, hog, Mustela nigripess and birds. However, the type B viruses are merely able to infect mammals, which it lacks of distinguishable serotypes comparing with type Angstrom. Finally, the Influenza C viruses besides can merely infect mammals, but seldom cause disease. From a structural position, the type A and type B have 8 sections ( Figure 1 ) . However, the grippe C viruses are simply consisted of s/s ( – ) sense RNA in 7 sections.
Figure 1. The common construction of flu virus: It shows the common construction of the grippe virus, harmonizing to the figure, the flu virus normal contain 8 sections, which include NA, NP, NS, MP, HA, PA, PB1, and PB2. The figure is from: hypertext transfer protocol: //www.osc.edu/education/si/projects/flu_virus/index.html.
Because of the grippe virus genome does non wish many other virus merely has a individual piece of nucleic acerb sequence, nevertheless, flu virus consist seven or eight piece of metameric negative-sense RNA, and each RNA including either one or two cistrons ( Bouvier et al. , 2008 ) .
The different types have diverseness genome constructions and different infection abilities ( Tang, et al. , 2005 ) . Therefore, virus types B and C are of lesser importance than A. The host of grippe virus has broad scope that includes water bird, sea gulls, hogs, Equus caballuss, Canis familiariss and many other mammals. The limitation of host scope, means hosts play different sorts of functions in the avian grippe virus ecosystem. For illustration, the type A avian grippe can be good survive in water bird and development stasis. It means that the avian grippe virus can populate for a long clip. However, to some ground-based domestic fowl, such as poulet, there are merely limited grippe virus subtypes are found. The grippe can be classified to two types, one is low infective avian grippe, and the other is extremely infective avian grippe ( Wang, 2009 ) , both of which can be extracted from poulet. Because the poulet can quite easy catch the avian grippe virus so the poulet are treated as assorted containers. Harmonizing to Borisenko, the endogenous avian retrovirus household ( AEV ) has been found in the four Gallus species. However, endogenous retroviruses have non merely been shown to infect animate beings through perpendicular transmittal from their parents ( Borisenko, 2003 ; Katzourakis, et al. , 2005 ) , but have besides been shown in a well publicized instance in hogs, to traverse the species barrier and infect human cells ( Bartosch, et al. , 2004 ) therefore we guess that grippe viruses may be related to endogenous retro-viruses of the bird and the other bugs.
Positions on the domestic poulet
The poulet ( Gallus brace ) is an of import theoretical account being, because it bridges the evolutionary spread between mammals and other craniates. Chicken were known to be in Asia at every bit far as 5400bc ( Hillier et al. , 2004 ) . The familial analysis of the poulet information was started at the beginning of 20th century ( Hillier et al. 2004 ) . Chicken has been widely used in assortment surveies such as virology, oncogenesis and immunology ( Stehelin & A ; et al. , 1976 ) . Like most carnal, the poulet karyotype is made up of 38 somatic chromosomes and one brace of sex chromosomes. ( Hillier et al. , 2004 ) .
Conserved parts
Conserved sequences are based on a Deoxyribonucleic acid molecule ( or an aminic acid sequence in a protein ) that has remained basically unchanged during development ( Attwood and Parry-Smith, 1999 ) . Harmonizing to Prasad et Al. research, the sequences preservation of parts has functional importance ( 1990 ) . Based on the old survey, the strongly conserved parts have an of import point of focal point for planing effectual redresss covering a wide spectrum antiviral activity ( Ghosh, et al. , 2010 ) . The high-affinity antibodies against a conserved antigenic determinant could supply unsusceptibility to the diverse grippe subtypes and prevent hereafter pandemic virus infections ( Ekiert, 2009 ) . The research carried on by Ghosh et al. , identified that there are 50-base extremely conserved parts in 3′-terminal terminal of the NA cistron from proving 173 H5N1 NA cistron sequences ( Ghosh, et al. , 2010 ) .
Endogenous Retroviruss
Endogenous Retroviruss are ancient familial parasites that are widely found in a scope of vertevrates ( Borisenko, 2003 ; Gifford and Tristem, 2003 ) . They are besides formed during the integration measure of the retrovirus infections rhythm ( Borisenko, 2003 ) , in this processing, the retroviruses integrate their genome into the hosts genome, and the hosts bring forthing termed a provirus ( Borisenko, 2003 ; Gifford and Tristem, 2003 ; Katzourakis, et al. , 2005 ) . The poulet genomes includes 3 groups of avian endogenous retroviruses, foremost is the ev venue, 2nd groups are the evdogenous avian retroviruses and the eventually are human related type one retroviruses ( Borisenko, 2003 ) . The retroviruses and pararetroviruses which have evolved from LTR retrotranposons can go forth and re-enter the host cells. They can make this by geting new proteins and this procedure is called horizontal or sidelong transportation. ( Doolittle, and et al. , 1989 ) . The LTR retrotransposon widely exist in eucaryotic genomes, particularly in workss. ( Ganko, et al. , 2001 ) . The LTR retrotransposons move via a mechanism rather similar to that used by retroviruses ( Boeke, 2003 ) .
Phylogenetic trees
The end of phylogeny analysis is to work out the relationship among species, populations, persons, or cistrons. ( Lesk,
2005 ) . Phylogenetic trees are utile graph, which illustrate to the reader a brief position about development relationship. Therefore, phyletic trees are widely used in development relationship research. The development of the phyletic tree has passed over 100 old ages. In 1872, Charles Darwin introduced the theory that populations evolve over the class of coevals through a procedure of natural choice in his book, The Origin of Species ; he used a vranching form of development to exemplify the diverseness of life ( Darwin, 1859 ) . At present, non merely the algorithm, but besides the tool has had a great betterment, therefore the consequences of phyletic tree are more dependable now. The construction of phyletic tree consists of nodes, subdivision and foliages. Normally, the subdivision lengths are used to mensurate the unsimilarity between two species, or the length of clip since their separation. The tree could split into to two sorts, the rooted tree and un-rooted tree the former shows the form of descent while the latter shows the topology of relationship. By analysing the length of a subdivision of root tree, research workers can understand the rate of development in species or cistrons. However, un-rooted trees can merely exemplify the alteration in the figure of bases ( Jiang, 2003 ) .
MicroRNA
MicroRNAs were discovered by Victor Ambros in 1993 ( Lee, 1993 ) . There are about 500 known mammalian miRNA cistrons have been discovered, and each miRNA may be able to modulate many different protein-coding cistrons ( Williams, 2008 ) . They are little ( 21nt to 23nt ) non-coding RNAs that recognize and bind to certain complementary sites in the 3 ‘ untranslated parts of mark cistrons in animate beings. At least, 40 % of MicroRNA cistrons lie in the noncoding DNA of protein and non-protein cryptography cistrons or coding DNAs ( Rodriguez, 2004 ) . Most MicroRNA cistrons have their ain cistron booster and regulative units, by unknown mechanisms, the MicroRNAs can modulate protein production of the mark transcript ( Lau and et al. , 2001 & A ; Lee, and et al. , 2004 ) . It is a subject of involvement in evolutionary biological science every bit good as in functional genomics.
Methodology overview
Analysis process and work flow
ClustalX2 and BioEdit package were used to look into the conserved parts of flu virus.
BLAST was used to look into equivocal conserved parts sequences.
LTR_FINDER database was introduced to look into the LTR sequences in poulet chromosomes.
BLAST was used to compare the similarity between the conserved parts in Flu virus and the poulet genomes.
MEGA package was used to build the phyletic tree.
The microRNA database was used to happen microRNAs and comparative consequences.
Figure 2. The work flow of research.The figure shows the work flow of this undertaking. The squares mean the consequence of every measure, and the rhythms stand for the tools which introduced. Harmonizing to the work flow, the research tools include three package and three databases. The consequences by and large include four parts, conserved parts, the similarity sequences between grippe virus and chicken genome sequence, the phyletic tree and the consequences by look intoing miRBase Database.
Software
The chief nucleus of this undertaking has used three package tools:
ClustalX2 is a tool which could be used in Multiple-alignment in protein sequence and nucleotide sequence. Furthermore it could be used for fixing phyletic trees ( Jeanmougin et al. , 1998 ) . It is freely available on the web site: hypertext transfer protocol: //www.clustal.org/download/current/ .
BioEdit is powerful package for redacting sequences and sequence analysis for Windowss systems ( Hall, 1999 ) . It freely available to download on the web site: hypertext transfer protocol: //www.softii.com/downinfo/9529.html
MEGA is the short name for Molecular Evolutionary Genetics Analysis package. In 1993, the first version of MEGA was released ; soon the latest version is MEGA5. MEGA is widely used in retracing the evolutionary histories of species and multi-gene households, and gauging rates of molecular development. In this undertaking the MEGA package will be used to construct phyletic trees ( Kumar, 2004 ) and it is available at the following web site: hypertext transfer protocol: //www.megasoftware.net/
Data aggregation and processing
Sample choice is the first and most important measure for fixing the undermentioned research. Thus the research informations must be believable. To cover with this issue, in this undertaking all informations will come from the EMBL database. The sample should fulfill two characteristics, catholicity and representative. In sample choice, the genomes of grippe A type viruses will be chosen instead than type C, since A type viruses have 8 sections in whole genome. C viruses merely have 7 sections, that means C viruses ca n’t carry through the universality characteristic. Therefore in this undertaking viruses C are non a good research object, and will be discarded. The bird grippe nucleotide sequence will be downloaded from the following web site: hypertext transfer protocol: //www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html. Harmonizing to the grippe virus sequence database, the protein or nucleotide sequences are able to be retrieved from the database utilizing GenBank accession Numberss or hunt footings. The dataset of grippe sequences are annotated with different Types, hosts, country/region, sections and subtypes ( see table 1 ) . Furthermore the choice sequence type includes the protein, protein coding part and base.
Table 1. Define hunt set: It shows the inside informations of sample of subdivision. In this undertaking, all samples are type A H5N1 grippe virus, which are from Asia country.
Type Host Country/Region
Section
Subtype ( H )
( N )
A Avian Asia
Any
5
1
There are 13712 nucleotide sequences after seeking, and so the following measure is to choose the sample which the part comes from. Finally 31 samples have being selected as the research sample. Each sample includes eight sections and is form China. The lily-livered nucleotide sequence was downloaded from hypertext transfer protocol: //www.ncbi.nlm.nih.gov/projects/genome/guide/chicken/ . By look intoing the database, it merely provides the chromosomes from 1 to 28 and 32, and other chromosome such as W, Z, MT, LGE22C19W28_E50C23 and LGE64. In this undertaking, the chromosome 1 to 28 and 32 were selected, and all resources were stored in the local machine for farther analysis.
Finding preserve parts in grippe viruses
Multiple sequence alliances are a widely used bioinformatics analysis method. Many tools have become available for multiple sequence alliances since the first tool was written in 1988 by Higgines ( Higgins, 1988 ) . However, the first tool was lame calculating ability, but now there are several such as Clustal W ( Thompson et al. , 1994 ) , Clustal X ( Thompson et al. , 1997 ) . Furthermore, T-Coffee, MAFFT and MUSCLE could be used for multiple alliances ( Larkin et al. , 2007 ) . To happen conserved parts, the ClustalX2 package will be used. This is because the alliance algorithm has been optimized to aline sets of sequences which are wholly collinear. The consequences have the same spheres and these spheres have the same order. If the alliance can non run into the conditions, the package will give you undependable alliances. Therefore, when making the alliance in grippe virus, each section has to be done in a separate alliance and the sequences have to salvage in a same file. By multiple alining every sections in different grippe viruses, the consequences will expose the same parts by utilizing the symbol “ * ” in whole sequence ( see figure 3 ) .
Figure 3. Example of multiple sequence alliances: This is an illustration consequence of multiple sequence alliances ; on the left manus of chief window is the sequence Idaho information ( gastrointestinal figure and gb figure ) . On the top of chief window, the symbol means that the bases in different sequence are same in perpendicular. The horizontal swayers in the button of the Windowss, shows the place inside informations of multiple sequence alliances.
As the ClustalX2 package is non able to pick the conserved parts out. So in this portion, with the aid of BioEdit package, the conserved parts are really easy to be discovered by utilizing the ‘Find Conserved Regions ‘ option in BioEdit. BioEdit is a biological sequence editor which is intended to supply basic maps for nucleic sequence redaction, alliance, use and analysis. In this measure, there are several value should be set for happening conserved parts. The value is demoing as Figure 4.
Figure 4. Value set of happening conserved parts of an alliance: It shows the value puting to happen conserved parts by utilizing BioEdit tool. As the spreads will the chief affect factor for calculating the information and information value will impact the length of the conserved parts. Thus the figure of the spreads in any sections should be limited ; the appropriate spreads puting will be great aid for turn uping the true conserved parts.
Investigate the LTR sequences in poulet chromosomes.
In this measure, the chief purpose is to happen the LTR sequences and retro-transposons in poulet nucleotide sequences. Based on Kazazian and Moran ‘s research ( 1998 ) , the non-LTR retrotransposons, are less understood mechanistically. Therefore in this undertaking I would wish to take the LTR sequence, to turn up the retrotransposons. In this subdivision, LTR_FINDER database will be used to happen LTR sequence in poulet chromosomes. The LTR_FINDER is able to scan large- graduated table sequences rapidly and by given DNA sequences, the database could foretell the location and construction of full length LTR retrotransposons accurately ( Zhao, et al. , 2007 ) . The tool is available web site on the: hypertext transfer protocol: //tlife.fudan.edu.cn/ltr_finder/ . Base on the LTR_FINDER consequences, lily-livered base sequences which contain the retrotransposons will be selected for the Blast research.
Comparing the similarity between the conserved parts in Flu virus and the poulet genomes
Comparing the similarity is the core portion of this research. By comparing the conserved parts paperss and selected poulet chromosomes sequences, harmonizing to the NCBI blast database, the consequences will demo the similarity sequences between both species. The consequences include the information about similarity sequences location, sequence fiting information, E-value, Score, Identities and Gaps. The Blast consequences will be saved as HTML papers in local machine, and the files which contain the higher similarity sequences ( E-value a‰¤ 0.01 ) will be picked out for phyletic analysis. As if e-value less than 0.01, it suggests that the sequences may hold homology.
Building Phylogenetic tree for development analysis
The phyletic tree has been widely used for development analysis. Phylogenetic tree are used to understand the development relationship in different and similar species. To construct phyletic trees, there are three chief methods, minimal development method, maximal parsimoniousness method and the neighbor-joining [ NJ ] method ( Saitou and Nei, 1987 ) . The standard algorithm of the tree doing methods is based on the rule that it examines all possible topologies or a certain figure of topologies ( Saitou and Nei, 1987 ) . The concluding tree will be the smallest sum of entire evolutionary alteration and it is likely to be near to the true tree. However, this method will be a long clip to construct phyletic trees and merely little Numberss of topologies are examined ( Saitou and Nei, 1987 ) . Thus the consequences sometimes are non really satisfactory. Harmonizing to old survey, some method are non guaranteed to bring forth the minimal development tree, but the procedure of seeking for the minimal development tree is much better than the maximal parsimoniousness algorithms, for illustration, the Distance Wagner ( DW ) method ( Farris, 1972 ) , Modified Farris ( MF ) methods ( Tateno et al. , 1982 ) , and the neighborliness methods of Sattath and Tversky ( Sattath and Tversky, 1977 ) . In 1987, Naruya Saitou and Masatoshi Nei introduced a new method [ NJ ] method for retracing phyletic trees ; it produces the concluding tree under the rule of minimal development, and it is much effectual to obtain the right tree topology ( Saitou and Nei, 1987 ) . Therefore in this undertaking, the all the phyletic trees will be built based on the neighbor-joining method.
Basically, by alining the amino acid sequences, the figure of aminic acerb permutations per site or evolutionary distance between sequences was calculated by comparing the proportion of amino acid difference. The spreads can be present in any sequences, which will be excluded from the analysis. Finally, the tree will be constructed by neighbor-joining method and trial by bootstrap method. The in phylogeny trial, the bootstrap method is used to look into the topology disturbance in the neighbor-joining and parsimony methods. The trial will reiterate 1,000 times. Kimura 2- parametric quantity theoretical account is used to rectify for multiple hits, and transitional and transversional permutation rates and the differences in permutation rates among sites will be taken into consideration. To cover with the losing informations and alliance spreads job, Pairwise omission option will be used to take these factors from the analysis.
Measuring assurance in phyletic analysis
The bootstrap method is widely used in phyletic analyses. In the early phase, the bootstrap method was applied in phyletic to measure the repeatability of a given consequence ( Hillis, 1993 ) . However, originally bootstrapping was used to measure the repeatability of phyletic consequences. It is by and large interpreted to measure the chance that a phyletic estimation represents the right evolution ( Hillis, 1993 ) . As far back as 1985, Felsenstein suggested that the statistical trial of bootstrap could be used to measure assurance bounds of internal subdivisions in phyletic analysis. In his theory, he pointed out that characters in a matrix of taxa Ten characters are able to try with replacing, it will make many new matrices which are the same size as the original matrix, and each of them could be use to happen the good tree ( 1985 ) . Harmonizing to earlier research ( Hillis, 1993 ) , if the internal subdivision with a bootstrap has a proportion bigger than 80 % , it could turn up a true clade. And more than 95 % of the estimated clades with bootstrap assurance bounds above 70 % were right. However, if there are fewer than 10 % of estimated subdivisions with bootstrap proportions below 30 % , it is still right ( Hillis, 1993 ) . By and large, if the bootstrap bigger than 95 % , it could see the topologies is right ( Jiang, 2003 ) . It means that the consequences of phyletic tree are dependable. Therefore, one time the sequences got the bootstrap proportions, they have to be selected out for microRNA analysis no affair whether they got high bootstrap proportions or non.
Searching MicroRNAs by utilizing miRBase database
MicroRNAs may hold close relationship with some certain sequences given by a phyletic tree. Therefore the meaningful sequences in the poulet genome will be selected, and tested to see if they are MicroRNAs. This will be given a better understanding the map of those sequences. Therefore doing the relationship between conserved parts and lily-livered genome will be more obvious. In this subdivision, the work will depend on miRBase database. The miRBase database shops microRNA ( miRNA ) terminology, sequence informations, notes and mark anticipation. The database contains 5,071 miRNA venue from 58 species, and 5,922 distinguishable mature miRNA sequences ( Griffiths-Jones et al. , 2008 ) . It provides the research worker with a immense information to analyze miRNA genomics. All miRNAs are mapped to their genomic co-ordinates, and the sequence contain miRNA will be highlighted. The miRBase seeking map non merely accepts seeking by miRNA name, but besides accepts by genomic location and sequence informations. Because the sequence is rather short, and it has truly selected out, therefore, the seeking work will be based on sequence.
Consequences and Discussions
The conserved parts in grippe virus
By utilizing ClustalxX2 and BioEdit package, the conserved parts in each section have were selected. Finally, there are 24 conserved parts found in HA, 19 conserved parts found in MP, 17 conserved parts found in NA, 16 conserved parts found in NP, 16 conserved parts found in NS, 20 conserved parts found in PA, 28 conserved parts found in PB1, and 9 conserved parts found in PB2. Table 2 shows the each conserved parts location by multiple sequences alliance.
Table 2. The conserved parts location in flu virus: It shows the conserved part location based on multiple sequence alliances in each grippe virus sections, the consequences have been checked by Blast, and the equivocal conserved parts have been discarded.
Hour angle
Military policeman
Sodium
Neptunium
Nitrogen
Dad
PB1
PB2
1
90-111
67-86
39-60
87-109
79-130
96-112
68-82
109-128
2
113-147
88-108
86-103
111-127
132-202
114-130
97-114
166-188
3
210-225
115-137
399-415
132-154
215-235
132-151
116-135
475-506
4
235-249
199-236
418-445
249-277
237-253
165-181
137-153
823-839
5
340-355
261-278
513-538
345-364
264-283
339-358
173-195
1246-1262
6
464-480
280-299
555-577
411-430
291-308
411-427
247-279
1393-1409
7
498-517
510-532
603-625
723-766
375-400
513-528
299-318
1870-1898
8
597-612
543-559
636-664
852-871
425-449
570-586
320-357
2128-2150
9
645-660
572-607
666-682
1023-1039
460-483
609-634
416-435
2221-2300
10
758-777
624-640
717-736
1164-1186
516-538
636-666
476-498
11
803-855
660-679
750-766
1335-1351
558-581
717-734
545-582
12
944-964
769-784
873-892
1392-1408
584-607
741-764
637-654
13
984-1005
810-833
967-982
1428-1447
625-643
766-799
743-759
14
1092-1138
849-889
1149-1166
1449-1489
723-747
978-1003
875-891
15
1206-1264
891-906
1224-1264
1491-1516
853-872
1356-1372
917-936
16
1266-1285
911-952
1371-1397
1539-1555
889-909
1521-1537
1079-1095
17
1332-1351
961-983
1399-1447
1887-1903
1112-1128
18
1353-1381
1002-1019
2109-2125
1151-1179
19
1398-1426
1021-1038
2145-2167
1274-1290
20
1473-1494
2169-2191
1376-1395
21
1496-1524
1416-1434
22
1526-1568
1535-1557
23
1575-1609
1643-1665
24
1611-1645
1697-1731
25
1811-1830
26
2069-2085
27
2090-2115
28
2267-2314
The Blast consequence of comparing the conserved parts and poulet genomes
The e-value, as a important factor, is used to measure the similarity between biological sequences. The lower the e-value, the more similar the sequences are. However the value depends upon the mark given to the alliances and the lengths of comparing sequences. The conserved parts sequences are non long plenty to give really low E-value, therefore the meaningful Blast consequences ‘ E-value merely between to 0.01. Table 3 shows the most important consequences, in this undertaking. Harmonizing to analysis, the place of these sequences will be helpful to understand the sequence development.
Table 3. The of import sequences found by utilizing blast: It presents the location information of some poulet sequences, which got lower E-Value by utilizing blast. The row in ruddy is the most important consequence, and its comparative phyletic tree is illustrated by figure 7.
Mention No.
Location
E-Value
NW_001471565.1
1136006-1136043
NW_001471532.1
418524-418543
NW_001471651.1
7410711-7410737
NW_001471677.1
1993027-1993046
NW_001471673.1
15355501-15355530
NW_001471723.1
1285767-1285790
NW_001471505.1
928131-928166
NW_001471629.1
307181-307203
NW_001471720.1
8212308-8212333
NW_001471437.1
207986-208002
NW_001471447.1
86580-86601
The phyletic tree consequences
In the procedure of building the phyletic tree, information has been cut-off utilizing a blast e-value of a‰¤ 0.01 to compare the preserve part in grippe virus and poulet genomes. There are many conserved parts found in same section of grippe virus by utilizing BioEdit package. For illustration, in HA sections, there are 26 conserved parts and in PB1 sections, there are 32 conserved parts have been found. Because of utilizing Blast compare the conserved part and poulet genomes, the same preserve part will make many perfect fiting in different sequences, therefore when making the multiple sequence alliances, the perfect fiting short sequences in poulet genome will constellate in the preserve part which they compared with. When constructing phyletic tree, these clustered parts have to be analysed individually to forestall the information information traveling losing. Because if analyzing the whole sequence of perfect fiting sequence, the sequences have to be extended in a same length, it will take the mismatching. It means the short perfect fiting sequence may non remain at depart place. Therefore in this undertaking, the same section in flu virus will make a batch of phyletic tree consequences. Finally, 91 phyletic trees were constructed. Among the consequences, 16 in HA section, 10 in MP section, 9 in NP sections, 12 in NS sections, 11 in PA sections, 11 in PB1 sections and 8 in PB2 sections. However, there are merely 52 phyletic trees that could be used to analysis the development relationship between the grippe virus and lily-livered genome. Figure 7 shows an illustration consequence in a phyletic tree.
gi|85062564|gb|DQ343150|
gi|115502972|gb|DQ997283|
gi|225165033|gb|EU874899|
gi|115382807|gb|DQ997122|
gi|224181181|gb|FJ784854|
gi|268527184|gb|GU182142|
gi|268527180|gb|GU182158|
gi|61698013|gb|AY950230|
gi|115382881|gb|DQ997182|
gi|57915979|gb|AY737289|
gi|116583103|gb|DQ997538|
gi|61698021|gb|AY950234|
gi|115502957|gb|DQ997268|
gi|85062568|gb|DQ343152|
gi|50956627|gb|AY684706|
gi|47716772|gb|AY609312|
gi|85062566|gb|DQ343151|
gi|61698017|gb|AY950232|
gi|115502990|gb|DQ997377|
gi|115503009|gb|DQ997547|
gi|57916028|gb|AY737296|
gi|115397012|gb|DQ997308|
gi|61698019|gb|AY950233|
gi|225548021|gb|DQ914814|
gi|86753761|gb|DQ366330|
gi|224181179|gb|FJ784853|
gi|115382830|gb|DQ997133|
gi|224181167|gb|FJ784847|
gi|115502941|gb|DQ997156|
gi|50365728|gb|AY653200|
gi|224181175|gb|FJ784851|
ref|NW 001471505.1|Gga18 WGA25
ref|NW 001471565.1|
ref|NW 001471675.1|
ref|NW 001471428.1|
ref|NW 001471521.1|
39
54
96
37
25
0.05
Flu virus sequence
Significant sequence
Chicken sequence
Figure 7.
The meaningful consequence in HA section: the phyletic tree was built by utilizing the Neighbor-Joining method ( Saitou and Nei, 1987 ) and the optimum tree with the amount of subdivision length =0.93668520. The per centum of replicate trees in which the associated taxa clustered together in the bootstrap trial ( 1,000 replicates ) are shown following to the subdivisions in the same units as those of the evolutionary distances used to deduce the phyletic tree. Furthermore, the evolutionary distances were calculated utilizing the Kimura 2 parametric quantity method ( Kimura, 1980 ) . The analysis involved 36 nucleotide sequences. All equivocal places were removed for each sequence brace. There were a sum of 19 places in the concluding dataset. the ruddy words is the important sequence that I have being found, and the bootstrap trial value is 96, therefore the consequence is dependable. It will be used to look into the miRBase database to happen similar microRNAs. And the evolutionary analyses were conducted in MEGA ( Tamura, 2007 ) .
All sequences, produced by utilizing neighbor-joining method of Saitou & A ; Nei ( 1987 ) , and most of the phyletic trees could split into two chief line of descents, which include flu virus and poulet. As the testing sequences are non really long, therefore, the N.J method can non give the strongly bootstrap support in group of involvement sometimes. These sequences include NW001471693.1 ( with weak bootstrap value ; 20 % ) , NW001471743.1 ( with weak bootstrap ; 12 % ) , NW001471581.1 ( with weak bootstrap ; 26 % ) and some other sequences had an even lower bootstrap value. Sequences that got the bootstrap support value, are from 30 % to 70 % , these sequences contain NW001471685.1 and NW001471589.1 ( bootstrap value: 35 % ) , NW001471521.1 ( bootstrap value: 64 % ) , NW1471532.1 ( bootstrap value: 55 % ) , NW001471700.1 ( bootstrap value: 59 % ) and NW001471431.1 ( bootstrap value: 63 % ) . There are merely two consequences that have strong bootstrap values, these are NW001471505.1 ( bootstrap value: 96 % ) and NW001471562.1 ( bootstrap value: 73 % ) . Harmonizing of the phyletic tree consequence, most of the poulet sequences are monophyletic group as out-group, demoing at base of the tree. If proving poulet sequences are the clique, most of them are sorted matching to taxonomic group of grippe virus as sister taxa ( see table 4 ) , and they may hold paralogs relationship which suggests that cistron duplicate has occurred for evolutionary relationship of poulet sequences within each single species of flu virus.
Table 4. The Sister Taxa Sequence in Phylogenetic Trees: It shows the sister taxa in difference sections by analysis phyletic trees. The different colour from the top to the base are base for the different sections. And the each brace of column There are 11 in HA, eight in MP, six in NA, three in NP, three in NS, four in PA, 12in PB1 and three in PB2.
Flu virus ref
Chicken ref
Flu virus ref
Chicken ref
gi|115382807
NW001471723.1
gi|115503009
NW001471609.1
gi|268527184
NW001471645.1
gi|85062568
NW001471563.1
gi|115503009
NW001471595.1
gi|50365728
NW001471551.1
gi|268527184
NW001471457.1
gi|47716772
NW001471591.1
gi|57916028
NW001471670.1
gi|85062566
NW001471651.1
gi|115502990
NW001471426.1
gi|115502955
NW001471667.1
gi|224181089
NW001471590
gi|268527099
NW001471552.1
gi|47716778
NW001471555
gi|224181095
NW001471505.1
gi|50956633
NW001471667
gi|115502955
NW001471458.1
gi|85692680
NW001471458
gi|86753781
NW001471654.1
gi|57916045
NW001471743.1
gi|115502963
NW001471640.1
gi|224181356
NW001471455.1
gi|85681814
NW001471447.1
gi|57916045
NW001471545.1
gi|86753771
NW001471521.1
gi|115397016
NW001471441.1
gi|61698069
NW001471627.1
gi|115502978
NW001471615.1
gi|116583109
NW001471697.1
gi|115503015
NW001471551.1
gi|115502970
NW001471588.1
gi|116583112
NW001471454.1
gi|86753751
NW001471720.1
gi|115397021
NW001471581.1
gi|61698110
NW001471509.1
gi|115503000
NW001471562.1
gi|117414791
NW001471532.1
gi|115397022
NW001471535.1
gi|224181254
NW001471728.1
gi|268527236|
NW001471627.1
gi|115382877
NW001471445.1
gi|47716768
NW001471531.1
gi|224181236
NW001471700.1
gi|115382827
NW001471680.1
gi|115397022
NW001471503.1
gi|57915965
NW001471549.1
gi|85692710
NW001471641.1
gi|50956621
NW001471437.1
gi|115502953
NW001471698.1
Furthermore, most of the clip, the sequences in the out group showed a strong bootstrap value, sometimes the value showed 99 % or 100 % bootstrap support, and it suggests the cistron development is rather common in same species.
The consequences given by the miRBase database
Harmonizing to the phyletic consequences, 52 of import short sequences were selected out for MicroRNA analysis. The MicroRNAs database shows that there are 11 short sequences may hold relationship with poulet MicroRNAs and 32 short sequences may hold development relationship with other species. The tabular array 5 shows the of import sequences which were generated by comparing the different sections of the grippe virus.
As the phyletic tree show us the important sequences in poulet genomes. Therefore, harmonizing to the old Blast consequence, we could acquire the matched base sequences, after look intoing the miRBase database, if these sequences could fit the micro-RNA, the comparative micro-RNA will be end product. The following tabular array shows the inside informations of possible comparative microRNA found in miRBase and the poulet sequences.
Table 5. the item of similar microRNAs found by look intoing important short sequences: harmonizing to the miRBase database consequence, these consequences are rather similar to microRNAs in lily-livered genome and the similar MicroRNA consequences are demoing by mature sequence accession figure. The comparative protein Idaho were given by EMBL database and the protein map predicted by Gene ontology database.
Mature sequence accession
Relative part
Relative Protein Id
Protein map
MIMAT0007714
Intron
ENSGALP00000038644
ENSGALP00000038645
ENSGALP00000009867
nucleobase, nucleoside, nucleotide and nucleic acid metabolic procedure ; ATP binding ; ATP-dependent DNA helicase activity ; DNA binding ; hydrolase activity, moving on acid anhydrides, in phosphorus-containing anhydrides.
MIMAT0001112
Intergenic
Unknown
None
MIMAT0007746
Intron
ENSGALP00000002103
ENSGALP00000040260
signal transducer activity, receptor activity, ordinance of biological procedure, protein signaling tract, membrane.
MIMAT0007472
Intergenic
Unknown
None
MIMAT0011205
Intergenic
Unknown
None
MIMAT0007569
Intron
ENSGALP00000007732
RNA polymerase II written text go-between activity, ordinance of biological procedure, protein binding, written text regulator activity, biologicalprocess
MIMAT0007526
Intergenic
Unknown
None
MIMAT0007470
Intergenic
Unknown
None
MIMAT0007339
Intergenic
Unknown
None
MIMAT0007676
Intron
ENSGALP00000011596
negative ordinance of cistron look, histone acetyltransferase activity, written text represser activity, positive ordinance of written text from RNA polymerase II booster
MIMAT0003774
Intergenic
Unknown
None
Harmonizing to the comparative microRNA consequences, it shows that the comparative microRNAs are located in either the noncoding DNA or intergenic. An noncoding DNA is a sequence within a canned part of a cistron that is removed during RNA processing, though it contains cistrons, it is non translated into protein. The non-coding subdivisions in the noncoding DNA can be transcribed to precursor messenger RNA and some other RNAs, hence if it could be farther provided these sequences map and comparative protein will be great aid for understanding the relationship between conserved parts in flu virus and chicken genome sequences. By look intoing the EMBL database, the protein which relative to intron were found, and the protein map were anticipation by Gene ontology database. However the database does n’t demo the map inside informations for the microRNA, which it is in intergenic and none comparative protein was found up to now. Harmonizing to research ( Pfeffer et al. , 2005 ) , a few of mammalian viruses have been identified the relationship with miRNAs. These viruses include Epstein- Barr virus ( Pfeffer et al. , 2004 ) , Kaposi sarcoma-associated virus ( Cai et al, 2005 ) . Furthermore, Lecellier and squad suggested that endogenous miRNAs can impact antiviral defence mechanisms ( Lecellier et al. , 2005 ) .
Decision of the consequences
In this undertaking, 140 conserved parts have been identified. By utilizing Blast to compare the similarity between these 140 conserved parts and poulet genomes, I found 247 duplicate consequences, which have a E-value less than 0.01. There are 11 fiting consequences which have an E-value scope between and. Using the MEGA tool, identified and built 91 phyletic trees, and 52 of the phyletic trees ‘ topology reflected the development relationship, as there are 50 sister taxa were identified in 52 phyletic trees, it may propose that the sequences have paralogs development relationship. And harmonizing to look into the miRBase database, there are 32 sequences that may hold a relationship with microRNAs in some other species.11 sequences suggested that they may hold relationship with poulet microRNAs ; nevertheless the maps of some these parts can non be proven yet, sing to the restriction of the database of deficiency of experiment support resources. As all of the selected poulet sequences ‘ places are near to the retrotransposon and harmonizing to the consequence of phyletic trees, I consider that the place which are close to the retrotransposon in the poulet genome have development relationship with poulet genomes. However, I can non vouch this decision is perfectly right, as the farther work have to be carried on to turn out this sentiment.
Decisions
Restrictions of methods and betterment
In this undertaking, the BioEdit package is a rather powerful tool to redact sequences, and tonss of work was based on it. For illustration, it could be used to happen the conserved parts, sequences choice ; paperss form transform and etc. However, the BioEdit still have some restrictions, which reflects on finder conserved parts. Harmonizing to the Smagala old research, he suggested that the ‘Find Conserved Regions ‘ option in BioEdit package is non really good at covering with observing conserved parts, when non all records contain a full length sequence ( Smagala, 2005 ) . Because of the BioEdit package uses entropy secret plans to mensurate the deficiency of information content at each place, and the thought is base on the Claude Shannon theory.
( 1 )
Equation 1, shows the Claude Shannon, where is the uncertainness, it means the information at place l. and B alternatively for a residue, and represents the chance at which residue B is found at place l. Because BioEdit uses to normalise the information graduated tables, and the scope of informations for a four base is [ 0, 2 ] , the informations given by BioEdit package may make every bit high as 2.32. Therefore when the sequences include spreads or some equivocal characters, it will take to high informations. High informations make the sequence information go losing. Though the value ( Table 2 ) scene will be some aid to better the accurate for happening the conserved parts, it is still really hard for research worker to put an appropriate value, because of the figure of the length of sequences, the spreads, and some equivocal characters are non certain. To better this, the conserved part determination could utilize to normalise the information graduated tables, such as the package Confind ( Smagala, 2005 ) will be the best pick. Or anticipating some other package could develop for imitating the value puting in BioEdit package to assist user take the best value for puting.
Furthermore, there are still some restriction go outing in this undertaking, Aligning the blast consequence with grippe sequences utilizing ClustalX2 is limited, due to the short sequences wildly located in different place in one sequence, it may be hard to derive the accurate alliance, and the sequences mismatching possibly affect downstream consequences such as phyletic tree edifice.
Finally, the restriction of miRBase should be considered. As harmonizing to the latest release of the miRNA database, there are merely 121 distinguishable mature miRNAs found ( Griffiths-Jones et Al. 2006 ) , many of microRNAs incorporating in miRBase database merely identified by the sequences similarity to those known miRNA construction, and they still need by experimentation back uping ( Hillier et al. 2004 ; Griffiths-Jones et Al. 2006 ) , therefore this sort of miRNA did non include any inside informations for the map anticipation. Therefore, I can non acquire an accurate decision merely harmonizing to the microRNA database consequences.
Future work
In this undertaking, 29 chromosomes have been tested, so in the farther work, the left chromosomes have to be tested. Furthermore, as the comparing work based on the online blast database, in this undertaking I found that if the figure of comparing sequences were excessively many, the database will be really long clip to complete ciphering work, nevertheless if utilizing the local blast database alternatively of the online blast database, the calculating will be fast. And it will be seek to present for comparing the similarity subdivision.
As there are restrictions of the package, the grounds have shown in the restrictions of methods and betterment subdivision, in this undertaking the conserved parts found by BioEdit have to be tested by other package or lab experiment to do certain whether the conserved parts in poulet sequence are right. Furthermore, the conserved parts have to analyze from the protein facet. First, the conserved parts have to analysis by protein sequences, one time the protein sequences been proven holding development relationship, the protein secondary construction will be used to foretell the map of protein. Furthermore, because of the phyletic trees constructed by MEGA package, in order to acquire more accurate consequences, the phyletic tree edifice could be constructed by different package, and the compared consequences with MEGA consequences. By proving the miRBase some of these sequences have illustrated may hold the relationship with some microRNA. However, it has to be farther proved, particularly for those sequences holding close relationship with noncoding DNA. Whether those sequences have the same map with relevant noncoding DNA, and what sort of RNA transcribed could be discussed in farther work, if more clip was available.
Recognitions
I would wish to thank my undertaking supervisors, Dr. Peter Andras, and my coach Professor Anil Wipat, for their valuable support and advice during these six months. I would wish to thank my household for back uping and promoting me during my surveies. I am thankful to all the plan lectors and schoolmates for doing the class gratifying and cherished experience to me. Finally thanks my schoolmate Andrew David King assisting me with English corrections.