The Hague Convention

Motivation: In worldwide, every twelvemonth the harm caused by bird grippe virus is important, doing worldwide concern. In recent old ages, the high infective avian grippe eruptions have often caused a terror in Asia. More worryingly, the avian grippe can do worlds ill. Harmonizing to old surveies, research workers have mastered significant sum cognition about the bird flu virus construction and its remedy mechanism. However, the bird grippe virus has strong variableness, that make the vaccinum or antibiotics look rather weak. Bird such as poulet, duck and the goose, are the natural host of flu virus and birds are easy affected by the grippe virus. Therefore we raise a hypothesis that there will be a certain sequence bing in bird nucleotide sequences, which may be associated with the bird grippe virus coevals.

Purposes: In this undertaking, the chief work is turn uping the conserved parts in flu virus nucleotide sequences, and comparing the similarities between grippe viruses ‘ conserved parts and chicken nucleotide sequence by utilizing Blast. The perfect matching sequence will be used as the natural information to construct phyletic tree by utilizing the package tool MEGA. Harmonizing to the consequence of phyletic tree, the meaningful sequence sections will be checked from miRBase database to place the construction and map of these short sequences. Once these sequences are confirmed, it will be a great aid for the apprehension of flu virus. It will besides greatly better the ability of human to contend the grippe, supplying a new way of analyzing new flu virus antibiotics and vaccinum.

Contact: jingxuan.zhang @ ncl.ac.uk

debut

Worlds and the Flu virus have co-existed for at least 400 old ages ( Earn et al. , 2002 ) . In 1918, A flu pandemic ( Spanish grippe ) lead to 20 to 100 million deceases, and there were 1 to 1.5 million deceases from H2N2 ( Hilleman, 2002 ) . In the USA, there were over 426,000 deceases due to influenza virus between the old ages of 1972-1992 ( Klimov et al. , 1999 ) . Since the first sensing of Asiatic line of descent of extremely infective avian grippe virus ( H5N1 ) in southern China in 1996, the H5N1 virus eruption has continued for the past13 old ages in Asia ( Suzuki et al, 2009 ) . The high infective Avian Influenza can easy do the domestic fowl dice. Furthermore, between the old ages of 2003-2004, H5N1 caused deceases in several species, including ducks and worlds ( Sturm-Ramirez et al. 2004 and Tran et Al. 2004 ) . Harmonizing to a study by the WHO, there were 371 people that were infected by the H5N1. The strain caused 235 dice, by the terminal of March, 2008. This gave a warning to the universe that, it would be a serious jeopardy to human existences if Avian Influenza had an outbreak worldwide. Therefore, the survey of Avian Influenza seems rather pressing and necessary. Furthermore, it will be possible to supply a new research way for the vaccinums and drug industry, if this undertaking could turn out that the grippe genomes have similarities with poulet genomes. This undertaking is based on the biological hypothesis that a virus is born after the birth of cell biological science ; the chief purpose is to better the relationship between grippe virus and the host, such as poulet. Harmonizing to old research, the bird is one of the best hosts for grippe viruses ( Wang, L. , 2009 ) , and based on the research ( Hey and et Al. ) , given that birds barbour a natural reservoir of grippe viruses. Chicken, as a modern descendent of the dinosaurs, its genomes sequences have been sequenced ( Consortius ICGS, 2004 ) . Thus it gives us an chance to make whole genome analysies. In 1944, Frank Macfarlane Burnet ‘s research, he proved that the grippe virus lost virulency when it was cultured in fertilized biddy eggs. At present, the eggs still are the good medium for flu vaccinum production, and the poulet eggs are still the preferable medium for flu vaccinum production, which have lasted over 50 old ages since first usage ( Gerdil, 2003 ; Hay et al. , 2001 ; Palese, 2006 ) . In this undertaking, I will concentrate on the survey of the similarities between chicken genomes and avian grippe genomes, by comparing the sequences between comparative preserved constituents of avian grippe virus genomes and seeking to happen the endogenous retro-virus and the sites which are close to the jumping genes of the poulet. Consequences of phyletic tree analysis will exemplify our hypothesis that the avian grippe virus genomes may be related to the endogenous retro-virus and the sites shuting to the jumping genes.

Flu virus

Influenza is a negatively stranded RNA virus belonging to the orthomxyoviridae household. Harmonizing to the anti-genicity of their viral nucleoproteins ( NP ) and matrix proteins ( MP ) , the avian grippe virus can be divided into three different types, A, B and C. Each groups the virus have really similar construction ( Wang, 2009 ) . Based on the difference of haemagglutinin ( HA ) and neuraminidase ( NA ) , the Avian grippe virus can be divided into 15 H subtypes ( H1~H15 ) and 9 N subtypes ( N1~N9 ) . The HA are able to adhere the virus ‘ host cell and fuse with it. The NA can disrupt the viral collection and prevent freshly minted virus release from infected cell ( Hilleman, 2002 ) . The Influenza A viruses infects a broad assortment of animate beings, such as human, Equus caballus, hog, Mustela nigripess and birds. However, the type B viruses are merely able to infect mammals, which it lacks of distinguishable serotypes comparing with type Angstrom. Finally, the Influenza C viruses besides can merely infect mammals, but seldom cause disease. From a structural position, the type A and type B have 8 sections ( Figure 1 ) . However, the grippe C viruses are simply consisted of s/s ( – ) sense RNA in 7 sections.

Figure 1. The common construction of flu virus: It shows the common construction of the grippe virus, harmonizing to the figure, the flu virus normal contain 8 sections, which include NA, NP, NS, MP, HA, PA, PB1, and PB2. The figure is from: hypertext transfer protocol: //www.osc.edu/education/si/projects/flu_virus/index.html.

Because of the grippe virus genome does non wish many other virus merely has a individual piece of nucleic acerb sequence, nevertheless, flu virus consist seven or eight piece of metameric negative-sense RNA, and each RNA including either one or two cistrons ( Bouvier et al. , 2008 ) .

The different types have diverseness genome constructions and different infection abilities ( Tang, et al. , 2005 ) . Therefore, virus types B and C are of lesser importance than A. The host of grippe virus has broad scope that includes water bird, sea gulls, hogs, Equus caballuss, Canis familiariss and many other mammals. The limitation of host scope, means hosts play different sorts of functions in the avian grippe virus ecosystem. For illustration, the type A avian grippe can be good survive in water bird and development stasis. It means that the avian grippe virus can populate for a long clip. However, to some ground-based domestic fowl, such as poulet, there are merely limited grippe virus subtypes are found. The grippe can be classified to two types, one is low infective avian grippe, and the other is extremely infective avian grippe ( Wang, 2009 ) , both of which can be extracted from poulet. Because the poulet can quite easy catch the avian grippe virus so the poulet are treated as assorted containers. Harmonizing to Borisenko, the endogenous avian retrovirus household ( AEV ) has been found in the four Gallus species. However, endogenous retroviruses have non merely been shown to infect animate beings through perpendicular transmittal from their parents ( Borisenko, 2003 ; Katzourakis, et al. , 2005 ) , but have besides been shown in a well publicized instance in hogs, to traverse the species barrier and infect human cells ( Bartosch, et al. , 2004 ) therefore we guess that grippe viruses may be related to endogenous retro-viruses of the bird and the other bugs.

Positions on the domestic poulet

The poulet ( Gallus brace ) is an of import theoretical account being, because it bridges the evolutionary spread between mammals and other craniates. Chicken were known to be in Asia at every bit far as 5400bc ( Hillier et al. , 2004 ) . The familial analysis of the poulet information was started at the beginning of 20th century ( Hillier et al. 2004 ) . Chicken has been widely used in assortment surveies such as virology, oncogenesis and immunology ( Stehelin & A ; et al. , 1976 ) . Like most carnal, the poulet karyotype is made up of 38 somatic chromosomes and one brace of sex chromosomes. ( Hillier et al. , 2004 ) .

Conserved parts

Conserved sequences are based on a Deoxyribonucleic acid molecule ( or an aminic acid sequence in a protein ) that has remained basically unchanged during development ( Attwood and Parry-Smith, 1999 ) . Harmonizing to Prasad et Al. research, the sequences preservation of parts has functional importance ( 1990 ) . Based on the old survey, the strongly conserved parts have an of import point of focal point for planing effectual redresss covering a wide spectrum antiviral activity ( Ghosh, et al. , 2010 ) . The high-affinity antibodies against a conserved antigenic determinant could supply unsusceptibility to the diverse grippe subtypes and prevent hereafter pandemic virus infections ( Ekiert, 2009 ) . The research carried on by Ghosh et al. , identified that there are 50-base extremely conserved parts in 3′-terminal terminal of the NA cistron from proving 173 H5N1 NA cistron sequences ( Ghosh, et al. , 2010 ) .

Endogenous Retroviruss

Endogenous Retroviruss are ancient familial parasites that are widely found in a scope of vertevrates ( Borisenko, 2003 ; Gifford and Tristem, 2003 ) . They are besides formed during the integration measure of the retrovirus infections rhythm ( Borisenko, 2003 ) , in this processing, the retroviruses integrate their genome into the hosts genome, and the hosts bring forthing termed a provirus ( Borisenko, 2003 ; Gifford and Tristem, 2003 ; Katzourakis, et al. , 2005 ) . The poulet genomes includes 3 groups of avian endogenous retroviruses, foremost is the ev venue, 2nd groups are the evdogenous avian retroviruses and the eventually are human related type one retroviruses ( Borisenko, 2003 ) . The retroviruses and pararetroviruses which have evolved from LTR retrotranposons can go forth and re-enter the host cells. They can make this by geting new proteins and this procedure is called horizontal or sidelong transportation. ( Doolittle, and et al. , 1989 ) . The LTR retrotransposon widely exist in eucaryotic genomes, particularly in workss. ( Ganko, et al. , 2001 ) . The LTR retrotransposons move via a mechanism rather similar to that used by retroviruses ( Boeke, 2003 ) .

Phylogenetic trees

The end of phylogeny analysis is to work out the relationship among species, populations, persons, or cistrons. ( Lesk,

2005 ) . Phylogenetic trees are utile graph, which illustrate to the reader a brief position about development relationship. Therefore, phyletic trees are widely used in development relationship research. The development of the phyletic tree has passed over 100 old ages. In 1872, Charles Darwin introduced the theory that populations evolve over the class of coevals through a procedure of natural choice in his book, The Origin of Species ; he used a vranching form of development to exemplify the diverseness of life ( Darwin, 1859 ) . At present, non merely the algorithm, but besides the tool has had a great betterment, therefore the consequences of phyletic tree are more dependable now. The construction of phyletic tree consists of nodes, subdivision and foliages. Normally, the subdivision lengths are used to mensurate the unsimilarity between two species, or the length of clip since their separation. The tree could split into to two sorts, the rooted tree and un-rooted tree the former shows the form of descent while the latter shows the topology of relationship. By analysing the length of a subdivision of root tree, research workers can understand the rate of development in species or cistrons. However, un-rooted trees can merely exemplify the alteration in the figure of bases ( Jiang, 2003 ) .

ALSO READ  Employees’ Conduct and Freedom of Speech

MicroRNA

MicroRNAs were discovered by Victor Ambros in 1993 ( Lee, 1993 ) . There are about 500 known mammalian miRNA cistrons have been discovered, and each miRNA may be able to modulate many different protein-coding cistrons ( Williams, 2008 ) . They are little ( 21nt to 23nt ) non-coding RNAs that recognize and bind to certain complementary sites in the 3 ‘ untranslated parts of mark cistrons in animate beings. At least, 40 % of MicroRNA cistrons lie in the noncoding DNA of protein and non-protein cryptography cistrons or coding DNAs ( Rodriguez, 2004 ) . Most MicroRNA cistrons have their ain cistron booster and regulative units, by unknown mechanisms, the MicroRNAs can modulate protein production of the mark transcript ( Lau and et al. , 2001 & A ; Lee, and et al. , 2004 ) . It is a subject of involvement in evolutionary biological science every bit good as in functional genomics.

Methodology overview

Analysis process and work flow

ClustalX2 and BioEdit package were used to look into the conserved parts of flu virus.

BLAST was used to look into equivocal conserved parts sequences.

LTR_FINDER database was introduced to look into the LTR sequences in poulet chromosomes.

BLAST was used to compare the similarity between the conserved parts in Flu virus and the poulet genomes.

MEGA package was used to build the phyletic tree.

The microRNA database was used to happen microRNAs and comparative consequences.

Figure 2. The work flow of research.The figure shows the work flow of this undertaking. The squares mean the consequence of every measure, and the rhythms stand for the tools which introduced. Harmonizing to the work flow, the research tools include three package and three databases. The consequences by and large include four parts, conserved parts, the similarity sequences between grippe virus and chicken genome sequence, the phyletic tree and the consequences by look intoing miRBase Database.

Software

The chief nucleus of this undertaking has used three package tools:

ClustalX2 is a tool which could be used in Multiple-alignment in protein sequence and nucleotide sequence. Furthermore it could be used for fixing phyletic trees ( Jeanmougin et al. , 1998 ) . It is freely available on the web site: hypertext transfer protocol: //www.clustal.org/download/current/ .

BioEdit is powerful package for redacting sequences and sequence analysis for Windowss systems ( Hall, 1999 ) . It freely available to download on the web site: hypertext transfer protocol: //www.softii.com/downinfo/9529.html

MEGA is the short name for Molecular Evolutionary Genetics Analysis package. In 1993, the first version of MEGA was released ; soon the latest version is MEGA5. MEGA is widely used in retracing the evolutionary histories of species and multi-gene households, and gauging rates of molecular development. In this undertaking the MEGA package will be used to construct phyletic trees ( Kumar, 2004 ) and it is available at the following web site: hypertext transfer protocol: //www.megasoftware.net/

Data aggregation and processing

Sample choice is the first and most important measure for fixing the undermentioned research. Thus the research informations must be believable. To cover with this issue, in this undertaking all informations will come from the EMBL database. The sample should fulfill two characteristics, catholicity and representative. In sample choice, the genomes of grippe A type viruses will be chosen instead than type C, since A type viruses have 8 sections in whole genome. C viruses merely have 7 sections, that means C viruses ca n’t carry through the universality characteristic. Therefore in this undertaking viruses C are non a good research object, and will be discarded. The bird grippe nucleotide sequence will be downloaded from the following web site: hypertext transfer protocol: //www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html. Harmonizing to the grippe virus sequence database, the protein or nucleotide sequences are able to be retrieved from the database utilizing GenBank accession Numberss or hunt footings. The dataset of grippe sequences are annotated with different Types, hosts, country/region, sections and subtypes ( see table 1 ) . Furthermore the choice sequence type includes the protein, protein coding part and base.

Table 1. Define hunt set: It shows the inside informations of sample of subdivision. In this undertaking, all samples are type A H5N1 grippe virus, which are from Asia country.

Type Host Country/Region

Section

Subtype ( H )

( N )

A Avian Asia

Any

5

1

There are 13712 nucleotide sequences after seeking, and so the following measure is to choose the sample which the part comes from. Finally 31 samples have being selected as the research sample. Each sample includes eight sections and is form China. The lily-livered nucleotide sequence was downloaded from hypertext transfer protocol: //www.ncbi.nlm.nih.gov/projects/genome/guide/chicken/ . By look intoing the database, it merely provides the chromosomes from 1 to 28 and 32, and other chromosome such as W, Z, MT, LGE22C19W28_E50C23 and LGE64. In this undertaking, the chromosome 1 to 28 and 32 were selected, and all resources were stored in the local machine for farther analysis.

Finding preserve parts in grippe viruses

Multiple sequence alliances are a widely used bioinformatics analysis method. Many tools have become available for multiple sequence alliances since the first tool was written in 1988 by Higgines ( Higgins, 1988 ) . However, the first tool was lame calculating ability, but now there are several such as Clustal W ( Thompson et al. , 1994 ) , Clustal X ( Thompson et al. , 1997 ) . Furthermore, T-Coffee, MAFFT and MUSCLE could be used for multiple alliances ( Larkin et al. , 2007 ) . To happen conserved parts, the ClustalX2 package will be used. This is because the alliance algorithm has been optimized to aline sets of sequences which are wholly collinear. The consequences have the same spheres and these spheres have the same order. If the alliance can non run into the conditions, the package will give you undependable alliances. Therefore, when making the alliance in grippe virus, each section has to be done in a separate alliance and the sequences have to salvage in a same file. By multiple alining every sections in different grippe viruses, the consequences will expose the same parts by utilizing the symbol “ * ” in whole sequence ( see figure 3 ) .

Figure 3. Example of multiple sequence alliances: This is an illustration consequence of multiple sequence alliances ; on the left manus of chief window is the sequence Idaho information ( gastrointestinal figure and gb figure ) . On the top of chief window, the symbol means that the bases in different sequence are same in perpendicular. The horizontal swayers in the button of the Windowss, shows the place inside informations of multiple sequence alliances.

As the ClustalX2 package is non able to pick the conserved parts out. So in this portion, with the aid of BioEdit package, the conserved parts are really easy to be discovered by utilizing the ‘Find Conserved Regions ‘ option in BioEdit. BioEdit is a biological sequence editor which is intended to supply basic maps for nucleic sequence redaction, alliance, use and analysis. In this measure, there are several value should be set for happening conserved parts. The value is demoing as Figure 4.

Figure 4. Value set of happening conserved parts of an alliance: It shows the value puting to happen conserved parts by utilizing BioEdit tool. As the spreads will the chief affect factor for calculating the information and information value will impact the length of the conserved parts. Thus the figure of the spreads in any sections should be limited ; the appropriate spreads puting will be great aid for turn uping the true conserved parts.

Investigate the LTR sequences in poulet chromosomes.

In this measure, the chief purpose is to happen the LTR sequences and retro-transposons in poulet nucleotide sequences. Based on Kazazian and Moran ‘s research ( 1998 ) , the non-LTR retrotransposons, are less understood mechanistically. Therefore in this undertaking I would wish to take the LTR sequence, to turn up the retrotransposons. In this subdivision, LTR_FINDER database will be used to happen LTR sequence in poulet chromosomes. The LTR_FINDER is able to scan large- graduated table sequences rapidly and by given DNA sequences, the database could foretell the location and construction of full length LTR retrotransposons accurately ( Zhao, et al. , 2007 ) . The tool is available web site on the: hypertext transfer protocol: //tlife.fudan.edu.cn/ltr_finder/ . Base on the LTR_FINDER consequences, lily-livered base sequences which contain the retrotransposons will be selected for the Blast research.

Comparing the similarity between the conserved parts in Flu virus and the poulet genomes

Comparing the similarity is the core portion of this research. By comparing the conserved parts paperss and selected poulet chromosomes sequences, harmonizing to the NCBI blast database, the consequences will demo the similarity sequences between both species. The consequences include the information about similarity sequences location, sequence fiting information, E-value, Score, Identities and Gaps. The Blast consequences will be saved as HTML papers in local machine, and the files which contain the higher similarity sequences ( E-value a‰¤ 0.01 ) will be picked out for phyletic analysis. As if e-value less than 0.01, it suggests that the sequences may hold homology.

Building Phylogenetic tree for development analysis

The phyletic tree has been widely used for development analysis. Phylogenetic tree are used to understand the development relationship in different and similar species. To construct phyletic trees, there are three chief methods, minimal development method, maximal parsimoniousness method and the neighbor-joining [ NJ ] method ( Saitou and Nei, 1987 ) . The standard algorithm of the tree doing methods is based on the rule that it examines all possible topologies or a certain figure of topologies ( Saitou and Nei, 1987 ) . The concluding tree will be the smallest sum of entire evolutionary alteration and it is likely to be near to the true tree. However, this method will be a long clip to construct phyletic trees and merely little Numberss of topologies are examined ( Saitou and Nei, 1987 ) . Thus the consequences sometimes are non really satisfactory. Harmonizing to old survey, some method are non guaranteed to bring forth the minimal development tree, but the procedure of seeking for the minimal development tree is much better than the maximal parsimoniousness algorithms, for illustration, the Distance Wagner ( DW ) method ( Farris, 1972 ) , Modified Farris ( MF ) methods ( Tateno et al. , 1982 ) , and the neighborliness methods of Sattath and Tversky ( Sattath and Tversky, 1977 ) . In 1987, Naruya Saitou and Masatoshi Nei introduced a new method [ NJ ] method for retracing phyletic trees ; it produces the concluding tree under the rule of minimal development, and it is much effectual to obtain the right tree topology ( Saitou and Nei, 1987 ) . Therefore in this undertaking, the all the phyletic trees will be built based on the neighbor-joining method.

ALSO READ  A Problem Answer to a Law of Evidence Question

Basically, by alining the amino acid sequences, the figure of aminic acerb permutations per site or evolutionary distance between sequences was calculated by comparing the proportion of amino acid difference. The spreads can be present in any sequences, which will be excluded from the analysis. Finally, the tree will be constructed by neighbor-joining method and trial by bootstrap method. The in phylogeny trial, the bootstrap method is used to look into the topology disturbance in the neighbor-joining and parsimony methods. The trial will reiterate 1,000 times. Kimura 2- parametric quantity theoretical account is used to rectify for multiple hits, and transitional and transversional permutation rates and the differences in permutation rates among sites will be taken into consideration. To cover with the losing informations and alliance spreads job, Pairwise omission option will be used to take these factors from the analysis.

Measuring assurance in phyletic analysis

The bootstrap method is widely used in phyletic analyses. In the early phase, the bootstrap method was applied in phyletic to measure the repeatability of a given consequence ( Hillis, 1993 ) . However, originally bootstrapping was used to measure the repeatability of phyletic consequences. It is by and large interpreted to measure the chance that a phyletic estimation represents the right evolution ( Hillis, 1993 ) . As far back as 1985, Felsenstein suggested that the statistical trial of bootstrap could be used to measure assurance bounds of internal subdivisions in phyletic analysis. In his theory, he pointed out that characters in a matrix of taxa Ten characters are able to try with replacing, it will make many new matrices which are the same size as the original matrix, and each of them could be use to happen the good tree ( 1985 ) . Harmonizing to earlier research ( Hillis, 1993 ) , if the internal subdivision with a bootstrap has a proportion bigger than 80 % , it could turn up a true clade. And more than 95 % of the estimated clades with bootstrap assurance bounds above 70 % were right. However, if there are fewer than 10 % of estimated subdivisions with bootstrap proportions below 30 % , it is still right ( Hillis, 1993 ) . By and large, if the bootstrap bigger than 95 % , it could see the topologies is right ( Jiang, 2003 ) . It means that the consequences of phyletic tree are dependable. Therefore, one time the sequences got the bootstrap proportions, they have to be selected out for microRNA analysis no affair whether they got high bootstrap proportions or non.

Searching MicroRNAs by utilizing miRBase database

MicroRNAs may hold close relationship with some certain sequences given by a phyletic tree. Therefore the meaningful sequences in the poulet genome will be selected, and tested to see if they are MicroRNAs. This will be given a better understanding the map of those sequences. Therefore doing the relationship between conserved parts and lily-livered genome will be more obvious. In this subdivision, the work will depend on miRBase database. The miRBase database shops microRNA ( miRNA ) terminology, sequence informations, notes and mark anticipation. The database contains 5,071 miRNA venue from 58 species, and 5,922 distinguishable mature miRNA sequences ( Griffiths-Jones et al. , 2008 ) . It provides the research worker with a immense information to analyze miRNA genomics. All miRNAs are mapped to their genomic co-ordinates, and the sequence contain miRNA will be highlighted. The miRBase seeking map non merely accepts seeking by miRNA name, but besides accepts by genomic location and sequence informations. Because the sequence is rather short, and it has truly selected out, therefore, the seeking work will be based on sequence.

consequences and Discussions

The conserved parts in grippe virus

By utilizing ClustalxX2 and BioEdit package, the conserved parts in each section have were selected. Finally, there are 24 conserved parts found in HA, 19 conserved parts found in MP, 17 conserved parts found in NA, 16 conserved parts found in NP, 16 conserved parts found in NS, 20 conserved parts found in PA, 28 conserved parts found in PB1, and 9 conserved parts found in PB2. Table 2 shows the each conserved parts location by multiple sequences alliance.

Table 2. The conserved parts location in flu virus: It shows the conserved part location based on multiple sequence alliances in each grippe virus sections, the consequences have been checked by Blast, and the equivocal conserved parts have been discarded.

Hour angle

Military policeman

Sodium

Neptunium

Nitrogen

Dad

PB1

PB2

1

90-111

67-86

39-60

87-109

79-130

96-112

68-82

109-128

2

113-147

88-108

86-103

111-127

132-202

114-130

97-114

166-188

3

210-225

115-137

399-415

132-154

215-235

132-151

116-135

475-506

4

235-249

199-236

418-445

249-277

237-253

165-181

137-153

823-839

5

340-355

261-278

513-538

345-364

264-283

339-358

173-195

1246-1262

6

464-480

280-299

555-577

411-430

291-308

411-427

247-279

1393-1409

7

498-517

510-532

603-625

723-766

375-400

513-528

299-318

1870-1898

8

597-612

543-559

636-664

852-871

425-449

570-586

320-357

2128-2150

9

645-660

572-607

666-682

1023-1039

460-483

609-634

416-435

2221-2300

10

758-777

624-640

717-736

1164-1186

516-538

636-666

476-498

11

803-855

660-679

750-766

1335-1351

558-581

717-734

545-582

12

944-964

769-784

873-892

1392-1408

584-607

741-764

637-654

13

984-1005

810-833

967-982

1428-1447

625-643

766-799

743-759

14

1092-1138

849-889

1149-1166

1449-1489

723-747

978-1003

875-891

15

1206-1264

891-906

1224-1264

1491-1516

853-872

1356-1372

917-936

16

1266-1285

911-952

1371-1397

1539-1555

889-909

1521-1537

1079-1095

17

1332-1351

961-983

1399-1447

1887-1903

1112-1128

18

1353-1381

1002-1019

2109-2125

1151-1179

19

1398-1426

1021-1038

2145-2167

1274-1290

20

1473-1494

2169-2191

1376-1395

21

1496-1524

1416-1434

22

1526-1568

1535-1557

23

1575-1609

1643-1665

24

1611-1645

1697-1731

25

1811-1830

26

2069-2085

27

2090-2115

28

2267-2314

The Blast consequence of comparing the conserved parts and poulet genomes

The e-value, as a important factor, is used to measure the similarity between biological sequences. The lower the e-value, the more similar the sequences are. However the value depends upon the mark given to the alliances and the lengths of comparing sequences. The conserved parts sequences are non long plenty to give really low E-value, therefore the meaningful Blast consequences ‘ E-value merely between to 0.01. Table 3 shows the most important consequences, in this undertaking. Harmonizing to analysis, the place of these sequences will be helpful to understand the sequence development.

Table 3. The of import sequences found by utilizing blast: It presents the location information of some poulet sequences, which got lower E-Value by utilizing blast. The row in ruddy is the most important consequence, and its comparative phyletic tree is illustrated by figure 7.

Mention No.

Location

E-Value

NW_001471565.1

1136006-1136043

NW_001471532.1

418524-418543

NW_001471651.1

7410711-7410737

NW_001471677.1

1993027-1993046

NW_001471673.1

15355501-15355530

NW_001471723.1

1285767-1285790

NW_001471505.1

928131-928166

NW_001471629.1

307181-307203

NW_001471720.1

8212308-8212333

NW_001471437.1

207986-208002

NW_001471447.1

86580-86601

The phyletic tree consequences

In the procedure of building the phyletic tree, information has been cut-off utilizing a blast e-value of a‰¤ 0.01 to compare the preserve part in grippe virus and poulet genomes. There are many conserved parts found in same section of grippe virus by utilizing BioEdit package. For illustration, in HA sections, there are 26 conserved parts and in PB1 sections, there are 32 conserved parts have been found. Because of utilizing Blast compare the conserved part and poulet genomes, the same preserve part will make many perfect fiting in different sequences, therefore when making the multiple sequence alliances, the perfect fiting short sequences in poulet genome will constellate in the preserve part which they compared with. When constructing phyletic tree, these clustered parts have to be analysed individually to forestall the information information traveling losing. Because if analyzing the whole sequence of perfect fiting sequence, the sequences have to be extended in a same length, it will take the mismatching. It means the short perfect fiting sequence may non remain at depart place. Therefore in this undertaking, the same section in flu virus will make a batch of phyletic tree consequences. Finally, 91 phyletic trees were constructed. Among the consequences, 16 in HA section, 10 in MP section, 9 in NP sections, 12 in NS sections, 11 in PA sections, 11 in PB1 sections and 8 in PB2 sections. However, there are merely 52 phyletic trees that could be used to analysis the development relationship between the grippe virus and lily-livered genome. Figure 7 shows an illustration consequence in a phyletic tree.

gi|85062564|gb|DQ343150|

gi|115502972|gb|DQ997283|

gi|225165033|gb|EU874899|

gi|115382807|gb|DQ997122|

gi|224181181|gb|FJ784854|

gi|268527184|gb|GU182142|

gi|268527180|gb|GU182158|

gi|61698013|gb|AY950230|

gi|115382881|gb|DQ997182|

gi|57915979|gb|AY737289|

gi|116583103|gb|DQ997538|

gi|61698021|gb|AY950234|

gi|115502957|gb|DQ997268|

gi|85062568|gb|DQ343152|

gi|50956627|gb|AY684706|

gi|47716772|gb|AY609312|

gi|85062566|gb|DQ343151|

gi|61698017|gb|AY950232|

gi|115502990|gb|DQ997377|

gi|115503009|gb|DQ997547|

gi|57916028|gb|AY737296|

gi|115397012|gb|DQ997308|

gi|61698019|gb|AY950233|

gi|225548021|gb|DQ914814|

gi|86753761|gb|DQ366330|

gi|224181179|gb|FJ784853|

gi|115382830|gb|DQ997133|

gi|224181167|gb|FJ784847|

gi|115502941|gb|DQ997156|

gi|50365728|gb|AY653200|

gi|224181175|gb|FJ784851|

ref|NW 001471505.1|Gga18 WGA25

ref|NW 001471565.1|

ref|NW 001471675.1|

ref|NW 001471428.1|

ref|NW 001471521.1|

39

54

96

37

25

0.05

Flu virus sequence

Significant sequence

Chicken sequence

Figure 7. The meaningful consequence in HA section: the phyletic tree was built by utilizing the Neighbor-Joining method ( Saitou and Nei, 1987 ) and the optimum tree with the amount of subdivision length =0.93668520. The per centum of replicate trees in which the associated taxa clustered together in the bootstrap trial ( 1,000 replicates ) are shown following to the subdivisions in the same units as those of the evolutionary distances used to deduce the phyletic tree. Furthermore, the evolutionary distances were calculated utilizing the Kimura 2 parametric quantity method ( Kimura, 1980 ) . The analysis involved 36 nucleotide sequences. All equivocal places were removed for each sequence brace. There were a sum of 19 places in the concluding dataset. the ruddy words is the important sequence that I have being found, and the bootstrap trial value is 96, therefore the consequence is dependable. It will be used to look into the miRBase database to happen similar microRNAs. And the evolutionary analyses were conducted in MEGA ( Tamura, 2007 ) .

All sequences, produced by utilizing neighbor-joining method of Saitou & A ; Nei ( 1987 ) , and most of the phyletic trees could split into two chief line of descents, which include flu virus and poulet. As the testing sequences are non really long, therefore, the N.J method can non give the strongly bootstrap support in group of involvement sometimes. These sequences include NW001471693.1 ( with weak bootstrap value ; 20 % ) , NW001471743.1 ( with weak bootstrap ; 12 % ) , NW001471581.1 ( with weak bootstrap ; 26 % ) and some other sequences had an even lower bootstrap value. Sequences that got the bootstrap support value, are from 30 % to 70 % , these sequences contain NW001471685.1 and NW001471589.1 ( bootstrap value: 35 % ) , NW001471521.1 ( bootstrap value: 64 % ) , NW1471532.1 ( bootstrap value: 55 % ) , NW001471700.1 ( bootstrap value: 59 % ) and NW001471431.1 ( bootstrap value: 63 % ) . There are merely two consequences that have strong bootstrap values, these are NW001471505.1 ( bootstrap value: 96 % ) and NW001471562.1 ( bootstrap value: 73 % ) . Harmonizing of the phyletic tree consequence, most of the poulet sequences are monophyletic group as out-group, demoing at base of the tree. If proving poulet sequences are the clique, most of them are sorted matching to taxonomic group of grippe virus as sister taxa ( see table 4 ) , and they may hold paralogs relationship which suggests that cistron duplicate has occurred for evolutionary relationship of poulet sequences within each single species of flu virus.

Table 4. The Sister Taxa Sequence in Phylogenetic Trees: It shows the sister taxa in difference sections by analysis phyletic trees. The different colour from the top to the base are base for the different sections. And the each brace of column There are 11 in HA, eight in MP, six in NA, three in NP, three in NS, four in PA, 12in PB1 and three in PB2.

ALSO READ  Payroll System Essay

Flu virus ref

Chicken ref

Flu virus ref

Chicken ref

gi|115382807

NW001471723.1

gi|115503009

NW001471609.1

gi|268527184

NW001471645.1

gi|85062568

NW001471563.1

gi|115503009

NW001471595.1

gi|50365728

NW001471551.1

gi|268527184

NW001471457.1

gi|47716772

NW001471591.1

gi|57916028

NW001471670.1

gi|85062566

NW001471651.1

gi|115502990

NW001471426.1

gi|115502955

NW001471667.1

gi|224181089

NW001471590

gi|268527099

NW001471552.1

gi|47716778

NW001471555

gi|224181095

NW001471505.1

gi|50956633

NW001471667

gi|115502955

NW001471458.1

gi|85692680

NW001471458

gi|86753781

NW001471654.1

gi|57916045

NW001471743.1

gi|115502963

NW001471640.1

gi|224181356

NW001471455.1

gi|85681814

NW001471447.1

gi|57916045

NW001471545.1

gi|86753771

NW001471521.1

gi|115397016

NW001471441.1

gi|61698069

NW001471627.1

gi|115502978

NW001471615.1

gi|116583109

NW001471697.1

gi|115503015

NW001471551.1

gi|115502970

NW001471588.1

gi|116583112

NW001471454.1

gi|86753751

NW001471720.1

gi|115397021

NW001471581.1

gi|61698110

NW001471509.1

gi|115503000

NW001471562.1

gi|117414791

NW001471532.1

gi|115397022

NW001471535.1

gi|224181254

NW001471728.1

gi|268527236|

NW001471627.1

gi|115382877

NW001471445.1

gi|47716768

NW001471531.1

gi|224181236

NW001471700.1

gi|115382827

NW001471680.1

gi|115397022

NW001471503.1

gi|57915965

NW001471549.1

gi|85692710

NW001471641.1

gi|50956621

NW001471437.1

gi|115502953

NW001471698.1

Furthermore, most of the clip, the sequences in the out group showed a strong bootstrap value, sometimes the value showed 99 % or 100 % bootstrap support, and it suggests the cistron development is rather common in same species.

The consequences given by the miRBase database

Harmonizing to the phyletic consequences, 52 of import short sequences were selected out for MicroRNA analysis. The MicroRNAs database shows that there are 11 short sequences may hold relationship with poulet MicroRNAs and 32 short sequences may hold development relationship with other species. The tabular array 5 shows the of import sequences which were generated by comparing the different sections of the grippe virus.

As the phyletic tree show us the important sequences in poulet genomes. Therefore, harmonizing to the old Blast consequence, we could acquire the matched base sequences, after look intoing the miRBase database, if these sequences could fit the micro-RNA, the comparative micro-RNA will be end product. The following tabular array shows the inside informations of possible comparative microRNA found in miRBase and the poulet sequences.

Table 5. the item of similar microRNAs found by look intoing important short sequences: harmonizing to the miRBase database consequence, these consequences are rather similar to microRNAs in lily-livered genome and the similar MicroRNA consequences are demoing by mature sequence accession figure. The comparative protein Idaho were given by EMBL database and the protein map predicted by Gene ontology database.

Mature sequence accession

Relative part

Relative Protein Id

Protein map

MIMAT0007714

Intron

ENSGALP00000038644

ENSGALP00000038645

ENSGALP00000009867

nucleobase, nucleoside, nucleotide and nucleic acid metabolic procedure ; ATP binding ; ATP-dependent DNA helicase activity ; DNA binding ; hydrolase activity, moving on acid anhydrides, in phosphorus-containing anhydrides.

MIMAT0001112

Intergenic

Unknown

None

MIMAT0007746

Intron

ENSGALP00000002103

ENSGALP00000040260

signal transducer activity, receptor activity, ordinance of biological procedure, protein signaling tract, membrane.

MIMAT0007472

Intergenic

Unknown

None

MIMAT0011205

Intergenic

Unknown

None

MIMAT0007569

Intron

ENSGALP00000007732

RNA polymerase II written text go-between activity, ordinance of biological procedure, protein binding, written text regulator activity, biologicalprocess

MIMAT0007526

Intergenic

Unknown

None

MIMAT0007470

Intergenic

Unknown

None

MIMAT0007339

Intergenic

Unknown

None

MIMAT0007676

Intron

ENSGALP00000011596

negative ordinance of cistron look, histone acetyltransferase activity, written text represser activity, positive ordinance of written text from RNA polymerase II booster

MIMAT0003774

Intergenic

Unknown

None

Harmonizing to the comparative microRNA consequences, it shows that the comparative microRNAs are located in either the noncoding DNA or intergenic. An noncoding DNA is a sequence within a canned part of a cistron that is removed during RNA processing, though it contains cistrons, it is non translated into protein. The non-coding subdivisions in the noncoding DNA can be transcribed to precursor messenger RNA and some other RNAs, hence if it could be farther provided these sequences map and comparative protein will be great aid for understanding the relationship between conserved parts in flu virus and chicken genome sequences. By look intoing the EMBL database, the protein which relative to intron were found, and the protein map were anticipation by Gene ontology database. However the database does n’t demo the map inside informations for the microRNA, which it is in intergenic and none comparative protein was found up to now. Harmonizing to research ( Pfeffer et al. , 2005 ) , a few of mammalian viruses have been identified the relationship with miRNAs. These viruses include Epstein- Barr virus ( Pfeffer et al. , 2004 ) , Kaposi sarcoma-associated virus ( Cai et al, 2005 ) . Furthermore, Lecellier and squad suggested that endogenous miRNAs can impact antiviral defence mechanisms ( Lecellier et al. , 2005 ) .

Decision of the consequences

In this undertaking, 140 conserved parts have been identified. By utilizing Blast to compare the similarity between these 140 conserved parts and poulet genomes, I found 247 duplicate consequences, which have a E-value less than 0.01. There are 11 fiting consequences which have an E-value scope between and. Using the MEGA tool, identified and built 91 phyletic trees, and 52 of the phyletic trees ‘ topology reflected the development relationship, as there are 50 sister taxa were identified in 52 phyletic trees, it may propose that the sequences have paralogs development relationship. And harmonizing to look into the miRBase database, there are 32 sequences that may hold a relationship with microRNAs in some other species.11 sequences suggested that they may hold relationship with poulet microRNAs ; nevertheless the maps of some these parts can non be proven yet, sing to the restriction of the database of deficiency of experiment support resources. As all of the selected poulet sequences ‘ places are near to the retrotransposon and harmonizing to the consequence of phyletic trees, I consider that the place which are close to the retrotransposon in the poulet genome have development relationship with poulet genomes. However, I can non vouch this decision is perfectly right, as the farther work have to be carried on to turn out this sentiment.

Decisions

Restrictions of methods and betterment

In this undertaking, the BioEdit package is a rather powerful tool to redact sequences, and tonss of work was based on it. For illustration, it could be used to happen the conserved parts, sequences choice ; paperss form transform and etc. However, the BioEdit still have some restrictions, which reflects on finder conserved parts. Harmonizing to the Smagala old research, he suggested that the ‘Find Conserved Regions ‘ option in BioEdit package is non really good at covering with observing conserved parts, when non all records contain a full length sequence ( Smagala, 2005 ) . Because of the BioEdit package uses entropy secret plans to mensurate the deficiency of information content at each place, and the thought is base on the Claude Shannon theory.

( 1 )

Equation 1, shows the Claude Shannon, where is the uncertainness, it means the information at place l. and B alternatively for a residue, and represents the chance at which residue B is found at place l. Because BioEdit uses to normalise the information graduated tables, and the scope of informations for a four base is [ 0, 2 ] , the informations given by BioEdit package may make every bit high as 2.32. Therefore when the sequences include spreads or some equivocal characters, it will take to high informations. High informations make the sequence information go losing. Though the value ( Table 2 ) scene will be some aid to better the accurate for happening the conserved parts, it is still really hard for research worker to put an appropriate value, because of the figure of the length of sequences, the spreads, and some equivocal characters are non certain. To better this, the conserved part determination could utilize to normalise the information graduated tables, such as the package Confind ( Smagala, 2005 ) will be the best pick. Or anticipating some other package could develop for imitating the value puting in BioEdit package to assist user take the best value for puting.

Furthermore, there are still some restriction go outing in this undertaking, Aligning the blast consequence with grippe sequences utilizing ClustalX2 is limited, due to the short sequences wildly located in different place in one sequence, it may be hard to derive the accurate alliance, and the sequences mismatching possibly affect downstream consequences such as phyletic tree edifice.

Finally, the restriction of miRBase should be considered. As harmonizing to the latest release of the miRNA database, there are merely 121 distinguishable mature miRNAs found ( Griffiths-Jones et Al. 2006 ) , many of microRNAs incorporating in miRBase database merely identified by the sequences similarity to those known miRNA construction, and they still need by experimentation back uping ( Hillier et al. 2004 ; Griffiths-Jones et Al. 2006 ) , therefore this sort of miRNA did non include any inside informations for the map anticipation. Therefore, I can non acquire an accurate decision merely harmonizing to the microRNA database consequences.

Future work

In this undertaking, 29 chromosomes have been tested, so in the farther work, the left chromosomes have to be tested. Furthermore, as the comparing work based on the online blast database, in this undertaking I found that if the figure of comparing sequences were excessively many, the database will be really long clip to complete ciphering work, nevertheless if utilizing the local blast database alternatively of the online blast database, the calculating will be fast. And it will be seek to present for comparing the similarity subdivision.

As there are restrictions of the package, the grounds have shown in the restrictions of methods and betterment subdivision, in this undertaking the conserved parts found by BioEdit have to be tested by other package or lab experiment to do certain whether the conserved parts in poulet sequence are right. Furthermore, the conserved parts have to analyze from the protein facet. First, the conserved parts have to analysis by protein sequences, one time the protein sequences been proven holding development relationship, the protein secondary construction will be used to foretell the map of protein. Furthermore, because of the phyletic trees constructed by MEGA package, in order to acquire more accurate consequences, the phyletic tree edifice could be constructed by different package, and the compared consequences with MEGA consequences. By proving the miRBase some of these sequences have illustrated may hold the relationship with some microRNA. However, it has to be farther proved, particularly for those sequences holding close relationship with noncoding DNA. Whether those sequences have the same map with relevant noncoding DNA, and what sort of RNA transcribed could be discussed in farther work, if more clip was available.

recognitions

I would wish to thank my undertaking supervisors, Dr. Peter Andras, and my coach Professor Anil Wipat, for their valuable support and advice during these six months. I would wish to thank my household for back uping and promoting me during my surveies. I am thankful to all the plan lectors and schoolmates for doing the class gratifying and cherished experience to me. Finally thanks my schoolmate Andrew David King assisting me with English corrections.