Automatically clustering similar units for unit selection in speech synthesis
- 格式:pdf
- 大小:53.15 KB
- 文档页数:4
部署第一台2013服务器2013-06-30 11:10:42引子:ExchangeServer 2013目前是微软公司邮件系统产品的最新版本,它的诸多诱人特性众所周知。
在过去一年的时间里有不少朋友咨询其企业中选择什么样的邮件系统比较好?关键在于易上手,容易管理和使用,稳定性强,安全性高。
由此,当然是给他们推荐基于云的Exchange Online。
但是,许多朋友提到由于公司的安全策略,以及各种老板的“英明决断”,所以不愿采用基于公有云的服务。
不过对于在企业中实现私有云比较感性趣,而且愿意进行更多的尝试。
由此,我只能向其推荐使用基于企业部署的Exchange Server 2012企业版了。
接踵而来的问题就是对于某些公司虽然购买了一套强大的邮件服务系统产品(虽然,微软已经将其定位到了统一协作平台的位置,但做中国市场中,绝大多数用户还是会将Exchange从意识中定位在邮件系统这个层面上。
),但不舍得再投入资金购买相关的产品服务了。
这就劳累我这帮做IT的兄弟啦!当然,最终只能我搞义务劳动了。
多做了几次,也是会累,会不爽的。
所以,想了一下,就直接写出来吧,让他们来自己查阅,自己搞定。
如果不行,我再江湖救急了。
延续此前为微软易宝典投稿的文章风格,继续以易宝典的形式来写这一系列的文章。
一、系统需求既然推荐给朋友的东西当然应该是最好的东西了,向朋友推荐Exchange Server 2013的时候,我建议他们采用WindowsServer 2012作为服务器的操作系统。
虽然微软刚刚发布的Windows Server 2012 R2的预览版,但是就微软发布产品的速度来讲,当前采用Windows Server 2012也是明知之选。
因为如果采用WindowsServer 2008 R2可能在短期内会面临系统升级,采用Windows Server 2012如果幸运的话,微软在发布Windows Server 2012 R2时,没准会赶上活动或促销,获得低价或免费升级到Windows Server 2012 R2。
用RAxML构建极大似然进化树RAxML是用极大似然法建立进化树的软件之一,可以处理超大规模的序列数据,包括上千至上万个物种,几百至上万个已经比对好的碱基序列。
作者是德国慕尼黑大学的 A. Stamatak博士。
RAxML有若干版本(有的版本支持在多个CPU上运行),本文以最常用的单机版raxmlHPC为例。
1 下载和安装RAxML可以在Linux, MacOS, DOS下运行,下载网址为http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm也可以使用的超级计算机运行。
对于Linux和Mac用户下载RAxML-7.0.4.tar.gz 用gcc编译即可make –f Makefile.gccWindows用户可以下载编译好的exe文件,而无需安装。
2 数据的输入RAxML的数据位PHYLIP格式,但是其名字可以增加至256个字符。
“RAxML对PHYLIP文件中的tabs,inset不敏感”。
输入的树的格式为NewickRAxML的查错功能1 序列的名称有重复,即不同的碱基却拥有一致的名称。
2 序列的内容重复,即两条不同名称的序列,碱基完全一致。
3 某个位点完全由序列完全由未知符号组成,如氨基酸序列完全由X,?,*,-组成,DNA序列完全由N,O,X,?,-组成。
4序列完全由未知符号组成,如氨基酸序列完全由X,?,*,-组成,DNA序列完全由N,O,X,?,-组成。
5 序列名称中禁用的字符如包括空格、制表符、换行符、:,(),[]等3 RAxMLHPC下的选项-s sequenceFileName 要处理的phy文件-n outputFileName 输出的文件-m substitutionModel 模型设定方括号中的为可选项:[-a weightFileName] 设定每个位点的权重,必须在同一文件夹中给出相应位点的权重[-b bootstrapRandomNumberSeed] 设定bootstrap起始随机数[-c numberOfCategories] 设定位点变化率的等级[-d] -d 完全随机的搜索进化树,而不是从maximum parsimony tree开始。
A Quick Guide for the MixfMRI PackageWei-Chen Chen1and Ranjan Maitra21pbdR Core TeamSilver Spring,MD,USA2Department of StatisticsIowa State UniversityAmes,IA,USAContents1.Introduction11.1.Dependent Packages (1)1.2.The Main Function (2)1.3.Datasets (2)1.4.Examples (3)1.5.Workflows (3)2.Demonstrations32.1.2D Phantoms (3)2.2.2D Simulations (4)2.3.2D Clustering (5)©2018Wei-Chen Chen and Ranjan Maitra.Permission is granted to make and distribute verbatim copies of this vignette and its source provided the copyright notice and this permission notice are preserved on all copies.This publication was typeset using L A T E X.Warning:The findings and conclusions in this article have not been formally disseminated by the U.S.Food and Drug Administration and should not be construed to represent any determination or policy of any University,Institution,Agency,Adminstration and National Laboratory.Ranjan Maitra and this research were supported in part by the National Institute of Biomed-ical Imaging and Bioengineering(NIBIB)of the National Institutes of Health(NIH)under its Award No.R21EB016212.The content of this paper however is solely the responsibility of the authors and does not represent the official views of either the NIBIB or the NIH. This document is written to explain the main function of MixfMRI(?),version0.1-0.Every effort will be made to ensure future versions are consistent with these instructions,but features in later versions may not be explained in this document.1.IntroductionThe main purpose of this vignette is to demonstrate basic usage of MixfMRI which is pre-pared to implement developed methodology,simulation studies,data analyses in“A Practical Model-based Segmentation Approach for Accurate Activation Detection in Single-Subject functional Magnetic Resonance Imaging Studies”(?).The methodology mainly utilizes model-based clustering(unsupervised)of functional Magnetic Resonance(fMRI)data that identifies regions of brain activation associated with the performance of a task or the applica-tion of a stimulus.The implemented methods include2D and3D unsupervised segmentation analyses for fMRI signals.For simplification,only2D clustering is demonstrated in this vi-gnette.In this package,the data on fMRI signals are on the form of the p-values at each voxel of a Statistical Parametric Map(SPM).The clustering and segmentation analyses identify activated voxels/signals(in terms of small p-values)from normal brain behaviors within a single subject but also use spatial context. Note that the p-values may be derived from statistical models where typically a proper ex-periment design is required.These p-values are our data,and are used for analysis and no statements about significance level of p-values associated with activated voxels/signals are needed.Our analysis approach allows for the prespecification of a priori expected upper bounds for the proportion of activated voxels/signals that can guide in determining activated voxels.For large datasets,the methods and analyses are also implemented in a distributed man-ner especially using SPMD programming framework.The package also includes workflows which utilize SPMD techniques.The workflows serve as examples of data analyses and large scale simulation studies.Several workflows are also built in to automatically process clusterings,hypotheses,merging clusters,and visualizations.See Section1.5and files in MixfMRI/inst/workflow/for more information.1.1.Dependent PackagesThe MixfMRI package depends on other R packages to be functional even though they are not always required.This is because some examples,functions and workflows of the MixfMRI may need utilities of those dependent packages.For instance,Imports:MASS,Matrix,RColorBrewer,fftw,MixSim,EMCluster.Enhances:pbdMPI,oro.nifti.1.2.The Main FunctionThe main function,fclust(),implements model-based clustering using the EM algorithm(?) for fMRI signal data and provides unsupervised clustering results that identify activated regions in the brain.The fclust()function contains an initialization method and EM algorithms for clustering fMRI signal data which have two parts:•PV.gbd for p-value of signals associated with voxels,and•X.gbd for voxel information/locations in either2D or3D,where PV.gbd is of length N(number of voxels)and X.gbd is of dimension N×2or N×3 (for2D or3D).Each signal(per voxel)is assumed to follow a mixture distribution of K com-ponents with mixing proportion ETA.Each component has two independent coordinates(one for each part)with density functions:Beta and multivariate Normal distributions,for each part of fMRI signal data.Beta Density:The first component(k=1)is restricted by min.1st.prop and Beta(1,1)(equivalently,the standard uniform)distribution.The rest k=2,3,...,K-1components have different Beta(alpha,beta)distributions with alpha<1<beta for all k>1components.This co-ordinate mainly represents the results of test statistics for determining activation of voxels (those that have smaller p-values).Note that the test statistics may be developed/smoothed/-computed from a time course model associated with voxel behaviors.See the main paper? for information.Multivariate Normal Density:The logarithm of the multivariate normal density is used as a penalty to provide regular-ization of the estimated parameters and in the estimated activation.model.X="I"is for identity covariance matrix of this multivariate Normal distribution,and"V"for unstructured covariance matrix.ignore.X=TRUE is to ignore X.gbd and normal density,i.e.there is no regularization and only the Beta density is used.Note that this coordinate(for each axis)is recommended to be normalized in the(0,1)scale which is on the same scale of Beta density. From a modeling perspective,rescaling X.gbd does not have an effect.In this package,the two parts PV.gbd and X.gbd are assumed to be independent because the latter comes through the addition of a penalty term to the log likelihood of the voxel-wise data on p-values.The goal of the main function is to provide spatial clusters(in addition to the PV.gbd)indicating spatial correlations.Currently,APECMa(?)and EM algorithms are implemented with EGM algorithm(?)to speed up convergence when MPI and pbdMPI(?)are available.RndEM initialization(?) with a specific way of choosing initial seeds is implemented for obtaining good initial values that has the potential to increase the chances of convergence.1.3.DatasetsThe package has been built with several datasets including•three2D phantoms,shepp0fMRI,shepp1fMRI,and shepp2fMRI,•one3D dataset,pstats,with p-values obtained from the SPM obtained after running the Analysis of Functional Neuroimaging(AFNI)software()on the imagination dataset of?.•two small2D voxels datasets,plex and pval.2d.mag,in p-values•two toy examples,toy1and toy2.1.4.ExamplesThe scripts in MixfMRI/demo/have several examples that demonstrate the main function, the example datasets and other utilities in this package.For a quick start,•the scripts MixfMRI/demo/fclust2d.r and MixfMRI/demo/fclust3d.r show the basic usage of the main function fclust()using the two toy datasets,•the scripts MixfMRI/demo/maitra_2d.r and MixfMRI/demo/shepp.r show and visual-ize examples on how to generate simulated datasets with given overlap levels,and •the scripts MixfMRI/demo/alter_*.r show alternative methods.1.5.WorkflowsThe package also has several workflows established for simulation studies.The main examples are located in MixfMRI/inst/workflow/simulation/.See the file create_simu.txt that generates scripts for simulations.The files under MixfMRI/inst/workflow/spmd/have the main scripts for the workflows. Note that MPI and pbdMPI are required for workflows because these simulations require potentially long computing times.2.DemonstrationThe examples presented below are simulated and are not necessarily meant to represent meaningful activation study on the brain.Their purpose is to demonstrate our segmentation methodology in activation detection.2.1.2D PhantomsThree2D phantoms built in the MixfMRI package can be displayed in R from the demo as simple asMaitra’s Phantoms§¤R>demo(maitra_phantom,package="MixfMRI",ask=F)¦¥which performs the code in MixfMRI/demo/maitra_phantom.r.The R command should givea plot similar to Figure1containing three different simulated2D slices of a hypothesized brain.Each phantom may have different amounts of activated voxels(in terms of smallerp-values).Colors represent different activation intensities.The total proportions of truly active voxels are listed in the title of each phantom.Figure1:Simulated2D Phantoms.shepp2fMRI: 3.968%shepp1fMRI: 2.249%shepp0fMRI: 0.917%The examples used below mimic some active regions(in2D)depending on different types of stimuli that may trigger responses in the brain.Hypothetically,the voxels may be active by regions,but each region may not be active in the same way(or magnitude)even though they may need to collectively respond to the stimuli(for example,due to time delay,response order,or sensitivity of study design).As an example,only3.968%of voxels in the shepp2fMRI phantom are active and indicated by two different colors(blue and brown)for different activation types where p-values may be smaller than0.05and may follow two Beta distributions(with different configurations)for the truly active voxels and one uniform distribution(i.e.Beta(1,1))for the truly inactive voxels.The following code provides some counts for each groups of active and inactive voxels in the shepp2fMRI phantom.Summary of shepp2fMRI Phantoms§¤>table(shepp2fMRI,useNA="always")shepp2fMRI012<NA>134084728251574¦¥The summary says that this phantom has three kinds of activations with group ids:0,1,and 2.There are13,408voxels belonging to cluster0(inactive),followed by472voxels belonging to cluster1(active&highlighted in blue in Figure1),and82voxels belonging to cluster2 (active&high lighted in brown in Figure1).There are51,574pixels(NA)of this imaging dataset which are not within the brain(contour by the black line in Figure1).See Section2.2 for information of generating p-values from a mixture of three Beta distributions.2.2.2D SimulationsThe MixfMRI provides a function gendataset(phantom,overlap)to generate p-values of activations.The function needs two arguments:phantom and overlap.The phantom is a map containing voxel group id’s where p-values will be simulated from a mixture Beta distribution with certain mixture level specified by the overlap argument.The example can be found in MixfMRI/demo/maitra_2d.r and can be done in R as simple asSimulations of Active Voxels§¤R>demo(maitra_2d,package="MixfMRI",ask=F)¦¥Note that the overlap represents similarity of activation signals.The higher the overlap,the more difficult it becomes to distinguish between activation and inactivation and also the kinds of activation.The command above should give a plot similar to Figure2containing group id’s on the left and their associated p-values for stimulus responses on the right.The top row displays examples for phantom shepp1fMRI,and the bottom row displays examples for phantom shepp2fMRI.•Inside the brain,the group id2’s are indicated by white(active is highly associated with stimula due to experiment design),1’s are indicated by light gray(slightly active),and 0’s are indicated by dark gray(inactive).Note that the region with white color was the region colored blue and the light gray region corresponds to the region colored brown in Figure1•The simulated p-values are colored by a map using a red-orange-yellow palette from0 to1.Note that small p-values(redder voxels)may also occur at truly inactive voxels. See Figure3for the distribution of simulated p-values for the phantom shepp2fMRI.The methodology and analyses implemented in this package aim to identify those active voxelsin spatial clusters.For example,regions of active voxels associated with imagining the playing of certain sports.When an experiment was conducted/designed to detect brain behaviors, the statistical model and the p-values of the treatment effect should be able to reflect the voxel activations.Typically,the statistical tests are done independently voxel-by-voxel due to complexity of computation and modeling.This package provides post hoc clustering that adds spatial contents to p-values and helps to isolate meaningful regions clouded with many smallp-values.See?for information of clustering performance and comprehensive assessments for this post hoc approach.2.3.2D ClusteringThe example can be found in MixfMRI/demo/maitra_2d_fclust.r as simple asClustering of Active Voxels§¤R>demo(maitra_2d_fclust,package="MixfMRI",ask=F)¦¥This demo(explained below)is to cluster the simulated p-values(see Section2.2)using the developed method.Code of maitra_2d_fclust.r§¤library(MixfMRI,quietly=TRUE)Figure2:Activated regions of voxels and simulated p-values.shepp1fMRI overlap 0.01shepp2fMRI overlap 0.01set.seed(1234)da<-gendataset(phantom=shepp2fMRI,overlap=0.01)$pval###Check2d data.id<-!is.na(da)PV.gbd<-da[id]#pdf(file="maitra_2d_fclust.pdf",width=6,height=4)hist(PV.gbd,nclass=100,main="p-value")#dev.off()###Test2d data.id.loc<-which(id,arr.ind=TRUE)X.gbd<-t(t(id.loc)/dim(da))ret<-fclust(X.gbd,PV.gbd,K=3)print(ret)###Check performancelibrary(EMCluster,quietly=TRUE)RRand(ret$class,shepp2fMRI[id]+1)¦¥In the code above,the histogram of simulated p-values is plotted in Figure3.Then,the fclust(X.gbd,PV.gbd,K=3)groups voxels in three clusters.At the end,ret saves the clustering results.The print(ret)from the above code will show the results below in detail:•N is the total number of voxels to be clustered/segmented•K is the total number of segments.•n.class is the number of voxels in each segment•ETA is the mixing proportion of each segment•BETA is the set of parameters of the Beta distributions (by column)•MU is the centers of the segments (the spatial location inside the brain)•SIGMA is the dispersion of the segmentsFigure 3:Activation (p -values)Distribution.The x-axis is for the p -values.p−valuePV .gbd F r e q u e n cy0.00.20.40.60.81.0200400600The numbers of voxels for each segment,in this example,are 13,394,184,and 384associated with new cluster ids:0,1,and 2,paring with the true classifications (see the table in Section 2.1),the adjusted Rand index gives 0.9749indicating good agreement between the truly active and the activated (as determined by our segmentation methodology)results.Outputs of Clustering §¤R >print (ret )Algorithm :apecma Model.X:I Ignore.X:FALSE-Convergence :1iter :16abs .err :0.02091979rel.err :7.375343e -07-N:13962p.X:2K:3logL :28364.52-AIC :-56693.04BIC :-56557.25ICL -BIC :-55712.18-n.class :133********-init.class.method:-ETA:(min.1st.prop:0.8max.PV:0.1)[1]0.952667040.013078470.03425449-BETA:(2by K)[,1][,2][,3][1,]1 1.127244e-010.04237128[2,]1 4.429518e+04 1.00000130-MU:(p.X by K)[,1][,2][,3][1,]0.50135380.31054600.5917377[2,]0.50760800.37181450.3749842-SIGMA:(d.normal by K)[,1][,2][,3][1,]0.012711980.00018596280.009462210[2,]0.021866620.00115820520.004284016R>RRand(ret$class,shepp2fMRI[id]+1)Rand adjRand Eindex0.99640.9749 1.7012¦¥。
通俗讲解dirichlet 聚类英文回答:Dirichlet clustering, also known as Dirichlet process mixture modeling, is a probabilistic clustering algorithm that allows for the automatic determination of the number of clusters in a dataset. It is named after the Dirichlet distribution, which is used to model the distribution of cluster assignments.In Dirichlet clustering, each data point is assigned to one of the clusters, and the cluster assignments are determined based on the similarity between data points. However, unlike traditional clustering algorithms,Dirichlet clustering does not require the number ofclusters to be specified in advance. Instead, it uses anon-parametric Bayesian approach to automatically determine the number of clusters based on the data.The Dirichlet process is a stochastic process thatallows for an infinite number of clusters. It is characterized by two parameters: a concentration parameter, which controls the number of clusters, and a base distribution, which specifies the distribution of cluster assignments. The concentration parameter determines the probability of creating a new cluster when a new data point is encountered, while the base distribution determines the distribution of data points within each cluster.To perform Dirichlet clustering, we start with an empty set of clusters and iteratively assign data points to clusters. At each iteration, we calculate the probability of assigning a data point to each existing cluster, as well as the probability of creating a new cluster. The data point is then assigned to the cluster with the highest probability. If the probability of creating a new clusteris higher than the probability of assigning the data point to any existing cluster, a new cluster is created.The process continues until all data points have been assigned to clusters. The resulting clusters can then be used for further analysis or visualization.中文回答:Dirichlet聚类,也被称为Dirichlet过程混合建模,是一种概率聚类算法,可以自动确定数据集中的聚类数量。
AUTO SPRUCE TRIAL SYSTEM (ASTS)H um a H assan RizviComputer Engineering DepartmentSir Syed University of Engineering and Technology, Karachi, (Pakistan)E-mail:SanaSoftware Engineering DepartmentSir Syed University of Engineering and Technology, Karachi, (Pakistan)Dr. Sadiq Ali khanDepartment of Computer ScienceUniversity of Karachi, (Pakistan)E-mail:Muhammad KhurrumDepartment of InformaticsMalaysia university of science & technology, (Malaysia) E-mail:Khalique AhmedComputer Engineering DepartmentSir Syed University of Engineering & Technology, Karachi, (Pakistan)E-mail:ABSTRACTAuto Spruce Trial System (ASTS) is designed to provide a platform in which children’s intelligence and cognitive behavior are tested through Wechsler Intelligence Scale for Children (WISC). This test is used for a children assessment and find out their abilities learning and disabilities, as well as a clinical device. ASTS system is an automated testing system which conducts the children test and generates their intelligence result automatically. We automate the system as mentioned above in Pakistan which is taken manually and consumes a lot of time. It does not really matter how much intelligence one has, what makes a difference is the manner by which well one uses his/her intelligence. This test is applicable for th ose children’s whose parents are worried about their mental health’s issues and their learning potential. An intelligence test can encourage guardians and instructors make judgments around an individual child’s educational course, standard, or in need of special education.KEYWORDSWechsler Intelligence Scale for Children (WISC).1. INTRODUCTIONThis testing system as mentioned above, developed by David Wechsler. It’s a separately directed intelligence test for children between the ages of six and sixteen. The original test as mentioned above was developed in 1939, and this test divided into several of the subtests. The subtest was arranged into Verbal and Performance scales. These test scores based on:•Verbal IQ (VIQ)•Performance IQ (PIQ)•Full Scale IQ (FSIQ)The third edition was published in 1991 named as WISC-III. This edition has introduced a new subtest as a measure of processing speed. These four new index scores were introduced to represent more narrow domains of cognitive function: •Verbal Comprehension Index (VCI)•Perceptual Organization Index (POI)•Freedom from Distractibility Index (FDI)•Processing Speed Index (PSI)The WISC-IV and WISC-V are published in the year 2003 and 2014 respectively. The WISC-V has included a total of 21 subtests which based on 15 composite scores [2].In this paper we talk about a sort of use named as ASTS which can assist guardians with making a brilliant and prosperous future for their kids.The application ASTS can be utilized in schools for special children’s. ASTS is used not only as an intelligence test, but it is also used for other indicative purposes. IQ scores detailed by the ASTS and these outcomes can be utilized as a component to diagnose the children mental retardation and specific learning disabilities. But, here in this application we are just focusing on how to find out the cognitive functioning of a child.1.1.PurposeASTS is quite different from others. It is more reliable than any other system and Institutionalized knowledge tests are developed by strict rules to guarantee unwavering quality and legitimacy. This test is reliable when achieved a desired outcome. The main purpose of this system is to provide guidance to the parents who’s really concerned about their children’s mental health issues, s o it will be a good approach to create something new and more reliable.1.2. ScopeSince, we know that it is a fact that everybody wants a new idea or something innovative. Auto Spruce Trial System (ASTS) is designed for the diagnosis of the Intelligence Quotient (IQ) level of a child and will be able to predict the presence of disorder in the children based on the age and number of answered question in a specific time and patterns based on the scaling system. By using ASTS, we can easily determine the cognitive functioning of any child. This testing system as mentioned above is applicable for children from age six to sixteen years. This test is utilized as an intelligence test, as well as a clinical apparatus. This project utilizes all our work, academic skills and our experience to making a remarkable source for us to learn more things and grow more into this field.1.3. Modules of auto spruce trial systemOur system consists of 3 modules, as follows:Module-1: Pre-designed Testing SystemModule-2: ConsultancyModule-3: Bulletin BoardFigure 1. Logo of our application.Module-1: Pre-designed Testing SystemIn this module, we simply automate the testing system as mentioned above it’s allow the physiatrist to identify the stages of mind development. It has four main indexes. These indexes are below:i) Verbal Comprehension Index (VCI)ii) Perceptual Reasoning Index (VRI)iii) Working Memory Index (WMI)iv) Processing Speed Index (PSI)There are a variety of subtests within each of these indexes[3].1.The VCI Score test your child’s intelligence and knowledge. The subtest includes:•Vocabulary, Similarities, Comprehension, Information*, Word Reasoning* 2.The PRI scores is related to intelligence and ability to learn new information. The subtest includes:•Block Design, Matrix Reasoning, Picture Concepts, Picture Completion*3.WMI score is related to short term memory. The subtest includes:•Forward Digit Span, Backward Digit Span, Letter-Number Sequencing, Arithmetic*4.PSI test focuses on mental quickness and task performance, its mainly concerned with concentration and attention. The subtest includes: •Coding, Symbol Search, Cancellation* [3].Module-2: ConsultancyIt provides a consultancy section. Everyone who gives test, do not understand their test result and for their easiness we will provide consultants who will guide them online and evaluate their personality perfectly.Module-3: Bulletin BoardBulletin Board includes a portion where users can view updates about a particular issue or topic. It is a surface intended for the posting of public messages. It displays the daily updates of a website. If anything new happens, will be shown at the bulletin board.2. LITERATURE REVIEWAuto spruce trial system is an automated intelligence testing system which is based on the real and authorized data bank set. Automated System and websites exists, that conduct WISC-IV testing but their data set is not appropriate and according to the measures of testing system as mentioned above credibility. ASTS applies accurate and authenticate data set based on level of WISC-IV. ASTS is quite different from others. ASTS, is more reliable than any other system and standardized intelligence tests. ASTS is built by strict rules to ensure reliability and validity. A test result is considered reliable if we can get the equivalent/comparative outcome over and over.ASTS is designed to overcome the difficulties of psychologists when they take manual tests and generate their result in few days. The difficulties, they face are:1. The test is taken manually.2. The result is generated lately.3. Proper time is not given to individuals.2.1. Old methods usedA Test Sheet Algorithm for AssessmentsA dynamic programming approach is used to solve the problem generated by multiple criteria test-sheet. This utilizes the techniques of clustering and dynamic programming that allows the construction of a possible test sheet in accordance with the specified requirements. In this paper some experimental results and the test-sheet-generating strategy of ITED is discussed to evaluate the efficiency of the approach [5].Fuzzy Logic-based Student Learning Assessment ModelIn this article a diagnosis model based on fuzzy logic has been presented. One of the main advantages of this model is that it sanctions for a representation of interpret able cognizance since it is predicated on rules when the reasoning is well defined as well as when the reasoning is intuitive, as a result of experience. The qualitative and quantitative criteria in student assessment proposed by the teacherscan be easily improved (linguistic variables as well as fuzzy rules) adding a high degree of flexibility [11].In Development of Computer-Assisted Testing System with Genetic Test Sheet technique is followed [12]. They proposed two genetic algorithms: •CLFG•FIFGAbove techniques are used for test sheet-generating problems. In a less time by applying these approaches we can get the test sheets with near-optimal discrimination degrees. The two algorithms have been embedded in a CAI system, Intelligent Tutoring, Evaluation, and Diagnosis that provides the easiness and the more informative tool for the instructors an learners. The (ITED-II) testing sub systems generate the test sheets by accepting the assessment requirements by reading the test items from the item banks. In the end the test results are sending to the tutoring sub system for the arrangements of adaptive subject materials [12]. Generation Algorithm for Test Sheet ResultsThe test sheet generating issues are solved through an adaptive cellular genetic algorithm, which is based on selection strategy. This algorithm is a combination of Adaptive Test Sheet Generation and cellular genetic algorithm. This approach resolves the problems of test sheet generation space, improves the fitness of test sheet and also improves the assessment of child’s. These techniques also improve the accuracy in calculations and convergence speed of calculations in test sheet generation [13].An Evolutionary Intelligent Water Drops Approach for Intelligence Test sheet Results GenerationIn this paper, an intelligence test sheet result generating problems and issues are resolved. The computerized test sheet results with multiples assessments and calculations are one of the major issues in the Computer Assisted Testing System an E-Learning technology. A huge and verity of different tests, questions and task banks with different abilities are involved in the assessments test, even randomized test cannot serve the purpose of assessment and cannot generate an accurate output. The accurate result of the system is based on correct question bank and algorithm. It is difficult to develop the assessment sheet that satisfies the all assessment criteria. Evolutionary Intelligent Water Drops is best and more suitable algorithm which solves all the issues related to test sheet results and also solve the huge amount of question bank assessments test [14].Genetic Algorithm used for assessment testIn this research, genetic algorithm approach is used for genetic assessment test. This method is used for optimized the sequences multiple variety and group of tests which have used for same purpose and it is also use a less amount of hardware resources for optimal solutions. These tests are time consuming and some restrictions are applied. In this approach representative keywords used for a particular test. This approach has three major elements:•Teaching•Learning•EvaluationThe genetic algorithm helps in finding the best appropriate solutions [15].The reference is given below of a report that is a result of testing tool kit as mentioned above of a child it describes all the tables that are used for performance evaluation [1][4].3. METHODOLOGY3.1. Method/TechniqueThe main motive of our development is to produce precise and trustworthy results. We don’t have the right to ruin anyone’s life as it is a matter of very serious problem, therefore the results of the system must be reliable. There have been many approaches that drive different results based on decisions that are made on different states. In every stage, a decision which achieves a reward closer to the total rewards is desirable. The new approach adopts fuzzy logic theory to diagnose the difficulty level of test items, in accordance with the learning status and personal features of each student, and then applies the techniques to the test sheet construction. Clustering and dynamic programming is also an approach to solve such issues. ASTS system is an automated testing system which conducts the children test and generate their intelligence result automatically in which we will apply fuzzy logic instead of clustering techniques and dynamic programming approach. We automate the testing system as mentioned above in Pakistan which is taken manually and consumes a lot of time. [4] ASTS will serve as an intelligent assistance to psychologist. The fuzzy logic will make the system more efficient and time saving and it will also become very helpful for the psychologist.3.2. Product PerspectiveThe system Auto Spruce Trial System (ASTS) is designed for the diagnosis of the Intelligence Quotient (IQ) level of a child and will be able to predict the presence of disorder in the children based on the age and number of answered question in a specific time and patterns based on the scaling system or the implementation of algorithm that are to be decided. The scores are cross matched with the scaled scores, composite scores, percentile rank or algorithm and provide the result in the terms of perfection or disorder. Type, kind, level and seriousness of disorder will be further provided in the consultancy section if required.3.3. System FunctionsThe functions of the system are as follows:•Generation of questions from question bank.•Make record for the answered question.•Compare the answers by the engine and perform calculations.•Predict the IQ level.•Predict disorder in the child if present on behalf of his/her answers.3.4. User View•User must know about this application features and a basic knowledge to operating the internet.•User of this is generally the children that will interact so only their proper attention is required.•This system can also be used by parents/system for the result tracking of the child so; they should have basic knowledge of computer.Figure 2. Abstract view of system.3.5. Operating EnvironmentThis is a web and android based system and hence will require a good GUI for good results. The basic need is the browser version for web users and android version for android user.3.6. Constraint•Expertise of members in the software used can be a constraint for the timely completion of the system.•Inappropriate working of database and interface may be a constraint.•Internet connection is important to run the function of the application.•Database is shared between both web and mobile application it may be forced to queue incoming requests and therefore increase the time it takes to fetch data.3.7. System Assumption•Great amount of memory is required in cell phones to use this system.•If your cell phone is not supporting well in memory and proper hardware resources so you can’t access this system.3.8. User Role•First step is the users register him/her self into the system.•The second step is system provides the access key to users.•The third step is according to the age level of user the system starts to show the test questions in order to take test of user then user start to give the test.•Next step is when user completed the test the system will show the test r esults. The result is in percentage form which determines the level of user’s intelligence and their cognitive abilities.•The last step is the user logout from the system.3.9. Overall System working through DiagramOverall working of the system through diagram is as follows:Figure 3. Web view of system.Figure 4. android view of system.3.10. Hardware and SoftwareHardware•Laptop Mobile•Operating SystemsWindows Android•Databases•MySql SQLiteProgramming Languages•Java PHP•HTML5 CSS3•Bootstrap•JSON4. RESULTS AND DISCUSSIONSWe automate the testing system as mentioned above in Pakistan which is taken manually and consumes a lot of time. Psychologists complete this test in 2/3 days or in an entire week period of time. They cannot take this test continuously because neither they can concentrate on a test after 2 nor 3 hours nor children will be able to give test continuously.The manually system is totally converted into automated system. It stands and outmost the credibility level of all available websites conducting WISC-IV tests. Its output/result will be a score sheet and recommendations paper of child’s intelligence for parents and Psychologists.Comparison:Websites Reliable Correct Usability Authentic [6] No Yes No No[7] No Yes No No[8] No No No No[9] No No No No[10] Yes Yes Yes YesThe comparison can be easily understandable through this chart:Figure 5. Comparison of system.5. CONCLUSIONIn this research, we have presented an application by which psychologists can take their assessment test easily. They don’t have to wait for the entire week for resu lt generation. Even, they don’t have to do calculations on their own. All they have to do is to check the behavior of the children and help them through queries if they are stuck. The rest of the work will be done by the system itself, which includes calculations, displaying questions and results generation. This will help kids in their primary ages, when they are studying. Not only kids but parents who are concerned about their children will get benefit too. The system will let parents know about their child weaknesses and IQ. Not only this, parents can also consult with the consultants about how to increase their children IQ and what should be done and what shouldn’t be done. This will help children to make their future bright and prosperous. In future, this application can be made more user friendly by implementing different GUI. For now, this application is only FYPs demonstration. But, after interacting with superior psychologist if they allow this application to be used for clinical purpose then we will implement it in clinics. And, we will not stop here, sooner it will be implemented in Schools and other educational institutions.6. REFERENCES11.1. Patent[1] A WISC Descriptive and Graphical Report by Michelle C. Rexach, Licensed School Psychologist, Florida Department of Health.11.2. Websites[2] .au/blog/wechslerintelligence-scale-for-children-wisc-iv/[3] https:///overview/the-q-interactive-library/wisc-iv.html11.3. Conference Proceedings[4] Anne-Marie Kimbell, “An Overview of the WISC “, Ph.D. National Training Consultant Pearson, 2015.[5] Gwo-Jen Hwang, “A Test-Sheet-Generating Algorithm for Multiple Assessment Requirements”, ieee transactions on education, vol. 46, no. 3, august 2003.11.4. Websites[6] /[7] /[8] /[9] /[10] /11.5. Research paper[11] Constanza Huapaya1, “ Proposal of Fuzzy Logic-based Students Learning Assessment Model”.[12] Gwo-Jen Hwang, Bertrand M. T. Lin, Hsien-Hao Tseng, and Tsung-Liang Lin, “On the Development of a Computer-Assisted Testing System With Genetic Test Sheet-Generating Approach” , ieee transactions on systems, man, and cybernetics—part c: applications and reviews, vol. 35, no. 4,november 2005.11.1. Journal Article[13] Ankun Huang, Dongmei Li1, Jiajia Hou ,Tao Bi, “An Adaptive Cellular Genetic Algorithm Based on Selection Strategy for Test Sheet Generation”, International Journal of Hybrid Information Technology , Vol.8, No.9 (2015).[14] Kavitha, “Composition of Optimized Assessment Sheet with Multi-criteria using Evolutionary IntelligentWater Drops (EvIWD) Algorithm”, International Journal of Software Engineering and Its Applications, Vol. 10, No. 6 (2016).[15] Doru Popescu Anastasiu, Nicolae Bold, and Daniel Nijloveanu, “A Method Based on Genetic Algorithms for Generating Assessment Tests Used for Learning”,vol. 54, 2016, pp. 53–60.。
Automated Meta Data Generation for Personalized Music PortalsErich Gstrein, Florian Kleedorfer, Bigitte KrennSmart Agents Technologies, Research Studios Austria, ARC Seibersdorf research GmbH,Vienna, Austria.{erich.gstrein, florian.kleedorfer, brigitte.krenn}@researchstudio.atAbstractProviding appropriate meta information is essential for personalized e- and m-portals especially in the music domain with its huge archives and short-dated content. Unfortunately the meta data typically coming with the portal’s content is not appropriate for such systems. In this paper we describe an application implemented for use in a large personalized music portal. We explain the way our system – the feature extraction engine FE2 - generates meta information, it’s architecture and how the meta data is used within the portal. Both portal and FE2 are real world systems designed to operate on huge music archives.1.IntroductionThe distribution of music or music relevant content is currently one of the hottest topics with an enormous market potential especially in the mobile world. Due to the large scale of the data sets and the restricted usability of mobile devices, intelligent personalization systems are necessary to ensure utter satisfaction of the users [13]. Thus the more a personalization system knows about the items it recommends the better it will perform. Concerning this knowledge the content domain ‘music’ bears some specific traps that current commercial personalization systems must overcome, in particular these are:Meta Data Quality: Audio content providers deliver their audio data together with only some basic information, like artist name, album name, track name, year, genre and pricing information. From a recommender’s point of view – where similarity relations between items often form the basis for recommendations - this data quality is very poor, because songs are not necessarily similar if they have been created by the same artist and tracks with similar names do not necessarily sound alike.Genres: Although a disgraced concept, it is an indispensable one to a music portal. The most serious problem is that genres are not standardized and thus are likely to be a source of dispute. For example, when it comes to music styles the AllMusicGuide offers 531, Amazon 719 and about 430 different genres [1].Content volume/life-cycle: Music portals often use huge music archives (e.g., promises more than 1.500.000 songs) with rapidly increasing content and dynamically changing relevance of the contents ( think of all the one day wonders produced by the music industry). Ensuring or creating high quality meta information is an enormous problem in this context.Cultural dependency: The cultural background of the people interested in music plays an important role too [1], because it influences many dimensions of the selection, profiling, and recommendation components. This implies that the provided meta information also must incorporate cultural aspects.Summing up, concerning music meta data we have to account for at least the following issues:1.deal with the fact that content providers donot provide appropriate information for highquality recommendations2.classify items without the availability of asound set of classes (genres)3.create appropriate meta information forarchives that are constantly increasing in size4.account for cultural diversityThe work presented in this paper describes an application developed to support a personalization system for a large international mobile music portal – called the Personal Music Platform (PMP)[13]. The PMP, currently online in Europe and Asia, offers music and music relevant products such as wallpapers,Worker-Nodes ringtones, etc. (A white labeled demo application can be visited at .)The paper is organized as follows. While section 2 describes the basic concepts we build our system upon, a short overview of the architecture of FE 2 is presented in section 3. After discussing the application scenarios in section 4 the further development of FE 2 is highlighted in section 5.2. Audio Meta Data GenerationWithin recommender systems similarity is a major concept used in collaborative filtering as well as in item based filtering approaches [1, 2]. While former systems refer to similarity among users the latter focus on item similarity, especially important for the music domain where ‘sounding similar’ is a major selection criteria applied by end users.But how to extract such meta information to support a personalization system? In the worst case the sources of information available to a designer of a music personalization system are:1. a set of audio files, each coming with a titleand the name of the artist2. the web with an unforeseeable number ofpages, related to some music topics, such as fan-pages, artist home pages, etc.The field of Music Information Retrieval (MIR) has recently started to investigate respective issues, in particular the definition of similarity measures between music items. Two approaches are in focus: (1) the definition of similarity based on audio data, see for instance [3, 4, 5]; (2) the definition of similarity based on cultural aspects extracted from web pages [6, 7, 8]. In the following we will concentrate on the audio based approach. Unfortunately the state of the art techniques for feature extraction, and similarity calculation - with similarity being defined as distance between two feature vectors - are very resource consuming. Thus the main challenge was to incorporate high quality meta data generation in a scaleable application dedicated to real world music archives.The MIR techniques we make use of in our approach are based on MFCCs (Mel Frequency Cepstral Coefficients) which summarize timbre aspects of the music which are based on psychoacoustic considerations. For the computation of similarity between tracks the feature vectors summarizing the spectral characteristics are compared [4, 10, 11].A serious drawback of these techniques is their time complexity. The extraction of a timbre model out of an audio file (WAVE, 22Khz, mono) takes about 30 seconds while the calculation of the distance between two vectors takes 0.05 seconds on a single machine (PC 4GHZ CPU, 1GB RAM). In the further discussion, the terms feature vector and timbre model will be used without further distinction.Even though the extraction of the timbre model of a track takes about 30 seconds it is far less critical than the computation of the similarity relations between tracks, because of its linear behavior. Each vector has to be extracted only once and distributing the task on n machines speeds up the process at factor n. In contrast the complexity of pair-wise distance computation is O(n 2) which in addition rises a storage problem (e.g. think of a complete distance matrix n x n where n=106). Therefore the optimization/reduction of distance calculations is a key factor for an application feasible for real world archives.3. FE 2 in a NutshellIn order to deal with the heavy workload that arises in a million tracks scenario, a distributed, easy scaleable architecture was developed that allows for parallelizing feature extraction, similarity computation and clustering jobs on multiple computers. Distribution is achieved by storing job descriptions in a database which is regularly checked for new jobs by worker nodes (Fig. 1). Distances between tracks – as well as the timbre models - are stored in a database and thus available to explorative data mining as well as to recurring tasks of content classification.Figure 1: Distributing RequestsContinuous content growth is handled by an event-based mechanism that causes new tracks to be passed to feature extraction and subsequent classification automatically. Furthermore, the architecture is designed in an extensible way, which makes it possibleto plug in additional feature extraction and modelcomparison mechanisms in the future.4.Contexts of UseIn this section we describe different aspects of use of the FE2 and its outcomes. Section 4.1 describes the most important use cases for applying similarity relations to generate appropriate meta information for improving the recommendation quality. Section 4.2 is dedicated to the process of meta data generation and highlights some administration features of the FE2. The big picture, how the FE2 is embedded in music portals is described in sections 4.3 and 4.4.4.1.Applying Meta DataThe distance measures, computed by FE2, are used for generating playlists and classes of similar sounding tracks.Automated playlist generation is an important feature for a music personalization system because it supports the user in finding the most similar songs to a given track. From a technical point of view this is done by solving the k-nearest neighbor problem, however with the serious aggravation that the attributes of the vectors cannot be used to create common indexing structures. Instead only pair-wise distances can be used. They are computed by applying the Monte Carlo algorithm to the timbre models (implemented as Gaussian Mixture Models). For more information on distance based indexing see [9].The classification of tracks by means of ‘sounding similar’ can be seen as an alternative or complementary concept to standard genres. The classification process is performed by defining clusters C i (i=1..n) via a set of prototypes P ij (timbre models) and assigning a class to each track of the audio archive. A track T is affiliated to a cluster C i if the distance DIST(T, C i) is smaller than any other DIST(T, C j). The distance DIST(T, C i) is computed by defining the minimal distance between the timbre model of T and one of the prototypes P ij of C i. In the course of the PMP project we pursue two different approaches: manual cluster generation and semi automated cluster generation.In the case of handpicked clusters (manual cluster generation), the prototypes are defined by a human expert and the resulting classification process is used to support the content administrator. In PMP this approach is mainly used for defining mood clusters – such as ‘feeling blue’, ‘feeling excited’, etc. – containing music that best matches the given mood.Semi automated cluster generation is performed by applying clustering algorithms (e.g. k-means clustering) on a sample set of the audio archive followed by a tuning /cleaning process by the content administrator. Within the PMP project this approach is mainly used to define a more intuitive genre concept based on sound similarity.4.2.Supporting the ContentAdministratorBeing confronted with hundreds of thousands of tracks and with a time consuming classification process an appropriate tool supported modus operandiis essential for an industrial application. Apart from the technical aspects of scalability and performance the support of the content administrator is one of the most important aspects of the FE2. The core features are:•different sample and test sets can be pulledout of the archive•several sample sets can be classified inparallel•the affiliation of tracks to clusters isrepresented graphically•the consequences of using a track asprototype of a cluster are displayed on-line •cluster metrics and tests provideinformation about the quality of theclusters4.3.FE2 in the Context of the MusicPortalIn the context of the PMP the feature extraction engineis used twofold:1.as a batch process for classifying itemsagainst a defined set of prototypes, triggeredby the content feed process of the portal2.as an administration tool used by a contentadministrator to define and refineclassification schemesThe information flow between PMP and the FE2 is bidirectional. In a first step the meta data information calculated by the FE2 is imported into the PMP portalto boost the personalization system. In a second stepuser feedback concerning the quality of the meta data collected by the personalization system is employed to refine the meta data generation process.For the time being the following kinds of feedback are incorporated in the meta data generation process: •the affiliation of tracks to predefined, mood specific music genres like ‘feeling blue’,‘excited’, etc. by the user. These tracks areused to refine/define the set of the specificcluster prototypes•users ratings on elements of recommendation lists are utilized to tune the ‘playlist’ and the‘similar artist’ generation process4.4.3rd Party ApplicationsCompanies like Gracenote(), All Music Guide() or Hifind () are creating high-quality meta-information on the basis of the human expert knowledge. The opponents of high-quality hand crafted content are up-to-dateness, focus on mainstream and of course the costs as a killer argument for small or medium size portals.The application area of automatic meta data generation software like FE2 is therefore not only limited to the improvement of specific portals (like PMP) but also can generally improve/support an editorial approach as illustrated in section 4.2.5.Future DirectionBeside the ongoing improvements of the audio based approach the capability of our FE2 will be extended to other sources of information. Promising approaches have recently been presented that try to exploit lyrics [12] or analyze the content of websites [6]. These approaches are complementary to the audio analysis and are particularly suited to capture cultural information. Concerning the FE2 we currently explore the applicability of ‘artist similarity’ – based on features extracted from websites – and ‘track similarity’ based on lyrics. Furthermore, the applicability of visualized similarity relations (e.g. visualization of clusters) for improving navigation will be investigated.6. References[1] A. Uitdenbogerd, R.v. Schyndl. A Review of Factors Affecting Music Recommender Systems. Dep. of Computer Science, Melbourne, Australia.[2] Herlocker, J., L., and Konstan, J., A., Terveen, L., G., and Riedl, J., T., (2004). Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information systems, Vol. 22 No. 1, January 2004, Pages 5 – 53[3] Pampalk, E, Dixon, S., and Widmer, G. (2003). On the Evaluation of Perceptual Similarity Measures for Music. In Proceedings of the 6th International Conference on Digital Audio Effects (DAFx-03), London.[4] Aucouturier, J.J and Pachet, F. (2002). Music Similarity Measures: What's the Use? In Proceedings of the International Conference on Music Information Retrieval (ISMIR'02), Paris, France.[5] Aucouturier, J.-J. and Pachet, F. (2004). Improving Timbre Similarity: How High isthe Sky? Journal of Negative Results in Speech and Audio Sciences 1(1).charge of[6] Knees, P., Pampalk, E., and Widmer, G. (2004). Artist Classification with Web-based Data. In Proceedings of the 5th International Conference on Music Information Retrieval(ISMIR'04), Barcelona, Spain, October 10-14, 2004.[7] Whitman, B. and Lawrence, S. (2002). Inferring Descriptions and Similarity for Music from Community Metadata. In Proceedings of the 2002 International Computer Music Conference, pp 591-598. 16-21 September 2002, Göteborg, Sweden.[8] Baumann, S. and Hummel, O. (2003). Using Cultural Metadata for Artist Recommendation. In Proceedings of the International Conference on Web Delivery of Music (WedelMusic), Leeds, UK.[9] Bozkaya, T. and Ozsoyoglu M., (1999). Indexing Large Metric Spaces for Similarity Search Queries. ACM Transactions on Database Systems, Vol. 24, No. 3, September 1999, Pages 361–404.[10] Foote, J.T. (1997). Content-based Retrieval of Music and Audio. In Proceedings of the SPIE Multimedia Storage and Archiving Systems II, 1997, vol. 3229.[11] Logan, B. and Salomon, A. (2001). A Music Similarity Function Based on Signal Analysis.[12] Logan, B., Kositsky, A., and Moreno, P. (2004). Semantic Analysis of Song Lyrics. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME).[13] Gstrein, E., et al. (2005). Adaptive Personalization: A Multi-Dimensional Approach to Boosting a Large Scale Mobile Music Portal. In Fifth Open Workshop on MUSICNETWORK: Integration of Music in Multimedia Applications, Vienna, Austria.。
AUTOMATICALLY CLUSTERING SIMILAR UNITS FOR UNIT SELECTION IN SPEECH SYNTHESIS.Alan W Black and Paul TaylorCentre for Speech Technology Research,University of Edinburgh,80,South Bridge,Edinburgh,U.K.EH11HNemail:awb@,Paul.T aylor@ABSTRACTThis paper describes a new method for synthesiz-ing speech by concatenating sub-word units from a database of labelled speech.A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic con-text.The appropriate cluster is then selected for a target unit offering a small set of candidate units.An opti-mal path is found through the candidate units based on their distance from the cluster center and an acousti-cally based join cost.Details of the method and justi-fication are presented.The results of experiments us-ing two different databases are given,optimising vari-ous parameters within the system.Also a comparison with other existing selection based synthesis techniques is given showing the advantages this method has over existing ones.The method is implemented within a full text-to-speech system offering efficient natural sound-ing speech synthesis.1.BACKGROUNDSpeech synthesis by concatenation of sub-word units (e.g.diphones)has become basic technology.It pro-duces reliable clear speech and is the basis for a num-ber of commercial systems.However with simple di-phones,although the speech is clear,it does not have the naturalness of real speech.In attempt to improve naturalness,a variety of techniques have been recently reported which expand the inventory of units used in concatenation from the basic diphone schema(e.g.[7] [5][6]).There are a number of directions in which this has been done,both in changing the size of the units,the classification of the units themselves,and the number of occurrences of each unit.A convenient term for these approaches is selection based synthesis.In general,there is a large database of speech with a variable number of units from a par-ticular class.The goal of these algorithms is to select the best sequence of units from all the possibilities in the database,and concatenate them to produce thefinal speech.The higher level(linguistic)components of the sys-tem produce a target specification,which is a sequence of target units,each of which is associated with a set of features.In the algorithm described here the database units are phones,but they can be diphones or other sized units.In the work of Sagisaka et al.[9],units are of variable length,giving rise to the term non-uniform unit synthesis.In that sense our units are uniform.The fea-tures include both phonetic and prosodic context,for instance the duration of the unit,or its position in a syllable.The selection algorithm has two jobs:(1)to find units in the database which best match this target specification and(2)tofind units which join together smoothly.2.CLUSTERING ALGORITHMOur basic approach is to cluster units within a unit type (i.e.a particular phone)based on questions concerning prosodic and phonetic context.Specifically,these ques-tions relate to information that can be produced by the linguistic component,e.g.is the unit phrase-final,or is the unit in a stressed syllable.Thus for each phone in the database a decision tree is constructed whose leaves are a list of database units that are best identified by the questions which lead to that leaf.At synthesis time for each target in the target speci-fication the appropriate decision tree is used tofind the best cluster of candidate units.A search is then made to find the best path through the candidate units that takes into account the distance of a candidate unit from its cluster center and the cost of joining two adjacent units.2.1.Clustering unitsTo cluster the units,wefirst define an acoustic mea-sure to measure the distance between two units of the same phone type.Expanding on[7],we use an acoustic vector which comprises Mel frequency cepstrum coef-ficients,F,power,and delta cepstrum,F and power. The acoustic distance between two units is simply the average distance for the vectors of all the frames in the units plus X%of the frames in the previous units,which helps ensure that close units will have similar preced-ing contexts.More formally,we use a weighted maha-lanobis distance metric to define the acoustic distancebetween two units and of the same phoneme class asifwhere is number of frames in,is pa-rameter of frame of unit,is the standard deviation of parameter,is weight for parameter .This measure gives the mean weighted distance be-tween units with the shorter unit linear interpolated to the longer unit.is the duration penalty weighting the difference between the two units’lengths.This acoustic measure is used to define the impurity of a cluster of units as the mean acoustic distance be-tween all members.The object is to split clusters based on questions to produce a better classification of the units.A CART method[2]is used to build a decision tree whose questions best minimise the impurity of the sub-clusters at that point in the tree.A standard greedy algorithm is used for building the tree.This technique may not be globally optimal but a full global search would be prohibitively computationally expensive.A minimum cluster size is specified(typically between 10-20).Although the available questions are the same for each phone type,the tree building algorithm will se-lect only the questions that are significant in partition-ing that particular type.The features used for CART questions include only those features that are available for target phones during synthesis.In our experiments these were:previous and following phonetic context (both phonetic identity and phonetic features),prosodic context(pitch and duration including that of previous and next units),stress,position in syllable,and posi-tion in phrase.Additional features were originally in-cluded,such as delta F between a phone and its pre-ceding phone,but they did not appear as significant and were removed.Different features are significant for dif-ferent phones,for example we see that lexical stress is only used in the phones schwa,i,a and n,while a fea-ture representing pitch is only rarely used in unvoiced consonants.The CART building algorithm implicitly deals with sparseness of units in that it will only split a cluster if there are sufficient examples and significant difference to warrant it.2.2.Joining unitsTo join consecutive candidate units from clusters se-lected by the decision trees,we use an optimal coupling [4]technique to measure the concatenation costs be-tween two units.This technique offers two results:the cost of a join and a position for the join.Allowing the join point to move is particularly important when our units are phones:initial unit boundaries are on phone-phone boundaries which probably are the least stable part of the signal.Optimal coupling allows us to select more stable positions towards the center of the phone. In our implementation,if the previous phone in the database is of the same type as the selected phone we use a search region that extends60%into the previous phone,otherwise the search region is defined to be the phone boundaries of the current phone.Our actual measure of join cost is a frame based Eu-clidean distance.The frame information includes F, Mel frequency cepstrum coefficients,and power and their delta counterparts.Although this uses the same parameters as used in the acoustic measure used in clus-tering,now it is necessary to weight the F parameter to deter discontinuity of local F which can be partic-ularly distracting in synthesized examples.Except for the delta features this measure is similar to that used in [7].2.3.Selecting unitsAt synthesis time we have a stream of target segments that we wish to synthesize.For each target we use the CART for that unit type,and ask the questions tofind the appropriate cluster which provides a set of candi-date units.The function is defined as the dis-tance of a unit to its cluster center,and the functionas the join cost of the optimal coupling point between a candidate unit and the previous can-didate unit it is to be joined to.We then use a Viterbi search tofind the optimal path through the can-didate units that minimizes the following expression: allows a weight to be set optimizing join cost over target cost.Given that clusters typically contain units that are very close,the join cost is usually the more im-portant measure and hence is weighted accordingly.2.4.PruningAs distributing the whole database as part of a synthe-sis voice may be prohibitively large,especially if mul-tiple voices are required,appropriate pruning of units can be done to reduce the size of the database.This has two effects.Thefirst is to remove spurious atypical units which may have been caused by mislabelling or poor articulation in the original recording.The second is to remove those units which are so common that there is no significant distinction between candidates.Given this clustering algorithm it is easy(and worthwhile)to achieve thefirst by removing the units from a cluster that are furthest from its center.Results of some exper-iments on pruning are shown below.The second type of pruning,removing overly com-mon units,is a little harder as it requires looking at the distribution of the distances within clusters for a unit type tofind what can be determined as,“close enough.”Again this involves removal of those units furthest from the cluster center,though this is best done before thefi-nal splits in the tree,and only for the most common unit types.As with all the measures and parameters there is a trade off between synthesis resources(size of database and time to select)verses quality,but it seems that prun-ing20%of units makes no significant difference(and may even improve the results)while up to50%may be removed without seriously degrading the quality.(Sim-ilarfigures were also found in the work described in [7].)3.EXPERIMENTSTwo databases have so far been tested with this tech-nique,a male British English RP speaker consisting of460TIMIT phonetically balanced sentences(about 14,000units)and a female American news reader from the Boston University FM Radio corpus[8](about 37,000units).Testing the quality of speech synthesis is difficult. Initially we tried to score a model under some set of parameters by synthesizing a set of50sentences.The results were scored on a scale of1-5(excellent to in-comprehensible).However the results were not consis-tent except when the quality widely differed.Therefore instead of using an absolute score we used a relative one,as it was found to be much easier and reliable to judge if an example was better,equal or worse than an-other than state its quality on some absolute scale.In these tests we generated20sentences for a small set of models by varying some parameter(e.g.cluster size).The20sentences consisted of10“natural target”sentences(where the segments,duration and F were derived directly from naturally spoken examples),and 10examples of text to speech.None of the sentences in the test set were in the databases used to build the cluster models.Each set of20was played against each other set(in random order)and a score of better,worse or equal was recorded.A sample set was said to“win”if it had more better examples than another.A league table was kept recording the number of“wins”for each sample set thus giving an ordering on the sets.In the following tests we varied cluster size,and F weight in the acoustic cost,and the amount to prune final clusters.These full tests were only carried out on the male460sentence database.For the cluster size wefixed the other parameters at what we thought were mid-values.The following table gives the number of“wins”of that sample set over the others.minimum cluster size812wins142 Obviously we can see that when the cluster is too re-strictive the quality decreases but at around10it is at its best and decreases as the cluster size gets bigger.The importance of F in the acoustic measure was tested by varying its weighting relative to the other pa-rameters in the acoustic vector.F acoustic weight1.0 3.030This optimal value is lower than we expected but we believe this is because our listening test did not test against an original or actual desired F,thus no penalty was given to a“wrong”but acceptable F contour,in a synthesized example.Thefinal test was tofind the effect of pruning the clusters.In this case clusters of size15and10were tested,and pruning involved discarding a number of units from the clusters.In both cases discarding1or 2made no perceptible difference in quality(though re-sults actually differed in2units).In the size10clus-ter case,further pruning began to degrade quality.In the size15cluster case,quality only degraded after dis-carding more than3units.Overall the best quality was for the size10cluster and pruning2allows the database size to be reduced without affecting quality.The prun-ing was also tested on the f2b database with its much larger inventory.Best overall results with that database were found with pruning3and4from a cluster size of 20.In these experiments no signal modification was done after selection,even though we believe that such pro-cessing(e.g.PSOLA)is necessary.We do not expect all prosodic forms to exist in the database and it is better to introduce a small amount of modification to the signal in return forfixing obvious discontinuities.However it is important for the selection algorithm to be sensitive to the prosodic variation required by the targets so that the selected units require only minimal modification.Ide-ally the selection scoring should take into account the cost of signal modification,and we intend to run simi-lar tests on selections modified by signal processing.4.DISCUSSIONThis algorithm has a number of advantages over other selection based synthesis techniques.First the cluster method based on acoustic distances avoids the problem of estimating weights in a feature based target distance measure as described in[7],but still allows unit clusters to be sensitive to general prosodic and phonetic distinc-tions.It also neatlyfinesses the problem of variabil-ity in sparseness of units.The tree building algorithm only splits a cluster when there are a significant num-ber and identifiable variation to make the split worth-while.The second advantage over[7]is that no target cost measurement need be done at synthesis time as the tree effectively has pre-calculated the“target cost”(in this case simply the distance from the cluster center). This makes for more efficient synthesis as many dis-tance measurements now need not be done. Although this method removes the need to gener-ate the target feature weights generated in[7]used in estimating acoustic distance there are still many other places in the model where parameters need to be esti-mated,particularly the acoustic cost and the continuity cost.Any frame based distance measure will not eas-ily capture“discontinuity errors”perceived as bad joins between units.This probably makes it difficult tofind automatic training methods to measure the quality ofthe synthesis produced.Donovan and Woodland[5]use a similar clustering method,but the method described here differs in that in-stead of a single example being chosen from the cluster, all the members are used so that continuity costs may take part in the criteria for selection of the best units. In[5],HMMs are used instead of a direct frame-based measure for acoustic distance.The advantage in using an HMM is that different states can be used for different parts of the unit.Our model is equivalent to a single state HMM and so may not capture transient in-formation in the unit.We intend to investigate the use of HMMs as representations of units as this should lead to a better unit distance score.Other selection algorithms use clustering,though not always in the way presented here.As stated,the cluster method presented here is most similar to[5].Sagisaka et al.[9]also clusters units but only using phonetic information,they combine units forming longer,“non-uniform”units based on the distribution found in the database.Campbell and Black[3]also use similar pho-netic based clustering but further cluster the units based on prosodic features,but still resorts to a weighted fea-ture target distance for ultimate selection.It is difficult to give realistic comparisons of the qual-ity of this method over others.Unit selection techniques are renowned for both their extreme high quality exam-ples and their extreme low quality ones,and minimis-ing the bad examples is a major priority.This technique does not yet remove all low quality examples,but does try to minimise them.Most examples lie in the mid-dle of the quality spectrum with mostly good selection but a few noticable errors which detract from the over-all acceptability of the utterance.The best examples, however,are nearly indistinguishable from natural ut-terances.This cluster method is fully implemented as a wave-form synthesis component using the Festival Speech Synthesis System[1].5.ACKNOWLEDGEMENTSWe gratefully acknowledge the support of the UK En-gineering and Physical Science Research Council(EP-SRC grant GR/K54229and EPSRC grant GR/L53250).REFERENCES[1]A.W.Black and P.Taylor.The Festival SpeechSynthesis System:system documentation.Tech-nical Report HCRC/TR-83,Human Communci-ation Research Centre,University of Edinburgh, Scotland,UK,January1997.Avaliable at /projects/festival.html.[2]L.Breiman,J.Friedman,R.Olshen,and C.Stone.Classification and Regression Trees.Wadsworth& Brooks,Pacific Grove,CA.,1984.[3]N.Campbell and A.Black.Prosody and the selec-tion of source units for concatenative synthesis.In J.van Santen,R.Sproat,J.Olive,and J.Hirschberg,editors,Progress in speech synthesis,pages279–282.Springer Verlag,1996.[4]A.Conkie and S.Isard.Optimal coupling of di-phones.In J.van Santen,R.Sproat,J.Olive,and J.Hirschberg,editors,Progress in speech synthesis, pages293–305.Springer Verlag,1996.[5]R.Donovan and P.Woodland.Improvements in anHMM-based speech synthesiser.In Eurospeech95, volume1,pages573–576,Madrid,Spain,1995. [6]X.Huang, A.Acero,H.Hon,Y.Ju,J Liu,S.Meredith,and M.Plumpe.Recent improvements on microsoft’s trainable text-to-speech synthesizer: Whistler.In ICASSP-97,volume II,pages959–962, Munich,Germany,1997.[7]A.Hunt and A.Black.Unit selection in a concate-native speech synthesis system using a large speech database.In ICASSP-96,volume1,pages373–376, Atlanta,Georgia,1996.[8]M.Ostendorf,P.Price,and S.Shattuck-Hufnagel.The Boston University Radio News Corpus.Tech-nical Report ECS-95-001,Electrical,Computer and Systems Engineering Department,Boston Univer-sity,Boston,MA,1995.[9]Y.Sagisaka,N.Kaiki,N.Iwahashi,and K.Mimura.ATR–-TALK speech synthesis system.In Pro-ceedings of ICSLP92,volume1,pages483–486, 1992.。