Inferring processes underlying B-cell repertoire diversity. We model the VDJ recombination and somatic hypermutation processes in B-cells using probabilistic inference methods on high-throughput DNA sequence data of B-cell receptor heavy chains. Our method captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for function. We also identify known features of the somatic hypermutation process independent of statistical effects of selection. Submitted 26 February; v1 submitted 10 February. T-cell receptor diversity results from a stochastic random DNA recombination process, called VDJ recombination, followed by thymic selection of cells according to the affinity of their receptors with self and foreign antigenic peptides. We infer population statistics for V and J gene choice, and for the length and amino-acid content of the variable region. Our approach is designed to disentangle the effects of selection from biases inherent in the recombination process. Inferred selection factors differ significantly between individuals, or between naive and memory repertoires. The number of sequences shared between individuals is well-predicted by the model, indicating a purely stochastic origin of such "public" sequences. We find a strong correlation between biases produced by VDJ recombination and our inferred selection factors, together with a measure of hydrophobicity during selection. Both effects suggest that thymic selection acting on the recombination process has shaped the selection pressures acting during somatic evolution. Submitted 19 October. Recombination scenarios are composed of recombination events -- choices of gene segments, nucleotide pair deletions and insertions -- described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be produced in several different ways. Inferring the distribution of these scenarios is an important baseline. Inferring the parameters of the distributions from receptor sequences is a computationally difficult problem, requiring enumerating every possible scenario for every observed receptor sequence. We develop a Hidden Markov model, which accounts for all recombination scenarios that can produce the observed sequences. We develop and test a sequential algorithm based on the Baum-Welch algorithm that can effectively learn the parameters for the recombination distributions. We tested our inference tool on data for both the alpha and beta chains of the T cell receptor. To validate the validity of our inference, we generated synthetic sequences produced by a known model, and verified that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to estimate the probability of generation of any receptor sequence, as well as the total diversity of the repertoire. The model provides a baseline to quantify selection. Submitted 31 October. Bioinformatics 32. Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination. Despite the large diversity of T cell repertoires, many identical T-cell receptor (TCR) sequences are found in a large number of unrelated individuals and species. Here we show that even for large cohorts the observed degree of sharing of TCR sequences between individuals is well predicted by a model accounting for the quantitative statistical biases in the generation process, together with a simple model of thymic selection. Whether a sequence is shared by many individuals is predicted to depend on the number of observed individuals and the sequencing depth, as well as on the sequence itself, in agreement with the data. We predict the extent of sharing depending on the sampled repertoire size and the size of the observed repertoires. Submitted 2 March. Mouse T cell repertoires as statistical ensembles: The ability of the adaptive immune system to respond to diverse pathogens stems from the large diversity of immune cell receptor repertors (TCRs). This diversity originates in a stochastic DNA recombination process (VDJ recombination) that occurs each time a new T cell is generated from a progenitor cell. By analyzing T cell sequence data obtained from the blood and thymus of mice of different ages, we quantify the changes in this process that occur in mice from embryo to adult. We find a dramatic increase with age in the number of nucleotide insertions in the VDJ recombination process, leading to a corresponding increase in diversity. Since the thymus maintains thymic output over time, mature repertoires are mixtures of different statistical generation processes and, by analyzing the mixture, we can infer a clear picture of the time evolution of the adaptive immune system. Repertoire analysis also allows us to measure the effect of selection on the output of the VDJ recombination process. The biases we find are very similar between thymus and spleen, suggesting that they mainly reflect selection for proper folding of the TCR receptor protein. Identifying shared clonotypes reveals the structure and dynamics of the mouse T cell receptor repertoire. The diversity of T-cell receptors recognizing different pathogens is generated through a highly stochastic recombination process, making the independent generation of the same sequence rare. Yet unrelated individuals do share sequences, which together form a "public" repertoire of shared clonotypes. The TCR repertoire is largely formed prenatally, when the enzyme inserting random nucleotides is downregulated, creating a low diversity population. By statistically analyzing repertoire data, we identify shared clonotypes and study their properties. Our results suggest that large, low-diversity clonal expansions are created during development, and persist over time periods, providing the backbone of the public repertoire. Counter malpractice bot activity cs maes?{/PARAGRAPH}.