September 2014:  This section of the website is currently under revision.

NAIS bridges the gap between numerical analysts, computer scientists and high performance computing algorithm developers by creating new systems of code annotation, compilation and efficient implementations for application-oriented computational methods such as adaptive finite elements, multiscale modelling, molecular simulation and optimization.

Impact/Output from the NAIS programme include

  • Scientific Advances

  • Collaboration

  • Impact on Research Strategy

  • UK Research Capability

  • Publications resulting from the project

Scientific Advances

With the broad range of research interests across the NAIS programme there has been a range of scientific advances.  Highlights include

  • NAIS Lecturer Peter Richtarik has written and submitted 18 papers in the areas of optimization and machine learning. Two papers, on randomized coordinate descent methods, scalable to problems with billions of variables, attracted major international best paper awards.
  • An algorithm developed in a joint paper by Peter Richtarik and Martin Takac was implemented by Amazon and a version thereof is operational at Google.
  • Research by  NAIS Lecturer, James Maddison on automated model development for time dependent problems extends the high level automated code generation system FEniCS while providing adjoint methods for sensitivity studies. This technology will be used in a soon-to-start NERC project for ocean eddy parameterisation.
  • NAIS funding enabled research into Soft Matter that appeared as a cover story in the RSC journal 'Soft Matter'. Further investigations into these liquid crystals by EPCC and collaborators have recently been published in Nature Communications.
  • Collaboration with Cray, and others, has enabled cutting-edge research into the challenges involved with designing, utilising, and programming Exascale computers.
  • Exascale research has been disseminated via numerous publications, presentations and training (including prestigious joint EPCC/Cray tutorials at a number of the supercomputing conferences) plus activity within global standards organisations for all the main parallel computing programming mechanism (EPCC is now an active member of the standards bodies for MPI, OpenMP, and OpenACC).
  • A new research group (memory models and consistency schemes) has been established by NAIS Lectuer, Vijay Nagarajan,  and is expected to expand.
  • NAIS Lecturer, Sebastien Loisel, has written and submitted 13 papers during the NAIS appointment. The new algorithms developed solve problems with billions of unknowns on tens of thousands of processors.
  • Significant progress with them in two developing areas of domain decomposition: Waad Subber on domain decomposition for uncertainty quantification; and with Hieu Nguyen on the analysis of the Bank-Holst paradigm for heterogeneous grids with local refinements. Both these projects were supervised by Sebastien Loisel.
  • Significant scientific advances on how much the use of HPC techniques and algorithms can help the computation of realistic approximations to interesting and challenging problems.
  • The work of Ainsworth and Rankin is one particular example. In the series of papers they wrote together (some in collaboration with Allendes and Barrenechea), the parallelisation of adaptive finite element schemes was extensively studied, and the possibility of getting much farther in terms of size of the problem was achieved.
  • Riaz’s thesis provided a different way of designing finite element software, in which the parallel structure is somehow hard-wired into it. The possibilities of auto-tuning of this code are also a very promising area for future research.
  • The appointment of NAIS reader, Victorita Dolean, has brought a highly needed expertise in the interface of mathematics and HPC to the Department. She also brings an expertise that acts as a bridge between different parts of the Numerical Analysis and Scientific Computing group.

Collaboration

The NAIS programme has led to greater collaboration between the partner institutions, across UK Research Groups and internationally.  Ultimately this has led to improved research excellence and capability building within the UK.

UK Research Groups

The NAIS initiative has led to increased collaboration both within the partner institutions as well as with NA-HPC communities across the UK, especially the NAIS extended network partners (Bath, Leicester and Warwick). NAIS co-organised the NA-HPC networks, based in Manchester. The collaboration between academics was promoted throughout the lifetime of the NAIS programme, supported through sponsored seminar series, workshops/meetings and research visits. This culminated in a series of meetings in the final 6 months of the programme. Either organised or co-supported by NAIS, these meetings covered a broad range of topics. These events were held at the NAIS centres at Edinburgh and Glasgow, as well as at NAIS Network venues throughout the UK (Cambridge, Durham, Bath and Dundee). With over 500 delegates, these meetings represent an excellent opportunity for cross-discipline networking, research ideas exchange. As a consequence of these meetings, and earlier activities, we are already aware of joint projects being initiated and NAIS members arranging future research visits.

Industry and commerce

During the NAIS programme there has been a wide range of external (industry/commerce) involvement in the programme through, participation in workshops/meetings, student placements, specific research collaboration, fellowships/awards for NAIS researchers and Industry visits/briefings. Areas of collaboration have included HPC/Software/Hardware (Cray, Agilent, Intel, OpenAcc Standards Consortium), Engineering/Defence/Aerospace (Dstl/Selex/QinetiQ/Airbus/ Cobham/Sandia/Naval Research laboratory), Drug Design/Molecular Dynamics (Plebiotic, Accelrys, NVIDIA, IBM), Industrial Fellowships/Awards/Student Placements (Google/Intel/IBM), Marketing/Social Media/Big Data (Bloom Media/IQcodex/Schuh/Tag Digital/Merchant Soul/NHS/Scottish Government/HRL Labs), Finance (Bloomberg LP, Lloyds Banking Group), Optimisation (Amazon, SAS Institute, IBM, Baidu, Western General Hospital, Arup), Medical Imaging (SINAPSE), and Oil and Gas (DNV-GL).

International collaboration

Through the meetings/workshops discussed previously, sponsored research visits, and appointment of staff with established connections, the NAIS programme has led to substantially enhanced international collaborations. This includes links/collaborations with researchers based in the US (Rice, Chicago, Texas, Wisconsin Madison University, UC Berkeley), across Europe, (CERFACS (France), Besancon (France), Paris (France), Geneva (Switzerland), Louvain-la-Neuve (Belgium), Bonn (Germany) and the Swiss National Supercomputer Centre), as well as the rest of the world (Singapore).

Collaboration leading to building capability

Collaborations with other UK research groups established in this programme will be key to ongoing research excellence/success, e.g. the links between the Informatics Group and EPCC are a key feature of the CDT in Pervasive Parallelism, links between EPCC and UoE Maths were key to obtaining a joint NSF-EPSRC software infrastructures grant. Networks, established and co-organised under this programme will provide long-term opportunities for collaboration of UK NAIS members.

The established industrial/commerce links are being used in on-going and future research projects. For example, Google, SAS, Amazon and Baidu are involved as a project partner in an EPSRC bid, SAS is a project partner in a funded EPSRC grant, NAIS commercial links  were used in shaping the industrial participation for the successful MIGSAA CDT.

Furthermore, there are examples where Industrial Collaborators are using tangible outputs from research e.g algorithms developed in Peter Richtarik’s group. These tangible outputs and collaborations are a key success of the programme and will feed directly the future IMPACT case studies from the partner institutions.

The international collaborations have provided enhanced scope for research visits. The ability to attract world-leading academics to the partner institutions for seminars/workshops/research visits was a real asset of the NAIS programme. These collaborations have enhanced the international standing of the research as well as providing opportunities to attract world-leading expertise to the UK/partner institutions.

Impact on Research Strategy

The NAIS programme has had significant impact at each of the partner institutions.  The specific benefits vary for each partner.

Edinburgh – Mathematics

NAIS has profoundly altered the scale and intensity research being conducted in the area of computational mathematics. There are many more active researchers, more grant funding, and steady streams of speakers addressing themes such as numerical analysis, computational modelling and parallel numerical operational research-related methods.

Computational fluid dynamics is being developed with a rich, algorithm-intensive approach.

Moreover, NAIS funding has helped us to move forward in research areas such as exploratory data-guided molecular sampling, large scale optimization methods, and other areas that are on the forefront of data-centric modelling.

New hires in the HPC area are now established within a School that actively encourages and values their contributions.

Edinburgh - EPCC

We have built ties with the maths departments within, and associated, with this project, and also with important commercial companies involved with either producing hardware that is used for computational simulation or who are actively using simulation in their business. Particularly, the establishment of an Intel Parallel Computing Centre at EPCC and the involvement of EPCC in the OpenACC standard have both grown out of NAIS work. Furthermore, NAIS has enabled EPCC to be a leader in Exascale software and technologies, which has driven our involvement in a wide range of different projects and initiatives. NAIS funding enabled EPCC to embed academic researchers within the department, moving beyond the model of simply focussing on parallel and HPC enabling and EPCC and collaborating with application scientists, to bring application scientists within EPCC so they can work much more closely together, giving the application scientists a more detailed understanding of HPC issues and challenges for their work and giving HPC experts a more detailed understanding of the requirements of specific scientific fields and the simulation codes they use.

Edinburgh – Informatics

ICSA, our research institute, has expanded by around 35% in size over the course of NAIS, gaining critical mass for a substantial future. The case for this expansion, most directly including Nagarajan's post, has been supported by the increased research activity in the area (all of our NAIS funded work sits squarely within ICSA). Parallel computing is now firmly embedded within the research agenda of both ICSA and the School, and indeed we are now able to play a full role in the growing strength of parallelism research within central Scotland, leading to further collaborative actions and proposals.

Heriot-Watt

The NAIS project has enabled us to enlarge and diversify our computational mathematics group and our focus is now firmly on the development of approximation methods for PDEs, SDEs and BIEs that are both provably reliable as well as being efficient to implement on current and likely future computer systems. Diversification into the linear algebra needed for efficient PDE solvers has come through the appointment of Sebastien Loisel and his continuing group of students. We have built excellent new connections with EPCC, strengthened links with the other NAIS partners and developed a mutually beneficial arrangement with our own CS colleagues sharing HPC equipment and expertise.

Strathclyde

The Department made major commitments with this project, including the cluster and partial funding for scholarships. The expertise built in the department with the hiring of Dolean, plus the topics of the PhD studentships and PDRAs has helped to shift the direction of the Numerical Analysis and Scientific Computing group, and also of other groups in the Department. The Population Modelling Group, in particular the Marine Population Modelling (sub)group, have been intensively using the cluster to run finite differences models of the Clyde River. This has created the awareness that accurate and fast methods are necessary, and then there will be an increase of collaboration between the NASC and MPM groups in the future. Moreover, the involvement of D. Higham in problems involving Big Data, and the two NAIS-funded workshops organised by him, have had a sizeable impact in the future directions within the NASC group.

UK Research Capability

During the NAIS programme there were a number of open-ended academic appointments made, numerous training courses run (e.g. HPC and GPU programming), 50+ meetings/workshops organised as well as initiating a large number of collaborative projects.  These all served to strengthen the UK Research Capability. 

Highlights include

  • Strengthening of cross-discipline collaboration between the partner institutions and across the UK-wide NAIS network
  • Significant contribution to the parallel computing research within the NAIS partner institutions.
  • NAIS appointments heavily involved in PhD supervision and post-docs
  • NAIS appointments  with expertise in bridging the interface between Maths and HPC
  • HPC-related research within University of Edinburgh, School of Mathematics has doubled
  • Enabling EPCC to be a leader in Exascale research in UK, EU and global projects.
  • Involvement in Exascale projects such as Nu-FuSE and CRESTA
  • Strengthening of ICSA group, attracting substantial RCUK and Industrial funding.
  • NAIS provided the foundation for follow-on projects, examples include
    • CDTs – Analysis and It’s Application, Pervasive Parallelism and Data Science
    • EPSRC-NSF Software Infrastructure for Sustained Innovation
    • EPSRC Grants, Coordinate Descent for Big Data Problems and Algorithms for Data Simplicity.

Publications

On the adaptive selection of the parameter in stabilized finite element approximations, Ainsworth, M., Allendes, A., Barrenchea, G.R. and Rankin, R. SIAM Journal on Numerical Analysis 51(3):1585-1609, 2013

Fully computable a posteriori error bounds for stabilized FEM approximations of convection-reaction-diffusion problems in three dimensions, Ainsworth, M., Allendes, A., Barrenchea, G.R. and Rankin, R. International Journal for Numerical Methods in Fluids 73(9):765-790, 2013

Computable error bounds for nonconforming Fortin-Soulie finite element approximation of the Stokes problem, Ainsworth, M., Allendes, A., Barrenchea, G.R. and Rankin, R. IMA Journal of Numerical Analysis 32(2):417-447, 2012

Guaranteed computable bounds on quantities of interest in finite element computations, Ainsworth, M. and Rankin, R. International Journal for Numerical Methods in Engineering 89(13):1605-1634, 2012

Computable bounds for the error in finite element approximations of linear elasticity problems, Ainsworth, M. and Rankin, R. Proceedings of 2nd International Symposium on Frontiers of Computational Sciences 74-87, 2012

Realistic computable error bounds for three dimensional finite element analyses in linear elasticity, Ainsworth, M. and Rankin, R. Computer Methods in Applied Mechanics and Engineering 200(21-22):1909-1926, 2011

Bernstein-Bezier finite elements of arbitrary order and optimal assembly procedures, Ainsworth, A., Andriamaro, G. and Davydov, O. SIAM Journal on Scientific Computing 33(6):3087-3109, 2011

Stable numerical coupling of exterior and interior problems for the wave equation, Banjai, L., Lubich, Ch. and Sayas, FJ., Numerische Mathematik, 2014

Fast convolution quadrature for wave equation in three dimensions, Banjai, L. and Kachanovska, M., Journal of Computational Physics

Sparsity of Runge-Kutta convolution weights for the three-dimensional wave equation, Banjai, L. and Kachanovska, M., BIT Numerical Mathematics, 2014

Fully discrete versions of Kirchhoff's formula with CQ-BEM, Banjai, L., Laliena, A. and Sayas, FJ., IMA Journal of Numerical Analysis

Time-domain Dirichlet-to-Neumann map and its descretization, Banjai, L. IMA Journal of Numerical Analysis, 2013

Numerical Simulations for the Non-Linear Molodensky Problem, Banz, L., Costea, A., Gimperlein, H. and Stephan, E.P., Studia Geophysica et Geodaetica 2014

Time domain BEM for sound radiation of tyres, Banz, L., Gimperlein, H., Nezhi, Z. and Stephan, E.P., 2014

Stabilized mixed hp-BEM for frictional contact problems in linear elasticity, Banz, L., Gimperlein, H., Issaoui, A. and Stephan, E.P., arXiv 1407.1803, 2014

Finite element eigenvalue enclosures for the Maxwell operator, Barrenchea, G.R., Boulton, L and Boussaid, N. arXiv:1112.1592, 2013

The 2–Lagrange Multiplier Method Applied to Nonlinear Transmission Problems for the Richards Equation in Heterogeneous Soil with Cross Points, Berninger, H., Loisel, S. and Sander, O. accepted for publication in SISC, 2014

Turbulent Navier-stokes Analysis of an Oscillating wing in a Power-extraction Regime using the Shear Stress Transport Turbulence Model, Campobasso, M.S., Piskopakis, A., Drofelnik, J. and Jackson, A. Computers and Fluids 88:136-155, 2013

Branching and Bounding Improvements for Global Optimization Algorithms with Lipschitz Continuity Properties, Cartis, C., Fowkes, J. and Gould, N., ERGO Technical Report 13-010, 2013                                           see also oBB software

A Machine Learning-Based Approach for Thread Mapping on Transactional Memory Applications, Castro, M., Wanderley Goes, L.F., Pousa Ribeiro, C., Cole, M., Cintra, M. and Mehaut, J-F., Proceedings of 18th Annual International Conference on High Performance Computing (HiPC11), 1-10, 2011

MaSIF: Machine Learning Guided Auto-tuning of Parallel Skeletons, Collins, A., Fensch, C., Leather, H. and Cole, M. to appear in IEEE International Conference on High Performance Computing (HiPC13), 2013

Auto-tuning Parallel Skeletons, Collins, A., Fensch, C. and Leather, H., Parallel Processing Letters 22(02):124005, 2012

Optimization Space Exploration of the FastFlow Parallel Framework, Collins, A., Fensch, C. and Leather, H., presented at HLPUGPU'12 as part of HiPEAC'12, 2012

A Nash-Hormander Iteration and Boundary Elements for the Molodensky Problem, Costea, A., Gimperlein, H. and Stephan, E.P., Numerische Mathematik 127:1-34, 2014

Approximation by sums of piecewise linear polynomials, Davydov, O. and Rabarison, F. Journal of Approximation Theory 185:107-123, 2014

Managing and Analysing Genomic Data using HPC and Clouds, Dobrzelecki, B., Krause, A., Piotrowski, M. and Hong, N.C., in Grid and Cloud Database Management (Eds Fiore, S. and Giovanni, A.) Springer 2011

Sharp Condition Number Estimates for the Symmetric 2-Lagrange Multiplier Method, Drury, S.W. and Loisel, S., Lecture Notes in Computational Science and Engineering 91:255-261, 2013 in Domain Decomposition Methods in Science and Engineering XX

The Optimized Schwarz Method with a Coarse Grid Correction, Dubois, O., Gander, M.J., Loisel, S., St-Cyr, A. and Szyld, D.B., SIAM Journal on Scientific Computing 34(1):A421-458, 2012

TSO-CC: Consistency Directed Cache Coherence for TSO, Elver, M. and Nagarajan, V. presented at The International Symposium on High-Performance Computer Architecture (HPCA14), 2014

Smart, Adaptive Mapping of Parallelism in the Presence of External Workload, Emani, M.K., Wang, Z, and O'Boyle, M.F.P., in International Symposium on Code Generation and Optimization (CGO'13), 2013

A Novel Technique to Improve Parallel Program Performance Co-executing with Dynamic Workloads, Emani, M.K. and O'Boyle, M.P.F., Workshop on Performance Engineering and Applications, IEEE International Conference on High Performance Computing (HiPC) 2013

Self-Adaptive Parallelism Mapping in Dynamic Environments, Emani, M.K. and O'Boyle, M.P.F., in Doctoral Forum, USENIX International Conference on Autonomic Computing (ICAC), 2013

Fast and accurate analysis of large-scale composite structures with the parallel multilevel fast mutlipole algorithm, Ergul, O. and Gurel, L. Journal of the Optical Society of America A 30(3):509-517, 2013

Accurate solutions of extremely large integral-equation problems in computational electromagnetics, Ergul, O. and Gurel, L. IEEE Proceedings 101(2):342-349, 2013

Rigorous analysis of double-negative materials with the multlevel fast multipole algorithm, Ergul, O. and Gurel, L. ACES Journal 27(2):161-168, 2012

Fast and accurate solutions of electromagnetics problems involving lossy dielectric objects with the multilevel fast multipole algorithm, Ergul, O. Engineering Analysis with Boundary Elements 36(3):423-432, 2012

Analysis of composite nanoparticles with surface integral equations and the multilevel fast multipole algorithm, Ergul, O. Journal of Optics 14(6):062701-062701, 2012

Parallel-MLFMA solutions of large-scale problems involving composite objects, Ergul, O. and Gurel, L. in Antennas and Propagation Society International Symposium (APSURSI) IEEE, 2012

Analysis of composite objects involving multiple dielectric and metallic parts with the parallel multilevel fast multipole algorithm, Ergul, O. and Gurel, L. in Proceedings of the International Review of Progress in Applied Computational Electromagnetics (ACES), 2012

Solutions of large-scale electromagnetics problems involving dielectric objects with the parallel multilevel fast multipole algorithm, Ergul, O. Journal of the Optical Society of America A 28(11):2261-2268, 2011

Parallel implementation of MLFMA for homegeneous objects with various material properties, Ergul, O. Progress in Electromagnetics Research 121:505-520, 2011

Fast and accurate analysis of homogenized metamaterials with the surface integral equations and the multilevel fast multipole algorithm, Ergul, O. IEEE Antennas and Wireless Propagation Letters 10:1286-1289, 2011

Rigorous solutions of large-scale dielectric problems with the parallel multilevel fast multipole algorithm, Ergul, O. and Gurel, L. in Proceedings of the XXX URSI General Assembly and Scientific Symposium of the International Union of Radio Science, 2011

Analysis of double-negative materials with surface integral equations and the multilevel fast multipole algorithm, Ergul, O. and Gurel, L in Proceedings of the Computational Electromagnetics International Workshop, 2011

Analysis of lossy dielectric objects with the multilevel fast multipole algorithm, Ergul, O. and Gurel, L. in Antennas and Propagation Society International Symposium (APSURSI) IEEE, 2011

Benchmark solutions of large problems for evaluating accuracy and efficiency of electromagnetics solvers, Ergul, O. and Gurel, L. in Antennas and Propagation Society International Symposium (APSURSI) IEEE, 2011

Accuracy: the frequently overlooked parameter in the solution of extremely large problems, Ergul, O. and Gurel, L. in Proceedings of the European Conference on Antennas and Propagation (EuCAP), 2011

Smooth Minimization of Nonsmooth Functions with Parallel Coordinate Descent Methods, Fercoq, O, and Richtarik, P., arXiv:1309.5885, 2013

Accelerated, Parallel and Proximal Coordinate Descent, Fercoq, O. and Richtarik, P, arXiv:1312.5799, 2013

An Optimal Block Iterative Method and Preconditioner for Banded Matrices with Applications to PDEs on Irregular Domains, Gander, M.J., Loisel, S. and Szyld, D.B., SIAM Journal on Matrix Analysis and Applications 33(2):653-680, 2012

A prior error estimates for a time-dependent boundary element method for the acoustic wave equation in a half-space, Gimperlein, H., Nezhi, Z. and Stephan, E.P., arXiv:1406.7566, 2014

Adaptive Fe-BE coupling for strongly nonlinear transmission problems with friction II, Gimperlein, H. and Stephan, E.P., arXiv:1310-6325, 2013

Parallel Skeletons, Gorlatch, S. and Cole, M., in Encyclopedia of Parallel Computing (Ed. D. Padua), pages 1417-1422, Springer, 2011

Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems, Grewe, D., Wang, Z. and O'Boyle, M.F.P., in International Symposium on Code Generation and Optimization (CGO'13), 2013

OpenCL Task Partitioning in the Presence of GPU Contention, Grewe, D., Wang, Z. and O'Boyle, M.F.P., in International Workshop on Languages and Compilers for Parallel Computing (LCPC'13), 2013

A Workload-Aware Mapping Approach for Data-Parallel Programs, Grewe, D., Wang, Z. and O'Boyle, M.P.F., in 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), 2011

Hierarchical parallelization of the multlevel fast multipole algorithm (MLFMA), Gurel, L. and Ergul, O. IEEE Proceedings 101(2):332-341, 2013

Accuracy and efficiency considerations in the solution of extremely large electromagnetics problems, Gurel, L. and Ergul, O. in Proceedings of the Computational Electromagnetics International Workshop, 2011

Partition of unity finite element method for time-dependent diffusion problems:an error estimate, Iqbal, M., Gimperlein, H., Mohammed, M.S. and Laghrouche, O. Infrastructure and Environment Scotland 2nd postgraduate Conference, 2014

Distributed Computing Practice for Large-Scale Science & Engineering Applications, Jha, S., Cole, M., Katz, D.S., Parashar, M., Rana, O. and Weissman, J., Concurrency and Computation: Practice and Experience 25(11):1559-1585, 2012

The EPCC OpenACC Benchmark Suite, Johnson, N. and Jackson, A., presented at Exascale Applications and Software Conference, 2013

Semi-Stochastic Gradient Descent Methods, Konecny, J. and Richtarik, P. arXiv:1312.1666, 2013

Address-aware Fences, Lin, C., Nagarajan, V. and Gupta, R., Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS'13), 313-324, 2013

Efficient Sequential Consistency via Conflict Ordering, Lin, C., Nagarajan, V., Gupta, R. and Rajaram, B., Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII), 273-286, 2012

Efficient Sequential Consistency Using Conditional Fences, Lin, C., Nagarajan, V. and Gupta, R., International Journal of Parallel Programming 40(1):84-117, 2012

Efficient Sequential Consistency Using Conditional Fences, Lin, C., Nagarjan, V. and Gupta, R., Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10), 295-306, 2010

Condition Number Estimates and Weak Scaling for 2-Level 2-Lagrange Multiplier Methods for General Domains and Cross Points, Loisel, S. Submitted

Condition Number Estimates for the Nonoverlapping Optimized Schwarz Method and the 2-Lagrange Multiplier Method for General Domains and Cross Points, Loisel, S., SIAM Journal on Numerical Analysis 51(6):3062-3083, 2013

Minimum Polynomial Extrapolation in MATLAB and in R, Loisel, S. and Takane, Y., Submitted

Generalized GIPSCAL Re-revisited: a Fast Convergent Algorithm with Acceleration by the Minimal Polynomial Extrapolation, Loisel, S. and Takane, Y., Advances in Data Analysis and Classification 5(1):57-75, 2011

Optimized Domain Decomposition Methods for the Spherical Laplacian, Loisel, S., Cote, J., Gander, M.J., Laayouni, L. and Qaddouri, A., SIAM Journal on Numerical Analysis 48(2):524-551, 2010

Partans: An Autotuning Framework for Stencil Computation on Multi-GPU Systems, Lutz, T., Fensch, C., and Cole, M. ACM Transactions on Architecture and Code Optimization 9(4) Article 59, 2013

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures, Mohanty, S. and Cole, M. PMAM 2014, International Workshop of Programming Models and Applications for Multicores and Manycores, 2014

Autotuning Wavefront Abstractions for Heterogenous Architectures, Mohanty, S. and Cole, M., Proceedings of the 3rd Workshop on Applications for Multi-Core Architectures at SBAC-PAD, 2012

A System for Debugging via Online Tracing and Dynamic Slicing, Nagarajan, V., Jeffrey, D., Gupta, R. and Gupta, N., Software: Practice and Experience 42(8):995-1014, 2012

Fast RMWs for TSO: Semantics and Implementation, Rajaram, B., Nagarajan, V., Sarkar, S. and Elver, M., Proceedings of the 34th ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI'13), 61-72, 2013

SuperCoP: a General, Correct, and Performance-efficient Supervised Memory System, Rajaram, B., Nagarajan, V., McPherson, A.J. and Cintra, M., Proceedings of the 9th Conference on Computing Frontiers (CF'12), 85-94, 2012

Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function, Richtarik, P. and Takac, M., Mathematical Programming 144(1-2):1-38, 2014

Distributed Coordinate Descent Method for Learning with Big Data, Richtarik, P. and Takac, M., arXiv:1310.2059, 2013

On Optimal Probabilities in Stochastic Coordinate Descent Methods, Richtarik, P. and Takac, M., arXiv:1310.3438, 2013

Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes, Richtarik, P., Takac, M. and Ahipasaoglu, S.D., arXiv:1212.4137, 2012

Parallel Coordinate Descent Methods for Big Data Optimization, Richtarik, P. and Takac, M., arXiv:1212.0873, 2012

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design, Richtarik, P. and Takac, M., in Operations Research Proceedings 2011, SpringerLink 2012

Schwarz Preconditioner for the Stochastic Finite Element Method, Subber, W. and Loisel, S, submitted to DD12

Schwarz Preconditioners for Stochastic Elliptic PDEs, Subber, W. and Loisel, S., Computer Methods in Applied Mechanics and Engineering 272:34-57, 2014

TOP-SPIN: TOPic discovery via Sparse Principal component INterference, Takac, M., Ahipasaoglu, S.D., Cheung, N-M. and Richtarik, P., arXiv:1311.1406, 2013

Mini-Batch Primal and Dual Methods for SVMs, Takac, M., Bijral, A., Srebro, N. and Richtarik, P., JMLR Workshop and Conference Proceedings 28(3):1022-1030, 2013

Inexact Coordinate Descent: Complexity and Preconditioning, Tappenden, R., Richtarik, P. and Gondzio, J., arXiv:1304.5530, 2013

Separable Approximations and Decomposition Methods for the Augmented Lagrangian, Tappenden, R., Richtarik, P. and Buke, B., arXiv:1308.6774, 2013

Resource Analyses for Parallel and Distributed Coordination, Trinder, P. W., Cole, M., Hammond, K., Loidl, H-W. and Michaelson, G.J., Concurrency and Computation: Practice and Experience 25(3):309-348, 2013

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications, Wanderley Goes, L.F., Pousa Ribeiro, C., Castro, M., Mehaut, J-F., Cole, M. and Cintra, M., International Journal of Parallel Programming 42(2):365-382, 2014

Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications, Wanderley Goes, L.F., Ioannou, N., Xekalakis, P., Cole, M. and Cintra, M., IEEE Transactions on Parallel and Distributed Computing 23(12):2205-2218, 2012

Integrating Profile-Driven Parallelism Detection and Machine-Learning Based Mapping, Wang, Z., Tournavitis, G., Franke, B. and O'Boyle, M.P.F., ACM Transaction on Architecture and Code Optimization 11(1) Article 2, 2014

Partitioning Streaming Parallelism for Multi-cores: A Machine Learning  Based Approach, Wang, Z. and O'Boyle, M.P.F, in International Conference on Parallel Architectures and Compilation Techniques (PACT'10), 2010