This article reports the findings of the Workshop on High Performance
Computing and Communications (HPCC) for Grand Challenge Applications:
Computer Vision, Speech and Natural Language Processing (SNLP), and Artificial
Intelligence (AI). The goals of the workshop are to identify applications,
research problems, and designs of HPCC systems for supporting applications
in these areas. In computer vision, we have identified the main scientific issues as
machine learning, surface reconstruction, inverse optics and integration,
model acquisition, and perception and action. Since vision algorithms
operate in different levels of granularity, computers for supporting these
algorithms need to be heterogeneous and modular. Advances in technology,
new architectural concepts, and software design methods are essential for
this area. In SNLP, we have identified issues in statistical analysis in
corpus-based speech and language understanding, search strategies for language
analysis, auditory and vocal-tract modeling, integration of multiple levels
of speech and language analyses, and connectionist systems. Similar to
algorithms in computer vision, algorithms in SNLP require high
computational power, ranging from general purpose supercomputing to special pur-
pose VLSI systems. As processing has various requirements, a hybrid
heterogeneous computer system is the most desirable. In AI, important issues that need immediate attention include the
development of efficient machine learning and heuristic search methods that
can adapt to different architectural configurations, and the design and
construction of scalable and verifiable knowledge bases, active memories,
and artificial neural networks. A computer system for supporting AI
applications is heterogeneous, requiring research in high speed computer networks, mass storage and efficient retrieval methods, computer languages,
and hardware and software design tools. Research in these areas is inherently multidisciplinary and will
require active participation of researchers in device and networking
technologies, signal processing, computer architecture, software engineering,
and knowledge engineering. Besides extending current frontiers in
research, an important aspect to be emphasized is the integration of
existing components and results into working systems.
In this paper, we have studied various supervised learning methods for
training feed-forward neural networks. In general, such learning can be
considered as a nonlinear global optimization problem in which the goal is
to minimize a nonlinear error function that spans the space of weights,
using heuristic strategies that look for global optima (in contrast to
local optima). We have surveyed various global optimization methods
suitable for neural-network learning, and have proposed the Novel method
(Nonlinear global Optimization method Via External Lead) for nonlinear
optimization and neural network learning. By combining global and local
searches, we show how Novel can be used to find a good local minimum in
the error space. Our key idea is to use a user-defined trace that pulls
a search out of a local minimum without having to restart it from a new
starting point. Using five benchmark problems, we have compared Novel
against some of the best global optimization algorithms and have
demonstrated its superior improvement in performance. The key points studied and summarized in this paper are as follows:
(a) Global optimization involves strategies to escape from local
minima after reaching there,
(b) A good global search strategyy can find better local minima,
(c) The search space of small neural networks is more rugged, requiring
more powerful global search methods to get convergence,
(d) A better local minimum in the error space does not always lead to
neural networks that generalize better,
(e) Smaller networks tend to generalize better.
Sun Sparc object code training software and results
In this paper, we propose the concept of regression cubes in which regression models can be
obtained for any data cuboid efficiently without accessing the raw data. The traditional classification
of measures into distributive, algebraic, and holistic measures cannot accommodate advanced
statistical models such as regression and filters. In this paper, we propose a fundamentally new
class of measures, compressible measures, in order to support efficient computation of the statistical
models. For compressible measures, we can efficiently compute the statistical models at an
arbitrary data cell, while each data cell is compressed into a small auxiliary matrix with a fixed
size independent of the number of tuples. We can then compute the statistical measures for any
data cell from the compressed data of the lower-level cells without accessing the raw data. Time-
and space-efficient lossless aggregation formulae are derived for regression and filtering measures.
Our analytical and experimental studies show that the proposed approach substantially reduces
the memory usage and the overall response time for statistical analysis of multidimensional data.
In this paper, we investigate methods for online, multi-dimensional regression analysis of
time-series stream data, with the following contributions: (1) our analysis shows that only a
small number of compressed regression measures instead of the complete stream of data
need to be registered for multi-dimensional linear regression analysis, (2) to facilitate
online stream data analysis, a partially materialized data cube model, with regression as
measure, and a tilt time frame as its time dimension, is proposed to minimize the amount
of data to be retained in memory or stored on disks, and (3) an exception-guided drilling
approach is developed for on-line, multi-dimensional exception-based regression analysis.
Based on this design, algorithms are proposed for efficient analysis of time-series data
streams. Our performance study compares the proposed algorithms and identifies the most
memory- and time-efficient one for stream data analysis.
Real-time surveillance systems and other dynamic envoronments often generate tremendous (potentially
infinite) volume of stream data: the volume is too huge to be scanned multiple times.
However, much of such data resides at rather low level of abstraction, whereas most analysts are
interested in dynamic changes (such as trends and outliers) at relatively high levels of abstraction.
To discover such high level characteristics, one may need to perform online multilevel analysis of
stream data, similar to OLAP (online analytical processing) of relational and data warehouse data.
With limited storage space and the demand for fast response, is it realistic to promote online,
multi-dimensional analysis and mining of stream data to alert people about dramatic changes of
situations at multiple levels of abstraction?
In this paper, we present an architecture, called stream-cube, which, based on our analysis,
is feasible for successful online multi-dimensional, multi-level analysis of stream data. By
successfu, we mean that the system will provide analytical power and flexibility, derive timely
and quality responses, and consume limited memory space and other resources.
The general design of the stream-cube architecture is described as follows.
First, a tilt time frame model is taken as the default model for time dimension.
Such a model reduces the amount of data to be retained in memory or stored on disks but still
achieves flexibility and analysis power. Second, a small number of critical layers are maintained
for flexible analysis. Consider that the stream data resides at the primitive layer. It is desirable
to identify two critical higher layers in applications: the minimal interest layer, and the observation
layer. These two layers can be used as the basis for stream cube construction and for multi-level
online analysis. Third, instead of computing a complete data cube based on the notion of critical
layers, one may compute only the layers along a popular path and leave others for query-driven,
online computation. Our analysis shows that this architecture provides a good time-space trade-off:
it decreases susbstantially the demand for memory usage and reduces the overall response time.
Other possible designs for OLAPing stream data are also discussed in the paper.
Efficient scheduling techniques of computing resources
are essential for achieving satisfactory performance for
users as computer systems and their applications become more complex.
In this paper, we survey research on scheduling algorithms,
review previous classifications of scheduling problems,
and present a broader classification scheme.
Using a uniform terminology for scheduling strategies
and the new classification scheme, previous
work on scheduling strategies is reviewed,
and trends in scheduling research are identified.
Finally, a methodology for developing scheduling strategies
is presented.
In this paper, we present the design of a system for automatically
learning and evaluating new heuristic methods that can be used to map a set
of communicating processes on a network of computers. Our learning system
is based on testing a population of competing heuristic methods within a
fixed time constraint. We develop and analyze various resource scheduling
strategies based on a statistical model that trades between the number of
new heuristic methods considered and the amount of testing performed on
each. We implement a prototype learning system (Teacher 4.1) for learning
new heuristic methods used in Post-Game Analysis, a system that iteratively
generates and refines mappings of a set of communicating processes on a
network of computers. Our performance results show that a significant
improvement can be obtained by a systematic exploration of the space of
possible heuristic methods.
In this paper we present new methods for the automated learning of
heuristics in knowledge-lean applications and for finding heuristics that can be
generalized to unlearned domains. These applications lack domain knowledge
for credit assignment; hence, operators for composing new heuristics are
generally model-free, domain independent, and syntactic in nature. The
operators we have used are genetics-based; examples of which include
mutation and cross-over. Learning is based on a generate-and-test paradigm
that maintains a pool of competing heuristics, tests them to a limited
extent, creates new ones from those that perform well in the past, and
prunes poor ones from the pool. We have studied three important issues in
learning better heuristics: (a) anomalies in performance evaluation, (b)
rational scheduling of limited computational resources in testing candidate
heuristics in single-objective as well as multi-objective learning, and (c)
finding heuristics that can be generalized to unlearned domains. We show
experimental results in learning better heuristics for (a) process
placement for distributed-memory multicomputers, (b) node decomposition in a
branch-and-bound search, (c) generation of test patterns in VLSI circuit
testing, and (d) VLSI cell placement and routing.
In this paper, we model search algorithms with meta-control, allowing
resource constraints, approximation, and parallel processing to be
incorporated easily in the search process. The basic building block of the
model is a hierarchical search process (HSP) consisting of context-free and
context-sensitive grammars classified according to problem-independent and
problem-dependent parts. The context-sensitive components are used mainly
for evaluating decision parameters and in ordering production rules in the
context-free grammar. The execution of the grammars for given initial
conditions may invoke other HSPs already defined in the system. We describe
ISE (acronym for Integrated Search Environment), a tool that implements
hierarchical searches with meta-control. By separating the
problem-dependent and problem-independent components in ISE, new search methods
based on a combination of existing methods can be developed easily by
coding a single master control program. Further, new applications solved by
searches can be developed by coding the problem-dependent parts and reusing
the problem-independent parts already developed. We describe the organization of ISE and present experiments carried out on the system.
In this paper, we study a new learning model for designing heuristics
automatically under resource constraints. We focus on improving
performance-related heuristic methods (HMs) in knowledge-lean application
domains. We assume that learning is episodic, that the performance
measures of an episode are dependent only on the final state reached in
evaluating the corresponding test case and are independent of the
intermediate decisions and internal states, and that the aggregate performance
measures of the HMs involved are independent of the order of evaluation of
test cases. The learning model is based on testing a population of
competing HMs for an application problem and switches from one to another
dynamically, depending on the outcome of previous tests. Its goal is to find a
good HM within the resource constraints, with proper tradeoff between cost
and quality. It complements existing point-based machine learning models
that maintain only one incumbent HM, and that tests the HM extensively
before switching to alternative ones. It extends existing work on classifier
systems by addressing issues related to delays in feedback, scheduling of
tests of HMs under limited resources, anomalies in performance evaluation,
and scalability of HMs. Finally, we describe our experience in applying
the learning method on five application problems.
In this paper, we develop new methods for adjusting configuration
parameters of genetic algorithms operating in a noisy environment. Such methods
are related to the scheduling of resources for tests performed in genetic
algorithms. Assuming that the population size is given, we address two
problems related to the design of efficient scheduling algorithms
specifically important in noisy environments. First, we study the duration-scheduling
problem that is related to setting dynamically the duration of
each generation. Second, we study the sample-allocation problem that
entails the adaptive determination of the number of evaluations taken from
each candidate in a generation. In our approach, we model the search
process as a statistical selection process and derive equations useful for
these problems. Our results show that our adaptive procedures improve the
performance of genetic algorithms over that of commonly used static ones.
Two-level pipelining in processor arrays (PAs) involves pipelining of
operations across processing elements (PEs) and pipelining of operations in
functional units in each PE. Although it is an attractive method for
improving the throughput of PAs, existing methods for generating PAs with
two-level pipelining are restricted and cannot systematically explore the
entire space of feasible designs. In this paper, we extend a systematic
design method, called General Parameter Method (GPM), we have developed
earlier to find optimal designs of PAs with two-level pipelines. The basic
idea is to add new constraints on periods of data flows to include the
effect of internal functional pipelines in the PEs. As an illustration, we
present pipelined PA designs for computing matrix products. For n-dimensional meshes and other symmetric problems, we provide an efficient
scheme to obtain a pipelined PA from a non-pipelined PA using a reindexing
transformation. This scheme is used in GPM as a pruning condition to
arrive at optimal pipelined PAs efficiently. For pipelines with minimum initiation
interval (MII) greater than unity, we show additional constraints
that ensure correctness of the synthesized PAs.
In this paper, we have studied the scheduling of computational
resources for genetic algorithms operating in a noisy environment.
Assuming that the number of candidates in each generation is fixed, we
have studied two related scheduling problems: determining a suitable
duration for a generation, and finding the appropriate number of
samples to be drawn from each candidate. Our results show that dynamic
scheduling policies perform better that existing static ones.
Although we have used simple test functions in our experiments, the
advantage of using dynamic scheduling is universal and is independent
of the reproduction mechanism of individual genetic algorithms.
Processor arrays are frequently used to deliver high performance in many
applications with computationally intensive operations. This paper
presents the General Parameter Method (GPM), a systematic parameter-based
approach for synthesizing such algorithm-specific architectures. GPM can
synthesize processor arrays of any lower dimension from a
uniform-recurrence description of the algorithm. The design objective is a general
non-linear and non-monotonic user-specified function, and depends on
attributes such as computation time of the recurrence on the processor array,
completion time, load time, and drain time. In addition, bounds on some or
all of these attributes can be specified. GPM performs an efficient search
of polynomial complexity to find the optimal design satisfying the
user-specified design constraints. As an illustration, we show how GPM can be
used to find optimal linear processor arrays for computing transitive
closures. We consider design objectives that minimize computation time, or
processor count, or completion time (including load and drain times), and
user-specified constraints on number of processing elements and/or
computation/completion times. We show that GPM can be used to obtain
optimal designs that trade between number of processing elements and
completion time, thereby allowing the designer to choose a design that best meets
the specified design objectives. We also show the equivalence between the
model assumed in GPM and that in the popular dependence-based methods.
Consequently, GPM can be used to find optimal designs for both models.
In this paper, we present the design of a processor array (PA) with bounded
number of input/output (I/O) ports (L-by-N PEs with O(L) I/O ports, where
1 leq L leq N) for computing transitive closures. Previous designs of PAs to
solve this problem use either N-by-N PAs with O(N) I/O ports or linear PAs
with O(1) I/O ports. These designs are restricted by their I/O bandwidth
because designs using O(N) I/O ports are limited by the number of PEs that
can be supported by these ports, whereas designs with O(1) ports are
limited by the I/O time to load and drain data. In contrast, our proposed
design uses O(L) I/O ports, where L is a technology-dependent design
parameter. In our proposed architecture, we have used more powerful processing
elements (PEs), each with a microprogram-driven ALU, a limited number of
I/O buffers and O(N/L) memory words. Our technique for mapping
transitive-closure computations consists of four steps. (1) Starting from
a mapping of a transitive-closure problem on an N-by-N PA, we map the
computations into an L-by-N PA. (2) We then schedule the computations in the
L-by-N PA in order to satisfy the dependencies. (3) Based on the new
mapping, we derive speeds and directions of data flows and the number of
buffers required in each PE. (4) Finally, we derive the microprogram in
each PE for correct execution. We show that our design specializes to
efficient designs in the boundary cases when L=1 (linear PA) and L=N (square
mesh).
In this paper, we study the performance of various IDA*-style searches and
investigate methods to improve their performance by predicting in each
stage the threshold to be used for pruning. Without loss of generality, we
consider minimization problems in this paper. We first present three
models to approximate the distribution of the number of search nodes by
lower bounds: exponential, geometric, and linear, and illustrate these
distributions based on some well-known combinatorial search problems. Based on
these distributions, we show the performance of an ideal IDA* algorithm and
identify reasons why existing IDA*-style algorithms perform well. In
practice, we will be able to know from experience the type of distribution for
a given problem instance, but will not be able to know the parameters of
this distribution until the instance is solved. Hence, we develop RIDA*, a
method that estimates dynamically the parameters of the distribution, and
predicts the best threshold to be used in each stage. Finally, we compare
the performance of several IDA*-style algorithms --- Korf's IDA* and RBFS,
RIDA*, IDA*_CR and DFS* --- on several application problems, and identify
conditions under which each of these algorithms will perform well.
This paper describes Dynamic Workload Generator (DWG), a facility for
generating realistic and reproducible synthetic workloads for use in
load-balancing experiments. For such experiments, the generated workload must
not only mimic the highly dynamic resource-utilization patterns found on
today's distributed systems but also behave as a real workload does when
test jobs are run concurrently with it. The latter requirement is
important in testing alternative load-balancing strategies, a process that
requires running the same job multiple times, each time at a different site
but under an identical network-wide workload. Parts of DWG are implemented
inside the operating-system kernel and have complete control over the
utilization levels of four key resources: CPU, memory, disk, and network.
Besides accurately replaying network-wide load patterns recorded earlier,
DWG gives up a fraction of its resources each time a new job arrives and
reclaims these resources upon job completion. The latter operation is
controlled by pattern-doctoring rules implemented in DWG. We present DWG's
architecture, its doctoring rules, systematic methods for adjusting and
evaluating doctoring rules, and experimental results on a network of Sun
workstations.
In this paper we present a method called Novel (Nonlinear Optimization via
External Lead) for solving continuous and discrete global optimization
problems. Novel addresses the balance between global search and local
search, using a trace to aid in identifying promising regions before
committing to local searches. We discuss Novel for solving continuous
constrained optimization problems and show how it can be extended to solve
constrained satisfaction and discrete satisfiability problems. We first
transform the problem using Lagrange multipliers into an unconstrained
version. Since a stable solution in a Lagrangian formulation only
guarantees a local optimum satisfying the constraints, we propose a global
search phase in which an aperiodic and bounded trace function is added to
the search to first identify promising regions for local search. The trace
generates an information-bearing trajectory from which good starting points
are identified for further local searches. Taking only a small portion of
the total search time, this elegant approach significantly reduces
unnecessary local searches in regions leading to the same local optimum.
We demonstrate the effectiveness of Novel on a collection of continuous
optimization benchmark problems, finding the same or better solutions while
satisfying the constraints. We extend Novel to discrete constraint
satisfaction problems (CSPs) by showing an efficient transformation method
for CSPs and the associated representation in finite-difference equations
in Novel. We apply Novel to solve Boolean satisfiability instances in
circuit fault detection and circuit synthesis applications, and show
comparable performance when compared to the best existing method.
In this paper, we develop TCGD, a problem-independent, time-constrained,
approximate guided depth-first search (GDFS) algorithm. The algorithm is
designed to achieve the best ascertained approximation degree under a fixed
time constraint. We consider only searches with finite search space and
admissible heuristic functions. We study NP-hard combinatorial
optimization problems with polynomial-time computable feasible solutions.
For the problems studied, we observe that the execution time increases
exponentially as approximation degree decreases, although anomalies may
happen. The algorithms we study are evaluated by simulations using the
symmetric traveling-salesperson problem.
In this paper, we present two learning mechanisms for artificial neural
networks (ANNs) that can be applied to solve classification problems with
binary outputs. These mechanisms are used to reduce the number of hidden
units of an ANN when trained by the cascade-correlation learning algorithm
(CAS). Since CAS adds hidden units incrementally as learning proceeds, it
is difficult to predict the number of hidden units required when
convergence is reached. Further, learning must be restarted when the number of
hidden units is larger than expected. Our key idea in this paper is to
provide alternatives in the learning process and to select the best
alternative dynamically based on run-time information obtained. Mixed-mode
learning (MM), our first algorithm, provides alternative output matrices so
that learning is extended to find one of the many one-to-many mappings
instead of finding a unique one-to-one mapping. Since the objective of
learning is relaxed by this transformation, the number of learning epochs
can be reduced. This in turn leads to a smaller number of hidden units
required for convergence. Population-based Learning for ANNs (PLAN), our
second algorithm, maintains alternative network configurations and to
select at run time promising networks to train based on error information
obtained and time remaining. This dynamic scheduling avoids training
possibly unpromising ANNs to completion before exploring new ones. We show
the performance of these two mechanisms by applying them to solve the
two-spiral problem, a two-region classification problem, and the Pima Indian
Diabetes Diagnosis problem.
In this paper, we present new results on the automated generalization of
performance-related heuristics learned for knowledge-lean applications. By
first applying genetics-based learning to learn new heuristics for some
small subsets of test cases in a problem space, we study methods to
generalize these heuristics to unlearned subdomains of test cases. Our method
uses a new statistical metric called probability of win. By assessing the
performance of heuristics in a range-independent and distribution-independent manner, we can compare heuristics across problem subdomains in
a consistent manner. To illustrate our approach, we show experimental
results on generalizing heuristics learned for sequential circuit testing,
VLSI cell placement and routing, branch-and-bound search, and blind
equalization. We show that generalization can lead to new and robust heuristics
that perform better than the original heuristics across test cases of
different characteristics.
Problem solvers employ strategies in searching for solutions to given problem instances.
Strategies have traditionally been designed by experts using prior knowledge and refined
manually using trial and error. Recent attempts to automate these processes have produced
strategy-learning systems. This paper shows how various issues in strategy learning
are affected by the nature of performance tasks, problem solvers, and learning environments.
Surveyed learning systems are grouped by the commonality of their approaches into four
general architectures.
Satisfiability is a class of NP-complete problems that model a wide range
of real-world applications. These problems are difficult to solve because
they have many local minima in their search space, often trapping greedy
search methods that utilize some form of descent. In this paper, we
propose a new discrete Lagrange-multiplier-based global-search method for
solving satisfiability problems. We derive new approaches for applying
Lagrangian methods in discrete space, show that equilibrium is reached when
a feasible assignment to the original problem is found, and present
heuristic algorithms to look for equilibrium points. Our method and
analysis provides a theoretical foundation and generalization of local
search schemes that optimize the objective alone and clause-weight schemes
that optimize the constraints alone. In contrast to local search methods
that restart from a new starting point when a search reaches a local trap,
the Lagrange multipliers in our method provide a force to lead the search
out of a local minimum and move it in the direction provided by the
Lagrange multipliers. In contrast to clause-weight schemes that rely only
on the weights of violated constraints to escape from local minima, our
method also uses the value of an objective function (in this case the
number of violated constraints) to provide further guidance. The dynamic
shift in emphasis between the objective and the constraints, depending on
their relative values, is the key of Lagrangian methods. One of the major
advantages of our method is that it has very few algorithmic parameters to
be tuned by users, and the search procedure can be made deterministic and
the results, reproducible. We demonstrate our method by applying it to
solve an extensive set of benchmark problems archived in DIMACS of Rutgers
University. Our method often performs better than the best existing
methods and can achieve an order-of-magnitude speedup for some problems.
Dynamic load-balancing strategies for distributed systems seek to improve
average completion time of independent tasks by migrating each incoming
task to the site where it is expected to finish the fastest: usually the
site having the smallest load index. SMALL is an off-line learning system
for developing configuration-specific load-balancing strategies; it learns
new load indices as well as tunes the parameters of given migration
policies. Using a dynamic workload generator, a number of typical
systemwide load patterns are first recorded; the completion times of
several benchmark jobs are then measured at each site, under each of the
recorded load patterns. These measurements are used to simultaneously train
comparator neural networks, one per site. The comparators collectively
model a set of perfect load indices in that they seek to rank, at arrival
time, the possible destinations for an incoming task by their (not yet
known) respective completion times. The numerous parameters of the
decentralized dynamic load-balancing policy are then tuned using a genetic
algorithm. We present experimental results for a mix of scientific and
interactive workloads on Sun workstations connected by Ethernet. The
policies tuned by SMALL are shown to intelligently and effectively exploit
idle resources.
In this paper, we study the design of a coprocessor (CoP) to execute
efficiently recursive algorithms with uniform dependencies. Our design is
based on two objectives: (a) fixed bandwidth to main memory (MM) and (b)
scalability to higher performance without increasing MM bandwidth. Our CoP
has an access unit (AU) organized as multiple queues, a processor array
(PA) with regularly connected processing elements (PEs), and input/output
networks for data routing. Our design is unique because it addresses
input/output bottleneck and scalability, two of the most important issues
in integrating processor arrays in current systems. To allow processor
arrays to be widely usable, they must be scalable to high performance with
little or no impact on the supporting memory system. The use of multiple
queues in AU also eliminates the use of explicit data addresses, thereby
simplifying the design of the control program. We present a mapping
algorithm that partitions a data dependence graph (DG) of an application
into regular blocks, sequences the blocks through AU, and schedules the
execution of the blocks, one at a time, on PA. We show that our mapping
procedure minimizes the amount of communication between blocks in the
partitioned DG, and sequences the blocks through AU to reduce the
communication between AU and MM. Using the matrix-product and
transitive-closure applications, we study design trade-offs involving (a) division of
a fixed chip area between PA and AU, and (b) improvements in speedup with
respect to increases in chip area. Our results show, for a fixed chip
area, (a) that there is little degradation in throughput in using a linear
PA as compared to a PA organized as a square mesh, and (b) that the design
is not sensitive to the division of chip area between PA and AU. We
further show that, for a fixed throughput, there is an inverse square root
relationship between speedup and total chip area. Our study demonstrates
the feasibility of a low-cost, memory bandwidth-limited, and scalable
coprocessor system for evaluating recurrent algorithms with uniform
dependencies.
In this paper, we present a new method to handle inequality constraints and
apply it in Novel (Nonlinear Optimization via External Lead), a system we
have developed for solving constrained continuous nonlinear optimization
problems. In general, in applying Lagrange-multiplier methods to solve
these problems, inequality constraints are first converted into equivalent
equality constraints. One such conversion method adds a slack variable to
each inequality constraint in order to convert it into an equality
constraint. The disadvantage of this conversion is that when the search is
inside a feasible region, some satisfied constraints may still pose a
nonzero weight in the Lagrangian function, leading to possible oscillations
and divergence when a local optimum lies on the boundary of a feasible
region. We propose a new conversion method called the MaxQ method such
that all satisfied constraints in a feasible region always carry zero
weight in the Lagrange function; hence, minimizing the Lagrange function in
a feasible region always leads to local minima of the objective function.
We demonstrate that oscillations do not happen in our method. We also
propose methods to speed up convergence when a local optimum lies on the
boundary of a feasible region. Finally, we show improved experimental
results in applying our proposed method in Novel on some existing benchmark
problems and compare them to those obtained by applying the method based on
slack variables.
Lagrangian methods are popular in solving continuous constrained optimization
problems. In this paper, we address three important issues in applying
Lagrangian methods to solve optimization problems with inequality constraints.
First, we study methods to transform inequality constraints into equality constraints.
An existing method, called the slack-variable method, adds a slack variable to each
inequality constraint in order to transform it into an equality constraint.
Its disadvantage is that when the search trajectory is inside a feasible region,
some satisfied constraints may still pose some effect on the Lagrangian function,
leading to possible oscillations and divergence when a local minimum lies on the
boundary of the feasible region. To overcome this problem, we propose the
MaxQ method that carries no effect on satisfied constraints.
Hence, minimizing the Lagrangian function in a feasible region always
leads to a local minimum of the objective function. We also study some
strategies to speed up its convergence.
Second, we study methods to improve the convergence speed
of Lagrangian methods without affecting the solution quality.
This is done by an adaptive-control strategy that dynamically adjusts the
relative weights between the objective and the Lagrangian part, leading to
better balance between the two and faster convergence.
Third, we study a trace-based method to pull the search trajectory from
one saddle point to another in a continuous fashion without restarts.
This overcomes one of the problems in existing Lagrangian methods that converges
only to one saddle points and requires random restarts to look for new saddle points,
often missing good saddle points in the vicinity of saddle points already found.
Finally, we describe a prototype Novel (Nonlinear Optimization via External
Lead) that implements our proposed strategies and present improved solutions in
solving a collection of benchmarks.
In this paper, we define the generalization problem, summarize
various approaches in generalization, identify the credit assignment
problem, and present the problem and some solutions in measuring
generalizability. We discuss anomalies in the ordering of hypotheses
in a subdomain when performance is normalized and averaged, and show
conditions under which anomalies can be eliminated. To generalize performance
across subdomains, we present a measure called probability of win
that measures the probability whether a hypothesis is better than another.
Finally, we discuss some limitations in using probabilities of win and
illustrate their application in finding new parameter values for
TimberWolf, a package for VLSI cell placement and routing.
In this paper, we present a new discrete Lagrangian method
for designing multiplierless QMF (quadrature mirror filter) banks.
The filter coefficients in these filter banks are in powers-of-two
(PO2), where numbers are represented as sums or differences of powers of two
(also called Canonical Signed Digit--CSD--representation), and multiplications
are carried out as additions, subtractions and shifts.
We formulate the design problem as a nonlinear discrete constrained
optimization problem, using reconstruction error as the objective,
and stopband and passband energies, stopband and passband ripples
and transition bandwidth as constraints.
Using the performance of the best existing designs as constraints,
we search for designs that improve over the best existing designs
with respect to all the performance metrics.
We propose a new discrete Lagrangian method for finding good designs,
and study methods to improve the convergence speed of Lagrangian
methods without affecting their solution quality.
This is done by adjusting dynamically the relative weights between
the objective and the Lagrangian part.
We show that our method can find designs that improve over
Johnston's benchmark designs using a maximum of three to six ONE
bits in each filter coefficient instead of using floating-point representations.
Our approach is general and is applicable to
the design of other types of multiplierless filter banks.
In this paper, we present a new search method based on the theory of discrete Lagrange multipliers
for designing multiplierless PR (perfect reconstruction) LP (linear phase) filter banks.
To satisfy the PR constraints, we choose a lattice structure that, under
certain conditions, can guarantee the resulting two filters to be a PR pair.
Unlike the design of multiplierless QMF filter banks that represents
filter coefficients directly using PO2 (powers-of-two) form (also called
Canonical Signed Digit or CSD representation), we use PO2 forms to represent
the parameters associated with the lattice structure.
By representing these parameters as sums or differences of powers of two,
multiplications can be carried out as additions, subtractions, and shifts.
Using the lattice representation, we decompose the
design problem into a sequence of four subproblems.
The first two subproblems find a good starting point with continuous
parameters using a single-objective, multi-constraint formulation.
The last two subproblems first transform the continuous solution found by the
second subproblem into a PO2 form, then search for a design in a mixed-integer space.
We propose a new search method based on the theory of discrete
Lagrange multipliers for finding good designs, and study methods to improve
its convergence speed by adjusting dynamically the relative weights between
the objective and the Lagrangian part.
We show that our method can find good designs using at most four terms in PO2 form
in each lattice parameter.
Our approach is unique because our results are the first successful designs of
multiplierless PR-LP filter banks. It is general because it is applicable to the
design of other types of multiplierless filter banks.
In this paper, we explore the loss behavior encountered in
transmitting real-time voice over the Internet and propose a
new loss-concealment scheme to improve its received quality.
One known technique to conceal loss is to send interleaved
streams of voice samples and reconstruct missing or late
samples by interpolation at the receiver. Based on this
method, we propose a new transformation-based reconstruction
algorithm. Its basic idea is for the sender to transform an
input voice stream, according to the interpolation method
used at the receiver and the predicted loss behavior, before
interleaving the stream. The transformation is derived by
minimizing reconstruction error in case of loss. We show
that our method is computationally efficient and can be
extended to various interleaving factors and interpolation-
based reconstruction methods. Finally, we show performance
improvements of our method by testing it over the Internet.
Nonlinear constrained optimization problems in discrete and continuous
spaces are an important class of problems studied extensively in operations
research and artificial intelligence.
These problems can be solved by a Lagrange-multiplier method in continuous
space and by an extended discrete Lagrange-multiplier method in discrete space.
When constraints are satisfied, these methods rely on gradient
descents in the objective space to find high-quality solutions.
On the other hand, when constraints are violated, these methods rely on
gradient ascents in the Lagrange-multiplier space in order to increase the
penalties on unsatisfied constraints and to force these constraints into satisfaction.
The balance between gradient descents and gradient ascents depends on the
relative weights between the objective function and the constraints, which indirectly
control the convergence speed and solution quality of the Lagrangian method.
To improve convergence speed without degrading solution quality, we
propose an algorithm to dynamically control the relative weights between the
objective and the constraints. Starting from an initial weight, the algorithm
automatically adjusts the weights based on the behavior of the search progress.
With this strategy, we are able to eliminate divergence, reduce oscillations,
and speed up convergence. We show improved convergence behavior of our proposed
algorithm on both nonlinear continuous and discrete problems.
This paper studies various strategies in constrained simulated annealing (CSA),
a global optimization algorithm that achieves asymptotic convergence to
constrained global minima (CGM) with probability one for solving discrete
constrained nonlinear programming problems (NLPs).
The algorithm is based on the necessary and sufficient condition for discrete
constrained local minima (CLM) in the theory of discrete Lagrange multipliers
and its extensions to continuous and mixed-integer constrained NLPs.
The strategies studied include adaptive neighborhoods, distributions to
control sampling, acceptance probabilities, and cooling schedules.
We report much better solutions than the best-known solutions in the literature
on two sets of continuous benchmarks and their discretized versions.
In this paper we study the solution of SAT problems formulated as
discrete decision and discrete constrained optimization problems.
Constrained formulations are better than traditional unconstrained formulations because violated
constraints may provide additional forces to lead a search towards a satisfiable assignment.
We summarize the theory of extended saddle points in penalty formulations for solving discrete
constrained optimization problems and the associated discrete penalty method (DPM).
We then examine various formulations of the objective function, choices of neighborhood in
DPM, strategies for updating penalties, and heuristics for avoiding traps.
Experimental evaluations on hard benchmark instances pinpoint that traps contribute
significantly to the inefficiency of DPM and force a trajectory to repeatedly visit the
same set of or nearby points in the original variable space.
To address this issue, we propose and study two trap-avoidance strategies.
The first strategy adds extra penalties on unsatisfied clauses inside a trap,
leading to very large penalties for unsatisfied clauses that are trapped more
often and making these clauses more likely to be satisfied in the future.
The second strategy stores information on points visited
before, whether inside traps or not, and avoids visiting points that are close to points visited before.
It can be implemented by modifying the penalty function in such a way that, if a trajectory gets close
to points visited before, an extra penalty will take effect and force the trajectory to a new region.
It specializes to the first strategy because traps are special cases of points visited before.
Finally, we show experimental results on evaluating benchmarks in the DIMACS and SATLIB
archives and compare our results with existing results on GSAT, WalkSAT, LSDL, and Grasp. The results
demonstrate that DPM with trap avoidance is robust as well as effective for solving hard SAT problems.
Packet and compression losses are two sources of quality losses when streaming
compressed video over unreliable IP networks, such as the Internet.
In this paper, we propose two new approaches for concealing such losses.
First, we present a joint sender-receiver approach for designing transforms in multi-description
coding (MDC).
In the receiver, we use a simple interpolation-based reconstruction algorithm,
as sophisticated concealment techniques cannot be employed in real time.
In the sender, we design an optimized reconstruction-based discrete cosine transform
(ORB-DCT) with an objective of minimizing the mean squared error, assuming that some of the
descriptions are lost and that the missing information is reconstructed by simple averaging
at the destination. Second, we propose an artificial neural network
to compensate for compression losses introduced in MDC.
Experimental results show that our proposed algorithms perform well in real Internet tests.
Time-series predictions by artificial neural networks (ANNs) are traditionally
formulated as unconstrained optimization problems. As an unconstrained
formulation provides little guidance on search directions when a search
gets stuck in a poor local minimum, we have proposed to use a constrained
formulation in order to use constraint violations to provide additional
guidance. In this paper, we formulate ANN learning with cross-validations
for time-series predictions as a non-differentiable nonlinear constrained
optimization problem. Based on our theory of Lagrange multipliers for discrete
constrained optimization, we propose an efficient learning algorithm, called
violation guided back-propagation (VGBP), that computes an approximate
gradient using back-propagation (BP), that introduces annealing to avoid
blind acceptance of trial points, and that applies a relax-and-tighten
(R&T) strategy to achieve faster convergence. Extensive experimental
results on well-known benchmarks, when compared to previous work, show
one to two orders-of-magnitude improvement in prediction quality, while
using less weights.
In this paper, we present a new approach for designing filter banks for image compression.
This approach has two major components: optimization and generalization.
In the optimization phase, we formulate the design problem as a nonlinear optimization
problem whose objective consists of both the performance metrics of the image coder,
such as the peak signal-to-noise ratio (PSNR), and those of individual filters.
Filter banks are optimized in the optimization phase based on a set of training images.
In the generalization phase, the filter bank that can be generalized to other images
is selected from the candidates obtained in the optimization phase to be the final result.
The filter bank selected should perform well not only on the training examples used in
the design process but also on test cases not seen.
In contrast to existing methods that design filter banks independently from the other
operations in an image compression algorithm, our approach allows us to find filter
banks that work best in a specific image compression algorithm for a certain class of images.
In system prototype development, we adopt the agent-based approach to achieve better modularity,
portability, and scalability. Agents in the multi-agent system are specialized in performing
problem formulation, image compression, optimization, and generalization. In the experiments,
we show that on a set of benchmark images our approach has found filter banks that perform
better than the existing filter banks in different image compression algorithms and at different
compression ratios.
Quality-delay trade-offs can be made in transmitting subband-coded
images in the Internet by using either the TCP or the UDP protocol.
Delivery by TCP gives superior decoding quality but with very long
delays when the network is unreliable, whereas delivery by UDP has
negligible delays but with degraded quality when packets are lost.
Although images are delivered primarily by TCP today, we study in this paper the use
of UDP to deliver multi-description reconstruction-based subband-coded images and the
reconstruction of missing information at the receiver based on information received.
We first determine empirically the interleaving factors that should be used in
order to keep the probability of unrecoverable packet losses sufficiently small.
Next, we propose a joint sender-receiver approach for designing
transforms in multi-description subband coding.
In the receiver, we use a simple interpolation-based reconstruction algorithm,
as sophisticated concealment techniques cannot be employed in practice.
In the sender, we design an optimized reconstruction-based subband
transform (ORB-ST), with an objective of minimizing the mean squared error,
assuming that some of the descriptions are lost and that the missing
information is reconstructed by simple averaging at the destination.
Experimental results show that our proposed ORB-ST performs well in real Internet
tests, and UDP delivery of MDC images is an attractive alternative to TCP delivery.
A fundamental issue in real-time interactive voice transmissions over
unreliable IP networks is the loss or late arrival of packets for playback.
This problem is especially serious when transmitting low bit rate-coded
speech with pervasive dependencies introduced.
In this case, the loss or late arrival of a single packet
will lead to the loss of subsequent dependent frames.
In this paper, we study end-to-end loss-concealment
schemes for ensuring high quality in playback.
We propose a novel multiple description-coding method for
concealing packet losses in transmitting low bit rate-coded speech.
Based on high correlations observed in linear predictor parameters--in the form
of Line Spectral Paris (LSPs)--of adjacent frames, we generate
multiple descriptions in senders by interleaving LSPs, and
reconstruct lost LSPs in receivers by linear interpolations.
As excitation codewords have low correlations, we further enlarge the
segment size for excitation generation and replicate excitation codewords
in all the descriptions in order to maintain the same transmission bandwidth.
Our proposed scheme can be extended easily to more than two descriptions
and can adapt its number of descriptions dynamically to network-loss conditions.
Experimental results on FS-1016 CELP, ITU G.723.1, and
FS MELP coders show good performance of our scheme.
At the 2001 IEEE International Conference on Data Mining in San Jose, California on
November 29 - December 2, 2001, there was a panel discussion on how data mining research
meets practical development. One of the motivations for organizing the panel discussion
was to provide useful advice for industrial people to explore their directions in data
mining development. Based on the panel discussion, this paper presents the views and
arguments from the panel members, the Conference Chair and the Program Committee Co-Chairs.
These people as a group have both academic and industrial experiences in different data
mining related areas such as databases, machine learning, and neural networks. We will
answer such as (1) how far is data mining from practical development, (2) how data mining
research differs from practical development, and (3) what are the most promising areas in
data mining for practical development?
This paper presents a procedural framework that unifies various
mechanisms to look for discrete-neighborhood saddle points in
solving discrete constrained optimization problems (DCOPs).
Our approach is based on the necessary and sufficient condition on local
optimality in discrete space, which shows the one-to-one correspondence between the discrete-space
constrained local minima of a problem and the saddle points of the corresponding Lagrangian function.
To look for such saddle points, we study various mechanisms for performing ascents of the Lagrangian
function in the original-variable subspace and descents in the Lagrange-multiplier subspace.
Our results show that CSAEA, a combined constrained simulated
annealing and evolutionary algorithm, performs well when using
mutations and crossovers to generate trial points and accepting them
based on the Metropolis probability. We apply iterative deepening to
determine the optimal number of generations in CSAEA and show that
its performance is robust with respect to changes in population
size. To test the performance of the procedures developed, we apply them
to solve some continuous and mixed-integer nonlinear programming (NLP)
benchmarks and show that they obtain better results
than those of existing algorithms.
In this paper, we study the partitioning of constraints in temporal plan-
ning problems formulated as mixed-integer nonlinear programming (MINLP)
problems. Constraint partitioning is attractive because it leads to much
easier subproblems, where each is a significant relaxation of the original
problem. Moreover, each subproblem is very similar to the original problem
and can be solved by any existing solver with little or no modification.
Constraint partitioning, however, introduces global constraints that may be
violated when subproblems are evaluated independently. To reduce the over-
head in resolving such global constraints, we develop in this paper new
conditions and algorithms for limiting the search space to be backtracked
in each subproblem. Using a penalty formulation of a MINLP where the con-
straint functions of the MINLP are transformed into non-negative functions,
we present a necessary and sufficient extended saddle-point condition (ES-
PC) for constrained local minimization. When the penalties are larger than
some thresholds, our theory shows a one-to-one correspondence between a
constrained local minimum of the MINLP and an extended saddle point of the
penalty function. Hence, one way to find a constrained local minimum is to
increase gradually the penalties of those violated constraints and to look
for a local minimum of the penalty function using any existing algorithm
until a solution to the constrained model is found. Next, we extend the
ESPC to constraint-partitioned MINLPs and propose a partition-and-resolve
strategy for resolving violated global constraints across subproblems. Us-
ing the discrete-space ASPEN and the mixed-space MIPS planners to solve
subproblems, we show significant improvements on some planning benchmarks,
both in terms of the quality of the plans generated and the execution times
to find them.
``Data Mining: How Research Meets Practical Development,''
Knowledge and Information Systems: An International Journal,
Springer-Verlag, vol. 5, no. 2, April 2003, pp. 248-261.
We study in this paper the partitioning of the constraints of a temporal planning
problem by subgoals, their sequential evaluation before parallelizing the actions,
and the resolution of inconsistent global constraints across subgoals.
Using a Lagrangian formulation and the theory of extended saddle points,
we propose a global-search strategy that looks for local minima in the
original-variable space of the Lagrangian function and for local maxima in the
Lagrange-multiplier space. Our approach improves over a previous scheme that
partitions constraints along the temporal horizon. The previous scheme leads to many global
constraints that relate states in adjacent stages, which means that an incorrect
assignment of states in an earlier stage of the horizon may violate a global constraint
in a later stage of the horizon. To resolve the violated global constraint in
this case, state changes will need to propagate sequentially through multiple stages, often
leading to a search that gets stuck in an infeasible point for an extended period of time.
In this paper, we propose to partition all the constraints by subgoals and to add new global constraints in
order to ensure that state assignments of a subgoal are consistent with those in other subgoals.
Such an approach allows the information on incorrect state assignments in one subgoal to propagate quickly
to other subgoals. Using MIPS as the basic planner in a partitioned implementation, we demonstrate significant
improvements in time and quality in solving some PDDL2.1 benchmark problems.
``Subgoal Partitioning and Global Search for Solving Temporal Planning Problems in Mixed Space,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd.,
vol. 13, no. 4, Dec. 2004, pp. 767-790.
For packet video, information loss and bandwidth limitation are two factors that affect video playback quality.
Traditional rate allocation approaches have focused on optimizing video quality under bandwidth constraint alone.
However, in the best-effort Internet, packets carrying video data are susceptible to losses, which need to be
reconstructed at the receiver side. In this paper, we propose loss aware rate allocations in both group-of-block
(GOB) level and macroblock level, given that certain packets are lost during transmissions and reconstructed
using simple interpolation methods at the receiver side. Experimental results show that our proposed algorithms
can produce videos of higher quality when sent over lossy Internet.
``Loss Aware Rate Allocations in H.263 Coded Video Transmissions,''
Journal of Circuits, Systems, and Computers, World Scientific Publishing Co. Pte. Ltd.,
vol. 14, no. 6, Dec. 2005. pp. 1157-1171.
Real-time surveillance systems, telecommunication systems, and other dynamic environments often
generate tremendous (potentially infinite) volume of stream data: the volume is too huge to be scanned multiple
times. Much of such data resides at rather lowlevel of abstraction, whereas most analysts are interested in relatively
high-level dynamic changes (such as trends and outliers). To discover such high-level characteristics, one may need
to perform on-line multi-level, multi-dimensional analytical processing of stream data. In this paper, we propose
an architecture, called stream cube, to facilitate on-line, multi-dimensional, multi-level analysis of stream data.
For fast online multi-dimensional analysis of stream data, three important techniques are proposed for efficient
and effective computation of stream cubes. First, a tilted time frame model is proposed as a multi-resolution model
to register time-related data: the more recent data are registered at finer resolution, whereas the more distant data
are registered at coarser resolution. This design reduces the overall storage of time-related data and adapts nicely
to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels,
we propose to maintain a small number of critical layers. Flexible analysis can be efficiently performed based on
the concept of observation layer and minimal interesting layer. Third, an efficient stream data cubing algorithm
is developed which computes only the layers (cuboids) along a popular path and leaves the other cuboids for
query-driven, on-line computation. Based on this design methodology, stream data cube can be constructed and
maintained incrementally with a reasonable amount of memory, computation cost, and query response time. This
is verified by our substantial performance study.
Stream data cube architecture facilitates online analytical processing of stream data. It also forms 36 a preliminary
data structure for online stream data mining. The impact of the design and implementation of stream data cube in
the context of stream data mining is also discussed in the paper.
``The Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams,''
Distributed and Parallel Databases: Special Issue on Data Warehousing and Data Streams,
Springer-Verlag, Sept. 2005.
In this paper, we present the partitioning of mutual-exclusion (mutex) constraints in
temporal planning problems and its implementation in the SGPlan planner. Based on the strong
locality of mutex constraints observed in many benchmarks of the Fourth International Planning Competition (IPC4),
we propose to partition the constraints of a planning problem into groups based on their subgoals.
Constraint partitioning leads to significantly easier subproblems that are similar to the original problem and
that can be efficiently solved by the same planner with some modifications to its objective function.
We present a partition-and-resolve strategy that looks for locally optimal subplans in constraint-partitioned
temporal planning subproblems and that resolves those inconsistent global constraints across the subproblems.
We also discuss some implementation details of SGPlan, which include the resolution of
violated global constraints, techniques for handling producible resources, landmark analysis,
path finding and optimization, search-space reduction, and modifications of Metric-FF when
used as a basic planner in SGPlan. Last, we show results on the sensitivity of each
of these techniques in quality-time trade-offs and experimentally demonstrate that SGPlan
is effective for solving the IPC3 and IPC4 benchmarks.
``Temporal Planning using Subgoal Partitioning and Resolution in SGPlan,''
J. of Artificial Intelligence Research,
AAAI Press, vol. 26, Aug. 2006, pp.323-369.
A temporal flexible planning problem that involves contingent and
requirement events can be formulated as a simple temporal network
with uncertainty (STNU). An STNU is controllable when there
is a strategy for executing the requirement events (or actions) in
such a way that all the conditions involving contingent events
can be satisfied in all situations. The most interesting and
useful controllability property is dynamic controllability in
which the remaining actions in an STNU can always be scheduled
under all possible feasible durations of future contingent events when all the
past contingent events are known. In this paper, we propose and study a
novel problem of assigning bounds on the duration of each requirement
link in order for the resulting STNU to be dynamically controllable and
to minimize the total cost over the allowed durations of all requirement links.
We first prove the NP hardness of the problem with a
linear cost function. We then formulate the dynamic
controllability of an STNU as the constraints in a nonlinear
optimization problem. Finally, we present methods for reducing the
number of constraints in order to make the problem tractable and
to demonstrate the computational performance of our methods.
``Optimization of Bounds in Temporal Flexible Plans with Dynamic Controllability,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd.,
Feb. 2007, vol. 16, no. 1, pp. 17-44.
As OLAP engines being widely used nowadays to support multidimensional data analysis and
intelligent decision making, it is desirable to support in data cubes advanced statistical measures,
such as regression and filtering measures, in addition to the traditional simple measures such as
count and average. Such advanced measures will allow users to model, smooth, and predict the
trends and patterns of data. Although raw data is generally collected at the lowest level, an analyst
is often more interested in generating higher and multiple levels of abstraction without accessing
the raw data. Efficient computation of aggregated statistical measures in multidimensional spaces
is a challenging problem as existing aggregation algorithms for simple distributive and algebraic
measures are inadequate for such computation.
``Regression Cubes with Lossless Compression and Aggregation,''
IEEE Trans. on Knowledge and Data Engineering,
vol. 18, no. 12, Dec. 2006, pp. 1585-1599.
In this paper, we study strategies in incremental planning for ordering and grouping subproblems partitioned
by the subgoals of a planning problem. To generate a rich set of partial orders for ordering subproblems,
we propose an algorithm based on a relaxed plan that ignores the delete lists. The new algorithm considers
both the initial and the goal states and can effectively order subgoals in such a way that greatly reduces the
number of invalidations during incremental planning. We have also considered trade-offs between the granularity
of the subgoal sets and the complexity of solving the overall planning problem. We propose an efficient strategy
for dynamically adjusting the grain size in partitioning in order to minimize the total complexity.
We further evaluate a redundant-ordering scheme that uses two different subgoal orders to improve the solution
quality, without greatly sacrificing run-time efficiency. Experimental results on using Metric-FF, YAHSP, and
LPG-TD-speed as the embedded planners in incremental planning show that our strategies are general for improving
the time and quality of these planners across various benchmarks. Finally, we compare the performance of the
three planners, the incremental versions using these planners as embedded planners, and SGPlan4.1.
``Subgoal Ordering and Granularity Control for Incremental Planning,''
Int'l J. of Artificial Intelligence Tools, World Scientific Publishing Co. Pte. Ltd.,
vol. 16, no. 4, Aug. 2007, pp. 707-723.
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have
developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the
MultiWay) algorithm, aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning
when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the
iceberg condition). The latter, represented by BUC, computes the iceberg cube bottom-up and facilitates a priori pruning.
BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous
aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two
algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure,
extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the
iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.
IEEE Trans. on Knowledge and Data Engineering,
vol. 19, no. 1, Jan. 2007, pp. 111-126.
In this paper, we develop a new constrained artificial-neural-network (ANN) formulation and the
associated learning algorithm for predicting stock prices, a difficult time-series prediction problem.
We characterize daily stock prices as a noisy non-stationary time
series and identify its predictable low-frequency components.
Using a recurrent FIR (finite-impulse-response) ANN, we formulate the learning problem
as a constrained optimization problem, develop constraints for incorporating
cross validations, and solve the learning problem using algorithms based on the
theory of extended saddle points for nonlinear constrained optimization.
Finally, we illustrate our prediction results on ten stock-price time series.
Our main contributions in this paper are the channel-specific low-pass filtering of noisy time series
obtained by wavelet decomposition, the transformation of the low-pass signals to improve their stationarity,
and the incorporation of constraints on cross validation that can improve the accuracy of predictions.
Our experimental results demonstrate good prediction accuracy and annual returns.
``Constrained Formulations and Algorithms for Predicting Stock Prices by Recurrent FIR Neural Networks,''
Int'l J. of Information Technology and Decision Making, World Scientific Pub. Co.
vol. 5, no. 4, Dec. 2006, pp. 638-658.
In this paper, we present constrained simulated annealing (CSA), an algorithm that extends
conventional simulated annealing to look for constrained local minima of nonlinear constrained optimization problems.
The algorithm is based on the theory of extended saddle points (ESPs) that shows the one-to-one
correspondence between a constrained local minimum and an ESP of the corresponding penalty function.
CSA finds ESPs by systematically controlling probabilistic descents in the
problem-variable subspace of the penalty function and probabilistic ascents in the penalty subspace.
Based on the decomposition of the necessary and sufficient ESP condition
into multiple necessary conditions, we present constraint-partitioned simulated annealing
(CPSA) that exploits the locality of constraints in nonlinear optimization problems.
CPSA leads to much lower complexity as compared to that of CSA by partitioning the constraints of a problem into
significantly simpler subproblems, solving each independently, and resolving those violated global constraints
across the subproblems.
We prove that both CSA and CPSA asymptotically converge to a constrained global minimum with probability one
in discrete optimization problems. The result extends conventional simulated annealing (SA), which guarantees asymptotic
convergence in discrete unconstrained optimization, to that in discrete constrained optimization.
Moreover, it establishes the condition under which optimal solutions can
be found in constraint-partitioned nonlinear optimization problems.
Finally, we evaluate CSA and CPSA by applying them to solve some continuous constrained
optimization benchmarks and compare their performance to that of other penalty methods.
``Simulated Annealing with Asymptotic Convergence for Nonlinear Constrained Optimization,''
Journal of Global Optimization, Springer-Verlag,
vol. 39, pp. 1-37, 2007.
This paper presents a general method for comparing the conversational quality of speech communication
systems over networks with delays and its application in studying four VoIP systems.
The perceptual evaluation of an interactive conversation depends on the quality and the latency of
the one-way speech segments received. Due to path-dependent and non-stationary conditions in the Internet,
the delays incurred when a conversation switches from one client to another are asymmetric and may lead
to a degraded perceptual quality as compared to that of a face-to-face setting. The conversational dynamics
can further be disrupted by variations in the one-way speech quality and latency. As those factors that
affect conversational quality may be counteracting to each other, trade-offs must be made among them.
Currently available objective and subjective standards, however, do not adequately capture these
trade-offs, and there are no good methods for evaluating the conversational quality of VoIP systems.
In this paper, we present a method for studying these trade-offs under repeatable network and conversational
conditions. When applied on four commonly used VoIP systems, our results show that each achieves some trade-offs,
but none attains the best quality under all conditions. Lastly, we discuss a systematic approach for
predicting user preference between two VoIP systems in terms of objective measures that can be easily acquired.
``Analyzing Voice Quality in Popular VoIP Applications,''
IEEE Multimedia Magazine, vol. 16, Jan-Mar 2009, pp. 46-58.
This paper describes our work on real-time two-party and multi-party VoIP
(voice-over-IP) systems that can achieve high perceptual conversational quality.
It focuses on the fundamental understanding of conversational quality and its trade-offs among the
design of speech codecs and strategies for network control, playout scheduling, and loss concealments.
We have studied three key aspects that address the limitations of existing work and improve the perceptual quality of VoIP systems.
Firstly, we have developed a statistical approach based on just-noticeable difference (JND) to significantly reduce the large number
of subjective tests, as well as a classification method to automatically learn and generalize the results to unseen conditions.
Using network and conversational conditions measured at run time, the classifier learned
helps adjust the control algorithms in achieving high perceptual conversational quality.
Secondly, we have designed a cross-layer speech codec to interface with the loss-concealment and playout
scheduling algorithms in the packet-stream layer in order to be more robust and effective against packet losses.
Thirdly, we have developed a distributed algorithm for equalizing mutual silences and an overlay network for multi-party VoIP
systems. The approach leads to multi-party conversations with high listening only speech quality and balanced mutual silences.
``The Design of VoIP Systems With High Perceptual Conversational Quality,''
Journal of Multimedia, Special Issue on Advances in Interactive Digital Entertainment Technologies,
vol. 4, no. 2, 2009, pp. 49-62.
In this paper, we study the statistical scheduling of offline subjective tests for
evaluating alternative control schemes in real-time multimedia applications.
These applications are characterized by multiple counteracting objective quality metrics
(such as delay and signal quality) that can be affected by various control schemes.
However, the trade-offs among these metrics with respect to the subjective preferences
of users are not defined. As a result, it is difficult to select the proper control schemes
that lead to the best subjective quality at run time. Since subjective tests are expensive to
conduct and the number of possible control schemes and run-time conditions is prohibitively large,
it is important that a minimum number of such tests be conducted offline, and that the results
learned can be generalized to unseen conditions with statistical confidence. To this end, we
study in this paper efficient algorithms for scheduling a sequence of subjective tests,
while leaving the generalization of limited offline subjective tests to guide the operation
of the control schemes at run time to a future paper. Using Monte-Carlo simulations, we verify
the robustness of our algorithms in terms of their accuracy and efficiency.
``Statistical Scheduling of Offline Comparative Subjective Evaluations for Real-Time Multimedia,''
IEEE Trans. on Multimedia, Vol. 11, no. 6, Oct. 2009, pp 1114-1130.
The electronic learning (or e-learning) industry, including Webbased learning, has been booming worldwide in recent years, due to factors
such as reduced environmental costs and impact, quality education made affordable, and convenience and flexibility to learners.
The Internet's ubiquity is promoting new sophisticated techniques and powerful applications for a new paradigm in e-learning that uses
the Web as an interface for single-user as well as collaborative learning. Webbased learning and new concepts such as virtual classrooms, laboratories, and
universities introduce many new issues. The articles in this special issue of IEEE Internet Computing highlight three efforts in creating Internet-based
e-learning technologies that tackle challenges such as management of learning objects in an open and scalable architecture, incorporation of
learners' pedagogical features in Webbased learning environments, and digital game-based learning.
``Emerging Internet Technologies for E-Learning (Guest Editors' Introduction),
IEEE Internet Computing, July/August 2009, pp. 11-17.
Multimedia and networking technologies have significantly impacted on our daily activities, particularly in terms
of how we learn. Nowadays, classroom teaching no longer simply relies on chalk and blackboard as the prime medium
for course dissemination. E-learning technologies have made it possible to provide a virtual classroom environment
on the Web through supporting teacher-student and student-student communications, course material distribution as
well as online student assessments. They provide students with more control over their learning schedule and pace.
On top of this, multimedia technologies further offer students different forms of media to match their learning styles,
leading to enhancements of their learning effectiveness. This extended introduction discusses the latest e-learning
specific multimedia technologies, their research challenges and future trends from both pedagogical and technological
perspectives. We also summarize the papers included in this special issue.
``Recent development in multimedia e-learning technologies,
World Wide Web, vol 17, no. 2, Springer, pp. 189-198.
A distributed virtual environment (DVE) is a shared virtual environment (VE) that allows remote users to interact with each
other through networks. DVEs are becoming very popular due to some prominent applications, such as online games and virtual
worlds. To support a large number of users, a multi-server DVE architecture may be adopted, with each server managing a
subset of users. However, there are two critical problems with this architecture: view inconsistency caused by delays
and server overloading caused by uneven distribution of users. While the first problem affects users' perception of the
VE and causes user disputes, the second problem affects the system response time. In this paper, we first show that the
view inconsistency problem and the load balancing problem are conflicting objectives. We then propose an efficient joint
optimization framework to address both problems. Our results show that the proposed method can improve the view
inconsistency problem significantly, which is important to the interactivity of DVE applications.
``On View Consistency in Multi-Server Distributed Virtual Environments,''
IEEE Trans. on Visualization and Computer Graphics, vol. 20, no. 10, Oct. 2014, pp. 1428-1440.
In recent years, the rapid development of Internet, Internet of Things, and Cloud Computing have led to
the explosive growth of data in almost every industry and business area. Big data has rapidly developed
into a hot topic that attracts extensive attention from academia, industry, and governments around the
world. In this position paper, we first briefly introduce the concept of big data, including its definition,
features, and value. We then identify from different perspectives the significance and opportunities that
big data brings to us. Next, we present representative big data initiatives all over the world. We describe
the grand challenges (namely, data complexity, computational complexity, and system complexity), as
well as possible solutions to address these challenges. Finally, we conclude the paper by presenting
several suggestions on carrying out big data projects.
``Significance and Challenges of Big Data Research,''
Big Data Research, vol. 2, Elsevier, March 2015, pp. 59-64.
Designing a multimedia system to achieve high perceptual quality requires a black-box approach to tune
control inputs according to preferences from subjective tests. The authors present OptPQ, a systematic
method for optimizing perceptual quality using just-noticeable difference (JND) profiles.
``Optimizing the Perceptual Quality of Real-Time Multimedia Applications,''
IEEE Multimedia, vol. 22, no. 4, Oct.-Dec. 2015, pp. 14-28.
The authors explore approaches considered to understand their CUHK's impact on society,
including reputation surveys, altmetrics and bibliometrics. They suggest that deep analysis
of citations may be more meaningful than the current dependence on number of citations as
a proxy for a paper's quality.
``Designing World Class Research for Sociatal Impact,''
The Academic Executive Brief, Elsevier, vol. 6, 2016.
Tuning multimedia applications at run time to achieve high perceptual quality entails the search of nonlinear
mappings that determine how control inputs should be set in order to lead to high user-perceived quality.
Offline subjective tests are often used for this purpose but they are expensive to conduct because each
can only evaluate one mapping at a time and there can be infinitely many such mappings to be evaluated.
In this paper, we present a greedy algorithm that uses a small number of subjective test results to
accurately approximate this space of mappings. Based on an axiom on monotonicity and the property of
just-noticeable differences, we prove its optimality in minimizing the average absolute error between
the approximate and the original mappings. We further demonstrate the results using numerical simulations
and the application of the mappings found to tune the control of the multimedia game BZFlag.
``Optimality of Greedy Algorithm for Generating Just-Noticeable Difference Surfaces,''
IEEE Trans. on Multimedia, vol. 18, no. 7, July 2016, pp. 1330-1337.
When running multi-player online games on IP networks with losses and delays, the order of actions may be changed when
compared to the order run on an ideal network with no delays and losses.
To maintain a proper ordering of events, traditional approaches either use rollbacks to undo certain actions,
or use local lags to introduce additional delays. Both may be perceived by players because their changes are
beyond the just-noticeable-difference (JND) threshold.
In this paper we propose a novel method for ensuring a strongly consistent completion order of actions, where
strong consistency refers to the same completion order as well as the same interval between any completion time
and the corresponding ideal reference completion time under no network delay. We find that small adjustments
within the JND on the duration of an action would not be perceivable, as long as the duration is comparable
to the network round-trip time (RTT). We utilize this property to control the vector of durations of actions
and formulate the search of the vector as a multi-dimensional optimization problem. By using the property that
players are generally more sensitive to the most prominent delay effect (with the highest probability of noticeability,
or the probability of correctly noticing a change when compared to the reference), we prove that the optimal solution
occurs when the probabilities of noticeability, of the individual adjustments are equal. As this search can be done
efficiently in polynomial time (around 5 ms) with a small amount of space (around 160 KB), the search can be done at
run time to determine the optimal control. Lastly, we evaluate our approach on a popular open-source online shooting game BZFlag.
``Consistent Synchronization of Action Order with Least Noticeable Delays in Online Games,''
ACM Trans. on Multimedia Computing, Communications, and Applications (TOMM), vol. 13, no. 1, Jan. 2017.
In dense matching (including stereo matching and optical flow), nearly all existing approaches are based on simple
features, such as gray or RGB color, gradient or simple transformations like census, to calculate matching costs. These features do
not perform well in complex scenes that may involve radiometric changes, noises, overexposure and/or textureless regions. Various
problems may appear, such as wrong matching at the pixel or region level, flattening/breaking of edges and/or even entire
structural collapse. In this paper, we propose two fundamental principles based on the consistency and the distinctiveness of
features. We show that almost all existing problems in dense matching are caused by features that violate one or both of
these principles. To systematically learn good features for dense matching, we develop a general multi-objective optimization
based on these two principles and apply convolutional neural networks (CNNs) to find new features that lie on the Pareto
frontier. By using two-frame optical flow and stereo matching as applications, our experimental results show that the features
learned can significantly improve the performance of state-of-theart approaches. Based on the KITTI benchmarks, our method
ranks first on the two stereo benchmarks and is the best among existing two-frame optical-flow algorithms on flow benchmarks.
``Fundamental Principles on Learning New Features for Effective Dense Matching,''
IEEE Trans. on Image Processing (TIP), vol. 27, no. 2, Feb. 2018, pp. 822-836.
The rapid advancement of computing technologies has facilitated the implementation of AIED (Artificial Intelligence
in Education) applications. AIED refers to the use of AI (Artificial Intelligence) technologies or application
programs in educational settings to facilitate teaching, learning, or decision making. With the help of AI technologies,
which simulate human intelligence to make inferences, judgments, or predictions, computer systems can provide personalized
guidance, supports, or feedback to students as well as assisting teachers or policymakers in making decisions.
Although AIED has been identified as the primary research focus in the field of computers and education, the
interdisciplinary nature of AIED presents a unique challenge for researchers with different disciplinary backgrounds.
In this paper, we present the definition and roles of AIED studies from the perspective of educational needs.
We propose a framework to show the considerations of implementing AIED in different learning and teaching settings.
The structure can help guide researchers with both computers and education backgrounds in conducting AIED studies.
We outline 10 potential research topics in AIED that are of particular interest to this journal.
Finally, we describe the type of articles we like to solicit and the management of the submissions.
``Vision, Challenges, Roles and Research Issues of Artificial Intelligence in Education,''
Computers and Education: Artificial Intelligence,
vol. 1, 2020, pp. 1-5.
This paper presents the concept of kernels to address the complexity of solving big-data applications.
The solution strategies in these applications often require evaluating alternative subspaces on the big data and selecting the best solution.
As the data space in these problems is so vast that it is infeasible to scan the data once, we need domain-specific methods
for identifying a promising and manageable subspace with a high density of solutions before looking for individual solutions.
To this end, we introduce kernels to represent some properties of the statistical quality, average density, or probability of
solutions in a subspace to prune subspaces with suboptimal kernels. We classify various past approaches based on their analysis
methods and illustrate each by an example. Our results confirm that kernels can effectively harness the complexity of solving big-data applications.
``Using Kernels to Harness the Complexity of Big Data Applications,''
Int' Journal on Artificial Intelligence Tools,
vol. 31, no. 3, 2022, pp. 1-11.
This paper presents the design of dominance relations to reduce the space traversed in machine learning for solving
two applications in financial trading and real-time multimedia. A machine-learning algorithm designed for an application with a
huge search space will need to perform an efficient traversal of the space during learning. It will be more effective if it employs
a powerful pruning mechanism to eliminate suboptimal candidates before using them in the learning algorithm. In our approach,
we present dominance relations for pruning subspaces with suboptimal kernels that are otherwise evaluated in learning, where
kernels represent the statistical quality, average density, or probability of solutions in a subspace. Specifically, when one subspace
dominates another by a dominance relation, we can prune the latter and guarantee without searching both that the kernel of the
latter cannot be better than that of the first. As a result, a significant portion of the search space will be pruned by those nondominated
subspaces during learning. In the financial trading application studied, we use mean reversion as our strategy for
learning the set of promising stocks and Pareto-optimality as our dominance relation to reduce the space evaluated in learning.
In the multimedia application, we propose a dominance relation using an axiom from our past work to approximate the subspace
of perceptual qualities within an error threshold. The pruning mechanism allows the learning of the mapping from controls to
perceptual qualities while eliminating the evaluation of all those mappings that are within the error thresholds. In both cases, we
can harness the complexity of machine learning by reducing the candidate space evaluated.
``Dominance Pruning in Machine Learning for Solving Financial Trading and Real-Time Multimedia Applications,''
American Journal of Artificial Intelligence,
Science Publishing Group, vol. 6, no. 2, 2022, pp. 36-47.
In this paper, we present a search algorithm called real-time search for
solving combinatorial optimization problems under real-time constraints.
The problems we study are NP-hard and have good heuristic algorithms for
generating feasible solutions. The algorithm we develop aims at finding
the best possible solution in a given deadline. Since this objective is
generally not achievable without first solving the problem, we use an
alternative heuristic objective that looks for the solution with the best
ascertained approximation degree. Our algorithm schedules a sequence of
guided depth-first searches, each searching for a more accurate solution
(based on the approximation degree set), or solutions deeper in the search
tree (based on the threshold set), or a combination of both. Five versions
of the RTS algorithm for setting approximation degrees and/or thresholds
are formulated, evaluated, and analyzed. We develop an exponential model
for characterizing the relationship between the approximation degrees
achieved and the time taken. Our results show that our RTS algorithm finds
solutions better than a guided depth-first search, and solutions found are
very close to the best possible solution that can be found in the time
allowed. We evaluate our results using the symmetric traveling-salesman,
the knapsack, and the production-planning problems.
In this paper we present a parameter-based approach for synthesizing
systolic architectures from uniform recurrence equations. The scheme presented
is a generalization of the Parameter Method proposed by Li and Wah. The
approach synthesizes optimal arrays of any lower dimension from a general
uniform recurrence description of the problem. In other previous attempts
for mapping uniform recurrences into lower-dimensional arrays, optimality
of the resulting designs is not guaranteed. As an illustration of the
technique, optimal linear arrays for matrix multiplication are given. Due
to space limitation, proofs to theorems in this paper are not shown. In a
companion paper, we present details of the proof of Theorems 1 and 3 and a
different approach for defining the constraints shown in Theorem 2. A
detailed design for solving path-finding problems is also presented there.
In this paper, we develop parallel multistage selection strategies
for guiding the concurrent evaluation of candidates from a large pool of
candidates, with the objective of finding the candidate with the maximum
(resp. minimum) value in a limited amount of time.
We present a statistical model for finding the appropriate number
of candidates to be sampled and when each should be sampled.
We verify the analytical results by simulations and evaluate the
performance of strategies that are analytically intractable.
Our performance results show that there is linear speedup
in parallel processing over selection done sequentially.
Results obtained from application of our selection method
to learn heuristics for a real-world application demonstrate
that heuristics with improved performance can be found
by a systematic search of the space of feasible heuristics in limited time.
Due to space limitation, we are unable to present the details
of the experiments and the exact heuristics found.
Most existing methods for synthesizing systolic architectures can only map
n-dimensional recurrences to n-1-dimensional arrays. In this paper, we
generalize the parameter-based approach of Li and Wah to map n-dimensional
uniform recurrences to any k-dimensional processor arrays, where k is less than n. In
our approach, operations of the target array are captured by a set of
parameters, and constraints are derived to avoid computational conflicts
and data collisions. We show that the optimal array for any objective
function expressed in terms of these parameters can be found by a systematic
enumeration over a polynomial search space. In contrast, previous
attempts do not guarantee the optimality of the resulting designs. We
illustrate our method with optimal single-pass linear arrays for re-indexed
Warshall-Floyd path-finding algorithm. Finally, we show the application of
GPM to practical situations characterized by restriction on resources,
such as processors or completion time, rather than a well-defined objective.
In this paper, we propose a novel search algorithm called band search that
generalizes guided depth-first and best-first searches. The search
allocates a band of at most W nodes in each level of the search tree for storing active nodes in that level, D priority lists, one for each level, for
storing overflow nodes, and D counters, one for each level, for keeping
track of and to limit the degree of backtracking allowed. The algorithm
has three major features: a) it selects for expansion in a best-first
fashion from all nodes in the bands, b) it moves nodes from an overflow
list into the corresponding band in a depth-first fashion, and c) it
restricts backtracking so that at most W nodes in a level are fully searched
before allowing new nodes in this level into the band. The algorithm
specializes to be a guided depth-first search (GDFS) when W is one, and a
best-first search (BFS) when W is unlimited. By choosing W appropriately,
we show empirically that the algorithm can often outperform GDFS and has
performance close to that of BFS. Moreover, the algorithm can be used
instead of GDFS in iterative searches, such as IDA*, MIDA*, and DFS*. Finally,
we identify conditions upon which the algorithm behaves like BFS, and
the anomalous behavior when the algorithm performs worse than GDFS.
This paper presents a method for tuning parameters under a fixed time
constraint for a general binocular stereo-vision algorithm. A major difficulty
in stereo vision, as well as in other vision algorithms, lies in
adjusting the large variety of parameters for maximizing performance. This
effort is usually performed by human experts with a minimum of formal
guidelines. To automate this process, we develop Teacher 4.2, a generate-and-test
system that systematically generates new parameter values by analyzing
the results of previous tests, and that performs limited and controlled
tests on the candidates generated using high-speed computers. The system
is modeled as a statistical selection problem operating under a given time
constraint. It divides the time allowed into stages, where promising
parameter-value sets found in one stage are passed to the next stage for
further testing, and selects the parameter-value set deemed best by the
final stage as the result. We show experimentally that our system can find
new parameter-value sets which in some cases are better than the ones
originally found by extensive hand-tuning and commonly used heuristics. Our
experiments further show that different parameter values may be required
under different objectives and performance constraints. Our system
provides an automated tool for generating new parameters that can be part of
an adaptive stereo-vision system, capable of adapting to changing algorithm
specifications as well as changing goals and target domains.
We present in this paper systolic arrays with constant number of
input/output (I/O) ports for two-dimensional (2-D) FIR and IIR filtering.
Our design has an array of LxN processing elements (PE's), where L
(less than or equal to N) is
a technology-dependent parameter related to the number of I/O ports. Each
PE in our design has a microprogrammed arithmetic logic unit (ALU), a
control unit, a fixed number of I/O buffers, and O(N/L) memory. Our design
specializes to a square mesh when L=N, and a linear array when L=1. It can
implement both FIR and IIR filtering in O(N2M/L) time which is
asymptotically optimal.
Load-balancing systems use workload indices to dynamically schedule jobs.
We present a novel method of automatically learning such indices. Our
approach uses comparator neural networks, one per site, which learn to
predict the relative speedup of an incoming job using only the
resource-utilization patterns observed prior to the job's arrival. Our load indices
combine information from the key resources of contention: CPU, disk,
network, and memory. Our learning algorithm overcomes the lack of job-specific information by learning to compare the relative speedups of different sites with respect to the same job, rather than attempting to
predict absolute speedups. We present conditions under which such learning
is viable. Our results show that indices learnt using comparator networks
correctly pick the best destination in most cases when incoming jobs are
short; accuracy degrades as execution time increases.
In designing application-specific bit-level architectures and in
programming existing bit-level processor arrays, it is necessary to expand a
word-level algorithm into its bit-level form before dependence analysis can
be performed. In this paper, we consider dependence structures of
bit-level algorithms as functions of three components -- dependence structures
of word-level algorithms, dependence structures of the arithmetic
algorithms implementing word-wise operations, and algorithm expansions. Based
on these components, we can derive dependence structures of bit-level
algorithms without using time consuming general dependence analysis methods.
To illustrate our approach, we derive two dependence structures for
bit-level matrix multiplication and apply a method developed earlier to design
two bit-level architectures. One of these architectures is O(p) times
faster than the best word-level architecture, where p is the word length. The
speedup we found here is true in general because a bit in a bit-level
architecture goes to the next processor for processing as soon as it is
available.
In this paper, we present efficient algorithms for adjusting configuration
parameters of genetic algorithms that operate in a noisy environment.
Assuming that the population size is given, we address two problems
specifically important in a noisy environment. First, we study the duration-sizing
problem that determines dynamically the duration of each generation.
Next, we study the sample-allocation (sizing) problem that determines
adaptively the number of evaluations taken from each population in a
generation. For these two problems, we model the search process as a statistical
selection process and derive equations useful for controlling the duration
and the sample sizes. Our result shows that these adaptive procedures
improve the performance of genetic algorithms over those of commonly used
static ones.
The scheduling of tasks for applications with dynamic behavior
traditionally rely on externally observable metrics such as the number of active
processes. This paper presents a new scheduling strategy based on the
observation that it may be possible to capture the near-term resource
requirements of active tasks by expressions involving task parameters. Run-time evaluation of these expressions yields estimates of task behavior that
are valid over a short, future interval of time. The heuristics proposed,
which when used in conjunction with information supplied by profiling, can
be used to annotate the source program with such expressions. Preliminary
simulation results show that the use of near-future estimates in a dynamic
scheduling strategy for divide-and-conquer algorithms consistently improves
over traditional dynamic strategies. The performance of this strategy
approaches that of the best-known deterministic strategy while incurring an
overhead of the same order as other dynamic strategies.
In this paper, we propose a new supervised learning method to improve the
learning speed for feed-forward neural networks. Our method is different
from traditional supervised learning methods (such as back-propagation)
that find a one-to-one mapping between the given input pattern matrix and
the desired output pattern matrix. Instead, it finds one of the
one-to-many mappings between the input matrix and an intermediate output matrix,
and transforms the intermediate output matrix to the desired output matrix
in one step using linear mapping techniques. Learning is faster with our
method because there exist many intermediate output matrices, and learning
can stop whenever one such matrix is found. Our extensive experimental
results show that our learning algorithm converges to within a reasonable
range of error after a few training epochs, making it suitable for dynamic
real-time applications in which the network may need to be re-trained
periodically.
Effective load-balancing policies use dynamic resource information to
schedule tasks in a distributed computer system. In this paper, we present
a novel method for automatically learning such policies. At each site in
our system, we use a comparator neural network to predict the relative
speedup of an incoming task using only the resource-utilization patterns
obtained prior to the task's arrival. Outputs of these comparator networks
are broadcast periodically over the distributed system, and the resource
schedulers at each site use these values to determine the best site for
executing an incoming task. The delays incurred in propagating workload
information and tasks from one site to another, as well as the dynamic and
unpredictable nature of workloads in multiprogrammed multiprocessors, may
cause the workload pattern at the time of execution to differ from patterns
prevailing at the times of load-index computation and decision making. Our
load-balancing policy accommodates this uncertainty by using certain
tunable parameters. We present a population-based machine-learning algorithm
that adjusts these parameters in order to achieve high average speedups
with respect to local execution. Our learning algorithm overcomes the lack
of task-specific information by learning to compare the relative speedups
of different sites with respect to the same site, rather than attempting to
predict absolute speedups. We present experimental results based on the
execution of benchmark programs on a network of Sun workstations connected
by a local area network. Our results show that our load-balancing policy,
when combined with the comparator neural network for workload
characterization, is effective in exploiting idle resources in a distributed computer
system.
In this paper, we present the design of an application-specific coprocessor
for algorithms that can be modeled as uniform recurrences or
``uniformized'' affine recurrences. The coprocessor has a regular array of processors
connected to an access-unit for intermediate storage of data. The
distinguishing feature of our approach is that we assume the coprocessor to be
interfaced to a standard, slow (single-ported) memory with low bandwidth.
Hence, good performance is achieved by effectively exploiting data locality
in the applications by the compiler, and the final architecture is chosen
by a tradeoff analysis driven by the mapping process. Preliminary results
indicate that the coprocessor has significantly lower clock rates or higher
performance than that of existing RISC processors and is cost-effective for
executing loop computations.
This paper presents the position statements of the panelists in a panel
discussion at the 13th International Joint Conference on Artificial
Intelligence.
In this paper, we present a new learning mechanism called mixed-mode
learning. This learning mechanism transforms an existing supervising learning
algorithm from one that finds a unique one-to-one mapping into an algorithm
that finds one of the many one-to-many mappings. Since the objective of
learning is relaxed by this transformation, the number of learning epochs
can be reduced significantly. We show in this paper that mixed-mode
learning can be applied in the well-known cascade correlation learning algorithm
to reduce the number of hidden units required for convergence when applied
to classification problems whose desired outputs are binary. Our
experimental results confirm this reduction, although they show that more learning
time is required than that of cascade correlation. In general, reducing
the number of hidden units at the expense of additional learning time
is usually desirable.
In this paper we present an improved search procedure for the General
Parameter Method (GPM) [GanWah92a, GanWah92b]. Our procedure maps uniform
dependence algorithms to application-specific processor arrays (PAs). It
can optimize general design objectives with certain non-monotonicity
properties, i.e., those that do not increase monotonically with the
parameters. An example of such an objective is the minimization of the total
completion time, including load and drain times. In contrast, earlier
design methods can only deal with monotonic objectives. We present results
for the matrix-product problem using this search technique; application of
the technique on the transitive-closure problem is presented elsewhere
[ForWahShaGan93]. We also show that the parameters in GPM can be expressed
in terms of the schedule vector Pi and allocation matrix S in the popular
dependence-based method (DM), thereby allowing GPM to be used in DM for
finding optimal designs for uniform dependence algorithms.
In this paper, we present a method for finding the minimal configuration of
a feed-forward artificial neural network (ANN) for solving a given
application problem. We assume that the cascade-correlation (CAS) training algorithm
is used to train the weights of the ANNs concerned. Under a given
time constraint that is long enough to train tens of ANNs completely, we
divide the time into quanta, and present a method for scheduling
dynamically the ANN to be trained in each quantum from a pool of partially trained
ANNs. Our goal is to find an ANN configuration with smaller number of
hidden units as compared to the alternative of applying the CAS algorithm repeatedly to train each ANN to completion before exploring new ANNs. Our
system is a population-based generate-and-test method that maintains a
population of candidate ANNs, and that selectively train those that are
predicted to require smaller configurations. Since it is difficult to
predict the exact number of hidden units required when the CAS algorithm
terminates, our system compares two partially trained ANNs and predicts
which one will converge with a smaller number of hidden units relative to
the other. Our prediction mechanism is based on a comparator neural
network (CANN) that takes as inputs the TSSE-versus-time behavior of training
performed already on two ANNs, and that predicts which one will require a
smaller number of hidden units when convergence is reached. We show that
our CANN can predict correctly most of the time, and present experimental
results on better configurations found in a given time limit for a
classification problem and the two-spiral problem.
In this paper, we discuss twelve application-specific issues in strategy
learning. These issues are important when designing strategy learning
systems as well as in learning new application-specific strategies. In
each issue we discuss the problems involved and identified some effective
solutions.
Finding the global optimal solution for a general nonlinear program is a
difficult task except for very small problems. In this paper we identify a
class of nonlinear programming problems called polynomial programming
problems (PP). A polynomial program is an optimization problem with a scalar
polynomial objective function and a set of polynomial constraints. By using
Groebner Bases, we can determine the global minimum of a polynomial program
in a reasonable amount of time and memory.
In this paper, we study the performance of various IDA*-style searches and
investigate methods to improve their performance by predicting in each
stage the threshold to use for pruning. We first present three models to
approximate the distribution of number of search nodes by lower bounds:
exponential, geometric, and linear, and illustrate these distributions based
on some well-known combinatorial search problems. We then show the
performance of an ideal IDA* algorithm and identify reasons why existing IDA*-style algorithms perform well. In practice, we will be able to know from
previous experience the distribution for a given problem instance but will
not be able to determine the parameters of the distribution. Hence, we
develop RIDA*, a method that estimates dynamically the parameters of the
distribution, and predicts the best threshold to use. Finally, we compare
the performance of several IDA*-style algorithms --- Korf's IDA*, RIDA*,
IDA*-CR and DFS* --- on several application problems, and identify
conditions under which each of these algorithms will perform well.
In this paper we present the design of a comparator neural network (CANN)
and its associated learning algorithms. This network is useful for
comparing two time series in order to predict which one will have a better future
value based on a user-specified objective function. We illustrate its
application in distributed load balancing and in an automated design system
for neural networks. In the first application, we use a CANN as a workload
predictor that takes as input a time window of measurable values of CPU
occupancy, network traffic, disk traffic, and memory occupancy from two
computers, and predicts which one of the two computers will have a shorter
turnaround time for executing an incoming job. In the second application,
we use a CANN to predict which one of two partially trained neural networks
will have a shorter convergence time, given current TSSE traces of these
networks. In each application, experimental results show that prediction
is improved using the proposed CANNs.
We consider a class of zero-one integer programming feasibility problems
(0-1 ILPF problems) in which the coefficients of variables can be integers,
and the objective is to find an assignment of binary variables so that all
constraints are satisfied. We propose a Lagrangian formulation in the
continuous space and develop a gradient search in this space. By using two
counteracting forces, one performing gradient search in the primal space
(of the original variables) and the other in the dual space (of the
Lagrangian variables), we show that our search algorithm does not get
trapped in local minima and reaches equilibrium only when a feasible
assignment to the original problem is found. We present experimental
results comparing our method with backtracking and local search (based on
random restarts). Our results show that 0-1 ILPF problems of reasonable
sizes can be solved by an order of magnitude faster than existing methods.
In this paper, we present new results on the automated generalization of
performance-related heuristics learned for knowledge-lean applications. We
study methods to statistically generalize new heuristics learned for some
small subsets of a problem space (using methods such as genetics-based
learning) to unlearned problem subdomains. Our method uses a new
statistical metric called probability of win. By assessing the performance
of heuristics in a range-independent and distribution-independent manner,
we can compare heuristics across problem subdomains in a consistent manner.
To illustrate our approach, we show experimental results on generalizing
heuristics learned for sequential circuit testing, VLSI cell placement and
routing, and branch-and-bound search. We show that generalization can lead
to new and robust heuristics that perform better than the original
heuristics across problem instances of different characteristics.
In this paper, we discuss a new approach to generalize heuristic methods
(HMs) to new test cases of an application, and conditions under which such
generalization is possible. Generalization is difficult when performance
values of HMs are characterized by multiple statistical distributions
across subsets of test cases of an application. We define a new measure
called probability of win and propose three methods to evaluate it:
interval analysis, maximum likelihood estimate, and Bayesian analysis. We show
experimental results on new HMs found for blind equalization and branch-and-bound search.
In this paper, we present a new supervised learning method called Novel
(Nonlinear Optimization Via External Lead) for training feed-forward neural
networks. In general, such learning can be considered as a nonlinear global optimization problem in which the goal is to minimize the nonlinear error function that spans the space of weights. Novel is a trajectory-based
nonlinear optimization method that combines global and local searches to
find good local minima. It relies on an external force to pull a search
out of a local minimum in its global search and employs local descents to
locate local minima in its local search. The result is an efficient search
method that identifies good basins without spending a lot of time in them.
We have shown improved training and testing results for five neural-network
benchmark problems as compared to those of existing minimization and
neural-network learning algorithms. For the two-spiral problem, Novel has
found a design using only four hidden units and 25 weights. (The best
known design requires nine hidden units and 75 weights.) In short, Novel
represents a significant advance in the state-of-the-art in supervised
learning of neural networks and general optimization of continuous non-
linear functions.
Satisfiability is a class of NP-complete problems that model a wide range
of real-world applications. These problems are difficult to solve because
they have many local minima in their search space, often trapping greedy
search methods that utilize some form of descent. Further, gradient
changes in the search space are in integers, leading to possibly many
directions of the same gradient. In this paper, we propose a new discrete
Lagrange-multiplier-based global-search method for solving satisfiability
problems. We derive new approaches for applying Lagrangian methods in
discrete space, show that equilibrium is reached when a feasible assignment
to the original problem is found, and present heuristic algorithms to look
for equilibrium points. Instead of restarting from a new starting point
when a search reaches a local trap, the Lagrangian variables in our method
provide a force to lead the search out of a local minimum and move it in
the direction provided by the Lagrangian variables. One of the major
advantages of our method is that it has very few algorithmic parameters to be
tuned by users, and the search procedure can be made deterministic and the
results, reproducible. We demonstrate our method by applying it to solve
an extensive set of benchmark problems archived in DIMACS of Rutgers
University. Our method generally performs better than the best existing
methods and can achieve an order-of-magnitude speedup for some problems.
Moreover, our method can solve some new benchmark problems that cannot be
solved by other local-search methods.
Uniform recurrent algorithms belong to a fundamental class of operations
used in almost every scientific and engineering computation. An important
example of these algorithms is matrix multiplication. In this paper, we
present optimal designs of bit-level processor arrays for matrix
multiplication, and analyze trade-offs between bit-level and word-level designs in
terms of execution time and chip area. We formulate the design as an
optimization problem and search for optimal designs using the General Parameter Method developed by Ganapathy and Wah. We show that previous bit-level
processor arrays for matrix multiplication are two special cases of our
more general designs, and prove that one of the previous designs is optimal
in execution time. Finally, we show that the fastest bit-level design is
only O(log p) faster than than the fastest word-level design, instead of
O(p) as claimed in the previous work of Shang and Wah, where p is the
number of bits per word.
The satisfiability (SAT) problem is a core problem in mathematical logic
and computing theory. In practice, SAT is fundamental in solving many
problems in automated reasoning, computer-aided design, computer-aided
manufacturing, machine vision, database, robotics, integrated circuit
design, computer architecture design, and computer network design.
Traditional methods treat SAT as a discrete, constrained decision problem.
In recent years, many optimization methods, parallel algorithms, and
practical techniques have been developed for solving SAT. In this survey,
we present a general framework (an algorithm space) that integrates
existing SAT algorithms into a unified perspective. We describe sequential
and parallel SAT algorithms including variable splitting, resolution, local
search, global optimization, mathematical programming, and practical SAT
algorithms. We give performance evaluation of some existing SAT
algorithms. Finally, we provide a set of practical applications of the
satisfiability problems.
In this paper, we present a new global optimization method for designing
QMF (quadrature mirror filters) filter banks. We formulate the design
problem as a nonlinear constrained optimization problem, using the
reconstruction error as the objective, and the stopband ripple, stopband
energy, passband ripple, passband energy and transition bandwidth as
constraints. This formulation allows us to search for solutions that
improves with respect to the objective, and that performs better than or
equal to the best existing designs with respect to the constrained
measures. We present Novel, a global optimization method we have developed
for solving nonlinear continuous constrained optimization problems, and
apply it to find improved designs. We also show that relaxing the
constraints on transition bandwidth and stopband energy will lead to
significant improvements in the other performance measures.
In this paper, we present a new method to handle inequality constraints and
apply it in Novel (Nonlinear Optimization via External Lead), a system we
have developed for solving constrained continuous nonlinear optimization
problems. In general, in applying Lagrange-multiplier methods to solve
these problems, inequality constraints are first converted into equivalent
equality constraints. One such conversion method adds a slack variable to
each inequality constraint in order to convert it into an equality
constraint. The disadvantage of this conversion is that when the search is
inside a feasible region, some satisfied constraints may still pose a nonzero weight in the Lagrangian function, leading to possible oscillations
and divergence when a local optimum lies on the boundary of a feasible
region. We propose a new conversion method called the MaxQ method such
that all satisfied constraints in a feasible region always carry zero
weight in the Lagrange function; hence, minimizing the Lagrange function in
a feasible region always leads to local minima of the objective function.
We demonstrate that oscillations do not happen in our method. We also
propose methods to speed up convergence when a local optimum lies on the
boundary of a feasible region. Finally, we show improved experimental
results in applying our proposed method in Novel on some existing benchmark
problems and compare them to those obtained by applying the method based on
slack variables.
In this paper, we study various global optimization methods for designing
QMF (quadrature mirror filter) filter banks. We formulate the design
problem as a nonlinear constrained optimization problem, using the
reconstruction error as the objective, and other performance metrics as
constraints. This formulation allows us to search for designs that improve
over the best existing designs. We present Novel, a global optimization
method for solving nonlinear continuous constrained optimization problems.
We show that Novel finds better designs with respect to simulated annealing
and genetic algorithms in solving QMF benchmark design problems. We also
show that relaxing the constraints on transition bandwidth and stopband
energy leads to significant improvements in the other performance measures.
Weighted maximum satisfiability problems (MAX-SAT) are difficult to solve
due to the large number of local minima in their search space. In this
paper we propose a new discrete Lagrangian based search method (DLM) for
solving these problems. Instead of restarting from a new point when the
search reaches a local minimum, the Lagrange multipliers in DLM provide a
force to lead the search out of the local minimum and move it in a
direction provided by the multipliers. Since DLM has very few parameters
to be tuned, it can be made deterministic and the results, reproducible.
We compare DLM with GRASP in solving a large set of test problems, and show
that it finds better solutions and is substantially faster. DLM has a
solid theoretical foundation that can be used as a systematic approach for
solving general discrete optimization problems.
In this paper, we present a new discrete Lagrangian optimization method for
designing multiplierless QMF (quadrature mirror filter) filter banks. In
multiplierless QMF filter banks, filter coefficients are powers-of-two
(PO2) where numbers are represented as sums or differences of powers of two
(also called Canonical Signed Digit--CSD--representation), and
multiplications can be carried out as additions, subtractions and shifting.
We formulate the design problem as a nonlinear discrete constrained
optimization problem, using the reconstruction error as the objective, and
other performance metrics as constraints. One of the major advantages of
this formulation is that it allows us to search for designs that improve
over the best existing designs with respect to all performance metrics,
rather than finding designs that trade one performance metric for another.
We show that our design method can find designs that improve over
Johnston's benchmark designs using a maximum of three to six ONE bits in
each filter coefficient.
Nonlinear constrained optimization problems can be solved by a
Lagrange-multiplier method in a continuous space or by its extended
discrete version in a discrete space. These methods rely on gradient
descents in the objective space to find high-quality solutions, and
gradient ascents in the Lagrangian space to satisfy the constraints.
The balance between descents and ascents depends on the relative weights
between the objective and the constraints that indirectly control the
convergence speed and solution quality of the method. To improve
convergence speed without degrading solution quality, we propose an
algorithm to dynamically control these relative weights. Starting from
an initial weight, the algorithm automatically adjusts the weights based
on the behavior of the search progress. With this strategy, we are able
to eliminate divergence, reduce oscillations, and speed up convergence.
We show improved convergence behavior of our proposed algorithm on both
nonlinear continuous and discrete problems.
In this paper, we address three important issues in applying Lagrangian
methods to solve optimization problems with inequality constraints. First,
we propose a MaxQ method that transforms inequality constraints to
equality constraints. It overcomes divergence and oscillations that
occur in the slack-variable method. Some strategies to speed up its
convergence are also examined. Second, we develop a method to monitor
the balance between descents in the original-variable space and ascents
in the Lagrange-multiplier space in Lagrangian methods. During the
search, we adjust this balance adaptively in order to improve
convergence speed. Third, we introduce a nonlinear traveling trace to
pull a search trajectory out of a local equilibrium point in a continuous
fashion without restarting the search and without losing information
already obtained in the local search. This strategy extends existing
Lagrangian methods from a local search of equilibrium points to a global
search. We implement these three strategies in Novel (Nonlinear
Optimization via External Lead) and apply it to find improved solutions
for a collection of benchmark problems.
In this paper, we study performance measures for designing two-band
perfect reconstruction (PR) linear phase (LP) filter bank. Based on
these performance metrics, we formulate the design problem as a
nonlinear constrained optimization problem, where some metrics such as
stopband energy have closed form, but the others like transition
width do not. Our formulation allows us to search for designs that
improve over the existing designs. More important,
given user-specified performance bounds such as maximal transition
width, we are able to design filter banks if solutions exist, and
trade-off among different performance metrics can be easily achieved.
Finally, many experimental results show feasibility and efficiency
of our filter bank design method.
In this paper, we propose and evaluate an efficient multiaccess protocol
for cell-based wireless networks. Our protocol addresses the problems in
existing random-access protocols for wireless networks: long-term fairness
as well as short-term fairness in accessing a shared channel and the detection
of hidden and exposed collisions. Our proposed protocol is a limited contention
protocol in which the set of contending mobiles are chosen based on a global
common contention window maintained by every mobile station.
The contention window is adjusted based on three possible
channel states: no transmission, success, and collision.
We assume that the channel state at the end of each contention
slot is broadcast by a base station in a control channel.
We show analytically that the time interval between two successive
accesses to the channel by any station is geometrically distributed,
and that each station has equal chance to access the channel in every
contention period. This is significantly better than existing
random-access protocols based on the binary exponential backoff
algorithm, which results in large variances in inter-access delays.
Our experimental results also show that the number of contention slots
to resolve collisions is constant on the average, independent of the number
of contending stations.
In this paper, we present a new global-search method
for designing QMF (quadrature-mirror-filter) filter banks. We
formulate the design problem as a nonlinear constrained
optimization problem, using the reconstruction error as the
objective, and the other performance metrics as constraints.
This formulation allows us to search for designs that improve
over the best existing designs. Due to the nonlinear nature
of the performance metrics, the design problem is a nonlinear
constrained optimization problem with many local minima.
We propose to solve this design problem use global-search
methods based on Lagrangian formulations.
After transforming the original constrained optimization
problem into an unconstrained form using Lagrange multipliers,
we apply a new global-search method to find good solutions.
The method consists of a coarse-level global-search phase, a fine-level
global-search phase, and a local search phase, and is
suitable for parallel computation due to the minimal dependency
between various key components.
In our experiments, we show that
our method finds better designs than existing global-search methods,
including simulated annealing and genetic algorithms.
The design of filter banks in subband image coding is critical for
achieving high image quality. In this paper, we study the design from
both the signal processing domain and the theory of wavelets.
We formulate the design of filter banks as a two-stage nonlinear constrained
optimization problem, each of which is solved by sequential quadratic programming
(SQP). Using a wavelet image coding prototype, we show improved quality of the
designed filter banks in terms of image-compression and peak signal-to-noise
ratios (PSNRs).
Recently, we have proposed the discrete Lagrange-multiplier
method (DLM) to solve satisfiability problems.
Instead of restarting from a new starting point when the search reaches
a local minimum in the objective space, the Lagrange multipliers of violated
constraints in DLM provide a force to lead the search out of the local
minimum and move it in a direction provided by the multipliers.
In this paper, we present the theoretical foundation of DLM
for solving SAT problems and discuss some implementation issues.
We study the performance of DLM on a set of hard satisfiability
benchmark instances, and show the importance of dynamic scaling
of Lagrange multipliers and the flat-move strategy.
We show that DLM can perform better than competing local-search methods
when its parameters are selected properly.
This paper studies the design of data transformation algorithms for audio data
transmitted over the Internet, with a goal of reconstructing the original signals
by the receiver with little distortions in the presence of bursty loss of packets.
It assumes that a single audio stream is interleaved into multiple packets, and a lost
sample at the receiver is reconstructed as the interpolation of adjacent samples received.
We propose a non-redundant transformation-based reconstruction
algorithm that can minimize the reconstruction error for any fixed,
interpolation-based reconstruction algorithm.
Its basic idea is that the sender transforms the input audio stream optimally, based
on the reconstruction method used at the receiver before sending the data packets.
Consequently, the receiver is able to recover much better from losses
of packets than without any knowledge of what the signals should be.
In particular, we study our transformation algorithm based on
one popular linear interpolation-based reconstruction algorithm.
We found that our scheme can improve the signal-to-noise
ratio (SNR) by 1 to 2 dB with very little extra computation efforts as compared
to the scheme without transformation.
Real-time video streaming over the Internet requires robust
delivery mechanisms with low overhead. Traditional error control
schemes are not attractive because they either add redundant information
that may worsen network traffic, or rely solely on the inadequate capability
of the decoder to do error concealment.
As sophisticated concealment techniques cannot be employed in a real-time
software playback scheme, we propose in this paper a simple yet
efficient transformation-based error concealment algorithm.
The algorithm applies a linear transformation to the original video signals,
with the objective of minimizing the mean squared error if missing information
were restored by simple averaging at the destination.
We also describe two strategies to cope with error
propagations in temporal differentially coded frames.
Experimental results show that our proposed transformation-based reconstruction
algorithm performs well in real Internet tests.
In this paper, we present efficient trap-escaping strategies in a search
based on discrete Lagrange multipliers to solve difficult SAT problems.
Although a basic discrete Lagrangian method (DLM) can solve most of the
satisfiable DIMACS SAT benchmarks efficiently, a few of the large benchmarks
have eluded solutions by any local-search methods today. These difficult
benchmarks generally have many traps that attract local-search trajectories.
To this end, we identify the existence of traps when any change to a
variable will cause the resulting Lagrangian value to increase.
Using the hanoi4 and par16-1 benchmarks, we illustrate
that some unsatisfied clauses are trapped more often than others.
Since it is too difficult to remember explicitly all the traps encountered,
we propose to remember these traps implicitly by giving larger increases
to Lagrange multipliers of unsatisfied clauses that are trapped more often.
We illustrate the merit of this new update strategy by solving
some of most difficult but satisfiable SAT benchmarks in the DIMACS archive (hanoi4,
hanoi4-simple, par16-1 to par16-5, f2000,
and par32-1-c to par32-3-c). Finally, we apply the same algorithm to
improve on the solutions of some benchmark MAX-SAT problems that we solved before.
In this paper, we present constrained simulated annealing (CSA),
a global minimization algorithm that converges to constrained global
minima with probability one, for solving nonlinear discrete non-convex constrained
minimization problems.
The algorithm is based on the necessary and sufficient condition for constrained local
minima in the theory of discrete Lagrange multipliers we developed earlier.
The condition states that the set of discrete saddle points is the same as the set of
constrained local minima when all constraint functions are non-negative.
To find the discrete saddle point with the minimum objective value, we model
the search by a finite inhomogeneous Markov chain that carries out (in an annealing
fashion) both probabilistic descents of the discrete Lagrangian function
in the original-variable space and probabilistic ascents in the
Lagrange-multiplier space.
We then prove the asymptotic convergence of the algorithm to constrained
global minima with probability one.
Finally, we extend CSA to solve nonlinear constrained problems with continuous
variables and those with mixed (both discrete and continuous) variables.
Our results on a set of nonlinear benchmarks are much better than those reported by others.
By achieving asymptotic convergence, CSA is one of the major
developments in nonlinear constrained global optimization today.
In this paper we present a Lagrange-multiplier formulation of discrete constrained optimization problems,
the associated discrete-space first-order necessary and sufficient conditions for saddle points, and an
efficient first-order search procedure that looks for saddle points in discrete space.
Our new theory provides a strong mathematical foundation for
solving general nonlinear discrete optimization problems.
Specifically, we propose a new vector-based definition of descent directions in discrete space and
show that the new definition does not obey the rules of calculus in continuous space.
Starting from the concept of saddle points and using only vector calculus, we then prove
the discrete-space first-order necessary and sufficient conditions for saddle points.
Using well-defined transformations on the constraint functions,
we further prove that the set of discrete-space
saddle points is the same as the set of constrained local minima, leading
to the first-order necessary and sufficient conditions for constrained local minima.
Based on the first-order conditions, we propose a local-search
method to look for saddle points that satisfy the first-order conditions.
This paper improves constrained simulated annealing (CSA), a discrete global minimization
algorithm with asymptotic convergence to discrete constrained global minima with probability one.
The algorithm is based on the necessary and sufficient conditions for discrete
constrained local minima in the theory of discrete Lagrange multipliers.
We extend CSA to solve nonlinear continuous constrained
optimization problems whose variables take continuous values.
We evaluate many heuristics, such as dynamic neighborhoods, gradual resolution of nonlinear equality
constraints and reannealing, in order to greatly improve the efficiency of solving continuous problems.
We report much better solutions than the best-known solutions in
the literature on two sets of continuous optimization benchmarks.
In this paper, we present improved strategies in DLM-99-SAT to escape from traps and to select
proper parameter sets to use when applied to solve some difficult but satisfiable SAT problems.
One of the main issues in DLM-99-SAT is that it has a large number of tune-able parameters,
making it difficult to determine the best parameters to use when given a new instance.
To this end, we propose a set of performance metrics and a set of rules using these
metrics that determine at run time the best parameters to use for a specific instance.
Our experimental results show that these rules are able to select the most
suitable parameter set for each instance with very little overhead.
Finally, we verify the performance of DLM-99-SAT by solving some benchmarks in SATLIB and
some of the most difficult but satisfiable DIMACS SAT problems, including par32-1-c to
par32-4-c and hanoi4-simple.
In this paper, we propose a new probabilistic sampling procedure and its application
in simulated annealing (SA).
The new procedure uses Bayesian analysis to evaluate samples made already and draws the
next sample based on a density function constructed through Bayesian analysis. After
integrating our procedure in SA, we apply it to solve a set of optimization benchmarks.
Our results show that our proposed procedure, when used in SA, is very effective in
generating high-quality samples that are more reliable and robust in leading to global solutions.
In this paper, we formulate neural-network training as a constrained optimization
problem instead of the traditional formulation based on unconstrained optimization.
We show that constraints violated during a search provide additional force to help escape
from local minima using our newly developed constrained simulated annealing (CSA) algorithm.
We demonstrate the merits of our approach by training neural networks to solve the two-spiral problem.
To enhance the search, we have developed a strategy to adjust the gain factor of the activation function.
We show converged training results for networks with 4, 5, and 6 hidden units, respectively. Our work is
the first successful attempt to solve the two-spiral problem with 19 weights.
One fundamental problem with streaming audio and video data over unreliable
IP networks is that packets may be dropped or arrive too late for playback.
Traditional error control schemes are not attractive because they either add redundant information that may
worsen network traffic, or rely solely on the inadequate capability of the decoder to do error concealment.
In this paper, we propose a simple yet efficient transformation-based algorithm in order to conceal network
losses in streaming real-time audio and video data over the Internet.
In the receiver side, we adopt a simple reconstruction algorithm based on interpolation,
as sophisticated concealment techniques cannot be employed in software-based real-time playback.
In the sender side, we design a linear transformation with the objective of minimizing
the mean squared error, assuming that some of the descriptions are lost and that the missing
information is reconstructed by simple averaging at the destination.
We further integrate the transformations in case of video streaming in the
discrete cosine transform (DCT) to produce an optimized reconstruction-based DCT.
Experimental results show that our proposed algorithm performs well in real Internet tests.
In this paper, we present an efficient global-search strategy in an algorithm based
on the theory of discrete Lagrange multipliers for solving difficult SAT instances.
These difficult benchmarks generally have many traps and basins that attract local-search trajectories.
In contrast to trap-escaping strategies proposed earlier that only focus on traps, we propose a
global-search strategy that penalizes a search for visiting points close to points visited before in
the trajectory, where penalties are computed based on the Hamming distances between the current and
historical points in the trajectory. The new strategy specializes to the earlier trap-escaping strategies
because when a trajectory is inside a trap, its historical information will contain many points in close
vicinity to each other. It is, however, more general than trap escaping because it tries to avoid
visiting the same region repeatedly even when the trajectory is not inside a trap.
By comparing our results to existing results in the area, we conclude that our proposed strategy
is both effective and general.
One fundamental problem with streaming video data over unreliable IP
networks is that packets may be dropped or arrive too late for real-time playback.
Traditional error-control schemes are not attractive because they
either add redundant information that may worsen network traffic, or rely
solely on decoders with inadequate error concealment. This paper presents
a joint sender-receiver approach in designing transforms for multiple-description
coding in order to conceal network losses in streaming real-time video
over the Internet. In the receiver side, we adopt a simple
interpolation-based reconstruction, as sophisticated concealment techniques
cannot be employed in software-based real-time playback. In the sender
side, we design an optimized reconstruction-based discrete cosine transform
(ORB-DCT) with the objective of minimizing the mean squared error,
assuming that some of the descriptions are lost and that the missing
information is reconstructed by simple averaging at the destination.
Experimental results show that our proposed ORB-DCT performs better than
the original DCT in real Internet tests. Future research includes finding
perceptual-based quantization matrix based on extended basis images derived
for reconstruction, and incorporating the effects of quantization and
inverse quantization in the design.
In this paper, we formulate the training of artificial neural networks for
time-series prediction as a constrained optimization problem, instead of the
traditional formulation as an unconstrained optimization. A constrained
formulation is better because violated constraints can provide additional guidance
during a search than the traditional sum-of-squared errors in an unconstrained formulation.
Using a constrained formulation, we propose to add constraints on validation
errors in order to monitor the prediction quality during training of an ANN.
We then solve the constrained problem using our newly developed constrained simulated
annealing algorithm (CSA). Experimental results on the sunspot and laser time series
show that constrained formulations on training and cross-validation, when solved by CSA,
lead to better prediction quality and less number of weights than previous designs.
In this paper we propose an optimal anytime version of constrained simulated
annealing (CSA) for solving constrained nonlinear programming problems (NLPs).
One of the goals of the algorithm is to generate feasible solutions of certain prescribed
quality using an average time of the same order of magnitude as that spent by the original CSA with
an optimal cooling schedule in generating a solution of similar quality. Here, an optimal
cooling schedule is one that leads to the shortest average total number of probes when the original CSA
with the optimal schedule is run multiple times until it finds a solution.
Our second goal is to design an anytime version of CSA that generates gradually improving feasible
solutions as more time is spent, eventually finding a constrained global minimum (CGM).
In our study, we have observed a monotonically non-decreasing function relating the success
probability of obtaining a solution and the average completion time of CSA, and an exponential
function relating the objective target that CSA is looking for and the average completion time.
Based on these observations, we have designed CSA_{AT-ID}, the anytime CSA with iterative deepening
that schedules multiple runs of CSA using a set of increasing cooling schedules and a set of improving
objective targets.
We then prove the optimality of our schedules and demonstrate
experimentally the results on four continuous constrained NLPs.
CSA_{AT-ID} can be generalized to solving discrete, continuous, and mixed-integer
NLPs, since CSA is applicable to solve problems in these three classes.
Our approach can also be generalized to other stochastic search algorithms, such as genetic
algorithms, and be used to determine the optimal time for each run of such algorithms.
This paper presents a problem-independent framework that unifies various
mechanisms for solving discrete constrained nonlinear programming
(NLP) problems whose functions are not necessarily differentiable and
continuous. The framework is based on the first-order necessary and sufficient
conditions in the theory of discrete constrained optimization using
Lagrange multipliers. It implements the search for discrete-neighborhood
saddle points by performing ascents in the original-variable subspace
and descents in the Lagrange-multiplier subspace. Our study on the
various mechanisms shows that CSAGA, a combined constrained simulated
annealing and genetic algorithm, performs well. Finally, we apply iterative
deepening to determine the optimal number of generations in CSAGA.
Real-time audio and video data streamed over unreliable IP networks,
such as the Internet, may encounter losses due to dropped packets or
late arrivals. This paper reviews error-concealment schemes developed for
streaming real-time audio and video data over the Internet. Based on
their interactions with (video or audio) source coders, we classify existing
techniques into source coder-independent schemes that treat underlying
source coders as black boxes, and source coder-dependent schemes that
exploit coder-specific characteristics to perform reconstruction. Last, we
identify possible future research directions.
This paper presents a framework that unifies various search
mechanisms for solving constrained nonlinear programming (NLP)
problems. These problems are characterized by functions that are
not necessarily differentiable and continuous. Our proposed
framework is based on the first-order necessary and sufficient
condition for constrained local minimization in discrete space
that shows the equivalence between discrete-neighborhood saddle
points and constrained local minima. To look for discrete-neighborhood
saddle points, we formulate a discrete constrained
NLP in an augmented Lagrangian function and study various
mechanisms for performing ascents of the augmented function in
the original-variable subspace and descents in the Lagrange-
multiplier subspace. Our results show that CSAGA, a combined
constrained simulated annealing (SA) and genetic algorithm (GA),
performs well. Finally, we apply iterative deepening to
determine the optimal number of generations in CSAGA and show
that performance is robust with respect to changes in population size.
Time-series predictions by artificial neural networks (ANNs) are traditionally formulated as
unconstrained optimization problems. As an unconstrained formulation provides little guidance on
search directions when a search gets stuck in a poor local minimum, we have proposed recently
to use a constrained formulation in order to use
constraint violations to provide additional guidance. In this paper, we formulate ANN learning
with cross-validations for time-series predictions as a non-differentiable nonlinear constrained
optimization problem. Based on our theory of Lagrange multipliers for discrete constrained
optimization, we propose an efficient learning algorithm, called violation guided back-propagation
(VGBP), that computes an approximate gradient using back-propagation (BP), that introduces annealing
to avoid blind acceptance of trial points, and that applies a relax-and-tighten (R&T) strategy
to achieve faster convergence. Extensive experimental results on well-known benchmarks,
when compared to previous work, show one to two orders-of-magnitude improvement in
prediction quality, while using less weights.
Subband-coded images can be transmitted in the Internet using either the TCP or the UDP protocol.
Delivery by TCP gives superior decoding quality but with very long
delays when the network is unreliable, whereas delivery by UDP has
negligible delays but with degraded quality when packets are lost.
Although images are delivered currently over the Internet by TCP, we study in this paper the use
of UDP to deliver multi-description reconstruction-based subband-coded images.
First, in order to facilitate recovery from UDP packet losses, we propose a
joint sender-receiver approach for designing
optimized reconstruction-based subband transform (ORB-ST)
in multi-description coding (MDC).
Second, we carefully evaluate the delay-quality trade-offs between the TCP
delivery of SDC images and the UDP and combined TCP/UDP delivery of MDC images.
Experimental results show that our proposed ORB-ST performs well in real Internet
tests, and UDP and combined TCP/UDP delivery of MDC images provide a range of
attractive alternatives to TCP delivery.
In this paper, we develop new constrained artificial-neural-network (ANN) formulations and learning
algorithms to predict future stock prices, a difficult time-series prediction problem.
Specifically, we characterize stock prices as a non-stationary noisy time series,
identify its predictable low-frequency components, develop strategies to predict
missing low-frequency information in the lag period of a filtered time series,
model the prediction problem by a recurrent FIR ANN, formulate the training
problem of the ANN as a constrained optimization problem, develop new constraints
to incorporate the objectives in cross validation, solve the learning problem using
algorithms based on the theory of Lagrange multipliers for nonlinear discrete constrained
optimization, and illustrate our prediction results on three stock time series.
There are two main contributions of this paper. First, we present a new approach
to predict missing low-pass data in the lag period when low-pass filtering is applied
on a time series. Such predictions allow learning to be carried out more accurately.
Second, we propose new constraints on cross validation that can improve significantly
the accuracy of learning in a constrained formulation. Our experimental results
demonstrate good prediction accuracy in a 10-day horizon.
Subband-coded images can be transmitted in the Internet using either the TCP or the UDP protocol.
Due to the facts that TCP employs congestion control and retransmissions and that UDP and TCP packets
are treated differently by routers, TCP takes much longer delays than those of UDP to deliver an image,
but packet losses in UDP may lead to poor decoding quality if the image is single-description coded
(SDC) and the losses cannot be concealed.
In this paper, we study the use of UDP to deliver multi-description coded (MDC) reconstruction-based
subband-transformed (ST) images and the reconstruction of missing information at a receiver
based on information received.
To facilitate recovery from UDP packet losses, we propose a joint sender-receiver approach for
designing optimized reconstruction-based subband transform (ORB-ST) in MDC.
We then carefully evaluate the delay-quality trade-offs between the TCP
delivery of SDC images and the UDP and combined TCP/UDP delivery of MDC images.
Experimental results show that our proposed ORB-ST performs well in real Internet tests, and UDP and
combined TCP/UDP delivery of MDC images provide a range of attractive alternatives to TCP delivery.
A fundamental issue in real-time interactive voice transmissions over
unreliable IP networks is the loss or late arrival of packets for playback.
Such losses cannot be recovered by retransmissions due to tight time
constraints in interactive applications.
This problem is especially serious in transmitting low bit-rate coded speech when pervasive
dependencies are introduced in a bit stream, leading to the loss of subsequent
dependent frames when a single packet is lost or arrives late.
In this paper, we propose a novel LSP-based multiple-description coding method
that adapts its number of descriptions to network loss conditions in order to conceal
packet losses in transmitting low-bit-rate coded speech over lossy packet networks.
Based on high correlations observed in linear predictor parameters, in the form of Line Spectral
Pairs (LSPs), of adjacent frames, we generate multiple descriptions in a sender by interleaving
LSPs, and reconstruct lost LSPs in a receiver by linear interpolations.
Without increasing the transmission bandwidth, our scheme represents a trade-off
between the quality of received packets and the ability to reconstruct lost packets.
Our experimental results on FS CELP show good performance.
Real-time production systems and other dynamic environments often generate tremendous
(potentially infinite) amount of stream data; the volume of data is too huge to be stored
on disks or scanned multiple times. Can we perform on-line, multi-dimensional analysis and
data mining of such data to alert people about dramatic changes of situations and to initiate
timely, high-quality responses? This is a challenging task.
In this paper, we propose new dominance relations that can
speed up significantly the solution process of nonlinear constrained
dynamic optimization problems in discrete time and space.
We first show that path dominance in dynamic programming cannot be
applied when there are general constraints that span across multiple stages,
and that node dominance, in the form of Euler-Lagrange conditions developed in optimal
control theory in continuous space, cannot be extended to that in discrete space.
This paper is the first to propose efficient dominance relations,
in the form of local saddle-point conditions in each stage of a problem,
for pruning states that will not lead to locally optimal paths.
By utilizing these dominance relations, we develop efficient search algorithms
whose complexity, despite exponential, has a much smaller base as compared to
that without using the relations. Finally, we demonstrate the performance
of our algorithms on some spacecraft planning and scheduling benchmarks and show
significant improvements in CPU time and solution quality as compared to those obtained
by the existing ASPEN planner.
In this paper, we propose new dominance relations that can speed up significantly
the solution process of planning problems formulated as nonlinear constrained dynamic
optimization in discrete time and space.
We first show that path dominance in dynamic programming cannot be applied when there are
general constraints that span across multiple stages, and that node dominance, in the form
of Euler-Lagrange conditions developed in optimal control theory in continuous space,
cannot be extended to that in discrete space.
This paper is the first to propose efficient node-dominance relations, in the form of local
saddle-point conditions in each stage of a discrete-space planning problem, for pruning
states that will not lead to locally optimal paths.
By utilizing these dominance relations, we present efficient search algorithms whose complexity,
despite exponential, has a much smaller base as compared to that without using the relations.
Finally, we demonstrate the performance of our approach by integrating it in the ASPEN
planner and show significant improvements in CPU time and solution quality on some
spacecraft scheduling and planning benchmarks.