Project - Voice Activated Eblock
Team Members: Eric Frohnhoefer
CS 179J: Senior Design Project in Architecture / Embedded
Systems
Project Description:
We will be designing a new eBlock that will be able to recognize some sort of
speech or possibly clap from the user that will designate whether we want the
eBlock to output a logical “yes” or a logical “no.” For more background
information about eBlocks click on this
site. This eBlock that we wish to design will be able to interface with the
common set eBlock protocol, which can be found
here, and will be able to fit within our pooled budget of approximately one
hundred and fifty dollars. Our eBlock will be designed to be a
speaker-independent system, which elaborates to a multiple-user interface.
Small Group Option #1
The Plan:
We want to create
a voice eBlock that is able to interface with the rest of the eBlock
set. This eBlock would be able to take voice commands from either
everyone or a specified user, based on the mode selected. In normal
mode, the block would listen for a specific two word phrase from any
user. The first word would be it’s “name”, this name is programmed by
the user or through DIP switch settings. Once the device hears it’s
name it will await a command, either On (device outputs a “Yes”) or OFF
(device outputs a “No”). Because each device is named, the user does
not need to worry about multiple voice eBlocks interfering with each
other. All this is done with no training and no need for a PC.
In security mode, the device does the same thing, except only for a specified user. That user would have to train the block for the user’s voice and after that the block will only function for that user. Some security measures will be put in place to prevent retraining to circumvent security mode.
Range of device would be approx. 3-4 meters with a slightly elevated voice. Background noise would not affect device (within reason).
Possible Features (Time permitting):
Trade-off:
The voice eblock would be much more "exciting" design, more versatile,
potentially more attractive in the market. In addition to greater personal
Achievement. However, the voice eblock will be very complicated. I'm not
sure if we'll have enough time to properly research and implement something like
this. Additionally this design approach would put us way over budget due
to the cost of development tools.
Small Group Option #2
The Plan:
This type of eBlock respond to sharp sounds, namely claps. It would be able to
interface fully with other existing eBlocks. Operation would be very simple. The
eBlock would recognize a specific number of claps. Once an eBlocks hears the
right number of claps, it would invert it's output. If it was outputing a "Yes",
when it hears the right number of claps, it would output a "no" and vice versa.
By having eBlocks distinguishable by the number of claps, having multiple
eBlocks in close proximity is possible. In this way, a user could control one
eBlock without worrying about inadvertantly triggering other eBlocks within
earshot. Number of claps would be set by DIP switches, thus no programming or PC
is required.
Trade-off:
A capper type eblock would be easier to implement and much simpler. The components are also cheaper
and easier to work with. However,
a clapper eblock would not
be very "exciting". Additionally it does not meet our goal of producing a
voice activated eblock.
Project Description:
We will be designing a new eBlock that will be able to recognize some sort of
speech or possibly clap from the user that will designate whether we want the
eBlock to output a logical “yes” or a logical “no.” For more background
information about eBlocks click on this
site. This eBlock that we wish to design will be able to interface with the
common set eBlock protocol, which can be found
here, and will be able to fit within our pooled budget of approximately one
hundred and fifty dollars. Our eBlock will be designed to be a
speaker-independent system, which elaborates to a multiple-user interface.
Project Dilemmas:
Some initial problems that we had as separate groups
were that our project budgets were too low in order to purchase any of the
design kits that were necessary to design the speech recognition eBlock through
speech recognition ICs. Design kits would come complete with a speech
recognition IC, design boards, programs, etc. that would have made the
development process a whole lot easier. This problem made the idea about doing
the project through individual groups of two using development boards a near
impossibility. Our time constraints correlated to scratching out
the entire idea about doing a custom speech recognition eBlock by plain
programming on the PIC microprocessor. This method would have required too much
time, which we did not have. The design procedure would include learning about
speech patterns and recognition that entire quarter classes are devoted to.
After that, we would have needed to program the entire method through equations
and using Fast Fourier Transform. These programming techniques would have been
too taxing on the PIC programming since we would need many multipliers and had
intense computations that would have made the entire design too slow in response
to the user.
Reasoning for larger groups:
In order to complete our task of designing the clapper or voice recognition
eBlock, we have come up with the idea to make the clapper eBlock first and then
work together as a big group in order to complete the voice recognition eBlock
if time permits. This solution would allow for a compromised solution about the
time constraints that we had. This method would also have a failsafe if we were
not able to complete our voice recognition eBlock because we would have the
clapper eBlock to turn in. Our group could pool our budgets together to solve
the problem about purchasing the development kit that exceeded our previous
individual group budgets of fifty dollars. We have come up with two possible
paths to follow if we continue along with this idea.
Large Group Option #2
The Plan:
Since we planned on doing the entire project as a
group, one possible path to follow is by having one group of two work on
finishing a clapper eBlock, which would be simpler and easier to develop than
the voice recognition eBlock. The remaining people out of the entire big group
can begin working on the more complicated and time constraining voice
recognition eBlock using the development kit. Like one group of two can
research the development kit and decide which one is best to purchase, while
another group can learn as much as possible about eBlocks. We would then relay
all the information to each other towards the end. When the individual group of
two finishes the clapper eBlock, the group can present the eBlock to the rest of
group to get everyone up to date on the clapper eBlock. Using this method we
will already know how to interface eBlock protocol with our designed eBlocks by
having an initial run with the clapper eBlock. Our clapper eBlock will be our
initial prototype with communicating with the other eBlocks.
Large Group Option #2
The Plan:
Our second path that we could follow would be
similar to the first original path, except each group of two would build their
own variations of the clapper eBlock. After every group had finished their
variation of the clapper eBlock, we would all work together to develop the voice
recognition eBlock. Again, we would divide up into individual groups to do
specific tasks in order to complete our general goal of completing the voice
recognition eBlock. This would still allow for us to pool our remaining
budgets, after the initial spend on the clapper eBlocks, in order to purchase a
development kit.
Trade-off:
If we follow this path, we would look more
productive as a group because we could turn in three clapper eBlocks, and
everyone in the entire group can start on the voice recognition eBlock at the
same time. Another benefit to this path is that everyone will be familiar with
the clapper eBlock because everyone had to build their own as groups of two. We
also optimize our time when we divide up to have specific tasks and help each
other out. The drawback about using this path though, is that we will have
three clapper eBlocks that have slight variations among them. We would waste
more money on developing two additional eBlocks that we really did not need to
develop. This path is also re-inventing the wheel that we already had done.
Technology Trade Off Analysis for Voice eBlock (Also See: Processor comparison sheet ):
| Technology | Technology Description | Cons | Pros | Design Decision |
|
Speaker Independent Vs. Speaker Dependent
|
Speaker independent voice recognition requires no training on the part of the user. Where as Speaker dependent requires some initial training. |
Speaker Independent:
Speaker Dependent:
|
Speaker Independent:
Speaker Dependent:
|
We decided to go with the speaker independent technology because the product will be easier for the consumer use. In addition the time-to-market will not be greatly affected if we go with the ASSP (Application Specific Standard Part) option. In fact it may be less because we would not have to worry about the extra logic and programming required for voice training. |
|
Software Recognition Vs. Integrated Circuit Recognition
|
Using an IC to handle voice recognition in addition to a general purpose processor for control functions. In contrast to using a general purpose processor for handling voice recognition as well as control functions. |
Software Recognition:
|
Software Recognition:
IC Recognition:
|
The decision was made to go with IC recognition due to the short time-to-market constraint. This is done at the expense of power and cost constraints. However, we may be able to shut the PIC down when no words are being recognized saving us some power. |
|
Unique Block Vs. Common Block
|
Can multiple blocks be controlled independently when in close proximity? Unique block has an additional word to differentiate it from other blocks whereas with the common block there would be no way to differentiate between different voice blocks. | Unique
Block:
Common Block:
|
Unique
Block:
Common Block:
|
We decided to go with the unique block because it allows us to add more flexibility to the product without making the product extremely difficult to use. As far a engineering goes we would have more complex control software and addition hardware (dip switches). |
|
Continuous Listening Vs. Non-Continuous Listening
|
With Continuous Listening, the voice block would listen constantly for commands, whereas the Non-Continuous would only start to listen when given some “yes” input, such as a button press from a button block. |
Continuous Listening:
Non-Continuous Listening:
|
Continuous Listening:
Non-Continuous Listening:
|
Decided to go with continuous listening mode however may we may add a low-power mode so that both are possible time permitting. |
|
Multiple Word Recognition Vs. Isolated Word Recognition
|
With multiple word recognition, small phrases are possible, with more natural speech, whereas with isolated word, the user would speak slower and be able to only use single words. |
Multiple Word Recognition:
Isolated Word:
|
Multiple Word Recognition:
Isolated Word:
|
We decide to go with multiple word recognition because it allows for more natural speech which would make it easier for the user to use. Currently most Voice ASSP support multiple word recognition. |
Cost:
$129 for development kit and speech hardware.
Decisions:
No decisions have
been finalized as of yet. We're still waiting to see how tommorow's
(saturday) meeting with the whole class goes.
Either way there's a good chance that both designs will be implemented
in some say, and depending on how the quarter goes at the end
the two designs may merge into a new design.