Project  - Voice Activated Eblock

Team Members: Eric Frohnhoefer
CS 179J: Senior Design Project in Architecture / Embedded Systems


Proposals

Project Description:
We will be designing a new eBlock that will be able to recognize some sort of speech or possibly clap from the user that will designate whether we want the eBlock to output a logical “yes” or a logical “no.”  For more background information about eBlocks click on this site.  This eBlock that we wish to design will be able to interface with the common set eBlock protocol, which can be found here, and will be able to fit within our pooled budget of approximately one hundred and fifty dollars.  Our eBlock will be designed to be a speaker-independent system, which elaborates to a multiple-user interface.

Small Group Option #1

The Plan:
We want to create a voice eBlock that is able to interface with the rest of the eBlock set. This eBlock would be able to take voice commands from either everyone or a specified user, based on the mode selected. In normal mode, the block would listen for a specific two word phrase from any user. The first word would be it’s “name”, this name is programmed by the user or through DIP switch settings. Once the device hears it’s name it will await a command, either On (device outputs a “Yes”) or OFF (device outputs a “No”). Because each device is named, the user does not need to worry about multiple voice eBlocks interfering with each other. All this is done with no training and no need for a PC.

In security mode, the device does the same thing, except only for a specified user. That user would have to  train the block for the user’s voice and after that the block will only function for that user. Some security measures will be put in place to prevent retraining to circumvent security mode.

Range of device would be approx. 3-4 meters with a slightly elevated voice. Background noise would not affect device (within reason).

Possible Features (Time permitting):

Trade-off:
The voice eblock would be much more "exciting" design, more versatile, potentially more attractive in the market.  In addition to greater personal Achievement.  However, the voice eblock will be very complicated. I'm not sure if we'll have enough time to properly research and implement something like this.  Additionally this design approach would put us way over budget due to the cost of development tools.

Small Group Option #2

The Plan:
This type of eBlock respond to sharp sounds, namely claps. It would be able to interface fully with other existing eBlocks. Operation would be very simple. The eBlock would recognize a specific number of claps. Once an eBlocks hears the right number of claps, it would invert it's output. If it was outputing a "Yes", when it hears the right number of claps, it would output a "no" and vice versa. By having eBlocks distinguishable by the number of claps, having multiple eBlocks in close proximity is possible. In this way, a user could control one eBlock without worrying about inadvertantly triggering other eBlocks within earshot. Number of claps would be set by DIP switches, thus no programming or PC is required.

Trade-off:
A capper type eblock would be e
asier to implement and much simpler. The components are also cheaper and easier to work with. However, a clapper eblock would not be very "exciting".  Additionally it does not meet our goal of producing a voice activated eblock.


Project Description:
We will be designing a new eBlock that will be able to recognize some sort of speech or possibly clap from the user that will designate whether we want the eBlock to output a logical “yes” or a logical “no.”  For more background information about eBlocks click on this site.  This eBlock that we wish to design will be able to interface with the common set eBlock protocol, which can be found here, and will be able to fit within our pooled budget of approximately one hundred and fifty dollars.  Our eBlock will be designed to be a speaker-independent system, which elaborates to a multiple-user interface.

Project Dilemmas:
Some initial problems that we had as separate groups were that our project budgets were too low in order to purchase any of the design kits that were necessary to design the speech recognition eBlock through speech recognition ICs.  Design kits would come complete with a speech recognition IC, design boards, programs, etc. that would have made the development process a whole lot easier.  This problem made the idea about doing the project through individual groups of two using development boards a near impossibility. Our time constraints correlated to scratching out the entire idea about doing a custom speech recognition eBlock by plain programming on the PIC microprocessor.  This method would have required too much time, which we did not have.  The design procedure would include learning about speech patterns and recognition that entire quarter classes are devoted to.  After that, we would have needed to program the entire method through equations and using Fast Fourier Transform.  These programming techniques would have been too taxing on the PIC programming since we would need many multipliers and had intense computations that would have made the entire design too slow in response to the user.

Reasoning for larger groups:
In order to complete our task of designing the clapper or voice recognition eBlock, we have come up with the idea to make the clapper eBlock first and then work together as a big group in order to complete the voice recognition eBlock if time permits.  This solution would allow for a compromised solution about the time constraints that we had.  This method would also have a failsafe if we were not able to complete our voice recognition eBlock because we would have the clapper eBlock to turn in.  Our group could pool our budgets together to solve the problem about purchasing the development kit that exceeded our previous individual group budgets of fifty dollars.  We have come up with two possible paths to follow if we continue along with this idea.

Large Group Option #2

The Plan:
Since we planned on doing the entire project as a group, one possible path to follow is by having one group of two work on finishing a clapper eBlock, which would be simpler and easier to develop than the voice recognition eBlock.  The remaining people out of the entire big group can begin working on the more complicated and time constraining voice recognition eBlock using the development kit.  Like one group of two can research the development kit and decide which one is best to purchase, while another group can learn as much as possible about eBlocks.  We would then relay all the information to each other towards the end.  When the individual group of two finishes the clapper eBlock, the group can present the eBlock to the rest of group to get everyone up to date on the clapper eBlock.  Using this method we will already know how to interface eBlock protocol with our designed eBlocks by having an initial run with the clapper eBlock.  Our clapper eBlock will be our initial prototype with communicating with the other eBlocks.

Trade-off:
The benefits of following this method is that we utilize our budget better by only buying enough for one clapper eBlock, while the rest can be used for purchasing items to make the voice recognition eBlock.  We would be able to spend more man-hours on the voice recognition eBlock, and we would waste less time on unnecessary clapper eBlocks.  We would also have divisions with each smaller group doing specific tasks in order to generally get our goal done.  Some drawbacks about this path are that we would look less productive as an entire group and there is the possibility that we will not have anything to turn in besides one clapper eBlock.  Another drawback would be the fact that everyone would not have an intimate knowledge about the clapper eBlock except the original group of two who worked on it.  The individual group of two would also be behind everyone else in terms of knowledge about the voice recognition eBlock, which will take some time to catch up on.

Large Group Option #2

The Plan:
Our second path that we could follow would be similar to the first original path, except each group of two would build their own variations of the clapper eBlock.  After every group had finished their variation of the clapper eBlock, we would all work together to develop the voice recognition eBlock.  Again, we would divide up into individual groups to do specific tasks in order to complete our general goal of completing the voice recognition eBlock.  This would still allow for us to pool our remaining budgets, after the initial spend on the clapper eBlocks, in order to purchase a development kit.

Trade-off:
If we follow this path, we would look more productive as a group because we could turn in three clapper eBlocks, and everyone in the entire group can start on the voice recognition eBlock at the same time.  Another benefit to this path is that everyone will be familiar with the clapper eBlock because everyone had to build their own as groups of two.  We also optimize our time when we divide up to have specific tasks and help each other out.  The drawback about using this path though, is that we will have three clapper eBlocks that have slight variations among them.  We would waste more money on developing two additional eBlocks that we really did not need to develop.  This path is also re-inventing the wheel that we already had done.


Technology Trade Off Analysis for Voice eBlock (Also See: Processor comparison sheet ):

Technology Technology Description Cons Pros Design Decision

Speaker Independent

Vs.

Speaker Dependent

 

Speaker independent voice recognition requires no training on the part of the user. Where as Speaker dependent requires some initial training.

Speaker Independent:

  • Cannot be used for security applications.

  • Harder to implement in software (more complicated).

  • Less accurate (97% on most ICs)

Speaker Dependent:

  • Requires training (more complicated for user)

  • More memory required to hold training

Speaker Independent:

  • Anyone can use it.

  • Ease of Use

  • Less complicated (no initial programming)

Speaker Dependent:

  • Can be used for Security Purposes.

  • More accurate (99% on most ICs)

We decided to go with the speaker independent technology because the product will be easier for the consumer use.  In addition the time-to-market will not be greatly affected if we go with the ASSP (Application Specific Standard Part) option.  In fact it may be less because we would not have to worry about the extra logic and programming required for voice training.

Software Recognition

Vs.

Integrated Circuit Recognition

 

Using an IC to handle voice recognition in addition to a general purpose processor for control functions. In contrast to using a general purpose processor for handling voice recognition as well as control functions. Software Recognition:
  • Complicated to program, much more complex software.
  • High NRE and maintenance costs.
  • Requires additional possessing power. (Requires a more powerful processor with large multiplier)
  • Software development is time consuming.
IC Recognition:
  • Higher per unit costs (approx. $15-20 for low volumes).
  • Additional cost of development kit. (Approx. $100-150)
  • Additional power consumption from extra IC.
  • Less customization possible
Software Recognition:
  • Cheaper in High Volumes (able to amortize cost)
  • Less hardware
  • Possibly lower power consumption

IC Recognition:

  • Simpler to program, simpler software.
  • Lower NRE cost (using off the shelf part)
  • Requires a less powerful processor because of simplified duties.
  • Shortened development time.
The decision was made to go with IC recognition due to the short time-to-market constraint. This is done at the expense of power and cost constraints. However, we may be able to shut the PIC down when no words are being recognized saving us some power.

Unique Block

Vs.

Common Block

 

Can multiple blocks be controlled independently when in close proximity? Unique block has an additional word to differentiate it from other blocks whereas with the common block there would be no way to differentiate between different voice blocks.  Unique Block:
  • More complicated to program
  • Requires more hardware.
  • More Complicated for the User.
  • Possibly more expensive because of more complicated design.

Common Block:

  • Less functional (cannot have more than one voice block in proximity)
Unique Block:
  • Possible to have more than one voice block in close proximity without interference.
  • More control over functions

Common Block:

  • Less complicated design
  • Possibly cheaper (because of less complicated design)
We decided to go with the unique block because it allows us to add more flexibility to the product without making the product extremely difficult to use. As far a engineering goes we would have more complex control software and addition hardware (dip switches).

Continuous Listening

Vs.

Non-Continuous Listening

 

With Continuous Listening, the voice block would listen constantly for commands, whereas the Non-Continuous would only start to listen when given some “yes” input, such as a button press from a button block.

Continuous Listening:

  • Requires more power.

Non-Continuous Listening:

  • Some input required to “alert” block.

  • More complicated (because of button and additional logic)

Continuous Listening:

  • No input required for action. (More Convenient)

  • Increased Usability

Non-Continuous Listening:

  • Requires less power.

Decided to go with continuous listening mode however may we may add a low-power mode so that both are possible time permitting.

Multiple Word Recognition

Vs.

Isolated Word Recognition

 

With multiple word recognition, small phrases are possible, with more natural speech, whereas with isolated word, the user would speak slower and be able to only use single words. Multiple Word Recognition:
  • Requires more complicated software
  • More effort for the device to get acclimated to the user.

Isolated Word:

  • Unnatural Speech
  • More effort on the part of the user to get acclimated to the device.

Multiple Word Recognition:

  • Natural Speech

  • Easier for user to command block.

Isolated Word:

  • Requires simpler software
  • Easier for block to understand user.
We decide to go with multiple word recognition because it allows for more natural speech which would make it easier for the user to use. Currently most Voice ASSP support multiple word recognition.

Cost:

$129 for development kit and speech hardware.

Decisions:
No decisions have been finalized as of yet. We're still waiting to see how tommorow's (saturday) meeting with the whole class goes.
Either way there's a good chance that both designs will be implemented in some say, and depending on how the quarter goes at the end
the two designs may merge into a new design.