Statistical yield models:
Application of the Drake Equation:
- N = candidates that satisfy basic parameters
- R* = the growth rate in regions which satisfy [r] criteria (detailed below)
- ƒp = the fraction of candidates in regions which satisfy [r] criteria
- ne = the fraction of candidates that actually like me back
- ƒi = the fraction of candidates that satisfy the intellectual criteria [i] as detailed later on
- ƒc = fraction of candidates that I am able to make contact with
- L = the length of time for me to release detectable signals into cyberspace
Solving for N gives me approximately 45 candidates within an approximate 120 mile radius who:
1. Satisfies the age condition
2. Fits locational criteria [r]
3. Has a mutual attraction
4. Fits intellectual criteria [i]
Some notes: Integration of n(t) --> 25,000 along with the 2/3rds rule for standard deviation allowed me to guesstimate the number of candidates who would be physically appealing. Further integration used to find number of intelligent candidates. Triton-eyed equation discussed briefly as well.
(Chance being about 1.2% when N => 45; Optimal stopping/CSP not applied)
A brief introduction to the Optimal Stopping Classic Secretary Problem (CSP)
Optimal stopping deals with calculating a maximum yield for a minimized cost.
The CSP is a famous example of a finite horizon problem applied from the Optimal Stopping principle. In this scenario, the optimal stopping rule is used to calculate the chances of finding the best possible secretary through n number of applicants.
- There is one position available
- The candidates are totally ordered from best to worst with no ties.
- The candidates arrive sequentially in random order.
- We can only determine the relative ranks of the candidates as they arrive. We cannot observe the absolute ranks.
- Our goal is choose the very best candidate; no one less will do. The second best candidate is of no more value to us than the worst candidate.
- Once a candidate is rejected, she is gone forever and cannot be recalled.
- The number of candidates n is known
- M-1 potential partners out of N number of applicants defined by P(M/N)
- We want to find when the probability of P(M/N) is largest. Also defined as: M-1 < K ≤ N
- Best possible chance defined by: M-1/N(K-1)
- P(N/N) = 1/N
Update: I have reverse engineered OkC's platform in order to understand how the matching process works. Some unethical things were done admittedly -- I feel a bit dirty, but all is well. Using optimization methods, I was able to essentially enhance my profile quite a bit - I think. Profile views have increased by 600% (heh) and messages are starting to get answered.
No mathematical formula or algorithm has been developed for good conversation, however. Yet.
Let's not get ahead of ourselves with that, lest desperation.
Lexicology and match-analysis
I'll write the explanation later I swear.
[insert stuff about ZCTA's/readability index/match-making/Flesch here]
[charts n' stuff 4 analysis]
Readability index/match algorithm show positive correlation! YAY!
- Search algorithm
- Aggregate score formula
- What networks gather the most relevant variables?
- Modes of conversation and data acquisition
- Heuristic optimization
So computational algorithms:
OK, before I lose myself in the subject of search trees and how I would implement a searching algorithm, let's look at my personality traits as well as my personal characteristics:
Male, [age], [location], [personality]
Variables interested in:
- Female, age range, location, personality
- point system derived from aggregate point formula: points/no. tests
- where gender =/= female: void
- no var system for point aggregation yet
Data extrapolated from OkCupid tells me I am:
- Less optimistic
- Less political
- Less energetic
- Less artsy
- Less capitalistic
- Less trusting
So... binary trees now?
In this scenario, graphical trees provide a visual representation of probable pathways. Since we are not operating on a dating site which offers profiles (meaning keywords), it it not possible to extrapolate data for potential matches, however it is possible to extrapolate keywords and phrases from conversation. By constructing some generic sentence fragments for conversation, a handy algorithm might compute the 'networks' which associate the most potential matches. Not really 'true' computer programming (actually not really computer programming at all - not into the whole a.i girlfriend anymore) but establishing a framework from which to work and applying that to statistical information that can be interpreted.
For a simpler explanation:
/ / \
x x x
[finding the best route to get to 'o' using a search method implicated from statistical information about each 'x' network]
what exactly is a 'network'?
a network in this case is similar to a topological representation of a computer network link, a reductionistic explanation would be a social network like Facebook, Tumblr, or even Last.fm. A holistic statistical analysis of tree networks aims to find insight on the best maximum networks using the point aggregation method established at a later date.
what makes you so appealing?
with the mantra of thinking as a travelling salesman and narrowing down the best possible street - some what similar to the traveling salesman problem (TSP) using Euclidean minimum spanning trees (MSTs); a derived topological architecture along with a framework for minimization, I would need to think like a salesman in the manner that he is charismatic as well as keen. A good idea might be to implement a form of 'people-SEO' to increase my prominence. In the world of 'picking up chicks' (not trying to be misogynist but just stop reading if you don't like where this is going) this would be similar to 'befriending' an 'ugly girl' to get her as your 'wingman' and in the group as her 'hot' friends. Not really a politically correct explanation but it is succinct. To bring back what was stated about extrapolating data from conversation, this type of 'people-SEO' would require some insight into the art of conversation along with some real detective work.
I had done some research into the art and 'algorithm' of conversation when programming amandabot (codenamed amanda for short), the architecture was so that the program could grasp a basic understanding of conversational dialogue. For example, I assigned the introduction starter as "START": any conversation starter (i.e "hello!", "how are you doing?", "What's up?" etc.) these are preassigned so they can vary very little, however by preassigning conversational phrases up to the ~5th degree (think of it as a Sierpinski triangle -- see below), conversations become more and more interesting as they actually take a comprehensible format. The architecture of the programming behind amandabot is more complex however I think I did a pretty easy to follow explanation. The conversational architecture was based upon the Bayesian statistical format for artificial intelligence. Anyhow, I digress...
/ \ / \
2 2 2
/ \ / \ / \
side note: Don't be carried away with theoretical computer science yet - this should remain as simple as possible yet as pragmatic as possible (not really dealing with routes but actual people).
this all sounds great and all, but what is a real example of how you would use this algorithm?
The website Omegle allows you to talk to 'random strangers', however when starting a chat without adding your 'interests', you are likely to meet 17 year old girls offering to 'get on cam' for you rather than real person. However by adding an interest, for example: "calculus", you are more likely to meet a 20 year old undergraduate student. A plausible implementation of this would be to go further and research interests (an option on the site allows you to add Facebook interests) and find out statistical information of popular interests that yield the maximum potential 'candidates'. This is a simplistic explanation (there are other variables involved: timezone, location, etc.)
Obviously having a friend from across the globe is nice, but not talking to them much because of a time difference would be a problem. So what does this mean? Stake out! A method of naturalistic observation allows for data to be gathered at different intervals. By following UTC-8: 9 a.m - 11 p.m, and deviating towards 11 a.m - 2 a.m an increase in variables correlative to location is predicted.
In summation, people who live close by you and follow the same sleeping habits are more likely to log on to appropriated networks at similar time intervals.
OkCupid's algorithm for matchmaking:
- Data acquisition matches like-like as well as avoids conflicting answers
- OkCupid's aggregate formula targets what a person answers and what they want their match to answer as well as how important the questions are.
- Taking the geometric mean accounts for both matches: nth sqrt(match percentage x match percentage) where n is the number of questions
- Irrelevant: 0
- A little important: 1
- Somewhat important: 10
- Very important: 50
- Mandatory: 250
What does this mean? Answering questions you "really care about" is more efficient.
Some relevant courses: