Archive for the ‘uncategorized’ Category

Investigation of discrete statistical models for dyadic homophily in stochastic actor-based networks

Sunday, December 29th, 2013

File:Erdos generated network-p0.01.jpg


Statistical yield models:

Application of the Drake Equation:

N = R_{\ast} \cdot f_p \cdot n_e \cdot f_{\ell} \cdot f_i \cdot f_c \cdot L
  • N = candidates that satisfy basic parameters
  • R* = the growth rate in regions which satisfy [r] criteria (detailed below)
  • ƒp = the fraction of candidates in regions which satisfy [r] criteria
  • ne = the fraction of candidates that actually like me back
  • ƒi = the fraction of candidates that satisfy the intellectual criteria [i] as detailed later on
  • ƒc = fraction of candidates that I am able to make contact with
  • L = the length of time for me to release detectable signals into cyberspace

Solving for N gives me approximately 45 candidates within an approximate 120 mile radius who:

1. Satisfies the age condition
2. Fits locational criteria [r]
3. Has a mutual attraction
4. Fits intellectual criteria [i]
5. Available

Some notes: Integration of n(t) --> 25,000 along with the 2/3rds rule for standard deviation allowed me to guesstimate the number of candidates who would be physically appealing. Further integration used to find number of intelligent candidates. Triton-eyed equation discussed briefly as well.

(Chance being about 1.2% when N => 45; Optimal stopping/CSP not applied)


A brief introduction to the Optimal Stopping Classic Secretary Problem (CSP)

Optimal stopping deals with calculating a maximum yield for a minimized cost.

Definition of the problems

The CSP is a famous example of a finite horizon problem applied from the Optimal Stopping principle. In this scenario, the optimal stopping rule is used to calculate the chances of finding the best possible secretary through n number of applicants.

    1. There is one position available
    2. The candidates are totally ordered from best to worst with no ties.
    3. The candidates arrive sequentially in random order.
    4. We can only determine the relative ranks of the candidates as they arrive. We cannot observe the absolute ranks.
    5. Our goal is choose the very best candidate; no one less will do. The second best candidate is of no more value to us than the worst candidate.
    6. Once a candidate is rejected, she is gone forever and cannot be recalled.
    7. The number of candidates n is known

Rules and Criteria: UCLA

  • M-1 potential partners out of N number of applicants defined by P(M/N)
  • We want to find when the probability of P(M/N) is largest. Also defined as:       M-1 < K ≤ N
  • Best possible chance defined by: M-1/N(K-1)
  • P(N/N) = 1/N


 SO BASICALLY.. Meet the first 37% of the potential mates (in this instance: 37% of 45 is ~17),  and pick the next best potential mate.

Image: StrategyLimit.png


Update: I have reverse engineered OkC's platform in order to understand how the matching process works. Some unethical things were done admittedly -- I feel a bit dirty, but all is well. Using optimization methods, I was able to essentially enhance my profile quite a bit - I think. Profile views have increased by 600% (heh) and messages are starting to get answered.

No mathematical formula or algorithm has been developed for good conversation, however. Yet.

 Let's not get ahead of ourselves with that, lest desperation.


Lexicology and match-analysis 

I'll write the explanation later I swear.

[insert stuff about ZCTA's/readability index/match-making/Flesch here]


[charts n' stuff 4 analysis]

100.00% 32.9
96.00% 58.5
95.00% 70.6
95.00% 84
95.00% 80
92.00% 71
92.00% 83
91.00% 69
90.00% 96

Looks like:

Screenshot from 2014-02-04 17:37:30

Screenshot from 2014-02-04 17:36:15

asdfScreenshot from 2014-02-04 18:51:55





Readability index/match algorithm show positive correlation! YAY!


description omitted due to the nature of the intended use of this chart

description omitted due to the nature of the intended use of this chart




Tentative Checklist

  1. Search algorithm
  2. Aggregate score formula
  3. What networks gather the most relevant variables?
  4. Modes of conversation and data acquisition
  5. Heuristic optimization

So computational algorithms:

OK, before I lose myself in the subject of search trees and how I would implement a searching algorithm, let's look at my personality traits as well as my personal characteristics:

Male, [age], [location], [personality]

Variables interested in:

  • Female, age range, location, personality
  • point system derived from aggregate point formula: points/no. tests
  • where gender =/= female: void
  • no var system for point aggregation yet

Data extrapolated from OkCupid tells me I am:

Dominant Traits:

  • Love-Driven
  • Scientific
  • Adventurous
  • Introverted
  • Romantic

Sub-dominant traits:

  • Less optimistic
  • Less political
  • Less energetic
  • Less artsy
  • Less capitalistic
  • Less trusting


So... binary trees now?

In this scenario, graphical trees provide a visual representation of probable pathways. Since we are not operating on a dating site which offers profiles (meaning keywords), it it not possible to extrapolate data for potential matches, however it is possible to extrapolate keywords and phrases from conversation. By constructing some generic sentence fragments for conversation, a handy algorithm might compute the 'networks' which associate the most potential matches. Not really 'true' computer programming (actually not really computer programming at all - not into the whole a.i girlfriend anymore) but establishing a framework from which to work and applying that to statistical information that can be interpreted.

For a simpler explanation:

      / \
     x   x
    /   / \
   x   x   x

[finding the best route to get to 'o' using a search method implicated from statistical information about each 'x' network]

what exactly is a 'network'?

a network in this case is similar to a topological representation of a computer network link, a reductionistic explanation would be a social network like Facebook, Tumblr, or even A holistic statistical analysis of tree networks aims to find insight on the best maximum networks using the point aggregation method established at a later date.


what makes you so appealing?

with the mantra of thinking as a travelling salesman and narrowing down the best possible street - some what similar to the traveling salesman problem (TSP) using Euclidean minimum spanning trees (MSTs); a derived topological architecture along with a framework for minimization, I would need to think like a salesman in the manner that he is charismatic as well as keen. A good idea might be to implement a form of 'people-SEO' to increase my prominence. In the world of 'picking up chicks' (not trying to be misogynist but just stop reading if you don't like where this is going) this would be similar to 'befriending' an 'ugly girl' to get her as your 'wingman' and in the group as her 'hot' friends. Not really a politically correct explanation but it is succinct. To bring back what was stated about extrapolating data from conversation, this type of 'people-SEO' would require some insight into the art of conversation along with some real detective work.


I had done some research into the art and 'algorithm' of conversation when programming amandabot (codenamed amanda for short), the architecture was so that the program could grasp a basic understanding of conversational dialogue. For example, I assigned the introduction starter as "START": any conversation starter (i.e "hello!", "how are you doing?", "What's up?" etc.) these are preassigned so they can vary very little, however by preassigning conversational phrases up to the ~5th degree (think of it as a Sierpinski triangle -- see below), conversations become more and more interesting as they actually take a comprehensible format. The architecture of the programming behind amandabot is more complex however I think I did a pretty easy to follow explanation. The conversational architecture was based upon the Bayesian statistical format for artificial intelligence. Anyhow, I digress...

/ \
1    1
/ \ / \
2    2   2
/ \ / \ / \

side note: Don't be carried away with theoretical computer science yet - this should remain as simple as possible yet as pragmatic as possible (not really dealing with routes but actual people).


 this all sounds great and all, but what is a real example of how you would use this algorithm? 

The website Omegle allows you to talk to 'random strangers', however when starting a chat without adding your 'interests', you are likely to meet 17 year old girls offering to 'get on cam' for you rather than real person. However by adding an interest, for example: "calculus", you are more likely to meet a 20 year old undergraduate student. A plausible implementation of this would be to go further and research interests (an option on the site allows you to add Facebook interests) and find out statistical information of popular interests that yield the maximum potential 'candidates'. This is a simplistic explanation (there are other variables involved: timezone, location, etc.)



Obviously having a friend from across the globe is nice, but not talking to them much because of a time difference would be a problem. So what does this mean? Stake out! A method of naturalistic observation allows for data to be gathered at different intervals. By following UTC-8: 9 a.m - 11 p.m, and deviating towards 11 a.m - 2 a.m an increase in variables correlative to location is predicted.

In summation, people who live close by you and follow the same sleeping habits are more likely to log on to appropriated networks at similar time intervals.

OkCupid's algorithm for matchmaking: 


Video notes: 

  • Data acquisition matches like-like as well as avoids conflicting answers
  • OkCupid's aggregate formula targets what a person answers and what they want their match to answer as well as how important the questions are.
  • Taking the geometric mean accounts for both matches: nth sqrt(match percentage x match percentage) where n is the number of questions

Rating scale: 

  • Irrelevant: 0
  • A little important: 1
  • Somewhat important: 10
  • Very important: 50
  • Mandatory: 250



okcupid's scale for compatibility charted

What does this mean?  Answering questions you "really care about" is more efficient.


Some links:


Some relevant courses:

A [closer] look at lighttpd.conf & its constituents

Monday, January 21st, 2013

In the last guide, I showed you how to install the lighttpd web server,

in this post, I will discuss the lighttpd.conf located at /etc/lighttpd/lighttpd.conf & the various configurations available.



  • Continued configuration
  • Configuration overview
  • Name based virtual hosting
  • Error handlers

Continued configuration 

This explains what to do after the previous post.


If you have recently installed the lighttpd web server, point your browser to your domain.

If all is well, you should see this placeholder page (named index.lighttpd.html).

To edit the directory, and/or index page,

navigate to your server document root (as defined by lighttpd.conf - explained here)



Configuration overview

This is a brief overview of the directives in your config file.



This is a shorten list of the configuration options, a full list is available here.



server.document-root: The top level document root

server.port: The server port, default 80.

server.upload-dirs: The upload directory

server.max-connections: How many concurrent connections supported by the server. 1/2 recommended.

server.max-fds: Max file descriptors - set higher than max-connections (max-connections == max-fds/2)

server.max-keep-alive-idle: Maximum seconds before a idling keep-alive connection is dropped

connection.kbytes-per-second: Maximum kilobytes that a connection is limited to

server.kbytes-per-second: Limits the maximum kilobytes for the entire server. (kbyte/s)

server.max-request-size: Maximum size (kbyte) of request



mod_access: Access restrictions

mod_alias: Directory aliases

mod_compress: Reduces network load

mod_redirect: Redirects set of URLS


Let's take a look at the server document root, given by the core directive:

server.document-root =

This specifies where the top level document root is located, you can change it to specify where the default directory is.

(For example:  server.document-root = "/home/lighttpd/public_html/")

In order to have a fully running web server, you must specify the correct directory.


Name based virtual hosting

This enabled you to host multiple websites from a single IP address.


This example shall create 'website1' and 'website2'.


  • Firstly, you must have the required directories for the websites you want to host.


To do this:


# mkdir -p /home/lighttpd/default/http/

This is going to be our default 'main' document root.


# mkdir -p /home/lighttpd/
# mkdir -p /home/lighttpd/
# mkdir -p /home/lighttpd/
# mkdir -p /home/lighttpd/

These are our default document root & logs for website1 & website2.


  • Next, we will edit the lighttpd.conf file:
# vi /etc/lighttpd/lighttpd.conf


Change the default document root & point it to our defined document root:

server.document-root = "/home/lighttpd/default/http/"


Add the following lines:

include ""
include ""


Save and exit (:wq).


  • Creating website virtual host configuration:


Edit website configuration:


# vi /etc/lighttpd/


Add the following lines:

$HTTP["host"] =~ "website1\.com" {
              server.document-root = "/home/lighttpd/"
              accesslog.filename = "/home/lighttpd/"

(Simply replace the underlined with your website domain/directory)

Save and close (:wq)


Repeat this process for website2;


# vi /etc/lighttpd/


Add the following lines:


$HTTP["host"] =~ "website2\.com" {
              server.document-root = "/home/lighttpd/"
              accesslog.filename = "/home/lighttpd/"


Save and quit.


  • Force reload lighttpd:
/etc/init.d/lighttpd force-reload




Error Handlers

This will detail how to customize your 404 status page using lighttpd web server.


  • Edit your lighttpd.conf:


# vi /etc/lighttpd/lighttpd.conf

  • Add the line:


server.error-handler-404 = "/error-404.php"

This calls upon the file 'error-404.php' located in the website's document root whenever a 404 error has been returned.


  • Save and exit - you're done!




Sunday, January 20th, 2013

i'm bill

this is my blog,



server proudly supports

debian, lighttpd