Chris R's Weblog
Visit Rory Blyth's blog. He worked at Microsoft and now he's a writer

Daily link December 9th, 2007

Where am I at.

While this is pseudo code and the actual code is thousands of lines, this is approximately what is going on. Minus the text searching functions.

So while this is not extra-ordinary and retardedly easy to understand, I am posting it just to give you a little idea of the memory manager. Like I said this is very pseudocode, because the actual code is far more complex. I am finishing the IPC with the socket file now.

g_main_pointer = malloc(some_huge_amount);

// dump from db file dumps into memory
// fopen(”filenum”, “r”)
// iterate into structs here
typedef struct oRecord {
type somedata1;
type somedata2;
} *PRECORD;
oRecord rc;
memcpy(rc.something, &from_file_stream[offset], sizeof(data));
memcpy(&g_main_pointer[offset, (void*)&rc, sizeof(rc));
// fclose

PRECORD baseRecord = (PRECORD)g_main_pointer;
for (int i = 1; i < 30; i++) {
PRECORD pcRecord = &baseRecord[i];
printf(”STR%d: %s\n”, i, pcRecord->somedata);
}

free(g_main_pointer);

Since the data is all dumped from the initial DB into terabytes of 1.3 meg files, it reads in files for gigabytes of memory caching until it runs out. Then it notes the last file read, and if the search overruns the records in memory, then it starts pumping through the rest of the 1.3 meg files. 1.3 meg files are key because a huge file like a VPC hard disk or whatever will mean a LONG fseek to the right position in the stream. On Linux this would mean a lot of stress on the VFS subsystem in the kernel and the driver for the disk. So it’s almost like a little pseudo filesystem table and there is an index that keeps all the records and the pagerank tables.

Each record in the 1.3 meg files is kept to 255 * bytes exactly so it can be iterated through easily. I limited it to 255 chars to retain speed. Since 21 or so fields are shoved into such a small space the accuracy of the search is greatly reduced.

I have to buy parts from Tiger Direct where as the other engines can get custom hardware with 20-50 RAM sockets. It’s not a very fair fight to say the least, so I have to do it this way to keep the speed up. We have lots of 1U’s but they are not custom. They are stock, with stock parts. Parts you can buy at TigerDirect.ca

Daily link December 7th, 2007

IPC through named files

Well, it’s that time, and I have to create a bridge between the search server process, and the search requester process. For JIT this is done via TCP via an intranet IP, and for local compilation of keyword searches, this is done via a named socket via /tmp

So I am now working on this. It is insanely fast, and I can only hope to have this finished so I can build up the final hardware config and just do this.

I still haven’t finished the recursive PBot which will scan the web, as I now only have a spock like DB full of finite social data. This is not the goal though as the goal is to have all web data. I will not rescan social data as I will simply dump it into the format that the web search will be in. The social sites are the vastest and highest traffic ranked on the web, so rescanning them would be tiresome duplication for bandwidth as weak as we have.

So with that said I continue on this trek, to search engine sweetness.

Daily link November 2nd, 2007

Making more progress in a solution for our customers

I am making more progress towards finding a solution for our customers right now. Progress has been made, and I am doing nothing except working towards that and doing accounting right now.

Once we have something concrete I will email all customers and talk with you individually so we can work through this together. The US currency still isn’t bad enough where it’s something immediate, but it is quickly getting there, so we are trying hard to reduce any impact of that 3rd party factor on our relationship, and your quality of work, and service. Hopefully with progress I am trying to make right now, the trains will be running on time no matter how bad the exchange rate gets.

We respect our customers, and want nothing but the best for them.

UPDATE: I don’t want to get anybody’s expectations up too much. This could take 3-4 weeks to work out.

UPDATE2: I can imagine our customers are angry or unsure right now. I would be willing to tell you exactly what steps I am taking to make sure that your work gets done as per the agreement. I have no problem with that, and you may contact me via email, chat, or telephone if you would like that information.


December 2007
M T W T F S S
« Nov    
 12
3456789
10111213141516
17181920212223
24252627282930
31  
BeerCo on YouTube (BCS videoblog)
Photoblog
(on Flickr)
Main RSS Feed
Link Blog (tech news from SiteSpaces.net)
Add to Technorati Favorites
About me
Comment RSS Feed
Click to see the XML version of this web page.


Chris R. works at BeerCoSoftware.com (title: President of Development and Sales). This is Chris's work blog.

Disclaimer: BCS will not let personal views of any employee, including Chris, regarding any software product, company, standards or otherwise get in the way of any company that hires it to provide a solution. Companies pay BCS and BCS provides solutions regardless of the views of any employee. That’s part of being professional, and BCS is a professional software company.

Everything here is Chris's personal opinion and is not read or approved before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else on this blog.

Login
Blog at WordPress.com.
InstaSize Online and Square InstaPic – Photo Editor for your PC.

How To Fix Svchost.Exe Netsvcs High CPU Usage Problem ? Solved: Netsvcs High CPU