I just got back from eating a banana split at DQ.
I am now going to write out all millions and millions and millions of records as short data blobs in a dbm derived database, and I am going to SHOVE all the search index data into 255 chars or 255 * 8 bits, plus one long as the 64 bit key, in a super long hash.
This will cut down search accuracy about 80% (or more), but damn will it be
way faster than (I got excited there) just as fast Yahoo and Google.
It will convert this project from an accurate search to a you may be pleasantly surprised search. What are you going to do. This is Canada. You can’t expect miracles here.
I’m leaving immediately after the dump starts. It will probably take a day or so anyway.
Wow, I just love entering receipts into the vendor portion of Quickbooks. It’s so much fun. Wait, I was being sarcastic. So instead of writing search engine code, I am playing little Miss secretary instead. I do this at least once a month for 2 days.
So I have corrected our pricing bug, and now I am trying to correct our sold contracts too cheaply bug. Hopefully by the time I get back from TechCrunch in Boston on November 16th I will have some sort of solution to the problem. I did quickly ponder on the fact that USDs are no longer worth what they were here, but they are still worth a dollar in the US of A. So that is a possible solution path. And no we do not have the resources to open an American branch of BeerCo. If only we did this problem wouldn’t be a problem because we could ping pong services back and forth across the border like Wal-mart does in times of currency fraud.
Oh and the comment about Boston has nothing to do with TechCrunch. While I am there I will try to talk to at least 2 American affiliates about a solution. They are not affiliated with Techcrunch in any way, and they are not even in the Web2.0 business at all. They do the same thing as us, but in the USA. I will simply stay an extra day and drive out to see some of them.
Getting back to accounting, Quickbooks does have a nice feature where you can enter the reference number on the receipts though, so you can quickly pull one of tens of thousands of receipts up in an flash by the ref number on it. It also catogorizes your expenses to easily put them into deduction categories when you have to file your T2 and RL-1. I also do that with intuit software for the company.
UPDATE: I am also looking into short term emergency work visas. So possibilities are open to resolve this problem still. I am still open to find solutions. I just know I can’t let the cost surpass the income any longer. If that happens it will be worse for customers than finding a solution which is comparable to what they have now. So this is a huge problem and it needs to be solved, and it is my job to solve it within the parameters of existing agreements. When the US dollar hits the floor, and that could be in January of February at 70-80 cents, it will no longer be possible to continue unless the problem has been solved by then, so I will work very hard to do it as soon as possible.
OK, so if you read this blog, you have been following my data storage problems.
One thing that may have come to mind is, if MySQL or MaxDB is limited, why not just change the source and recompile them to your liking. Isn’t that what open source is about?
The answer to that question is yes, that’s what FOSS is about, BUT, I only have limited time, and if they put a limit of 48 nodes on MySQL-Cluster, it’s for a good reason, like the manager can not handle the extra capacity should I hard code 2000 nodes max instead and recompile. SAP MaxDB is also a very complex program and I would not have time to add features that a huge corporation like SAP did not have time to add themselves.
I would have to hire more people to do that and the budget is limited to a few thousand dollars right now.
So the quick and dirty solution is to code a smart network storage solution hard coded to work with the rest of the code, and only including the storage commands needed to get this done. One that will auto expand infinitely by adding more 1U ip nodes. That is the only way to get it done in this bad situation.
MySQL didn’t care about scalability in high volume situations and that’s good for them. Good job MySQL, here’s a cookie. Same to SAP, another cookie for them.
Surprise, surprise, more custom software is needed. This network storage solution is going to be so to the metal, it’s going to leave rack rash. I am going to BURN through this like no person has ever done.
UPDATE: I also checked out PostGRE SQL and it too has limits unacceptable to this project. Basically I am going to make a super lightweight manager in C, and have networked queues and a hot cache for indexed requests. The manager server will serve as the data FAT, and the rest will only store indexes and blobs. It will be very basic and open ended. I really have no choice here. There is no premade data storage solution that will work for this setup. I should have realized that before.(not to change the setup, but to change the storage solution). Also, since it only has R/W and keys, it will be much faster than SQL.
UPDATE 2: due to these large scale storage problems, the initial feature of the project will be isolated on a single 1U server and shipped first to start getting advertising revenue while the other larger engine is coded and the networked database is finished. Otherwise it will lose precious revenue while the development is going on. So that yet to be disclosed “cool search” feature of the site will be shipped soon and launched solo while it’s bigger cousin is getting it’s guts reworked. Perhaps other portions of the project can also be launched until the big search is completed.
UPDATE 3: This doesn’t deserve it’s own blog post so here goes quickly.
Why not a SAN?
Because these aren’t merely data servers, they are application servers and tens, and later hundreds of servers will be calling the data synchronously across many, many machines, which means hanging inodes flying and data parity problems. I would rather go with an elegant custom built C component which is stripped and optimized for the application instead of a generic managed solution which “kind of fits” and can be “corrected” by a journalized FS by replaying transactions. There won’t be a DBA. We’re talking indexing Billions of pages here, with a B.