Read a large file without wasting memory

There is a huge file of short strings (URLs). The base is about 20 mega bytes. An ordinary hoster will beat if I start reading the file with each request and give one random line from the bold file.

How to read a random line from a huge file without eating up a lot of memory?


Answer 1, authority 100%

$fp     = fopen($filename, 'r');
$offset = mt_rand(0, filesize($filename) - 1);
fseek($fp, $offset);
//        
while ($offset > 0 && fgetc($fp) != "\n") {
    fseek($fp, --$offset);
}
$line = fgets($fp);

Answer 2, authority 86%

$avg_line_len = 30; //   
$file = fopen("list-url.txt", "r");
if ($file !== FALSE) {
    $stat = fstat( $file );
    do {
        do {
            $post = rand( 0, $stat['size'] - $avg_line_len );
        } while( fseek($file, $pos) == -1 );
        fgets($file); //      \n,
                      //       
    } while( ($url = fgets($file)) == FALSE);
    // $url -    
}

P.S.

If the file size is large (several megabytes in the original task), the $avg_line_lenvalue can be ignored, and instead of fstat, use filesize– the result will be the same -the same


Answer 3, authority 43%

I offer a simpler solution. The file itself needs to be split into a dozen (two) smaller ones. Further, let there be 100,000 lines in the initial file, and 100 files. We give them names like “file00.txt”,”file01.txt” … “file99.txt”. With rand(0,100); select the file name, and rand(0,1000); – line. Further – as given in other answers.

But the main thing is not to get too carried away. 1000 files in one directory will already slow down. Therefore, you need to experiment. Maybe it would make sense to create 10 directories with names 0 – 9, and put 100 files there, each with 100 lines.

If the number of lines does not break down optimally, you can either duplicate some of the lines, or process the number of lines in the file.

UPD:

Another variant has been born that can give a strong increase. But for this you will need to prepare the file to start. The algorithm is like this. We go through the entire file, find the longest url. Knowing its length, we make a new file, in which each line is padded with spaces to the desired length. Now, to jump to the desired line, you just need to do fseekfseek(_*(_) + 1)). Where 1 is line feed accounting (although you need to look, under Windows it can be 2). And after fseek, you can read the data.


Answer 4, authority 14%

$rand_line_number = rand(0,1000);
$handle = @fopen("file.txt", "r");
if ($handle) {
    $i = 0;
    while (($buffer = fgets($handle, 4096)) !== false) {
        if($i==$rand_line_number){
            echo $buffer;
            break;
        }
        $i++;
    }
    if (!feof($handle)) {
        echo "  \n";
    }
    fclose($handle);
}

Answer 5, authority 14%

Drop this whole “huge” file into the database, and pull it out of the database. If you are afraid to distill this whole thing on the host, do it on the local, and then upload the database to the host.


Answer 6, authority 14%

function fileRandLine($file)
{
    $res = '';
    if (is_file($file)) {
        $filesize = filesize($file);
        if ($filesize > 0) {
            $fp = @fopen($file, 'rt');
            if (is_resource($fp)) {
                for ($i = 0; $str = fgets($fp, $filesize + 1); $i++) {
                    if (mt_rand(0, $i) == 0) {
                        $trim = trim($str);
                        if (empty($trim)) {
                            $i--;
                            continue;
                        }
                        $res = $trim;
                    }
                }
            }
        }
    }
    return $res;
}

Application:

echo fileRandLine('file.txt');